My favorites | Sign in
Project Home Downloads Issues Source
Search
for
x86oddities  
x86 oddities
Featured
fr , en
Updated Feb 26, 2012 by ange.alb...@gmail.com

<< back to index

x86 oddities

This page (printable version wiki source) enumerates various oddities of the x86/x64.

They are all implemented and tested in CoST.

general

register order

  • In 32b, each register has some unique role (unlike in 64b, where all registers of the r9-r15 group are equivalent, from an mnemonic perspective): STOSD, reads from [ESI] to EAX, LOOP relies on ECX, XLAT uses EBX as a base, IN reads from port DX...
  • Registers are using A, B, C, D letters, but it stands for Accumulator, Base, Counter, Data, not the letters themselves.
Their logical order is not the alphabetic one: In the CPU, they're encoded in the A, C, D, B order:
for example, INC EAX is encoded 40, INC ECX is encoded 41, and so on.

instruction length

An instruction is limited to 15 bytes on recent CPUs (it changed over time):

for example, while a nop preceded by 14 useless prefixes is valid,

66 66 66 66 66 66 66 66 66 66 66 66 66 66 90: nop

=> nothing
adding one more prefix will reach the limit and trigger an exception:
66 66 66 66 66 66 66 66 66 66 66 66 66 66 66 90: ??

=> exception

However, it's possible to almost reach that limit with legitimate operations:

  • in 64 bits:
  •  2e 67 f0 48 818480 23df067e 89abcdef: lock add qword cs:[eax + 4 * eax + 07e06df23h], 0efcdab89h
  • (the same with more details):
  •  2e                                                   cs:
        67                                                    e         e
           f0                              lock
              48                                    qword
                 818480                         add          [?ax + 4 * ?ax +           ],
                        23df067e                                              07e06df23h
                                 89abcdef:                                                 0efcdab89h

  • it's also possible in 16 bits:
  • f0 2e 66 67 818418 67452301 efcdab89: lock add dword cs:[eax + ebx + 001234567h], 089abcdefh

VirtualPC has been known to be incorrectly ignoring the 15 bytes limit.

mnemonic length

  • or, in, jz/jp/js/jo, bt are the smallest mnemonics
  • maskmovdqu, vbroadcast, vzeroupper, vfnmadd132pd, vbroadcastf128 are long names...
    • but aeskeygenassist beats them all.

registers

MMX and FPU

MMX and FPU registers are overlapping, but in opposite directions: 0, 1,2,3... mapped to 7,6,5...

Thus, a single FPU operation on ST0 will modify FST, ST0, but also MM7 (and CR0, under XP).

d9eb: fldpi

=> fst = 03800h
   st0 = 04000c90fdaa22168c235h
   mm7 =     0c90fdaa22168c235h
   cr0 = 080010031h (under XP)

GS

On 32bit Windows, GS is not saved in the execution context: when the OS switches from an application to another, the content of GS is lost. This can be used as an anti-emulator or an anti-stepping: after some time of execution, GS will eventually be reset:

  1. set GS to X
  2. wait until GS is null <== thread switch eventually happens, and resets GS
  3. execution resume here

os/tool/vm detection

At any defined point of execution (EntryPoint, DllMain, TLS...), registers might have different values, depending on the OS.

  • tools like packers (such as UPX) or debuggers (such as OllyDbg) might also alterate these values.

And, at any point of execution:

  • smsw, sidt, str, sgdt will return different values depending on the OS.
  • sldt, lsl, str might return different values if execution takes place in a virtual machine.

These values are currently being collected in the Initial Values page.

specific

nop

nop is an alias mnemonic of 90:xchg *ax, *ax (which does nothing, as no flag are affected by xchg): the whole 90-97 range is actually xchg *ax, <reg32>.

90: xchg eax, eax

=> eax, eax = eax, eax ;)
91: xchg ecx, eax

=> ecx, eax = eax, ecx

However, xchg *ax, *ax has another encoding, which is not considered a nop. And, on 64 bits, it clears the upper 32 bits of rax. So, not all xchg *ax, *ax are nops.

87c0: xchg eax, eax

=> rax = eax

Hopefully, 90 is truly a nop, even in 64 bits.

xchg/xadd

xchg, xadd are opcodes that affect both source and target operands (like fxch).

Moreover, they can operate on different parts of the same register, which has the potential to break trivial logic analyzers:

0f c0c4: xadd ah, al

=> ah, al = al + ah, ah

aad

aad is officially defined to use only 10/0Ah as a default operand, but can just use any other operand.

it makes it the first Add and Multiply opcode, as al = ah * operand + al.

ax = 325h

d507: aad 7

=> ax = 3 * 7 + 25h = 3ah

aam

Similar logic for aam:

  • it's officially defined with 10/0ah, but it just works with any byte.
  • it's a division, and quotient and remainder go to ah and al respectively.

al = 3ah

d403: aam 3

=> ah = 3ah / 3 = 13h
   al = 3ah % 3 = 1

bswap

bswap is officially undefined on WORDS. In reality, it just clears the register, unexpectedly.

66 0fc8: bswap ax

=> ax = 0

cmpxchg*

  • lock cmpxchg8b doesn't crash CPUs anymore,

but some tools might still show obsolete warnings about it.

  • for bus optimization purposes, all 3 cmpxchg opcodes always write the operand, even if the values are unchanged: this could trigger an exception on read-only memory.

crc32

The crc32 opcode implements the full algorithm with a single operation, however, it's not the commonly used CRC32 (used in Zip), but actually the CRC-32C (Castagnoli CRC-32), which uses a different polynomial.

While it's technically the same algorithm as the 'common' CRC32, it uses a different seed, so it returns different results, thus it's useless for Zip, and all the countless applications of the deflate algorithm.

eax = 0abcdef9h
ebx = 12345678h

f2 0f 38f1c3: crc32 eax, ebx

=> eax, 0c0c38ce0h

It's still usable independently as a checksum, and is actually used in network protocols such as iSCSI, SCTP; it's actually more efficient than the standard CRC32 (when used for recovery purposes), but it's just incompatible.

mov

  • mov to/from control and debug registers ignores the modRM (the field that specifies if operations are done on registers or memory).
  • 0f 2000: mov eax, cr0
by the usual standards, it should have been decoded as mov [eax], cr0 instead, which would be invalid.

  • mov <reg32>, <selector> is officially MOV, (not MOVZX) yet it modifies the full DWORD from a WORD. (the upper word is supposedly undefined on CPUs older (or equal) to Pentium, but it's zero on a Pentium anyway). In any case, the upper word is null on modern CPUs.
  • 8cc8: mov eax, cs
    
    => eax = 0000001bh (xp)

push

Even though selectors are WORDS-sized registers, like standard registers such as AX, they're not pushed on the stack the same way.

1e: push ds

=> esp = esp - 4
   word ptr [esp] = ds
66 50: push ax

=> esp = esp - 2
   word ptr [esp] = ax

no other word is changed.

movbe

  • movbe (MOV Big Endian) is a recent opcode equivalent to mov + bswap, but only to/from memory.
  • [ebx] = 011223344h
    
    0f 38f003: movbe eax, [ebx]
    
    => eax = 044332211h
  • unlike bswap, it's able to work with a WORD without resetting it.
  • It's found only on Atom CPUs. Thus, netbooks support it, but not powerful CPUs such as i7.

lzcnt

lzcnt (Leading Zero CouNT) is an opcode created in 2007, only supported by AMD in their Barcelona architecture and later (it's planned in Intel Haswell for 2013, along with its counterpart tzcnt).

Recent opcodes would usually trigger an exception when executed on a CPU not supporting them.

However, this one is mapped on 0fbd: bsr (Bit Scan Reverse) with an f3 prefix, so it will not trigger any exception on a CPU that doesn't support it:

  1. it will just execute bsr and ignore the prefix.
  2. bsr and lzcnt work on the same register, and have the same instruction length, so the same target register will be modified, and the next instruction will be the same. Thus, only the target register and flags might be different.

if you execute:

ecx = 35abc80eh (00110101101010111100100000001110b)

f3 0f bdc1:

if lzcnt is supported by the CPU:

f3 0f bdc1: lzcnt eax, ecx

=> eax = 2

if not:

f3         <== ignored prefix
   0f bdc1: bsr eax, ecx

=> eax = 1dh

It makes lzcnt an odd exception-less AMD detector (for now): besides, with a null source, lzcnt will return a null value, while bsr will leave the target unmodified.

sal

Shift Arithmetic Left (the opcode with modRM 110) is identical to SHL (opcode with modRM 100), and is usually encoded directly as SHL: this means that assemblers always generates the SHL opcode, so SAL is sometimes totally ignored by disassemblers/emulators/...

al = 1010b

c0f0 02: sal al, 2

=> al = 101000b

It's informally called SAL, because it's technically a different opcode (in hex), but functionally, it's the same as SHL.

modRM 100 101 110 111
opcode SHL SHR 'SAL' SAR

salc

  • salc is sometimes written setalc
  • it stands for Set AL on Carry
  • it's undocumented by Intel - but not by AMD, and it's unexpectedly supported by Intel's public tools.
  • it's a one byte equivalent of 1ac0: sb al, al
    • al = cf ? -1 : 0
f9: stc
d6: salc

=> al = -1

lock

lock: works only on memory targets:

  • f0 0100: lock:add [eax], eax is valid.
  • f0 01c0: lock:add eax, eax and f0 0300: lock:add eax, [eax] trigger exceptions.

and on the following opcodes:

  • adc, add, and, or, sbb, sub, xor, dec, inc, neg, not
  • cmpxchg, cmpxchg8b
  • btr, bts, btc
    • f0 0fa300: lock:bt [eax], eax does trigger an exception.
  • xadd, xchg (even if they are already atomic, so lock: is superfluous)

XP bug

lock: is wrongly parsed by Windows XP:

  1. Upon an exception, XP tries to determine whether it should be an INVALID LOCK SEQUENCE or just an ILLEGAL INSTRUCTION
  2. but it checks too briefly for a F0 byte: in the case of FEF0, which is just undefined, an INVALID LOCK SEQUENCE is still triggered by XP even if, in this case, it has nothing to do with a lock prefix (For reference, FEC0 decodes as inc al)

Windows 7 just avoids the problem altogether by triggering an ILLEGAL INSTRUCTION on all invalid opcodes, no matter what, including invalid use of LOCK: prefix. No parsing, no mistake !

fef0: ??

=> INVALID LOCK SEQUENCE (XP, bug)
   ILLEGAL INSTRUCTION (W7)

smsw

  • returns CR0 value (WORD or DWORD)
  • unprivileged, unlike mov eax, cr0: it's an old 286 instructions, while mov cr0 is only present in 386 and later.
  • upper bits are officially undefined. but in reality, they're just CR0 contents.
  • 0f 01e0: smsw eax
    
    => eax = 8001003b (XP)
  • since CR0 is influenced by other events (FPU) under XP, it makes it a tough anti-emulator.
  • smsw is defined on DWORD or WORD on registers, but always on WORD in memory(see below).

str/sldt

Like smsw, they work on DWORD or WORD on registers, but only on WORD in memory.

   0f 00c8: str eax

=> eax = 00000028h (XP)
66 0f 00c8: str ax

=> ax  =     0028h (XP)
   0f 0008: str [eax]

=> word ptr [eax] = 0028h (XP)

it's the same for sldt.

test

test <r32>, <imm32> has an alternate encoding that is sometimes forgotten, as it's never generated by compilers or assemblers.

f7c8 44332211: test eax, 11223344h

IceBP

  • like salc, IceBP is undocumented by Intel, but not by AMD, and supported by Intel tools.
  • it stands for In-Circuit Emulator Breakpoint.
  • it's unprivileged.
  • it triggers a SINGLE STEP exception, after execution.
  • it's sometimes written Int1, as it's the stepping interrupt, but executing CD 01:Int 1 doesn't trigger SINGLE STEP.
f1: IceBp

=> SINGLE STEP (80000004h) exception

rdtscp

rdtscp is a recent opcode that just returns the usual rdtsc result to eax/edx, and also changes ECX: it's loaded with the low-order 32-bits of IA32_TSC_AUX MSR ... which means most of the time, 0.

0f 01f9: rdtscp

=> edx:eax = <rdtsc>
   ecx = 0

hint nop

  • hint nop is officially documented by Intel as opcode 0f 1f, but it's actually available on range 0f 19-1f.
  • as one would expect from a nop, it never triggers an exception, even when referencing an invalid address.
0f1980 00000080: nop [eax + 8000000h]

=> nothing
  • but, of course, if the operand is on an invalid page, it can still trigger an exception.

branch hints

  • branch hints are officially defined to give hints to the CPU whereas a branch is likely to be taken or not.
  • they are supposedly generated by compilers, but there is no official way to assemble or disassemble them.
  • they are re-using the 2e/3e bytes, which are mapped to CS: and DS: prefixes.

16b flow

  • call, jumps, return, loops can either jump to 32b or 16b via the 66: prefix.
  • there is no official way to disassemble a return to word : small retn, retn word, retn.w...
68 00104000: push 401000h
66 c3:       retn

=> eip = 00001000h
   esp = esp - 2

obsolete opcodes

There are many opcodes that are never (or in extreme cases) generated by compilers nowadays, that still fully work under modern CPUs. The list is long: xadd, aaa, daa, aas, das, aad, aam, l*s, bound, arpl, xlatb, lar, verr*, cmpxchg*, lsl...

For example, Here is some code, fully working under a modern CPU, but obfuscated by its obsolescence:

into
bound eax, [edx]
verr cx
lar eax, ecx
str edx
aaa
lsl eax, ecx
sfence
arpl cx, ax
aam
bswap ecx
lock cmpxchg8b [esi]
lds ebx, [esi]
xlatb
daa
xadd ecx, eax
prefetch [eax]

future opcodes

Intel Haswell will introduce very useful opcodes (on general registers) such as:

  • andn:

andn eax, ebx, ecx

=> eax = !ebx & ecx

which is functionally equivalent to 8086 instructions (from 1978):

89d8 mov eax, ebx
f7d0 not eax
21c8 and eax, ebx
  • mulx, rorx, sarx, shlx, shrx will do the same as their cousins from the late 70's, but without affecting the flags.

x64

32 bits zero extending

In 64 bits, opcodes are zero-extending on 32 bits registers.

thus, while

   fec0: inc al
66 ffc0: inc ax
   ffc0: inc rax

all do what you would expect.

but on the other hand,

48 ffc0: inc eax

resets the upper 32 bits of RAX.

switching between 32b and 64b modes

On a 64 bits CPU, the cpu can just change from/to 32b mode by jumping to a properly defined selector. In short, changing the number of bits just mean jumping to a different value of CS.

For example, in a 64b version of windows, selector 33h is for 64b. Jumping to it from a 32b process, then jumping back, will switch to 64b, then back to 32b. It's as simple as that.

    <32b>
call far 33h:_64b
    <32b>

_64b:
    <64b>
    ...
    retf

32+64

Since there are some opcodes specific to 32 bits mode (arpl, ...), and others specific to 64 bits mode (movsxd, ...), the same hex data can lead to completely different disassembly, just because CS is different at the start.

acknowledgements

  • Peter Ferrie
  • BeatriX
  • Czerno
  • Eugeny Suslikov
  • Gil Dabah
  • Guillaume DelugrĂ©
  • Igor Skochinsky
  • Jean-Baptiste BĂ©drune
  • Jim Leonard
  • Jon Larimer
  • Moritz Kroll
  • Oleh Yuschuk
  • Sebastian Biallas
  • Yoann Guillot

Other resources

<< back to index


Sign in to add a comment
Powered by Google Project Hosting