CIS-77 Home http://www.c-jump.com/CIS77/CIS77syllabus.htm
Encoding Real x86 Instructions
- Encoding Real x86 Instructions
- x86 Instructions Overview
- x86 Instruction Format Reference
- x86 Opcode Sizes
- x86 ADD Instruction Opcode
- Encoding x86 Instruction Operands, MOD-REG-R/M Byte
- General-Purpose Registers
- REG Field of the MOD-REG-R/M Byte
- MOD R/M Byte and Addressing Modes
- SIB (Scaled Index Byte) Layout
- Scaled Indexed Addressing Mode
- Encoding ADD Instruction Example
- Encoding ADD CL, AL Instruction
- Encoding ADD ECX, EAX Instruction
- Encoding ADD EDX, DISPLACEMENT Instruction
- Encoding ADD EDI, [EBX] Instruction
- Encoding ADD EAX, [ ESI + disp8 ] Instruction
- Encoding ADD EBX, [ EBP + disp32 ] Instruction
- Encoding ADD EBP, [ disp32 + EAX*1 ] Instruction
- Encoding ADD ECX, [ EBX + EDI*4 ] Instruction
- Encoding ADD Immediate Instruction
- Encoding Eight, Sixteen, and Thirty-Two Bit Operands
- Encoding Sixteen Bit Operands
- x86 Instruction Prefix Bytes
- Alternate Encodings for Instructions
- x86 Opcode Summary
- MOD-REG-R/M Byte Summary
- ISA Design Considerations
- ISA Design Challenges
- Intel Architecture Software Developer's Manual
- Intel Instruction Set Reference (Volume2)
- Chapter 3 of Intel Instruction Set Reference
- Intel Reference Opcode Bytes
- Intel Reference Opcode Bytes, Cont.
- Intel Reference Opcode Bytes, Cont.
- Intel Reference Opcode Bytes, Cont.
- Intel Reference Opcode Bytes, Cont.
- Intel Reference Opcode Bytes, Cont.
- Intel Reference Instruction Column
1. Encoding Real x86 Instructions
-
It is time to take a look that the actual machine instruction format of the x86 CPU family.
-
They don't call the x86 CPU a Complex Instruction Set Computer (CISC) for nothing!
-
Although more complex instruction encodings exist, no one is going to challenge that the x86 has a complex instruction encoding:
2. x86 Instructions Overview
|
|
3. x86 Instruction Format Reference
|
|
4. x86 Opcode Sizes
|
|
-
The x86 opcode bytes are 8-bit equivalents of iii field that we discussed in simplified encoding.
-
This provides for up to 512 different instruction classes, although the x86 does not yet use them all.
5. x86 ADD Instruction Opcode
|
|
6. Encoding x86 Instruction Operands, MOD-REG-R/M Byte
-
The MOD-REG-R/M byte specifies instruction operands and their addressing mode(*):
|
|
7. General-Purpose Registers
|
|
-
Similarly, AL, DL, CL, and BL represent the low-order 8 bits of the registers.
8. REG Field of the MOD-REG-R/M Byte
|
|
-
___________
-
(*) For certain (often single-operand or immediate-operand) instructions, the REG field may contain an opcode extension rather than the register bits. The R/M field will specify the operand in such case.
9. MOD R/M Byte and Addressing Modes
|
|
10. SIB (Scaled Index Byte) Layout
|
|
11. Scaled Indexed Addressing Mode
|
|
12. Encoding ADD Instruction Example
-
The ADD opcode can be decimal 0, 1, 2, or 3, depending on the direction and size bits in the opcode:
-
-
How could we encode various forms of the ADD instruction using different addressing modes?
13. Encoding ADD CL, AL Instruction
-
Interesting side effect of the direction bit and the MOD-REG-R/M byte organization: some instructions can have two different opcodes, and both are legal!
-
For example, encoding of
add cl, al
could be 00 C1 (if d=0), or 02 C8, if d bit is set to 1.
-
The possibility of opcode duality issue here applies to all instructions with two register operands.
14. Encoding ADD ECX, EAX Instruction
-
add ecx, eax
-
-
Note that we could also encode ADD ECX, EAX using the bytes 03 C8.
15. Encoding ADD EDX, DISPLACEMENT Instruction
-
Encoding the ADD EDX, DISP Instruction:
add edx, disp
16. Encoding ADD EDI, [EBX] Instruction
-
Encoding the ADD EDI, [ EBX ] instruction:
add edi, [ebx]
17. Encoding ADD EAX, [ ESI + disp8 ] Instruction
-
Encoding the ADD EAX, [ ESI + disp8 ] instruction:
add eax, [ esi + disp8 ]
18. Encoding ADD EBX, [ EBP + disp32 ] Instruction
-
Encoding the ADD EBX, [ EBP + disp32 ] instruction:
add ebx, [ ebp + disp32 ]
19. Encoding ADD EBP, [ disp32 + EAX*1 ] Instruction
-
Encoding the ADD EBP, [ disp32 + EAX*1 ] Instruction
add ebp, [ disp32 + eax*1 ]
20. Encoding ADD ECX, [ EBX + EDI*4 ] Instruction
-
Encoding the ADD ECX, [ EBX + EDI*4 ] Instruction
add ecx, [ ebx + edi*4 ]
21. Encoding ADD Immediate Instruction
|
|
-
If opcode high-order bit set to 1, then instruction has an immediate constant.
-
There is no direction bit in the opcode:
-
: indeed, you cannot specify a constant as a destination operand!
-
Therefore, destination operand is always the location encoded in the MOD-R/M bits of the the MOD-REG-R/M byte.
-
In place of the direction bit d, the opcode has a sign extension x bit instead:
-
For 8-bit operands, the CPU ignores x bit.
-
For 16-bit and 32-bit operands, x bit specifies the size of the Constant following at the end of the instruction:
-
If x bit contains zero, the Constant is the same size as the operand (i.e., 16 or 32 bits).
-
If x bit contains one, the Constant is a signed 8-bit value, and the CPU sign-extends this value to the appropriate size before adding it to the operand.
-
-
This little x trick often makes programs shorter, because adding small-value constants to 16 or 32 bit operands is very common.
-
-
-
The third difference between the ADD-immediate and the standard ADD instruction is the meaning of the REG field in the MOD-REG-R/M byte:
-
Since the instruction implies that
-
the source operand is a constant, and
-
MOD-R/M fields specify the destination operand,
the instruction does not need to use the REG field to specify an operand.
-
-
Instead, the x86 CPU uses these three bits as an opcode extension.
-
For the ADD-immediate instruction the REG bits must contain zero.
-
Other bit patterns would correspond to a different instruction.
-
-
Note that when adding a constant to a memory location, the displacement (if any) immediately precedes the immediate (constant) value in the opcode sequence.
22. Encoding Eight, Sixteen, and Thirty-Two Bit Operands
|
|
-
Intel studied x86 instruction set and came to the conclusion:
-
in a 32-bit environment, programs were more likely to use 8-bit and 32-bit operands far more often than 16-bit operands.
-
-
So Intel decided to let the size bit s in the opcode select between 8- and 32-bit operands.
23. Encoding Sixteen Bit Operands
|
|
-
There is nothing programmer has to do explicitly to put an operand size prefix byte in front of a 16-bit instruction:
-
the assembler does this automatically as soon as 16-bit operand is found in the instruction.
-
-
However, keep in mind that whenever you use a 16-bit operand in a 32-bit program, the instruction is longer by one byte:
Opcode Instruction -------- ------------ 41h INC ECX 66h 41h INC CX -
Be careful about using 16-bit instructions if size (and to a lesser extent, speed) are important, because
-
instructions are longer, and
-
slower because of their effect on the instruction cache.
-
24. x86 Instruction Prefix Bytes
-
x86 instruction can have up to 4 prefixes.
-
Each prefix adjusts interpretation of the opcode:
-
Repeat/lock prefix byte guarantees that instruction will have exclusive use of all shared memory, until the instruction completes execution:
F0h = LOCK
-
String manipulation instruction prefixes
F3h = REP, REPE F2h = REPNE
where
-
REP repeats instruction the number of times specified by iteration count ECX.
-
REPE and REPNE prefixes allow to terminate loop on the value of ZF CPU flag.
Related string manipulation instructions are:
-
MOVS, move string
-
STOS, store string
-
SCAS, scan string
-
CMPS, compare string, etc.
See also string manipulation sample program: rep_movsb.asm
-
-
Segment override prefix causes memory access to use specified segment instead of default segment designated for instruction operand.
2Eh = CS 36h = SS 3Eh = DS 26h = ES 64h = FS 65h = GS
-
Operand override, 66h. Changes size of data expected by default mode of the instruction e.g. 16-bit to 32-bit and vice versa.
-
Address override, 67h. Changes size of address expected by the instruction. 32-bit address could switch to 16-bit and vice versa.
-
25. Alternate Encodings for Instructions
-
To shorten program code, Intel created alternate (shorter) encodings of some very commonly used instructions.
-
For example, x86 provides a single byte opcode for
add al, constant ; one-byte opcode and no MOD-REG-R/M byte add eax, constant ; one-byte opcode and no MOD-REG-R/M bytethe opcodes are 04h and 05h, respectively. Also,
-
These instructions are one byte shorter than their standard ADD immediate counterparts.
-
Note that
add ax, constant ; operand size prefix byte + one-byte opcode, no MOD-REG-R/M byte
requires an operand size prefix just as a standard ADD AX, constant instruction, yet is still one byte shorter than the corresponding standard version of ADD immediate.
-
Any decent assembler will automatically choose the shortest possible instruction when translating program into machine code.
-
Intel only provides alternate encodings only for the accumulator registers AL, AX, EAX.
-
This is a good reason to use accumulator registers if you have a choice
-
(also a good reason to take some time and study encodings of the x86 instructions.)
-
26. x86 Opcode Summary
-
x86 opcodes are represented by one or two bytes.
-
Opcode could extend into unused bits of MOD-REG-R/M byte.
-
Opcode encodes information about
-
operation type,
-
operands,
-
size of each operand, including the size of an immediate operand.
-
27. MOD-REG-R/M Byte Summary
|
|
-
If operand is in memory, or operand is a register:
-
MOD field (bits [7:6]), combined with the R/M field (bits [2:0]), specify memory/register operand, as well as its addressing mode.
-
REG field (bits [5:3]) specifies another register operand in of the two-operand instruction.
-
28. ISA Design Considerations
-
Instruction set architecture design that can stand the test of time is a true intellectual challenge.
-
It takes several compromises between space and efficiency to assign opcodes and encode instruction formats.
-
Today people are using Intel x86 instruction set for purposes never intended by original designers.
-
Extending the CPU is a very difficult task.
-
The instruction set can become extremely complex.
-
If x86 CPU was designed from scratch today, it would have a totally different ISA!
-
Software developers usually don't have a problem adapting to a new architecture when writing new software...
-
...but they are very resistant to moving existing software from one platform to another.
-
-
This is the primary reason the Intel x86 platform remains so popular to this day.
29. ISA Design Challenges
-
Allowing for future expansion of the chip requires some undefined opcodes.
-
From the beginning there should be a balance between the number of undefined opcodes and
-
the number of initial instructions, and
-
the size of your opcodes (including special assignments.)
-
-
Hard decisions:
-
Reduce the number of instructions in the initial instruction set?
-
Increase the size of the opcode?
-
Rely on an opcode prefix byte(s), which makes later added instructions longer?
-
-
There are no easy answers to these challenges for CPU designers!
30. Intel Architecture Software Developer's Manual
-
Classic Intel Pentium II Architecture Software Developer's Manual contains three parts:
-
, Intel Basic Architecture: , PDF, 2.6 MB.
-
, Instruction Set Reference: , PDF, 6.6 MB.
-
, System Programing Guide: , PDF, 5.1 MB.
-
-
It is highly recommended that you download the above manuals and use them as a reference.
31. Intel Instruction Set Reference (Volume2)
-
Chapter 3 of the describes
-
each Intel instruction in detail
-
algorithmic description of each operation
-
effect on flags
-
operand(s), their sizes and attributes
-
CPU exceptions that may be generated.
-
-
The instructions are arranged in alphabetical order.
-
Appendix A provides opcode map for the entire Intel Architecture instruction set.
32. Chapter 3 of Intel Instruction Set Reference
-
Chapter 3 begins with instruction format example and explains the Opcode column encoding.
-
The Opcode column gives the complete machine codes as it is understood by the CPU.
-
When possible, the actual machine code bytes are given as exact hexadecimal bytes, in the same order in which they appear in memory.
-
However, there are opcode definitions other than hexadecimal bytes...
33. Intel Reference Opcode Bytes
-
Fow example,
34. Intel Reference Opcode Bytes, Cont.
-
/digit - A digit between 0 and 7 indicates that
-
The reg field of Mod R/M byte contains the instruction opcode extension.
-
The r/m (register or memory) operand of Mod R/M byte indicates
R/M Addressing Mode === =========================== 000 register ( al / ax / eax ) 001 register ( cl / cx / ecx ) 010 register ( dl / dx / edx ) 011 register ( bl / bx / ebx ) 100 register ( ah / sp / esp ) 101 register ( ch / bp / ebp ) 110 register ( dh / si / esi ) 111 register ( bh / di / edi ) -
-
The size bit in the opcode specifies 8 or 32-bit register size.
-
A 16-bit register requires a prefix byte:
Opcode Instruction -------- ------------ 41h INC ECX 66h 41h INC CX
35. Intel Reference Opcode Bytes, Cont.
-
/r - Indicates that the instruction uses the Mod R/M byte of the instruction.
-
Mod R/M byte contains both
-
a register operand reg and
-
an r/m (register or memory) operand.
-
36. Intel Reference Opcode Bytes, Cont.
-
cb, cw, cd, cp - A 1-byte (cb), 2-byte (cw), 4-byte (cd), or 6-byte (cp) value,
following the opcode, is used to specify-
a code offset,
-
and possibly a new value for the code segment register CS.
-
37. Intel Reference Opcode Bytes, Cont.
-
ib, iw, id - A 1-byte (ib), 2-byte (iw), or 4-byte (id) indicates presence of the immediate operand in the instruction.
-
Typical order of opcode bytes is
-
opcode
-
Mod R/M byte (optional)
-
SIB scale-indexing byte (optional)
-
immediate operand.
-
-
The opcode determines if the operand is a signed value.
-
All words and doublewords are given with the low-order byte first (little endian).
38. Intel Reference Opcode Bytes, Cont.
-
+rb, +rw, +rd - A register code, from 0 through 7, added to the hexadecimal byte given at the left of the plus sign to form a single opcode byte.
-
Register Encodings Associated with the +rb, +rw, and +rd:
For example,
39. Intel Reference Instruction Column
-
The Instruction column gives the syntax of the instruction statement as it would appear in a 386 Assembly program.
-
For example,