16

I need help understanding endianness inside CPU registers of x86 processors. I wrote this small assembly program:

section .data
section .bss

section .text
    global _start
_start:
    nop
    mov eax, 0x78FF5ABC
    mov ebx,'WXYZ'
    nop  ; GDB breakpoint here.
    mov eax, 1
    mov ebx, 0
    int 0x80

I ran this program in GDB with a breakpoint on line number 10 (commented in the source above). At this breakpoint, info registers shows the value of eax=0x78ff5abc and ebx=0x5a595857.

Since the ASCII codes for W, X, Y, Z are 57, 58, 59, 5A respectively; and intel is little endian, 0x5a595857 seems like the correct byte order (least significant byte first). Why isn't then the output for eax register 0xbc5aff78 (least significant byte of the number 0x78ff5abc first) instead of 0x78ff5abc?

wrxyz
  • 163
  • 1
  • 1
  • 4

4 Answers4

24

Endianness inside a register makes no sense since endianness describes if the byte order is from low to high memory address or from high to low memory address. Registers are not byte addressable so there is no low or high address within a register. What you are seeing is how your debugger print out the data.

slebetman
  • 109,858
  • 19
  • 140
  • 171
  • 1
    Thank you for the response. Apparently 'WXYZ' is indeed stored in the reverse order in the ebx register. According to the debugger, the BL register contains 87 (decimal value of 0x57); surely, there is a notion of byte ordering here? Why was 'W' not stored in the highest 8 bits of ebx instead? – wrxyz Dec 22 '10 at 00:01
  • 5
    Actually, the registers *are* partially byte addressable. You can access the lower two bytes of EAX with AL and AH. – Jim Mischel Dec 22 '10 at 00:46
  • 3
    That's byte accessible, not byte "addressable". You can access that lower byte but still cannot answer the question: "is that lower byte located in a memory address higher or lower than the higher byte". (well, you can argue that the opcode, if interpreted as an integer, is larger or smaller than the other but that is very arbitrary) – slebetman Jul 12 '13 at 06:54
  • 2
    So when we say that a CPU is little-endian, are we saying that the CPU will read multiple bytes from the memory following the "little-endian" rules? And the values in the register will be held in Big Endian always? – Koray Tugay Jan 20 '16 at 20:30
  • @KorayTugay: As I said. Endianness makes no sense to registers. Endianness refers to which address stores which part of a number. Registers only have ONE address. Memory have one address for each byte. – slebetman Jan 21 '16 at 00:15
  • 1
    @KorayTugay: I would also like to point out that `integer && 0xff` will ALWAYS give you the least significant byte in C/C++ regardless of little endian or big endian. It's just how integers are defined. You need to convert that integer to an array of char to see endianness behavior. – slebetman Jan 21 '16 at 00:17
  • @slebetman Thank you for the information. Can we say that the accepted answer here: http://programmers.stackexchange.com/questions/223957/little-and-big-endian-confusion is not quite right? – Koray Tugay Jan 21 '16 at 07:29
  • @slebetman I would also like to ask if endianness is only about memory, or does it also apply to how information is stored in the hard drive? – Koray Tugay Jan 21 '16 at 07:44
  • 2
    @KorayTugay: Hard drive is also memory: magnetic memory. Since hard drives are addressable byte-by-byte then yes, endianness matters on disk. – slebetman Jan 21 '16 at 09:51
18

The assembler is handling the two constants differently. Internally, a value in the EAX register is stored in big-endian format. You can see that by writing:

mov eax, 1

If you inspect the register, you'll see that its value is 0x00000001.

When you tell the assembler that you want the constant value 0x78ff5abc, that's exactly what gets stored in the register. The high 8 bits of EAX will contain 0x78, and the AL register contains 0xbc.

Now if you were to store the value from EAX into memory, it would be laid out in memory in the reverse order. That is, if you were to write:

mov [addr],eax

And then inspected memory at [addr], you would see 0xbc, 0x5a, 0xff, 0x78.

In the case of 'WXYZ', the assembler assumes that you want to load the value such that if you were to write it to memory, it would be laid out as 0x57, 0x58, 0x59, 0x5a.

Take a look at the code bytes that the assembler generates and you'll see the difference. In the case of mov eax,0x78ff5abc, you'll see:

<opcodes for mov eax>, 0xbc, 0x5a, 0xff, 0x78

In the case of mov eax,WXYZ, you'll see:

<opcodes for mov eax>, 0x57, 0x58, 0x59, 0x5a
Jim Mischel
  • 131,090
  • 20
  • 188
  • 351
  • 4
    So when we say that a CPU is little-endian, are we saying that the CPU will read multiple bytes from the memory following the "little-endian" rules? And the values in the register will be held in Big Endian always? – Koray Tugay Jan 20 '16 at 20:30
  • 6
    @KorayTugay: To my knowledge, that's true for modern processors. I don't know about older processors, but I suspect it's true. "Endianess" is concerned only with how the CPU expects values to be stored in memory. – Jim Mischel Jan 20 '16 at 20:32
  • @JimMischel I do realize this post is 3 years old by now. But why do you claim "nternally, a value in the EAX register is stored in big-endian format"? In your example, the AL register contains 0xbc, which means the least significant bits are stored in the lower 8 bits of EAX. Isn't that the definition of little endian? – Oliver Young Aug 16 '19 at 05:58
  • @OliverYoung No, that is not little endian. In little endian, the 32-bit value 0xDEADBEEF stored in memory at address 0x12345678 would have the value 0x0F at address 0x12345678, 0x0E at 0x123456789, etc. The bytes are stored right to left. That's not the case when you view the CPU register. In any case, as others pointed out, endianness makes sense only for memory. – Jim Mischel Aug 16 '19 at 15:11
11

Endianness makes sense only for memory, where each byte have a numeric address. When MSByte of a value is put in higher memory address than the LSByte, it's called Littte endian, and this is the endianness of any x86 processor.

While for integers the distinction between LSByte and MSByte is clear:

    0x12345678
MSB---^^    ^^---LSB

It's not defined for string literals! It's not obvious what part of the WXYZ should be considered LSB or MSB:

1) The most obvious way,

'WXYZ' ->  0x5758595A

would lead to memory order ZYXW.

2) The not not so obvious way, when the memory order should match the order of literals:

'WXYZ' ->  0x5A595857

The assembler have to choose one of them, and apparently it chooses the second.

ruslik
  • 14,714
  • 1
  • 39
  • 40
  • Yes, NASM chooses the sensible option, that multi-character literals assemble to an integer value with the bytes in source order, whether that's in a `db 'WXYZ' or a `mov eax, 'WXYZ'` or `mov dword [rdi], 'WXYZ'`. ([How are dw and dd different from db directives for strings?](https://stackoverflow.com/q/38860174)). Some other assemblers, like MASM, are less sane. [When using the MOV mnemonic to load/copy a string to a memory register in MASM, are the characters stored in reverse order?](https://stackoverflow.com/a/57436181) – Peter Cordes Dec 04 '22 at 18:11
5

In simple words, treat registers as just values, endiannes on how they are finally stored is not important.

You know that writing on eax you write a 32 bit number, and you know that reading from eax you will read the same 32 bit number. In this terms, endianness doesn't matter.

Than you know that in "al" you have less significant 8-bit part of the value, in "ah" most significan 8-bit part of the lower 16 bits. There is no way to access single bytes on higher 16bits, except of course reading the whole 32 bit value.