An interactive guide to x86-64 assembly - moving data

8 min read Original article ↗

This is the second part of a series of interactive articles on the x86-64 architecture. This part will focus on the first assembly instructions, visualizing the way data moves in memory when they are executed.

Visualizing memory

In the previous post we introduced some basics on data, encodings, and the places where data is stored: registers and memory. We also introduced a common way to visualize memory, that will be used extensively in this article: hex dumps.

The example below shows a hexdump of some example data taken from the stack frame of a process. Use the slider to adjust the number of bytes you want to see in a single row.

00000000

00000001

00000002

00000003

00000004

00000005

00000006

00000007

00000008

00000009

0000000a

0000000b

0000000c

0000000d

0000000e

0000000f

00000010

00000011

00000012

00000013

00000014

00000015

00000016

00000017

00000018

00000019

0000001a

0000001b

0000001c

0000001d

0000001e

0000001f

00000020

00000021

00000022

00000023

00000024

00000025

00000026

00000027

00000028

00000029

0000002a

0000002b

0000002c

0000002d

0000002e

0000002f

00000030

00000031

00000032

00000033

00000034

00000035

00000036

00000037

00000038

00000039

0000003a

0000003b

0000003c

0000003d

0000003e

0000003f

00000040

00000041

00000042

00000043

00000044

00000045

00000046

00000047

00000048

00000049

0000004a

0000004b

0000004c

0000004d

0000004e

0000004f

00000050

00000051

00000052

00000053

00000054

00000055

00000056

00000057

00000058

00000059

0000005a

0000005b

0000005c

0000005d

0000005e

0000005f

00000060

00000061

00000062

00000063

00000064

00000065

00000066

00000067

00000068

00000069

0000006a

0000006b

0000006c

0000006d

0000006e

0000006f

00000070

00000071

00000072

00000073

00000074

00000075

00000076

00000077

00000078

00000079

0000007a

0000007b

0000007c

0000007d

0000007e

0000007f

00000080

00000081

00000082

00000083

00000084

00000085

00000086

00000087

00000088

00000089

0000008a

0000008b

0000008c

0000008d

0000008e

0000008f

00000090

00000091

00000092

00000093

00000094

00000095

00000096

00000097

00000098

00000099

0000009a

0000009b

0000009c

0000009d

0000009e

0000009f

000000a0

000000a1

000000a2

000000a3

000000a4

000000a5

000000a6

000000a7

000000a8

000000a9

000000aa

000000ab

000000ac

000000ad

000000ae

000000af

000000b0

000000b1

000000b2

000000b3

000000b4

000000b5

000000b6

000000b7

000000b8

000000b9

000000ba

000000bb

000000bc

000000bd

000000be

000000bf

000000c0

000000c1

000000c2

000000c3

000000c4

000000c5

000000c6

000000c7

000000c8

000000c9

000000ca

000000cb

000000cc

000000cd

000000ce

000000cf

000000d0

000000d1

000000d2

000000d3

000000d4

000000d5

000000d6

000000d7

000000d8

000000d9

000000da

000000db

000000dc

000000dd

000000de

000000df

6578616d706c652061736369692074657874000000000000e95155555555000040dcffff0100000058dcffffff7f00000000000000000000e804be1278e96fe058dcffffff7f0000e951555555550000987d55555555000040d0fff7ff7f0000e8041ca48716901fe8043428fd06901f00000000ff7f0000000000000000000000000000000000000000000000000000000000000000000000429e875dca2f7e0000000000000000409ec2f7ff7f000068dcffffff7f0000987d555555550000e0e2fff7ff7f0000000000000000000000000000000000000051555555550000

example ascii text.......QUUUU..@.......X...................x.o.X........QUUUU...}UUUU..@.................4(.............................................B..]./~........@.......h........}UUUU...........................QUUUU..

The reason I want you to familiarize with this visualization is also the rationale behind this series of articles: Most resources online explain low-level topics (such as stack frames, data alignment, or buffer overflows) using abstract diagrams. But when you will approach these topics in practice, you will use tools like gdb, that visualize data in a completely different way compared to the diagrams.
For example, this is a screenshot of my setup when running gdb with GEF and the python pwntools library:

Screenshot of a tmux terminal split vertically into two panes. The pane on the left is an interactive python shell. It has received the input: p.send(bytearray(key)). The pane on the right is a gdb session whth the gef plugin enabled. You can see registers, stack and current instruction for a process called ./babyrev_level6.0. The program is about to run a call to glibc readline

All the visualizations in this article emulate the way data is visualized in real-world scenarios, with tools like gdb or PWNDBG, popular in CTF competitions. My hope is that this will lower the steep learning curve of those tools.

Moving data

The first instruction we are going to see is mov, which moves data around. It can move data from a register to another, from a register to memory, or vice-versa from memory to a register

These first examples are self-explanatory:

mov rbx, 0x10  ;copies the integer 0x10 into rbx
mov rax, rbx   ;copies the content of rbx into rax

Moving data to memory requires some extra syntax:
The following snippet writes the byte 0xff in the memory cell at address 0x10.

mov rax, 0x10
mov byte ptr [rax], 0xff

Let’s break it down:

  • First, we put in a register 0x10, the address of the cell we want to write to.
  • Then we perform a mov instruction with square brackets around the register name, to indicate that we want to move 0xff in the memory address pointed by the register, and not into the register itself.

Notice how in that example we moved a single byte, and we used the syntax byte ptr. You can change that in word, dword or qword if you want to move a different amount of bytes.

The interactive example below allows you to experiment with all possible variations of the pointer syntax. You can click “run” to see how the memory is affected

code

mov rbx, 0x4242424242424242
mov rax, 0x20
mov  ptr [rax], bl

memory

00000000

00000001

00000002

00000003

00000004

00000005

00000006

00000007

00000008

00000009

0000000a

0000000b

0000000c

0000000d

0000000e

0000000f

00000010

00000011

00000012

00000013

00000014

00000015

00000016

00000017

00000018

00000019

0000001a

0000001b

0000001c

0000001d

0000001e

0000001f

00000020

00000021

00000022

00000023

00000024

00000025

00000026

00000027

00000028

00000029

0000002a

0000002b

0000002c

0000002d

0000002e

0000002f

00000030

00000031

00000032

00000033

00000034

00000035

00000036

00000037

00000038

00000039

0000003a

0000003b

0000003c

0000003d

0000003e

0000003f

00000040

00000041

00000042

00000043

00000044

00000045

00000046

00000047

00000048

00000049

0000004a

0000004b

0000004c

0000004d

0000004e

0000004f

00000050

00000051

00000052

00000053

00000054

00000055

00000056

00000057

00000058

00000059

0000005a

0000005b

0000005c

0000005d

0000005e

0000005f

00000060

00000061

00000062

00000063

00000064

00000065

00000066

00000067

00000068

00000069

0000006a

0000006b

0000006c

0000006d

0000006e

0000006f

00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

................................................................................................................

A sidenote on endianness

We managed to reach this point by ignoring an important fact: x86-64 is a little endian architecture, which means that numbers are not stored in the way you would expect.
In the previous example, you saw what the number 0x4242424242424242 looks like in memory, but we choose that number carefully to hide the issue. In the next example, you can enter the number you want.
Can you spot what’s happening?

code

mov rbx,  
mov rax, 0x20
mov qword ptr [rax], rbx

memory

00000000

00000001

00000002

00000003

00000004

00000005

00000006

00000007

00000008

00000009

0000000a

0000000b

0000000c

0000000d

0000000e

0000000f

00000010

00000011

00000012

00000013

00000014

00000015

00000016

00000017

00000018

00000019

0000001a

0000001b

0000001c

0000001d

0000001e

0000001f

00000020

00000021

00000022

00000023

00000024

00000025

00000026

00000027

00000028

00000029

0000002a

0000002b

0000002c

0000002d

0000002e

0000002f

00000030

00000031

00000032

00000033

00000034

00000035

00000036

00000037

00000038

00000039

0000003a

0000003b

0000003c

0000003d

0000003e

0000003f

00000040

00000041

00000042

00000043

00000044

00000045

00000046

00000047

00000048

00000049

0000004a

0000004b

0000004c

0000004d

0000004e

0000004f

00000050

00000051

00000052

00000053

00000054

00000055

00000056

00000057

00000058

00000059

0000005a

0000005b

0000005c

0000005d

0000005e

0000005f

00000060

00000061

00000062

00000063

00000064

00000065

00000066

00000067

00000068

00000069

0000006a

0000006b

0000006c

0000006d

0000006e

0000006f

4578616d706c65207465787400000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

Example text....................................................................................................

In case you missed it, numbers are being saved with their bytes in an inverted order: For example, the number 0xcafe is composed of the byte ca followed by fe, but it will be saved as the byte fe followed by the byte ca.

What’s going on here is that both humans and computers use a positional number system to represent integers, but with a different order.
When we (humans using Hindu-Arabic numerals) represent numbers, we write the most significant value first, and continue in descending order. This is the same as Big endian architectures.

  human-readable decimal number
  1337 
  |  |
  |  Least significant digit 
  Most significant digit 

  human-readable hex number
  0xcafebabe  
    |     |
    |     Least significant byte
    Most significant byte

Little endian architectures write the least significant value first instead, and continue in ascending order.

This topic is explained in depth on wikipedia, with some useful diagrams that will solve any doubts you might have.
Endianness is only related to the way the processor handles integers. Other kinds of data, such as text, are usually encoded in the same order as you would expect. Floating point numbers are stored in a completely different format instead, you can read more about them in this great article , or in this visual guide by Ciechanowski

The stack

x64, like most architectures, has the concept of stack: an area in memory pointed by the special register rsp.
You can add or remove elements from the top of the stack by using the push and pop instructions. This is the most common interaction, but it’s also valid to directly adjust the value of rsp. In this interactive example the stack area is highlighted in blue, together with the value of the rsp and rax registers.

code

mov rax, 0x4242424242424242
 rax

registers

rsp  : [50 ff ff 7f 00 00 00 00]0x7fffff50 
rax  : [00 00 00 00 00 00 00 00]0x00 

memory

7fffff00

7fffff01

7fffff02

7fffff03

7fffff04

7fffff05

7fffff06

7fffff07

7fffff08

7fffff09

7fffff0a

7fffff0b

7fffff0c

7fffff0d

7fffff0e

7fffff0f

7fffff10

7fffff11

7fffff12

7fffff13

7fffff14

7fffff15

7fffff16

7fffff17

7fffff18

7fffff19

7fffff1a

7fffff1b

7fffff1c

7fffff1d

7fffff1e

7fffff1f

7fffff20

7fffff21

7fffff22

7fffff23

7fffff24

7fffff25

7fffff26

7fffff27

7fffff28

7fffff29

7fffff2a

7fffff2b

7fffff2c

7fffff2d

7fffff2e

7fffff2f

7fffff30

7fffff31

7fffff32

7fffff33

7fffff34

7fffff35

7fffff36

7fffff37

7fffff38

7fffff39

7fffff3a

7fffff3b

7fffff3c

7fffff3d

7fffff3e

7fffff3f

7fffff40

7fffff41

7fffff42

7fffff43

7fffff44

7fffff45

7fffff46

7fffff47

7fffff48

7fffff49

7fffff4a

7fffff4b

7fffff4c

7fffff4d

7fffff4e

7fffff4f

7fffff50

7fffff51

7fffff52

7fffff53

7fffff54

7fffff55

7fffff56

7fffff57

7fffff58

7fffff59

7fffff5a

7fffff5b

7fffff5c

7fffff5d

7fffff5e

7fffff5f

7fffff60

7fffff61

7fffff62

7fffff63

7fffff64

7fffff65

7fffff66

7fffff67

7fffff68

7fffff69

7fffff6a

7fffff6b

7fffff6c

7fffff6d

7fffff6e

7fffff6f

7fffff70

7fffff71

7fffff72

7fffff73

7fffff74

7fffff75

7fffff76

7fffff77

7fffff78

7fffff79

7fffff7a

7fffff7b

7fffff7c

7fffff7d

7fffff7e

7fffff7f

7fffff80

7fffff81

7fffff82

7fffff83

7fffff84

7fffff85

7fffff86

7fffff87

7fffff88

7fffff89

7fffff8a

7fffff8b

7fffff8c

7fffff8d

7fffff8e

7fffff8f

7fffff90

7fffff91

7fffff92

7fffff93

7fffff94

7fffff95

7fffff96

7fffff97

7fffff98

7fffff99

7fffff9a

7fffff9b

7fffff9c

7fffff9d

7fffff9e

7fffff9f

7fffffa0

7fffffa1

7fffffa2

7fffffa3

7fffffa4

7fffffa5

7fffffa6

7fffffa7

7fffffa8

7fffffa9

7fffffaa

7fffffab

7fffffac

7fffffad

7fffffae

7fffffaf

7fffffb0

7fffffb1

7fffffb2

7fffffb3

7fffffb4

7fffffb5

7fffffb6

7fffffb7

7fffffb8

7fffffb9

7fffffba

7fffffbb

7fffffbc

7fffffbd

7fffffbe

7fffffbf

7fffffc0

7fffffc1

7fffffc2

7fffffc3

7fffffc4

7fffffc5

7fffffc6

7fffffc7

7fffffc8

7fffffc9

7fffffca

7fffffcb

7fffffcc

7fffffcd

7fffffce

7fffffcf

7fffffd0

7fffffd1

7fffffd2

7fffffd3

7fffffd4

7fffffd5

7fffffd6

7fffffd7

7fffffd8

7fffffd9

7fffffda

7fffffdb

7fffffdc

7fffffdd

7fffffde

7fffffdf

54686973206973206578616d706c65206461746100000000e95155555555000040dcffff0100000058dcffffff7f00000000000000000000e804be1278e96fe058dcffffff7f0000e951555555550000987d55555555000040d0fff7ff7f0000e8041ca48716901fe8043428fd06901f00000000ff7f0000000000000000000000000000000000000000000000000000000000000000000000429e875dca2f7e0000000000000000409ec2f7ff7f000068dcffffff7f0000987d555555550000e0e2fff7ff7f0000000000000000000000000000000000000051555555550000

This is example data.....QUUUU..@.......X...................x.o.X........QUUUU...}UUUU..@.................4(.............................................B..]./~........@.......h........}UUUU...........................QUUUU..

There are two key elements you should notice by plaing with the example above:

  • rsp points to the top of the stack. It is decreased by 8 when we push a value, and increased by 8 when we pop a value.
  • Every time we pop a value from the stack that value is not deleted, the area of memory that contains it simply stops being part of the stack. The only thing that changes is the memory address pointed by rsp.

Basically, push rax does the same as the following code:

sub rsp, 8
mov qword ptr [rsp], rax

And pop rax does the same as the following code

mov rax, qword ptr [rsp]
add rsp, 8

There is a confusing element here: when we put something onto the stack we are growing the stack, and yet we are moving towards lower addresses of memory.

With the way we visualize memory this actually looks correct, the stack is growing towards the top.
But if we only look at the numeric adresses of elements on the stack, newer elements have smaller addresses, which looks backwards.
Even when you are aware of this, it’s common to get confused and end up thinking: “i put a new value on the stack, but it has a smaller address than the previous value, what is going on?”

Memory alignment

I don’t think memory alignment can be explained in a better way than what this article does, so check it out. Here we’ll only focus on how memory alignment impacts the way we visualize the stack:
Every time you push or pop something from the stack, you move the stack pointer 8 bytes up or down. If you observe carefully the previous example, you’ll also notice that the addresses in the stack pointer are always multiples of 8: they always end with either 0 or 8.

This kind of alignment is done on purpose for performance reasons, and you will encounter it everywhere. As a consequence, when we visualize memory in a hexdump it’s common to start from addresses multiples of 8 or 16, so that data will fit properly in a row.

This is a hexdump taken from the stack memory of a function. Two different variables are highlighted: one is the 32-bit integer 0xcafebabe, the other is a stack canary, which we’ll see in another article. You can adjust the slider to change the start address in the hexdump.

showing memory from address 0x0

00000000

00000001

00000002

00000003

00000004

00000005

00000006

00000007

00000008

00000009

0000000a

0000000b

0000000c

0000000d

0000000e

0000000f

00000010

00000011

00000012

00000013

00000014

00000015

00000016

00000017

00000018

00000019

0000001a

0000001b

0000001c

0000001d

0000001e

0000001f

00000020

00000021

00000022

00000023

00000024

00000025

00000026

00000027

00000028

00000029

0000002a

0000002b

0000002c

0000002d

0000002e

0000002f

00000030

00000031

00000032

00000033

00000034

00000035

00000036

00000037

00000038

00000039

0000003a

0000003b

0000003c

0000003d

0000003e

0000003f

00000040

00000041

00000042

00000043

00000044

00000045

00000046

00000047

00000048

00000049

0000004a

0000004b

0000004c

0000004d

0000004e

0000004f

00000050

00000051

00000052

00000053

00000054

00000055

00000056

00000057

00000058

00000059

0000005a

0000005b

0000005c

0000005d

0000005e

0000005f

00000060

00000061

00000062

00000063

00000064

00000065

00000066

00000067

00000068

00000069

0000006a

0000006b

0000006c

0000006d

0000006e

0000006f

00000070

00000071

00000072

00000073

00000074

00000075

00000076

00000077

00000078

00000079

0000007a

0000007b

0000007c

0000007d

0000007e

0000007f

00000080

00000081

00000082

00000083

00000084

00000085

00000086

00000087

00000088

00000089

0000008a

0000008b

0000008c

0000008d

0000008e

0000008f

00000090

00000091

00000092

00000093

00000094

00000095

00000096

00000097

00000098

00000099

0000009a

0000009b

0000009c

0000009d

0000009e

0000009f

000000a0

000000a1

000000a2

000000a3

000000a4

000000a5

000000a6

000000a7

000000a8

000000a9

000000aa

000000ab

000000ac

000000ad

000000ae

000000af

000000b0

000000b1

000000b2

000000b3

000000b4

000000b5

000000b6

000000b7

000000b8

000000b9

000000ba

000000bb

000000bc

000000bd

000000be

000000bf

000000c0

000000c1

000000c2

000000c3

000000c4

000000c5

000000c6

000000c7

000000c8

000000c9

000000ca

000000cb

000000cc

000000cd

000000ce

000000cf

000000d0

000000d1

000000d2

000000d3

000000d4

000000d5

000000d6

000000d7

000000d8

000000d9

000000da

000000db

000000dc

000000dd

000000de

000000df

6578616d706c652061736369692074657874000000000000e951555555550000bebafeca0000000058dcffffff7f00000000000000000000e804be1278e96fe058dcffffff7f0000e951555555550000987d55555555000040d0fff7ff7f0000e8041ca48716901fe8043428fd06901f00000000ff7f0000000000000000000000000000000000000000000000000000000000000000000000429e875dca2f7e0000000000000000409ec2f7ff7f000068dcffffff7f0000987d555555550000e0e2fff7ff7f0000000000000000000000000000000000000051555555550000

example ascii text.......QUUUU..........X...................x.o.X........QUUUU...}UUUU..@.................4(.............................................B..]./~........@.......h........}UUUU...........................QUUUU..

What I’m trying to show here is that everything is relative. What you see is always an abstract representation of the actual data, and it’s up to you to visualize it in a way that matches your mental model.

Further Reading

This article is still under development, and it’s improving over time.
If you reached this point, you might be interested in the next articles:

Additional resources: