An Interesting Pointer Puzzle – Dennis Kubes

6 min read Original article ↗

A reader of my blog sent me a question the other day asking to explain a piece of code with pointers. I found it to be a very interesting puzzle, not just because I had to drop into an object dump with a friend to work through it. The error is consistent, even across platforms. Here is a slightly modified version of the original code. We will call this file bad.c. See if you can notice the error.

#include <stdio.h>

int main()

{  

    int *a[] = {0,1,2,3,4};

    printf("arr0=%d\n", *a+0);

    printf("arr1=%d\n", *a+1);

    printf("arr2=%d\n", *a+2);

    printf("arr3=%d\n", *a+3);

    printf("arr4=%d\n", *a+4);

    return 0;

}

If you compile that, gcc bad.c you will get a bunch of warnings. Affirming that it is not good to ignore warnings, upon running it you should see output like this.

arr0=0
arr1=4
arr2=8
arr3=12
arr4=16

We ran this on both x64 mac and linux. Same result. What the expected output is is something like this.

arr0=0
arr1=1
arr2=2
arr3=3
arr4=4

Try to figure out the error before going further.

The Error

The error is actually easy to overlook. The int *a[] means an array of int pointers. What was probably intended was an array of ints.

int main()

{  

    int *a[] = {0,1,2,3,4};

    // should be int a[] = {0,1,2,3,4};

    // notice no pointer star before a

    printf("arr0=%d\n", *a+0);

    // also *a+0 should probably be *(a + 0)

    ...

}

Change that and compile it and done. We get the output. The *a+n expression dereferences the value at the start of the a array, in this case 0, and adds n to it. The *a+0 was probably intended to be *(a + 0) as well, leading to the same output but for a different reason. Mildly interesting.

Ignoring the array of int pointers instead of ints and the dereferencing logic errors, what is interesting is that the error is consistent, even across platforms. Do you know why?

The Puzzle

Confused. Run the error version multiple times you will get the same output. Usually with pointer errors you get changing memory locations and values. But the error output was consistent. I had to work through it in an object dump with my friend Seth Hartbecke to figure out what was going on.

#include <stdio.h>

int main()

{  

    int *a[] = {0,0,0,0,0};

    // change array to all zeros, compile, same output

}

Let’s drop into assembly output.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

000000000040052d <main>:

  40052d: 55                   push   %rbp

  40052e: 48 89 e5             mov    %rsp,%rbp

  400531: 48 83 ec 30           sub    $0x30,%rsp

  400535: 48 c7 45 d0 00 00 00 movq   $0x0,-0x30(%rbp) # placing array onto stack

  40053c: 00

  40053d: 48 c7 45 d8 00 00 00 movq   $0x0,-0x28(%rbp)

  400544: 00

  400545: 48 c7 45 e0 00 00 00 movq   $0x0,-0x20(%rbp)

  40054c: 00

  40054d: 48 c7 45 e8 00 00 00 movq   $0x0,-0x18(%rbp)

  400554: 00

  400555: 48 c7 45 f0 00 00 00 movq   $0x0,-0x10(%rbp)

  40055c: 00

  40055d: 48 8b 45 d0           mov    -0x30(%rbp),%rax # move first array value to rax register

  400561: 48 89 c6             mov    %rax,%rsi # no add just move rax value

  400564: bf 94 06 40 00       mov    $0x400694,%edi

  400569: b8 00 00 00 00       mov    $0x0,%eax

  40056e: e8 9d fe ff ff       callq  400410 <printf@plt>

  400573: 48 8b 45 d0           mov    -0x30(%rbp),%rax # move first array value to rax register

  400577: 48 83 c0 04           add    $0x4,%rax # add 4 to the value in the rax register

  40057b: 48 89 c6             mov    %rax,%rsi

  40057e: bf 9d 06 40 00       mov    $0x40069d,%edi

  400583: b8 00 00 00 00       mov    $0x0,%eax

  400588: e8 83 fe ff ff       callq  400410 <printf@plt>

  40058d: 48 8b 45 d0           mov    -0x30(%rbp),%rax # move first array value to rax register

  400591: 48 83 c0 08           add    $0x8,%rax # add 8 to the value in the rax register

  400595: 48 89 c6             mov    %rax,%rsi

  400598: bf a6 06 40 00       mov    $0x4006a6,%edi

  40059d: b8 00 00 00 00       mov    $0x0,%eax

  4005a2: e8 69 fe ff ff       callq  400410 <printf@plt>

  4005a7: 48 8b 45 d0           mov    -0x30(%rbp),%rax # move first array value to rax register

  4005ab: 48 83 c0 0c           add    $0xc,%rax # add 12 to the value in the rax register

  4005af: 48 89 c6             mov    %rax,%rsi

  4005b2: bf af 06 40 00       mov    $0x4006af,%edi

  4005b7: b8 00 00 00 00       mov    $0x0,%eax

  4005bc: e8 4f fe ff ff       callq  400410 <printf@plt>

  4005c1: 48 8b 45 d0           mov    -0x30(%rbp),%rax # move first array value to rax register

  4005c5: 48 83 c0 10           add    $0x10,%rax # add 16 to the value in the rax register

  4005c9: 48 89 c6             mov    %rax,%rsi

  4005cc: bf b8 06 40 00       mov    $0x4006b8,%edi

  4005d1: b8 00 00 00 00       mov    $0x0,%eax

  ...

This is interesting. It takes the first value of the array, 0, and adds 4, 8, 12, and 16 to it in sequence to give us our output. Where did those values come from? Have you solved it?

A Bad Case of Pointer Math

This is a bad case of pointer math.

int main()

{  

    int *a[] = {0,0,0,0,0};  // array of int pointers, pointers hold addresses

    printf("arr0=%d\n", *a+0);

    printf("arr1=%d\n", *a+1);

    printf("arr2=%d\n", *a+2);

    printf("arr3=%d\n", *a+3);

    printf("arr4=%d\n", *a+4);

    ...

}

The first piece of the puzzle is that int *a[] is an array of pointers. The initialization {0,0,0,0,0} or {0,1,2,3,4} are, rightly so according to the compiler, seen as pointers holding addresses. There are instances where absolute addresses are used, though it is more common in embedded programming.

The second piece to the puzzle is the 0, 4, 8, 12, 16 in the assembly. These are equivalent to 0 * 4, 1 * 4, 2 * 4, 3 * 4, and 4 * 4 respectively.

The int *a[] being an array of pointers, the compiler is doing pointer math. The *a + 1 value is being evaluated as an int pointer + n. In this case ints are 4 bytes and according to pointer math, a + n items get translated to 4 bytes * n items.

int main()

{  

    int *a[] = {0,0,0,0,0};  // array of int pointers, pointers hold addresses

    printf("arr0=%d\n", *a+0); // *a+0 == 4 bytes * 0 == 0

    printf("arr1=%d\n", *a+1); // *a+1 == 4 bytes * 1 == 4

    printf("arr2=%d\n", *a+2); // *a+2 == 4 bytes * 2 == 8

    printf("arr3=%d\n", *a+3); // *a+3 == 4 bytes * 3 == 12

    printf("arr4=%d\n", *a+4); // *a+4 == 4 bytes * 4 == 16

    ...

}

In the end a consistent, and somewhat interesting, case of bad pointer math. All from one little character.