Post

Of Pointers and Men (3)

This post is an automatic translation from French. You can read the original version here.

The weather is nice today: the sun is out, the birds are chirping… I think it is time to cover in a short article the common sources of pointer usage. It will be hard to be exhaustive, but I think it won’t be too long: we should be able to cover a few typical examples and get back to playing in the sun soon enough. Shall we?

The MMU, the safety net of processes

As we saw in the previous episodes, a pointer is just a variable containing a memory cell number, or an “address” if you prefer. But that address still needs to be valid!

What happens if it is not? To understand the answer to this question, we need to return to virtual memory management and the role of the MMU (Memory Management Unit).

On a PC, every process believes it is alone in the world and has the entirety of memory to itself. But it is being lied to. The addresses it uses are nothing but fake addresses (called virtual addresses) which are translated by the MMU into physical addresses.

image

Thus, if the process does not use certain parts of memory, the MMU has not yet established a correspondence with actual RAM, and we save space! In the example above, the process only uses three memory zones, respectively labeled 1, 2 and 3. The MMU has mapped them to three zones of RAM that it reserved for this purpose.

If a second process also needs RAM, it will do the same:

image

Two things make this system work. The first is that a process generally does not use all the RAM available to it. Unused zones are simply unmapped (with no RAM correspondence) and therefore take up no physical memory space.

The second thing that works in our favor is that processes will sometimes request the loading into memory of the same resource: a library, a file, or something else. In that case, the MMU loads the resource only once into memory and independently remaps it for each process that requests it. In the graph above, this is the case for the zone labeled “2”. But the MMU is not only in charge of “translating” addresses: it also manages access rights. And similarly to files, you can have read, write and execute permissions on a memory zone.

One last thing: the MMU’s granularity of work is not the byte. To simplify its job, it works in “memory pages”, meaning in chunks of 4096 bytes (this may vary depending on the machine, generally between 512 and 8192 bytes). If you only want 4 bytes, the MMU will map 4096 anyway (1 page).

Let’s come back to our pointers. In this context of MMU protection, when can they cause problems?

The dangling pointer, or how to shoot yourself in the foot.

The C developer’s worst enemy is a pointer containing an invalid address. Because it will have dramatic consequences on the program’s behavior. Let’s take a simple example:

#include <stdio.h>

int
main()
{

    int variable = 17 ;
    int* pointeur = &variable;

    printf("Le pointeur contient l'adresse %p \n", pointeur) ;

    printf("Je modifie cette adresse \n") ;
    *pointeur = 42 ;

    printf("still alive !!!! \n") ;

    return 0 ;
}

The program works perfectly:

$ ./test
Le pointeur contient l'adresse 0x7ffcfb309ffc
Je modifie cette adresse
still alive !!!!

But what would happen if we forgot to initialize our pointer?

int
main()
{

    int variable = 17 ;
    int* pointeur ;         // The pointer contains an unknown address!!!!

    printf("Le pointeur contient l'adresse %p \n", pointeur) ;

    printf("Je modifie cette adresse \n") ;
    *pointeur = 42 ;

    printf("still alive !!!! \n") ;

    return 0 ;
}
+$ ./test
Le pointeur contient l'adresse (nil)
Je modifie cette adresse
Erreur de segmentation

This time, the program crashes and throws a magnificent SEGFAULT. It is actually the MMU raising the alarm, because we are trying to access address zero (nil) and that address is not authorized for us!

A segmentation fault can occur in several situations:

  • When trying to access address zero (the NULL pointer)
  • When trying to access a memory zone not mapped by the MMU
  • When trying to access a memory zone in write or execute mode for which we don’t have the rights.

However, if you write to an erroneous memory zone that you do own: the MMU lets you do it! After all, being foolish has never been illegal as long as you follow the rules! (come to think of it, that’s the whole story of Twitter…)

This last type of error is the most insidious. And on top of that, it will sometimes be hard to reproduce for debugging purposes!

Stressing the MMU a bit…

To illustrate the access rules a bit, let’s see what happens if we systematically write anywhere:

#include <stdio.h>
#include <stdlib.h>

int A ;

int
main()
{
    int* p ;
    p = &A ;

    while(1) {

        printf("Acces à %p :\n", p ) ;
        *p= 42 ;
        printf("ok ! \n") ;

        p += 1 ;
    }

    return 0 ;
}

As I imagine you’ve understood, we start from A’s address and progressively advance through memory, without caring where we are writing. Let’s see how far we can go:

$ ./test
Acces à 0x55872c89903c :
ok !
Acces à 0x55872c899040 :
ok !
Acces à 0x55872c899044 :
ok !
Acces à 0x55872c899048 :
ok !
Acces à 0x55872c89904c :
ok !
Acces à 0x55872c899050 :
ok !
Acces à 0x55872c899054 :

[...]

Acces à 0x55872c899ff8 :
ok !
Acces à 0x55872c899ffc :
ok !
Acces à 0x55872c89a000 :
Erreur de segmentation

Mmmmh… That’s a lot of reckless writes! What is the MMU doing?

If we do the math:

0x55872c89a000 - 0x55872c89903c = 0XFC4 = 4036

That looks like the size of a memory page!

To be sure, let’s start the program again in a debugger:

$ gcc -g -o test main.c
$ gdb ./test

[...]

@gef> b main
Breakpoint 1 at 0x114d: file main.c, line 10.

@gef>  run
Starting program: /home/rancune/test/test
Breakpoint 1, main () at main.c:10
[...]

@gef>  print &A
$1 = (int *) 0x55555555803c <A>

@gef>  vmmap
[ Legend:  Code | Heap | Stack ]
Start              End                Offset             Perm Path
0x00555555554000 0x00555555555000 0x00000000000000 r-- /home/rancune/test/test
0x00555555555000 0x00555555556000 0x00000000001000 r-x /home/rancune/test/test
0x00555555556000 0x00555555557000 0x00000000002000 r-- /home/rancune/test/test
0x00555555557000 0x00555555558000 0x00000000002000 r-- /home/rancune/test/test
0x00555555558000 0x00555555559000 0x00000000003000 rw- /home/rancune/test/test
0x007ffff7de1000 0x007ffff7de3000 0x00000000000000 rw-
0x007ffff7de3000 0x007ffff7e05000 0x00000000000000 r-- /lib64/libc-2.33.so
0x007ffff7e05000 0x007ffff7f48000 0x00000000022000 r-x /lib64/libc-2.33.so
0x007ffff7f48000 0x007ffff7f93000 0x00000000165000 r-- /lib64/libc-2.33.so
0x007ffff7f93000 0x007ffff7f97000 0x000000001af000 r-- /lib64/libc-2.33.so
0x007ffff7f97000 0x007ffff7f99000 0x000000001b3000 rw- /lib64/libc-2.33.so
0x007ffff7f99000 0x007ffff7f9f000 0x00000000000000 rw-
0x007ffff7fc6000 0x007ffff7fca000 0x00000000000000 r-- [vvar]
0x007ffff7fca000 0x007ffff7fcc000 0x00000000000000 r-x [vdso]
0x007ffff7fcc000 0x007ffff7fcd000 0x00000000000000 r-- /lib64/ld-2.33.so
0x007ffff7fcd000 0x007ffff7ff1000 0x00000000001000 r-x /lib64/ld-2.33.so
0x007ffff7ff1000 0x007ffff7ffb000 0x00000000025000 r-- /lib64/ld-2.33.so
0x007ffff7ffb000 0x007ffff7ffd000 0x0000000002e000 r-- /lib64/ld-2.33.so
0x007ffff7ffd000 0x007ffff7fff000 0x00000000030000 rw- /lib64/ld-2.33.so
0x007ffffffdd000 0x007ffffffff000 0x00000000000000 rw- [stack]
0xffffffffff600000 0xffffffffff601000 0x00000000000000 --x [vsyscall]

@gef>  continue
[...]
Acces à 0x555555579ffc :
ok !
Acces à 0x55555557a000 :

Program received signal SIGSEGV, Segmentation fault.

The vmmap command allows us to view the memory spaces currently mapped by the MMU.

Our variable A, whose address is 0x55555555803c, is in the block:

[ Legend:  Code | Heap | Stack ]
Start              End                Offset             Perm Path
[...]
0x00555555558000 0x00555555559000 0x00000000003000 rw- /home/rancune/test/test
[...]

And in that block, we happen to have write permissions (rw), which is quite handy, it must be said.

But then what?

Well then, there is nothing. No memory block mapped by the MMU before address 0x007ffff7de1000. So the MMU complains when we access it and that’s the segfault!

Several things to observe in this little experiment. The first is that we can write without issue pretty much everywhere, as long as we stay within the zones mapped by the MMU. A (small) address error won’t necessarily crash the program right away and will come back to haunt us much later! The whole philosophy of C is that the programmer knows what they are doing, and we won’t get in their way!

The second is to confirm that this block is indeed 4096 bytes: the MMU mapped an entire memory page for us even though, presumably, there is not much going on in that area. This is the “bss” zone which contains uninitialized global variables.

$ size ./test
text    data     bss     dec     hex filename
1552     592       8    2152     868 ./test

I now suggest we look at a few classic error examples…

A small gallery of errors

  • The uninitialized pointer

I won’t go over this extensively – we just did it in great detail:

int
main()
{
    int* p ;
    *p = 69 ;

    return 0 ;
}

We are writing a 4-byte value somewhere in memory. But where exactly???? It is unlikely to be without consequences…

A good habit is to always set a NULL (zero) in pointers you have just declared. It costs nothing, and it is much easier to debug.

  • C syntax

Now this one is not too serious because the compiler will yell at you. When you want to declare three pointers, you might be tempted to do this:

double* A, B, C ;

Except no. In what I just wrote, A is indeed a double*, but B and C are doubles. The correct declaration syntax is this:

double *A, *B, *C ;

I don’t really categorize this as a pointer error, but I needed to tell you about it. Now it’s done! \o/

  • The “expired” pointer

This one is a bit more insidious. Take the following function:

int*
fonction_qui_pue( int A ) {
    int p ;
    p = A+1 ;
    return &p ;
}

The address you get back as the function’s result is the address of a local variable, whose scope is limited to the function. And when you try to use that address, p is no longer there. p has vanished.

In this specific case, p lives on the stack, and that means if we use this address much later in the program, there is a strong chance we will write right in the middle of other data. Not good.

A fun variant:

int
main() {
    int *K ;
    {
        int G ;
        K = &G ;
    }

    [...]

}

In C, a variable’s scope is the block in which it is declared. When you exit that block, the variable no longer exists. K will therefore contain a problematic address…

Final words

As always, I tell myself “My goodness, already? This is way too long!!!!” And yet there is so much more to say! Pointers are a powerful tool, but with no safety net, and you’d better understand what you are doing when manipulating them.

There are plenty of other ways to shoot yourself in the foot, especially with malloc and free, but we will cover those next time!

See you soon,

Rancune.

This post is licensed under CC BY 4.0 by the author.