On Friday, I spent a number of hours trying to run down an error in a fairly substantial piece of code. All I really knew was that I kept getting an error that said something like:
*** glibc detected *** ./my_program: malloc(): memory corruption: 0x0000000002296980 ***
When I pushed this piece of code through gdb, I discovered that this triggered on the second pass of a function that, at its heart, called SSL_WRITE(). Apparently there was an allocation in SSL_WRITE() that was causing some mischief, right?
That’s just where the error was being discovered. At some point prior to the SSL_WRITE() call, I was corrupting the heap. There are basically two ways this occurs:
- You try to free something that has already been freed – usually glibc will tell you if it’s a double free
- You overwrote or under-freed a piece of memory, causing the heap to have some allocated patches that shouldn’t be allocated or (more commonly) some unallocated patches that you should probably have allocated
Usually, we catch a double-free or an off-by-one allocation issue fairly easily. They’re both very common problems. We usually double-free when we are unaware that another function frees data for us automatically, or when we misunderstand the flow of the code and inadvertently free something that is already gone or (in some spaghetti cases) not allocated yet.
We often over-write allocated memory when we fail to think about null-characters and/or actual lengths. We see the first case most often with strings – we think of “hello world” having 11 characters, but in reality it is “hello world\0”, which includes the null character at the end. If you try to write the null-terminated string to an 11-byte memory location, you will overflow it. Of course, off-by-one errors are especially common among new programmers, so some of this is to be expected, but it is essential that you develop patterns that prevent such errors.
I finally tracked my problem to a free() operation that is probably necessary, but was freeing a data patch that was the wrong size (somehow). As I continue to dig, I will discover exactly why this is a problem, but for the sake of proving the rest of my code worked, I temporarily commented it out. “Temporarily” is important here – I left ample commentary around the site to remind myself to a) go back and fix it, and b) keep track of everything I’ve learned so far.
There are three compiler-based ways to catch malloc() errors inside of your code, so long as you’re working with the GNU compilers. Those methods can be found at the Gnu C Library Site – one is a compiler tag you can set to catch obvious problems in the source code, one is an environment variable you can set to catch them in general (at the cost of speed and memory), and one is a function that lets you test particular allocated blocks.
None of those methods found my problem, though – sometimes a programmer has to actually walk the beat himself. Who knew?