So there I was. I did have to use a proprietary library, for which I had no sources and no real hope of support from the creators. I built my program against it, I ran it, and I got a segmentation fault. An exception that seemed to happen inside that insidious library, which was of course stripped of all debugging information. I scratched my head, changed my code, checked traces, tried valgrind
, strace
, and other debugging tools, but found no obvious error. Finally, I assumed that I had to dig deeper and do some serious debugging of the library’s assembly code with gdb
. The rest of the post is dedicated to the steps I followed to find out what was happening inside the wily proprietary library that we will call libProprietary. Prerequisites for this article are some knowledge of gdb
and ARM architecture.
Some background on the task I was doing: I am a Canonical employee that works as developer for Ubuntu for Phones. In most, if not all, phones, the BSP code is not 100% open and we have to use proprietary libraries built for Android. Therefore, these libraries use bionic, Android’s libc implementation. As we want to call them inside binaries compiled with glibc, we resort to libhybris, an ingenious library that is able to load and call libraries compiled against bionic while the rest of the process uses glibc. This will turn out to be critical in this debugging. Note also that we are debugging ARM 32-bits binaries here.
The Debugging Session
To start, I made sure I had installed glibc and other libraries symbols and started to debug by using gdb
in the usual way:
$ gdb myprogram GNU gdb (Ubuntu 7.9-1ubuntu1) 7.9 ... Starting program: myprogram [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1". [New Thread 0xf49de460 (LWP 7101)] [New Thread 0xf31de460 (LWP 7104)] [New Thread 0xf39de460 (LWP 7103)] [New Thread 0xf41de460 (LWP 7102)] [New Thread 0xf51de460 (LWP 7100)] Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0xf49de460 (LWP 7101)] 0x00000000 in ?? () (gdb) bt #0 0x00000000 in ?? () #1 0xf520bd06 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) info proc mappings process 7097 Mapped address spaces: Start Addr End Addr Size Offset objfile 0x10000 0x17000 0x7000 0x0 /usr/bin/myprogram ... 0xf41e0000 0xf49df000 0x7ff000 0x0 [stack:7101] ... 0xf51f6000 0xf5221000 0x2b000 0x0 /android/system/lib/libProprietary.so 0xf5221000 0xf5222000 0x1000 0x0 0xf5222000 0xf5224000 0x2000 0x2b000 /android/system/lib/libProprietary.so 0xf5224000 0xf5225000 0x1000 0x2d000 /android/system/lib/libProprietary.so ... (gdb)
We can see here that we get the promised crash. I execute a couple of gdb
commands after that to see the backtrace and part of the process address space that will be of interest in the following discussion. The backtrace shows that a segment violation happened when the CPU tried to execute instructions in address zero, and we can see by checking the process mappings that the previous frame lives inside the text segment of libProprietary.so. There is no backtrace beyond that point, but that should come as no surprise as there is no DWARF information in libProprietary, and also noting that usage of frame pointer is optimized away quite commonly these days.
After this I tried to get a bit more information on the CPU state when the crash happened:
(gdb) info reg r0 0x0 0 r1 0x0 0 r2 0x0 0 r3 0x9 9 r4 0x0 0 r5 0x0 0 r6 0x0 0 r7 0x0 0 r8 0x0 0 r9 0x0 0 r10 0x0 0 r11 0x0 0 r12 0xffffffff 4294967295 sp 0xf49dde70 0xf49dde70 lr 0xf520bd07 -182403833 pc 0x0 0x0 cpsr 0x60000010 1610612752 (gdb) disassemble 0xf520bd02,+10 Dump of assembler code from 0xf520bd02 to 0xf520bd0c: 0xf520bd02: b 0xf49c9cd6 0xf520bd06: movwpl pc, #18628 ; 0x48c4 <UNPREDICTABLE> 0xf520bd0a: andlt r4, r11, r8, lsr #12 End of assembler dump. (gdb)
Hmm, we are starting to see weird things here. First, in 0xf520bd02
(which probably has been executed little before the crash) we get an unconditional branch to some point in the thread stack (see mappings in previous figure). Second, the instruction in 0xf520bd06
(which should be executed after returning from the procedure that provokes the crash) would load into the pc
(program counter) an address that is not mapped: we saw that the first mapped address is 0x10000
in the previous figure. The movw
instruction has also a “pl” suffix that makes the instruction execute only when the operand is positive or zero… which is obviously unnecessary as 0x48c4
is encoded in the instruction.
I resorted to doing objdump -d libProprietary.so
to disassemble the library and compare with gdb
output. objdump
shows, in that part of the file (subtracting the library load address gives us the offset inside the file: 0xf520bd02-0xf51f6000=0x15d02
):
15d02: f7f3 eade blx 92c0 <__android_log_print@plt>; 15d06: f8c4 5304 str.w r5, [r4, #772] ; 0x304 15d0a: 4628 mov r0, r5 15d0c: b00b add sp, #44 ; 0x2c 15d0e: e8bd 8ff0 ldmia.w sp!, {r4, r5, r6, r7, r8, r9, sl, fp, pc}
which is completely different from what gdb
shows! What is happening here? Taking a look at addresses for both code chunks, we see that instructions are always 4 bytes in gdb
output, while they are 2 or 4 in objdump
‘s. Well, you have guessed, don’t you? We are seeing “normal” ARM instructions in gdb
, while objdump
is decoding THUMB-2 instructions. Certainly objdump
seems to be right here as the output is more sensible: we have a call to an executable part of the process space in 0x15d02
(it is resolved to a known function, __android_log_print
), and the following instructions seems like a normal function epilogue in ARM: a return value is stored in r0
, the sp
(stack pointer) is incremented (we are freeing space in the stack), and we restore registers.
If we get back to the register values, we see that cpsr
(current program status register [1]) does not have the T bit set, so gdb
thinks we are using ARM instructions. We can change this by doing
(gdb) set $cpsr=0x60000030 (gdb) disass 0xf520bd02,+15 Dump of assembler code from 0xf520bd02 to 0xf520bd11: 0xf520bd02: blx 0xf51ff2c0 0xf520bd06: str.w r5, [r4, #772] ; 0x304 0xf520bd0a: mov r0, r5 0xf520bd0c: add sp, #44 ; 0x2c 0xf520bd0e: ldmia.w sp!, {r4, r5, r6, r7, r8, r9, r10, r11, pc} End of assembler dump.
Ok, much better now [2]. The thumb bit in cpsr
is determined by last bx/blx
call: if the address is odd, the procedure to which we are calling contains THUMB instructions, otherwise they are ARM (a good reference for these instructions is [3]). In this case, after an exception the CPU moves to arm mode, and gdb
is unable to know which is the right mode when disassembling. We can search for hints on which parts of the code are arm/thumb by looking at the values in registers used by bx/blx
, or by looking at the lr
(link register): we can see above that the value after the crash was 0xf520bd07
, which is odd and indicates that 0xf520bd06
contains a thumb instruction. However, for some reason gdb
is not able to take advantage of this information.
Of course this problem does not happen if we have debugging information: in that case we have special symbols that let gdb
know if the section where the code is contains thumb instructions or not [4]. As those are not found, gdb
uses the cpsr
value. Here objdump
seems to have better heuristics though.
After solving this issue with instruction decoding, I started to debug __android_log_print
to check what was happening there, as it looked like the crash was happening in that call. I spent quite a lot of time there, but found nothing. All looked fine, and I started to despair. Until I inserted a breakpoint in address 0xf520bd06
, right after the call to __android_log_print
, run the program… and it stopped at that address, no crash happened. I started to execute the program instruction by instruction after that:
(gdb) b *0xf520bd06 (gdb) run ... Breakpoint 1, 0xf520bd06 in ?? () (gdb) si 0xf520bd0a in ?? () (gdb) si 0xf520bd0c in ?? () (gdb) si 0xf520bd0e in ?? () Warning: Cannot insert breakpoint 0. Cannot access memory at address 0x0
Something was apparently wrong with instruction ldmia
, which restores registers, including the pc
, from the stack. I took a look at the stack in that moment (taking into account that ldmia
had already modified the sp
after restoring 9 registers == 36 bytes):
(gdb) x/16xw $sp-36 0xf49dde4c: 0x00000000 0x00000000 0x00000000 0x00000000 0xf49dde5c: 0x00000000 0x00000000 0x00000000 0x00000000 0xf49dde6c: 0x00000000 0x00000000 0x00000000 0x00000000 0xf49dde7c: 0x00000000 0x00000000 0x00000000 0x00000000
All zeros! At this point it is clear that this is the real point where the crash is happening, as we are loading 0
into the pc
. This looked clearly like a stack corruption issue.
But, before moving forward, why are we getting a wrong backtrace from gdb
? Well, gdb
is seeing a corrupted stack, so it is not able to unwind it. It would not be able to unwind it even if having full debug information. The only hint it has is the lr
. This register contains the return address after execution of a bl/blx
instruction [3]. If the called procedure is non-leaf, it is saved in the prologue, and restored in the epilogue, because it gets overwritten when branching to other procedures. In this case, it is restored on the pc
and sometimes it is also saved back in the lr
, depending on whether we have arm-thumb interworking built in the procedure or not [5]. It is not overwritten if we have a leaf procedure (as there are no procedure calls inside these).
As gdb
has no additional information, it uses the lr
to build the backtrace, assuming we are in a leaf procedure. However this is not true and the backtrace turns out to be wrong. Nonetheless, this information was not completely useless: lr
was pointing to the instruction right after the last bl/blx
instruction that was executed, which was not that far away from the real point where the program was crashing. This happened because fortunately __android_log_print
has interworking code and restores the lr
, otherwise the value of lr
could have been from a point much far away from the point where the real crash happens. Believe or not, but it could have been even worse!
Having now a clear idea of where and why the crash was happening, things accelerated. The procedure where the crash happened, as disassembled by objdump
, was (I include here only the more relevant parts of the code)
00015b1c <ProprietaryProcedure@@Base>: 15b1c: e92d 4ff0 stmdb sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr} 15b20: b08b sub sp, #44 ; 0x2c 15b22: 497c ldr r1, [pc, #496] ; (15d14 <ProprietaryProcedure@@Base+0x1f8>) 15b24: 2500 movs r5, #0 15b26: 9500 str r5, [sp, #0] 15b28: 4604 mov r4, r0 15b2a: 4479 add r1, pc 15b2c: 462b mov r3, r5 15b2e: f8df 81e8 ldr.w r8, [pc, #488] ; 15d18 <ProprietaryProcedure@@Base+0x1fc> 15b32: 462a mov r2, r5 15b34: f8df 91e4 ldr.w r9, [pc, #484] ; 15d1c <ProprietaryProcedure@@Base+0x200> 15b38: ae06 add r6, sp, #24 15b3a: f8df a1e4 ldr.w sl, [pc, #484] ; 15d20 <ProprietaryProcedure@@Base+0x204> 15b3e: 200f movs r0, #15 15b40: f8df b1e0 ldr.w fp, [pc, #480] ; 15d24 <ProprietaryProcedure@@Base+0x208> 15b44: f7f3 ef76 blx 9a34 <prctl@plt> 15b48: 44f8 add r8, pc 15b4a: 4629 mov r1, r5 15b4c: 44f9 add r9, pc 15b4e: 2210 movs r2, #16 15b50: 44fa add sl, pc 15b52: 4630 mov r0, r6 15b54: 44fb add fp, pc 15b56: f7f3 ea40 blx 8fd8 <memset@plt> 15b5a: a807 add r0, sp, #28 15b5c: f7f3 ef70 blx 9a40 <sigemptyset@plt> 15b60: 4b71 ldr r3, [pc, #452] ; (15d28 <ProprietaryProcedure@@Base+0x20c>) 15b62: 462a mov r2, r5 15b64: 9508 str r5, [sp, #32] 15b66: 4631 mov r1, r6 15b68: 447b add r3, pc 15b6a: 681b ldr r3, [r3, #0] 15b6c: 200a movs r0, #10 15b6e: 9306 str r3, [sp, #24] 15b70: f7f3 ef6c blx 9a4c <sigaction@plt> ... 15d02: f7f3 eade blx 92c0 <__android_log_print@plt> 15d06: f8c4 5304 str.w r5, [r4, #772] ; 0x304 15d0a: 4628 mov r0, r5 15d0c: b00b add sp, #44 ; 0x2c 15d0e: e8bd 8ff0 ldmia.w sp!, {r4, r5, r6, r7, r8, r9, sl, fp, pc}
The addresses where this code is loaded can be easily computed by adding 0xf51f6000
to the file offsets shown in the first column. We see that a few calls to different external functions [6] are performed by ProprietaryProcedure
, which is itself an exported symbol.
I restarted the debug session, added a breakpoint at the start of ProprietaryProcedure
, right after stmdb
saves the state, and checked the stack values:
(gdb) b *0xf520bb20 Breakpoint 1 at 0xf520bb20 (gdb) cont ... Breakpoint 1, 0xf520bb20 in ?? () (gdb) p $sp $1 = (void *) 0xf49dde4c (gdb) x/16xw $sp 0xf49dde4c: 0xf49de460 0x0007df00 0x00000000 0xf49dde70 0xf49dde5c: 0xf49de694 0x00000000 0xf77e9000 0x00000000 0xf49dde6c: 0xf75b4491 0x00000000 0xf49de460 0x00000000 0xf49dde7c: 0x00000000 0xfd5b4eba 0xfe9dd4a3 0xf49de460
We can see that the stack contains something, including a return address that looks valid (0xf75b4491
). Note also that the procedure must never touch this part of the stack, as it belongs to the caller of ProprietaryProcedure
.
Now it is a simply a matter of bisecting the code between the beginning and the end of ProprietaryProcedure
to find out where we are clobbering the stack. I will save you of developing here this tedious process. Instead, I will just show, that, in the end, it turned out that the call to sigemptyset()
is the culprit [7]:
(gdb) b *0xf520bb5c Breakpoint 1 at 0xf520bb5c (gdb) b *0xf520bb60 Breakpoint 2 at 0xf520bb60 (gdb) run Breakpoint 1, 0xf520bb5c in ?? () (gdb) x/16xw 0xf49dde4c 0xf49dde4c: 0xf49de460 0x0007df00 0x00000000 0xf49dde70 0xf49dde5c: 0xf49de694 0x00000000 0xf77e9000 0x00000000 0xf49dde6c: 0xf75b4491 0x00000000 0xf49de460 0x00000000 0xf49dde7c: 0x00000000 0xfd5b4eba 0xfe9dd4a3 0xf49de460 (gdb) cont Continuing. Breakpoint 2, 0xf520bb60 in ?? () (gdb) x/16xw 0xf49dde4c 0xf49dde4c: 0x00000000 0x00000000 0x00000000 0x00000000 0xf49dde5c: 0x00000000 0x00000000 0x00000000 0x00000000 0xf49dde6c: 0x00000000 0x00000000 0x00000000 0x00000000 0xf49dde7c: 0x00000000 0x00000000 0x00000000 0x00000000
Note here that I am printing the part of the stack not reserved by the function (0xf49dde4c
is the value of the sp
before execution of the line at offset 0x15b20
, see the code).
What is going wrong here? Now, remember that at the beginning of the article I mentioned that we were using libhybris. libProprietary assumes a bionic environment, and the libc functions it calls are from bionic’s libc. However, libhybris has hooks for some bionic functions: for them bionic is not called, instead the hook is invoked. libhybris does this to avoid conflicts between bionic and glibc: for instance having two allocators fighting for process address space is a recipe for disaster, so malloc()
and related functions are hooked and the hooks call in the end the glibc implementation. Signals related functions were hooked too, including sigemptyset()
, and in this case the hook simply called glibc implementation.
I looked at glibc and bionic implementations, in both cases sigemptyset()
is a very simple utility function that clears with memset()
a sigset_t
variable. All pointed to different definitions of sigset_t
depending on the library. Definition turned out to be a bit messy when looking at the code as it depended on build time definitions, so I resorted to gdb
to print the type. For a executable compiled for glibc, I saw
(gdb) ptype sigset_t type = struct { unsigned long __val[32]; }
and for one using bionic
(gdb) ptype sigset_t type = unsigned long
This finally confirms where the bug is, and explains it: we are overwriting the stack because libProprietary reserves in the stack memory for bionic’s sigset_t
, while we are using glibc’s sigemptyset()
, which uses a different definition for it. As this definition is much bigger, the stack gets overwritten after the call to memset()
. And we get the crash later when trying to restore registers when the function returns.
After knowing this, the solution was simple: I removed the libhybris hooks for signal functions, recompiled it, and… all worked just fine, no crashes anymore!
However, this is not the final solution: as signals are shared resources, it makes sense to hook them in libhybris. But, to do it properly, the hooks have to translate types between bionic in glibc, thing that we were not doing (we were simply calling glibc implementation). That, however, is “just work”.
Of course I wondered why the heck a library that is kind of generic needs to mess around with signals, but hey, that is not my fault ;-).
Conclusions
I can say I learned several things while debugging this:
- Not having the sources is terrible for debugging (well, I already knew this). Unfortunately not open sourcing the code is still a standard practice in part of the industry.
- The most interesting technical bit here is IMHO that we need to be very cautious with the backtrace that debuggers shows after a crash. If you start to see things that do not make sense it is possible that registers or stack have been messed up and the real crash happens elsewhere. Bear in mind that the very first thing to do when a program crashes is to make sure that we know the exact point where that happens.
- We have to be careful in ARM when disassembling, because if there is no debug information we could be seeing the wrong instruction set. We can check evenness of addresses used by
bx/blx
and of thelr
to make sure we are in the right mode. - Some times taking a look at assembly code can help us when debugging, even when we have the sources. Note that if I had had the C sources I would have seen the crash happening right when returning from a function, and it might not have been that immediate to find out that the stack was messed up. The assembly clearly pointed to an overwritten stack.
- Finally, I personally learned some bits of ARM architecture that I did not know, which was great.
Well, this is it. I hope you enjoyed the (lengthy, I know) article. Thanks for your reading!
[1] http://www.heyrick.co.uk/armwiki/The_Status_register
[2] We can get the same result by executing in gdb
set arm fallback-mode thumb
, but changing the register seemed more pedagogical here.
[3] http://infocenter.arm.com/help/topic/com.arm.doc.dui0068b/DUI0068.pdf
[4] http://reverseengineering.stackexchange.com/questions/6080/how-to-detect-thumb-mode-in-arm-disassembly
[5] http://www.mcternan.me.uk/ArmStackUnwinding/
[6] In fact the calls are to the PLT section, which is inside the library. The PLT calls in turn, by using addresses in the GOT data section, either directly the function or the dynamic loader, as we are doing lazy loading. See https://www.technovelty.org/linux/plt-and-got-the-key-to-code-sharing-and-dynamic-libraries.html, for instance.
[7] I had to use two breakpoints between consecutive instructions because the “ni” gdb
command was not working well here.