Figuring out corrupt stacktraces on ARM

If you’re developing C/C++ on embedded devices, you might already have stumbled upon a corrupt stacktrace like this when trying to debug with gdb:

(gdb) bt 
#0  0xb38e32c4 in pthread_getname_np () from /home/enrique/buildroot/output5/staging/lib/libpthread.so.0
#1  0xb38e103c in __lll_timedlock_wait () from /home/enrique/buildroot/output5/staging/lib/libpthread.so.0 
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

In these cases I usually give up gdb and try to solve my problems by adding printf()s and resorting to other tools. However, there are times when you really really need to know what is in that cursed stack.

ARM devices subroutine calls work by setting the return address in the Link Register (LR), so the subroutine knows where to point the Program Counter (PC) register to. While not jumping into subroutines, the values of the LR register is saved in the stack (to be restored later, right before the current subroutine returns to the caller) and the register can be used for other tasks (LR is a “scratch register”). This means that the functions in the backtrace are actually there, in the stack, in the form of older saved LRs, waiting for us to get them.

So, the first step would be to dump the memory contents of the backtrace, starting from the address pointed by the Stack Pointer (SP). Let’s print the first 256 32-bit words and save them as a file from gdb:

(gdb) set logging overwrite on
(gdb) set logging file /tmp/bt.txt
(gdb) set logging on
Copying output to /tmp/bt.txt.
(gdb) x/256wa $sp
0xbe9772b0:     0x821e  0xb38e103d   0x1aef48   0xb1973df0
0xbe9772c0:      0x73d  0xb38dc51f        0x0          0x1
0xbe9772d0:   0x191d58    0x191da4   0x19f200   0xb31ae5ed
...
0xbe977560: 0xb28c6000  0xbe9776b4        0x5      0x10871 <main(int, char**)>
0xbe977570: 0xb6f93000  0xaaaaaaab 0xaf85fd4a   0xa36dbc17
0xbe977580:      0x130         0x0    0x109b9 <__libc_csu_init> 0x0
...
0xbe977690:        0x0         0x0    0x108cd <_start>  0x0
0xbe9776a0:        0x0     0x108ed <_start+32>  0x10a19 <__libc_csu_fini> 0xb6f76969  
(gdb) set logging off
Done logging to /tmp/bt.txt.

Gdb already can name some of the functions (like main()), but not all of them. At least not the ones more interesting for our purpose. We’ll have to look for them by hand.

We first get the memory page mapping from the process (WebKit’s WebProcess in my case) looking in /proc/pid/maps. I’m retrieving it from the device (named metro) via ssh and saving it to a local file. I’m only interested in the code pages, those with executable (‘x’) permissions:

$ ssh metro 'cat /proc/$(ps axu | grep WebProcess | grep -v grep | { read _ P _ ; echo $P ; })/maps | grep " r.x. "' > /tmp/maps.txt

The file looks like this:

00010000-00011000 r-xp 00000000 103:04 2617      /usr/bin/WPEWebProcess
...
b54f2000-b6e1e000 r-xp 00000000 103:04 1963      /usr/lib/libWPEWebKit-0.1.so.2.2.1 
b6f6b000-b6f82000 r-xp 00000000 00:02 816        /lib/ld-2.24.so 
be957000-be978000 rwxp 00000000 00:00 0          [stack] 
be979000-be97a000 r-xp 00000000 00:00 0          [sigpage] 
be97b000-be97c000 r-xp 00000000 00:00 0          [vdso] 
ffff0000-ffff1000 r-xp 00000000 00:00 0          [vectors]

Now we process the backtrace to remove address markers and have one word per line:

$ cat /tmp/bt.txt | sed -e 's/^[^:]*://' -e 's/[<][^>]*[>]//g' | while read A B C D; do echo $A; echo $B; echo $C; echo $D; done | sed 's/^0x//' | while read P; do printf '%08x\n' "$((16#"$P"))"; done | sponge /tmp/bt.txt

Then merge and sort both files, so the addresses in the stack appear below their corresponding mappings:

$ cat /tmp/maps.txt /tmp/bt.txt | sort > /tmp/merged.txt

Now we process the resulting file to get each address in the stack with its corresponding mapping:

$ cat /tmp/merged.txt | while read LINE; do if [[ $LINE =~ - ]]; then MAPPING="$LINE"; else echo $LINE '-->' $MAPPING; fi; done | grep '/' | sed -E -e 's/([0-9a-f][0-9a-f]*)-([0-9a-f][0-9a-f]*)/\1 - \2/' > /tmp/mapped.txt

Like this (address in the stack, page start (or base), page end, page permissions, executable file load offset (base offset), etc.):

0001034c --> 00010000 - 00011000 r-xp 00000000 103:04 2617 /usr/bin/WPEWebProcess
...
b550bfa4 --> b54f2000 - b6e1e000 r-xp 00000000 103:04 1963 /usr/lib/libWPEWebKit-0.1.so.2.2.1 
b5937445 --> b54f2000 - b6e1e000 r-xp 00000000 103:04 1963 /usr/lib/libWPEWebKit-0.1.so.2.2.1 
b5fb0319 --> b54f2000 - b6e1e000 r-xp 00000000 103:04 1963 /usr/lib/libWPEWebKit-0.1.so.2.2.1
...

The addr2line tool can give us the exact function an address belongs to, or even the function and source code line if the code has been built with symbols. But the addresses addr2line understands are internal offsets, not absolute memory addresses. We can convert the addresses in the stack to offsets with this expression:

offset = address - page start + base offset

I’m using buildroot as my cross-build environment, so I need to pick the library files from the staging directory because those are the unstripped versions. The addr2line tool is the one from the buldroot cross compiling toolchain. Written as a script:

$ cat /tmp/mapped.txt | while read ADDR _ BASE _ END _ BASEOFFSET _ _ FILE; do OFFSET=$(printf "%08x\n" $((0x$ADDR - 0x$BASE + 0x$BASEOFFSET))); FILE=~/buildroot/output/staging/$FILE; if [[ -f $FILE ]]; then LINE=$(~/buildroot/output/host/usr/bin/arm-buildroot-linux-gnueabihf-addr2line -p -f -C -e $FILE $OFFSET); echo "$ADDR $LINE"; fi; done > /tmp/addr2line.txt

Finally, we filter out the useless [??] entries:

$ cat /tmp/bt.txt | while read DATA; do cat /tmp/addr2line.txt | grep "$DATA"; done | grep -v '[?][?]' > /tmp/fullbt.txt

What remains is something very similar to what the real backtrace should have been if everything had originally worked as it should in gdb:

b31ae5ed gst_pad_send_event_unchecked en /home/enrique/buildroot/output5/build/gstreamer1-1.10.4/gst/gstpad.c:5571 
b31a46c1 gst_debug_log en /home/enrique/buildroot/output5/build/gstreamer1-1.10.4/gst/gstinfo.c:444 
b31b7ead gst_pad_send_event en /home/enrique/buildroot/output5/build/gstreamer1-1.10.4/gst/gstpad.c:5775 
b666250d WebCore::AppendPipeline::injectProtectionEventIfPending() en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WebCore/platform/graphics/gstreamer/mse/AppendPipeline.cpp:1360 
b657b411 WTF::GRefPtr<_GstEvent>::~GRefPtr() en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/DerivedSources/ForwardingHeaders/wtf/glib/GRefPtr.h:76 
b5fb0319 WebCore::HTMLMediaElement::pendingActionTimerFired() en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WebCore/html/HTMLMediaElement.cpp:1179 
b61a524d WebCore::ThreadTimers::sharedTimerFiredInternal() en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WebCore/platform/ThreadTimers.cpp:120 
b61a5291 WTF::Function<void ()>::CallableWrapper<WebCore::ThreadTimers::setSharedTimer(WebCore::SharedTimer*)::{lambda()#1}>::call() en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/DerivedSources/ForwardingHeaders/wtf/Function.h:101 
b6c809a3 operator() en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WTF/wtf/glib/RunLoopGLib.cpp:171 
b6c80991 WTF::RunLoop::TimerBase::TimerBase(WTF::RunLoop&)::{lambda(void*)#1}::_FUN(void*) en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WTF/wtf/glib/RunLoopGLib.cpp:164 
b6c80991 WTF::RunLoop::TimerBase::TimerBase(WTF::RunLoop&)::{lambda(void*)#1}::_FUN(void*) en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WTF/wtf/glib/RunLoopGLib.cpp:164 
b2ad4223 g_main_context_dispatch en :? 
b6c80601 WTF::{lambda(_GSource*, int (*)(void*), void*)#1}::_FUN(_GSource*, int (*)(void*), void*) en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WTF/wtf/glib/RunLoopGLib.cpp:40 
b6c80991 WTF::RunLoop::TimerBase::TimerBase(WTF::RunLoop&)::{lambda(void*)#1}::_FUN(void*) en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WTF/wtf/glib/RunLoopGLib.cpp:164 
b6c80991 WTF::RunLoop::TimerBase::TimerBase(WTF::RunLoop&)::{lambda(void*)#1}::_FUN(void*) en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WTF/wtf/glib/RunLoopGLib.cpp:164 
b2adfc49 g_poll en :? 
b2ad44b7 g_main_context_iterate.isra.29 en :? 
b2ad477d g_main_loop_run en :? 
b6c80de3 WTF::RunLoop::run() en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WTF/wtf/glib/RunLoopGLib.cpp:97 
b6c654ed WTF::RunLoop::dispatch(WTF::Function<void ()>&&) en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WTF/wtf/RunLoop.cpp:128 
b5937445 int WebKit::ChildProcessMain<WebKit::WebProcess, WebKit::WebProcessMain>(int, char**) en /home/enrique/buildroot/output5/build/wpewebkit-custom/build-Release/../Source/WebKit/Shared/unix/ChildProcessMain.h:64 
b27b2978 __bss_start en :?

I hope you find this trick useful and the scripts handy in case you ever to resort to examining the raw stack to get a meaningful backtrace.

Happy debugging!

8 thoughts on “Figuring out corrupt stacktraces on ARM”

  1. Excellent! Found out this several times. It would be great if gdb could handle this, or have a tool/script to do it easily.

  2. Hello,

    I get below error,

    -bash: 16#”24dad”: syntax error: invalid arithmetic operator (error token is “”24dad””)

    when executing this command,

    cat /tmp/bt.txt | sed -e ‘s/^[^:]*://’ -e ‘s/[]*[>]//g’ | while read A B C D; do echo $A; echo $B; echo $C; echo $D; done | sed ‘s/^0x//’ | while read P; do printf ‘%08x\n’ “$((16#”$P”))”; done | sponge /tmp/bt.txt

    Can you please help to overcome this error?

    1. Replacing “$((16#”$P”))” with “$((16#$P))” worked for me in zsh. Maybe the same applies to you if it happens in an older bash version. bash 5.2.26 was happy with this code here.

  3. It works for me here. I’ve split the processing of bt.txt in multiple steps. Please check every step in your case (using my example bt.txt) and ensure that there’s no format change in the generated output caused by any weird LANG or LOCALE settings.
    This is too much unformatted text for a comment, so I’ve posted everything in this pastebin: https://pastebin.com/W9vgnvu8

  4. Great stuff!

    Another way to find the library address mappings, specially useful if the process is already dead (for example, when dealing with a coredump) is to type in Gdb the command “info proc mappings”

Leave a Reply

Your email address will not be published.

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.