Cisco ASA series part seven: Checkheaps

This article is part of a series of blog posts. We recommend that you start at the beginning. Alternatively, scroll to the bottom of this article to navigate through the whole series.

As part of our ongoing series we would like to talk about Cisco's Checkheaps security and stability mechanism. More specifically, we’ll look at how it impacts exploitation, how to test it and how you can disable it during exploitation if necessary.

What is Checkheaps?

Cisco IOS and ASA devices make use of an interesting memory validation routine called Checkheaps. On Cisco ASA, this is a dedicated thread that is periodically run at a default interval of 60 seconds. Its purpose is to analyse all chunks present on the heap and ensure that they meet some basic sanity checks.

It seems that Checkheaps was designed to ensure Cisco devices automatically recover from heap corruption errors by checking that the router's memory is in a sane state and rebooting the system if not. It may have been introduced as a security mechanism, but it's unlikely this was the primary reasoning.

This regular battery of checks means that Checkheaps also (intentionally or not) works as a fairly interesting hurdle when exploiting heap corruption vulnerabilities. At the very least, it forces an exploit payload to either disable Checkheaps from running or be vigilant about cleaning up any unwanted corruption that may violate the sanity checks.

Older Cisco IOS devices include Checkheaps, however our research is focused on Cisco ASA devices. These ASA devices make use of more widely known public heap algorithms (dlmalloc-2.8.x and ptmalloc2) and as such the Checkheaps implementation differs compared to traditional IOS that has its own proprietary memory allocator.

This is by no means the first time Checkheaps has been looked at, though most previous discussions are specific to IOS. Felix Lindner (FX) documented Checkheaps in 2002 in Phrack #60 [1] and at BlackHat USA 2009 [2] and Michael Lynn [3] documented an interesting way to bypass it at BlackHat USA 2005. Gyan Chawdhary [4] released a fairly extensive paper discussing Checkheaps and Lynn's attack on it, and mentioned it briefly in a follow up presentation at BlackHat 2008 [5]. FX released a tool called Cisco Incident Response (CIR) [6] that implemented its own implementation of Checkheaps for analysing IOS memory dumps.

ASA checkheaps command

The Cisco Command Line Interface (CLI) provides a number of commands that let you probe and modify the state of Checkheaps. The main command is show checkheaps:

ciscoasa(config)# show checkheaps
 Checkheaps stats from buffer validation runs
 --------------------------------------------
 Time elapsed since last run	: 55 secs
 Duration of last run		: 10 millisecs
 Number of buffers created	: 21036
 Number of buffers allocated	: 20882
 Number of buffers free		: 154
 Total memory in use		: 113279512 bytes
 Total memory in free buffers	: 116544 bytes
 Total number of runs		: 52

This tells us some interesting information like how long it takes Checkheaps to parse the entire heap, whether or not it keeps running (assuming the number of runs continues to increment), what the run interval is (based on the approximate max value you ever see in time elapsed since last run), a very coarse grained look at the count of allocations, and free buffers.

The checkheaps command can be used to set the time out interval after which the heap will be validated. Using checkheaps ? will show the following two options:

ciscoasa(config)# checkheaps ?
 configure mode commands/options:
 check-interval			Buffer verification interval
 validate-checksum		Code space checksum validation

The validate-checksum command appears to be relatively useless (at least in the versions we analysed) and involves validating that certain static magic values (0xdeadcode or 0xfeedface) at static locations in memory haven't been modified. Otherwise, we are interested in the check-interval command, which lets you specify how often Checkheaps should run. This can be useful for testing if you want to validate if a Checkheaps bypass works; you can set it to run every second for instance. You can use the following command to observe that Checkheaps has its own process (as in lina process, not Linux process) running on the device:

ciscoasa(config)# show process memory |	include Checkheaps
 0          0           0           0				Checkheaps

Knowing that Checkheaps is its own process can be helpful later when trying to reverse engineer the related functionality.

It's worth noting that, outside of exploitation, you can effectively disable Checkheaps for testing purposes by increasing the check-interval to such a large value that it should (almost) never run again:

ciscoasa(config)# checkheaps check-interval 2147483

Checkheaps assertions

One interesting distinction to make is that the term 'checkheaps' is used in two places by Cisco that don't necessarily mean the same thing. The first is the Checkheaps process that we discussed above. The second is the use of the CHECKHEAPS term when throwing assertions related to heap validation routines that aren't necessarily explicitly triggered by the Checkheaps process running (though they could be). An example of such an assertion (edited for brevity) might look approximately like this:

============= CHECKHEAPS HAS DETECTED A MEMORY CORRUPTION ===============
 Reason: Heap memory corrupted
 ------------- NEXT MALLOC CHUNK --------------
 Malloc chunk ptr	0xa8db5780
 Prev Size			352
 Size			-1462019984
 Prev Chunk Ptr	0x54545454
 Next Chunk Ptr	0x0875ba64
 Malloc header ptr	0xa8db5788
 Magic Head	0x875ba64
 Buffer Len	0x54545454
 Refcount		0x54545454
 Owner		0x54545454
 Caller PC	0x54545454
 ------ Dump including 128 bytes around malloc buffer ------
 0xa8db5740: 00 00 00 00 00 00 00 00 10 32 e3 5e 55 55 55 55  |  .........2.^UUUU
 0xa8db5750: d0 d4 80 81 19 00 00 00 a0 45 8e ac 44 00 80 a8  |  .........E..D...
 0xa8db5760: 23 01 ee f3 ef cd ee f3 48 00 00 00 02 01 00 00  |  #.......H.......
 0xa8db5770: 23 01 1c a1 d0 00 00 00 00 00 00 00 ef cd ee f3  |  #...............
 0xa8db5780: 60 01 00 00 70 58 db a8 64 ba 75 08 54 54 54 54  |  `...pX..d.u.TTTT
 0xa8db5790: 54 54 54 54 54 54 54 54 54 54 54 54 54 54 54 54  |  TTTTTTTTTTTTTTTT
 0xa8db57a0: 54 54 54 54 54 54 54 54 54 54 54 54 54 54 54 54  |  TTTTTTTTTTTTTTTT
 0xa8db57b0: 54 54 54 54 54 54 54 54 54 54 54 54 54 54 54 54  |  TTTTTTTTTTTTTTTT
 0xa8db57c0: 54 54 54 54 54 54 54 54 54                       |  TTTTTTTTT
 ==========================================================================
 assertion "(next == m->top || cinuse(next))" failed: file "malloc.c", line 2848

The above assertion might actually be called by checks that the heap (in this example dlmalloc-2.8.x) is doing itself when trying to process chunks (e.g. when freeing or coalescing two chunks). However, the assertion could also be raised when Checkheaps is scanning the heap. The assertions are shared because, as we will see, Checkheaps' implementation is based on dlmalloc-2.8.x debugging functionality enabled by the DEBUG compile-time constant.

Reversing Checkheaps

Identifying the main Checkheaps process can be done by searching for string references containing ‘checkheaps’. There is at least one error string mentioning checkheaps_process(). This is the primary Checkheaps function started at runtime by init_checkheaps().

After the timers initialise, the code enters a loop and, when it wakes up, calls the validate_buffers() function. Once completed, it resets the timers and then idles again. A portion of that code is as follows:

.text:09C18B48  mov     eax, ds:dword_B25B0B0
.text:09C18B4D mov edx, ds:dword_B25B0B4
.text:09C18B53 mov [esp], ebx
.text:09C18B56 mov [ebp+var_20], eax
.text:09C18B59 mov [ebp+var_1C], edx
.text:09C18B5C call timer_running_and_awake
.text:09C18B61 test eax, eax
.text:09C18B63 jz loc_9C18AB5
.text:09C18B69 mov dword ptr [esp], offset checkheaps_timer
.text:09C18B70 call mgd_timer_first_expired
.text:09C18B75 movzx eax, word ptr [eax+18h]
.text:09C18B79 test ax, ax
.text:09C18B7C jnz b_try_tx_checksum
.text:09C18B82 mov dword ptr [esp], offset ch_validate_buffers_timer
.text:09C18B89 call mgd_timer_stop
.text:09C18B8E rdtsc
.text:09C18B90 mov [esi], eax
.text:09C18B92 mov [esi+4], edx
.text:09C18B95 mov dword ptr [esp], 1
.text:09C18B9C call validate_buffers

Our main point of interest is understanding what this validate_buffers function does, as it should carry out most of the checks we want to avoid. The function is quite complicated but upon initial analysis it quickly becomes apparent that the code is heavily borrowed from the dlmalloc source file malloc-2.8.3.c [7]. Note we’ll refer to this file in the future as dlmalloc-2.8.3 so as to remind you that it is specifically dlmalloc we are talking about.

The main give away of Cisco using dlmalloc version 2.8.3 is the various assert() strings, which match a number of debug validation routines present in most dlmalloc versions.

Some examples include:

.text:09BE6FA9  b_assert_pqinuse:                       ; CODE XREF: validate_buffers+EE
.text:09BE6FA9 mov dword ptr [esp+8], 0BDCh
.text:09BE6FB1 mov dword ptr [esp+4], offset aMalloc_c ; "malloc.c"
.text:09BE6FB9 mov dword ptr [esp], offset aPinuseQ ; "pinuse(q)"
.text:09BE6FC0 call __lina_assert
.text:09BE6FC5 ; ---------------------------------------------------------------------------
.text:09BE6FC5
.text:09BE6FC5 b_refcount_error: ; CODE XREF: validate_buffers+444
.text:09BE6FC5 mov dword ptr [esp+8], 17DDh
.text:09BE6FCD mov dword ptr [esp+4], offset aMalloc_c ; "malloc.c"
.text:09BE6FD5 mov dword ptr [esp], offset aMhMh_refcount ; "mh->mh_refcount"
.text:09BE6FDC call __lina_assert
.text:09BE6FE1 ; ---------------------------------------------------------------------------
.text:09BE6FE1
.text:09BE6FE1 b_assert_bad_footprint_val: ; CODE XREF: validate_buffers+387
.text:09BE6FE1 mov dword ptr [esp+8], 0C63h
.text:09BE6FE9 mov dword ptr [esp+4], offset aMalloc_c ; "malloc.c"
.text:09BE6FF1 mov dword ptr [esp], offset aMFootprintMMax ; "m->footprint max_footprint"
.text:09BE6FF8 call __lina_assert

Upon further reversing, what you will find is that Checkheaps on the ASA devices is a small wrapper around a modified version of the dlmalloc-2.8.3 [7] traverse_and_check() function, which you can see by looking at the open source implementation. It loops over each mempool on the system (which if you have read the libmempool blog post  you will know is managed in part by a modified dlmalloc-2.8.3 mstate) and then uses the custom traverse_and_check() call to validate the underlying memory segments tracked by each mstate. We've included an approximate C version of what validate_buffers looks like in Appendix A . The assumption is that the CHECKHEAPS constant would be defined when compiling code to run on Cisco ASA devices.

At a high level, what happens is that validation routines are called on each encountered mempool. For each mempool, the associated mstate is obtained. For every mstate, each associated memory segment (of which there can be many in a list) is parsed. For each memory segment, the associated memory mapping is linearly walked from the beginning one chunk at a time. Each encountered chunk has validation done on it. The specific validation carried out is dependent on if the chunk is in use or free and what type of chunk it is (small, tree). The checks done on each chunk are quite extensive.

The most notable omissions from the original dlmalloc traverse_and_check() function is the removal of the bin_find() calls. This is presumably for performance reasons as it would walk a bin list for each in-use and free chunk, of which there are many. The primary additions to the function are the introduction of a check for special mempool header magic (0xa11c0123 and 0xa11ccdef) on in-use chunks and (0xf3ee0123 and 0xf3eecdef) on free chunks.

Having spent the time to reverse this logic, and knowing the related open source component on which it is based, it allows us to know all of the checks they could make to detect corruption and gives us some useful ideas for working around it.

Bypassing Checkheaps with chunk skipping

It's worth noting that once you've fully understood all of the checks that Checkheaps will carry out, it can be possible (depending on your bug constraints) to simply corrupt the heap in such a way that Checkheaps can't even detect that something is wrong. This can be done by ensuring it encounters a chunk header with a corrupted (but otherwise valid) size that skips it past other corrupted chunks that wouldn't be considered 'sane'.

Alternately, if you can simply ensure that all chunks you corrupt remain properly aligned to adjacent chunks and retain their correct headers. Then Checkheaps will happily validate them.

Even though dlmalloc 2.8.3 base code does not support safe unlinking when a free chunk is unlinked from a doubly linked bin list, when Checkheaps runs its own linear checks, it does validate the linkage of free chunks. This is due to DEBUG constant being set. Consequently if you rely on corrupting the forward and backward pointers of some free chunk you will try to coalesce, Checkheaps can potentially detect it before coalescing even happens!

On the flip side, Checkheaps does not validate linkage on the mempool header linked list. This therefore means if you rely solely on corrupting mempool links for your mirror-write primitive, Checkheaps will not catch you.

To better demonstrate how to trick Checkheaps into skipping over chunks, let us present some diagrams. You will see that by corrupting the size of a chunk to skip over a number of adjacent corrupted chunks, Checkheaps will fail to trigger an assertions and will jump to an adjacent and sane chunk when it linearly walks the memory segment.

The first diagram below shows how corrupting a chunk (C) to the extent that the linkage could be abused to achieve a mirror overwrite would be detected by Checkheaps as it checks each chunk.

Checkheaps detecting corruption

Alternately, the next diagram shows how corrupting chunk (A)'s length so that it points to (D) directly will confuse Checkheaps into only checking (A) and (D) and not (B) or (C). This allows the corrupted (C) linkage values to be used during exploitation.

Checkheaps missing corruption

Theoretical race condition risks

If you attempt to bypass Checkheaps by working around it as described above, there are still technically some race conditions that are worth noting. If Checkheaps happens to already be running and hasn't yet gotten to the chunk you're in the act of corrupting, or is analysing the same chunk you're actively corrupting, there is a risk that while your chunk is partially corrupted, Checkheaps will analyse it and detect an error. This would be more likely to occur on multi-processor devices.

Due to the behavior of the internal Cisco scheduler traditionally being run-to-completion (at least on Cisco IOS, but we haven't reversed the scheduler on the ASA), it might not be possible for this race to even occur on uniprocessor systems.

To illustrate this as an example, let's consider the previous two diagrams with slight alterations to show times that things occur.

Checkheaps race condition

At time T1 above, we see that Checkheaps is checking chunks (A) and (B) that are untouched at this time; the checks pass and it keeps moving onto adjacent chunks. At time T2, a separate thread triggers some memory corruption which also attempts to trick Checkheaps by corruption chunk (A) such that Checkheaps would skip (B) and (C). The thread at time T2 is also corrupting (C) which it doesn't want Checkheaps to see. However, at time T3 we see that Checkheaps is now validating the corrupted (C) chunk, which will fail. This type of race seems much harder to work around, but is also an extremely small time window to worry about, so should be extremely rare in practice.

Disabling Checkheaps via write primitive

Overwriting the ch_is_validating global

Inspired by Michael Lynn's crashing_already_ trick to disable Checkheaps, we wanted to see if we could come up with a similar way to bypass Checkheaps by attacking it directly. The crashing_already_ approach is not viable on Cisco ASA, as the symbol doesn't exist and the assert()'s thrown by Checkheaps are more direct to crash.

However, if you look at the code in Appendix A , you might have noticed a variable called ch_is_validating. Whenever the Checkheaps process wakes up, it will call a function called validate_buffers which does the actual heap walking. The first thing validate_buffers does is try to test and set a lock called ch_is_validating, shown in the disassembly below.

.text:09BE69E0   push    ebp
.text:09BE69E1 mov ebp, esp
.text:09BE69E3 push edi
.text:09BE69E4 push esi
.text:09BE69E5 push ebx .text:09BE69E6 sub esp, 5Ch
.text:09BE69E9 mov ebx, ds:ch_is_validating
.text:09BE69EF test ebx, ebx
.text:09BE69F1 jz short loc_9BE6A00
.text:09BE69F3
.text:09BE69F3 b_exit_validate_buffers: ; CODE XREF: validate_buffers+599
.text:09BE69F3 add esp, 5Ch
.text:09BE69F6 pop ebx
.text:09BE69F7 pop esi
.text:09BE69F8 pop edi
.text:09BE69F9 pop ebp
.text:09BE69FA retn
.text:09BE69FA ; ---------------------------------------------------------------------------
.text:09BE69FB align 10h
.text:09BE6A00
.text:09BE6A00 loc_9BE6A00: ; CODE XREF: validate_buffers+11
.text:09BE6A00 mov ds:ch_is_validating, 1
.text:09BE6A0A xor eax, eax

We assume this lock is in here to prevent re-entrancy in the event that the Checkheaps timer fires while another instance of Checkheaps hasn't yet finished validating the heap. In a device under significantly high memory and CPU load (and where the Checkheaps interval is relatively small) there may be a case where one run of Checkheaps hasn't completed before the next is scheduled.

An obvious bypass approach is to simply set the re-entrancy lock to a non-zero value using a write primitive to trick Checkheaps into thinking it's already running permanently.

There is one unfortunate race condition with this approach, which is that if Checkheaps is in fact already running at the time and we use a write primitive to set the lock, Checkheaps will unset the lock when it finishes running. This will prevent the bypass from being permanent. To address this you can disable Checkheaps via write primitive initially and then disable it again from a payload to be sure it's off while you clean up.

It's worth noting that Lynn's approach was more powerful as it would prevent a crash even if Checkheaps did detect the error, whereas in our case if Checkheaps is already running and we can't trick it into skipping invalid chunks, then it would still crash.

However, in the typical case using a write primitive to disable Checkheaps in this way is quite useful. Especially since it can simply be turned back on once the heap is cleaned up.

Given we are able to automate finding symbols across lina binaries using previously described methods, identifying the ch_is_validating locations on non-ASLR lina binaries is trivial, so knowing where to write on most firmware versions is easy.

Other approaches

There is a global timeout value called ch_check_interval_timeout used to store the interval for validating buffers. Given an arbitrary write primitive you can simply set this to a large value so that Checkheaps won't run for a long time, giving you the opportunity to fix up the heap.

You can also confuse validate_buffers by modifying the mempool_list to refer to itself rather than the real list of mempools. This will prevent any mempool from being validated.

There are many other ways this could likely be achieved. The point simply being that, given an arbitrary write primitive, Checkheaps should no longer be considered an effective mitigation.

Testing Checkheaps behavior and bypasses

In this section we will test the behavior of corruption on Checkheaps, as well as testing both theories for bypassing Checkheaps. First we look at 32-bit, where Checkheaps is most effective. Then we look at 64-bit, where we make an interesting observation about why Checkheaps is almost useless in practice.

Checkheaps on a 32-bit Cisco ASA device

Basic assert() testing

First, let's look at a regular behaviour on an ASA running firmware 924. We change Checkheaps to execute every one second.

ciscoasa-924(config)# checkheaps check-interval 1
ciscoasa-924(config)# show checkheaps
Checkheaps stats from buffer validation runs
--------------------------------------------
Time elapsed since last run : 0 secs
Duration of last run : 12 millisecs
Number of buffers created : 22102
Number of buffers allocated : 21942
Number of buffers free : 160
Total memory in use : 128630552 bytes
Total memory in free buffers : 4305640 bytes
Total number of runs : 18

Now we find some chunk on the heap (any will do) by looking at the mstate and selecting one.

(gdb) dlchunk -v 0xac96be40
struct malloc_chunk @ 0xac96be40 {
prev_foot = 0x8140d4d0
size = 0x58 (CINUSE|PINUSE)
struct mp_header @ 0xac96be48 {
mh_magic = 0xa11c0123
mh_len = 0x28
mh_refcount = 0x0
mh_unused = 0x0
mh_fd_link = 0xa89b9768 (OK)
mh_bk_link = 0xa8400344 (-)
alloc_pc = 0x9167916 (-)
free_pc = 0x9b44a9e (-)

Above, we analyse the chunk at 0xac96be40. As we can see, it is a small in-use chunk. Now let's modify its mh_magic value and see what happens.

(gdb) set *(unsigned int*)0xac96be48=0xdeadbeef
(gdb) c
Continuing.

Thread 6 received signal SIGABRT, Aborted.
[Switching to Thread 524]
0xffffe430 in __kernel_vsyscall ()
(gdb) bbt
#0 0xffffe430 in __kernel_vsyscall ()
#1 0xdc6f5bcb in write () from /home/aa/cisco/firmware/asa924/_asa924-k8.bin.extracted/rootfs/lib/libpthread.so.0
#2 0x090e7fa1 in lina_send_signal ()
#3 0x090e84be in int3 ()
#4 0x09c04cf8 in __lina_assert ()
#5 0x09be6fa9 in validate_buffers ()
#6 0x09c18ba1 in checkheaps_process ()
#7 0x0806ab8c in sub_806AB30 ()
#8 0x00000000 in ()

Thread 6 received signal SIGABRT, Aborted.
[Switching to Thread 524]
0xffffe430 in __kernel_vsyscall ()

Using the bbt command from the ret-sync plugin we can get a backtrace showing the symbols for the various functions from our idb. We can see that our modification resulted in Checkheaps raising an assertion inside validate_buffers(), as we'd expect. You can refer to Appendix A  for the plethora of assert() calls that could result in similar scenarios.

Now that we've seen we can trigger Checkheaps, let's see if we can test our bypasses.

Chunk-skipping bypass test

If you have a lot of control over what you can write to memory while corrupting the heap, and you have confidence in the layout of chunks you will be corrupting, the easiest approach to bypass Checkheaps could involve chunk skipping, as described earlier. By this we simply mean having a chunk layout such as [A][B][C][D] where you end up corrupting [B] and [C] in ways that Checkheaps would normally detect, but you corrupt [A] in such a way that it points to [D]. This will also rely on [D] either already having, or being corrupted to, hold flags that properly reflect the state of [A].

To demonstrate this we will select a number of pre-existing chunks that are all in use. We do this so that it's as easy as possible for someone else to replicate. We select chunks at the beginning of the memory segment as we assume these will be long-lived and less prone to change during test. Let's pick our candidates:

(gdb) dlchunk -c 5 0xa8400000
0xa8400000 M sz:0x00ae8 fl:CP alloc_pc:0xad024ed0,-
0xa8400ae8 M sz:0x00070 fl:CP alloc_pc:0x0916548a,- [A]
0xa8400b58 M sz:0x00098 fl:CP alloc_pc:0x090fa11e,- [B]
0xa8400bf0 M sz:0x00098 fl:CP alloc_pc:0x090fa12a,- [C]
0xa8400c88 M sz:0x00038 fl:CP alloc_pc:0xdc7db737,- [D]

The first chunk is the mstate structure itself, which won't have a mempool header, so we'll select 0xa8400ae8 to be our [A], and we will increment the names in alphabetical order.

All of these chunks are in use and each one’s previous chunk is also in use so they all have the CINUSE and PINUSE flags set. This means the prev_foot member won't hold the previous chunk size, which simplifies our requirements. Our plan is to modify the size of [A] to skip over [B] and [C].

(gdb) dlchunk -v 0xa8400ae8 
struct malloc_chunk @ 0xa8400ae8 {
prev_foot = 0x0
size = 0x70 (CINUSE|PINUSE)
struct mp_header @ 0xa8400af0 {
mh_magic = 0xa11c0123
mh_len = 0x44
mh_refcount = 0x0
mh_unused = 0x0
mh_fd_link = 0x0 (unmapped)
mh_bk_link = 0xa87a39f0 (OK)
alloc_pc = 0x916548a (-)
free_pc = 0x0 (-)

We know the two adjacent chunks are both of size 0x98, so we want to modify the size field above to 0x70 + (0x98*2) | (CINUSE|PINUSE).

(gdb) set *(0xa8400ae8+4)=(0x70 + (0x98*2) + 3)
(gdb) dlchunk -v 0xa8400ae8
struct malloc_chunk @ 0xa8400ae8 {
prev_foot = 0x0
size = 0x1a0 (CINUSE|PINUSE)
struct mp_header @ 0xa8400af0 {
mh_magic = 0xa11c0123
mh_len = 0x44
mh_refcount = 0x0
mh_unused = 0x0
mh_fd_link = 0x0 (unmapped)
mh_bk_link = 0xa87a39f0 (OK)
alloc_pc = 0x916548a (-)
free_pc = 0x0 (-)
(gdb) dlchunk -c 5 0xa8400000
0xa8400000 M sz:0x00ae8 fl:CP alloc_pc:0xad024ed0,-
0xa8400ae8 M sz:0x001a0 fl:CP alloc_pc:0x0916548a,- [A]
0xa8400c88 M sz:0x00038 fl:CP alloc_pc:0xdc7db737,- [D]
0xa8400cc0 M sz:0x00030 fl:CP alloc_pc:0xdc7db737,-
0xa8400cf0 M sz:0x00038 fl:CP alloc_pc:0xdc7db2ca,-

As we see above, the [A] chunk now encapsulates both the [B] and [C] chunk and it tricks our dlchunk tool into seeing only [A] and [D] (much in the way that Checkheaps will be tricked). However, we can still analyse [B] by accessing its address directly:

(gdb) dlchunk -c 5 0xa8400b58
0xa8400b58 M sz:0x00098 fl:CP alloc_pc:0x090fa11e,- [B]
0xa8400bf0 M sz:0x00098 fl:CP alloc_pc:0x090fa12a,- [C]
0xa8400c88 M sz:0x00038 fl:CP alloc_pc:0xdc7db737,- [D]
0xa8400cc0 M sz:0x00030 fl:CP alloc_pc:0xdc7db737,-
0xa8400cf0 M sz:0x00038 fl:CP alloc_pc:0xdc7db2ca,-

The important thing to realise here is that [B] and [C] are now invisible from the perspective of Checkheaps. This means their header contents can be invalid and yet they will never be tested despite Checkheaps running regularly. Let's test this theory:

First we continue and set Checkheaps to run at a one second interval and confirm it's indeed running regularly:

ciscoasa(config)# checkheaps check-interval 1
ciscoasa(config)# show checkheaps
Checkheaps stats from buffer validation runs
--------------------------------------------
Time elapsed since last run : 0 secs
Duration of last run : 26 millisecs
Number of buffers created : 21163
Number of buffers allocated : 21027
Number of buffers free : 136
Total memory in use : 113439264 bytes
Total memory in free buffers : 35520 bytes
Total number of runs : 3
ciscoasa(config)# show checkheaps
Checkheaps stats from buffer validation runs
--------------------------------------------
Time elapsed since last run : 0 secs
Duration of last run : 11 millisecs
Number of buffers created : 21162
Number of buffers allocated : 21027
Number of buffers free : 135
Total memory in use : 113439264 bytes
Total memory in free buffers : 35520 bytes
Total number of runs : 8

Now, we trap back into gdb and corrupt [B] and [C] to hold invalid mh_magic values as we did in our original assert() test. Before corruption we have:

(gdb) dlchunk -v 0xa8400b58
struct malloc_chunk @ 0xa8400b58 {
prev_foot = 0x8140d4d0
size = 0x98 (CINUSE|PINUSE)
struct mp_header @ 0xa8400b60 {
mh_magic = 0xa11c0123
mh_len = 0x6c
mh_refcount = 0x0
mh_unused = 0x0
mh_fd_link = 0x0 (unmapped)
mh_bk_link = 0xa8400bf8 (OK)
alloc_pc = 0x90fa11e (-)
free_pc = 0x0 (-)
(gdb) dlchunk -v 0xa8400bf0
struct malloc_chunk @ 0xa8400bf0 {
prev_foot = 0x8140d4d0
size = 0x98 (CINUSE|PINUSE)
struct mp_header @ 0xa8400bf8 {
mh_magic = 0xa11c0123
mh_len = 0x6c
mh_refcount = 0x0
mh_unused = 0x0
mh_fd_link = 0xa8400b60 (-)
mh_bk_link = 0xa877b848 (OK)
alloc_pc = 0x90fa12a (-)
free_pc = 0x0 (-)

Now we corrupt both chunks and check the result:

(gdb) set *(unsigned int *)0xa8400b60=0xdeadbeef
(gdb) set *(unsigned int *)0xa8400bf8=0xdeadbeef
(gdb) dlchunk -v 0xa8400b58
struct malloc_chunk @ 0xa8400b58 {
prev_foot = 0x8140d4d0
size = 0x98 (CINUSE|PINUSE)
struct mp_header @ 0xa8400b60 {
mh_magic = 0xdeadbeef
mh_len = 0x6c
mh_refcount = 0x0
mh_unused = 0x0
mh_fd_link = 0x0 (unmapped)
mh_bk_link = 0xa8400bf8 (OK)
alloc_pc = 0x90fa11e (-)
free_pc = 0x0 (-)
(gdb) dlchunk -v 0xa8400bf0
struct malloc_chunk @ 0xa8400bf0 {
prev_foot = 0x8140d4d0
size = 0x98 (CINUSE|PINUSE)
struct mp_header @ 0xa8400bf8 {
mh_magic = 0xdeadbeef
mh_len = 0x6c
mh_refcount = 0x0
mh_unused = 0x0
mh_fd_link = 0xa8400b60 (-)
mh_bk_link = 0xa877b848 (OK)
alloc_pc = 0x90fa12a (-)
free_pc = 0x0 (-)

Now we can continue execution to confirm it won't be detected. Finally, to show for certain that this is what's happening, let's fix the size of [A] and leave the mh_magic fields of [B] and [C] as the incorrect values.

(gdb) set *(0xa8400ae8+4)=0x73
(gdb) dlchunk -c 3 -v 0xa8400ae8
struct malloc_chunk @ 0xa8400ae8 {
prev_foot = 0x0
size = 0x70 (CINUSE|PINUSE)
struct mp_header @ 0xa8400af0 {
mh_magic = 0xa11c0123
mh_len = 0x44
mh_refcount = 0x0
mh_unused = 0x0
mh_fd_link = 0x0 (unmapped)
mh_bk_link = 0xa87a39f0 (OK)
alloc_pc = 0x916548a (-)
free_pc = 0x0 (-)
--
struct malloc_chunk @ 0xa8400b58 {
prev_foot = 0x8140d4d0
size = 0x98 (CINUSE|PINUSE)
struct mp_header @ 0xa8400b60 {
mh_magic = 0xdeadbeef
mh_len = 0x6c
mh_refcount = 0x0
mh_unused = 0x0
mh_fd_link = 0x0 (unmapped)
mh_bk_link = 0xa8400bf8 (OK)
alloc_pc = 0x90fa11e (-)
free_pc = 0x0 (-)
--
struct malloc_chunk @ 0xa8400bf0 {
prev_foot = 0x8140d4d0
size = 0x98 (CINUSE|PINUSE)
struct mp_header @ 0xa8400bf8 {
mh_magic = 0xdeadbeef
mh_len = 0x6c
mh_refcount = 0x0
mh_unused = 0x0
mh_fd_link = 0xa8400b60 (-)
mh_bk_link = 0xa877b848 (OK)
alloc_pc = 0x90fa12a (-)
free_pc = 0x0 (-)
(gdb) c
Continuing.

Thread 6 received signal SIGABRT, Aborted.
[Switching to Thread 522]
0xffffe430 in __kernel_vsyscall ()
(gdb) bbt
#0 0xffffe430 in __kernel_vsyscall ()
#1 0xdc6f6611 in pause () from /home/aa/cisco/firmware/asa924/_asa924-k8.bin.extracted/rootfs/lib/libpthread.so.0
#2 0x090e84c5 in int3 ()
#3 0x09c04cf8 in __lina_assert ()
#4 0x09be6fa9 in validate_buffers ()
#5 0x09c18ba1 in checkheaps_process ()
#6 0x0806ab8c in sub_806AB30 ()
#7 0x00000000 in ()

This is perhaps an obvious trick, but the point is to show that careful consideration of how you lay out and corrupt chunks during exploitation can prevent Checkheaps from detecting an attack entirely. The main thing to note is that if you intend to rely on corrupting free chunks or providing fake free chunks for the purpose of coalescing, you should consider hiding them inside a chunk designed to skip over them. Otherwise there is the risk Checkheaps will detect them mid-exploitation.

ch_is_validating bypass test

Similar to the last test, we'll do this through simulation in gdb because it's easier to demonstrate. As before, we set the Checkheaps interval to one to show that it will stop running. After waiting a few minutes, we run the command again to show that it's running regularly.

ciscoasa(config)# show checkheaps
Checkheaps stats from buffer validation runs
--------------------------------------------
Time elapsed since last run : 0 secs
Duration of last run : 11 millisecs
Number of buffers created : 21170
Number of buffers allocated : 21033
Number of buffers free : 137
Total memory in use : 113455840 bytes
Total memory in free buffers : 18368 bytes
Total number of runs : 208

Now from a gdb shell we write to ch_is_validating, which is at 0x0B2545E0 on this particular firmware image.

(gdb) x/x 0x0B2545E0
0xb2545e0: 0x00000000
(gdb) set *0x0B2545E0=1
(gdb) x/x 0x0B2545E0
0xb2545e0: 0x00000001

Following this, we can see how many times Checkheaps had run immediately prior (229):

ciscoasa(config)# show checkheaps
Checkheaps stats from buffer validation runs
--------------------------------------------
Time elapsed since last run : 0 secs
Duration of last run : 0 millisecs
Number of buffers created : 21172
Number of buffers allocated : 21033
Number of buffers free : 139
Total memory in use : 113455840 bytes
Total memory in free buffers : 18368 bytes
Total number of runs : 229

If we wait a few seconds again, given we know it's intended to be running at a one second interval, it should be incrementing. But we see that it is not:

ciscoasa(config)# show checkheaps
Checkheaps stats from buffer validation runs
--------------------------------------------
Time elapsed since last run : 0 secs
Duration of last run : 0 millisecs
Number of buffers created : 21172
Number of buffers allocated : 21033
Number of buffers free : 139
Total memory in use : 113455840 bytes
Total memory in free buffers : 18368 bytes
Total number of runs : 229

Our bypass appears to work well. But to confirm for certain, let's corrupt a chunk in a similar fashion to earlier tests to see that it will not assert:

(gdb) dlchunk -v 0xacff5bb0
struct malloc_chunk @ 0xacff5bb0 {
prev_foot = 0x8140d4d0
size = 0xf8 (CINUSE|PINUSE)
struct mp_header @ 0xacff5bb8 {
mh_magic = 0xa11c0123
mh_len = 0xcc
mh_refcount = 0x0
mh_unused = 0x0
mh_fd_link = 0xac96bd48 (OK)
mh_bk_link = 0xa84005c4 (-)
alloc_pc = 0x8262b45 (-)
free_pc = 0x0 (-)
(gdb) set *(unsigned int *)0xacff5bb8=0xdeadbeef
(gdb) c
Continuing.

As you will see, it won't crash. We can run show checkheaps again and see that no incrementing is occurring. Let's confirm that this is exactly because of what we expect. We'll set a breakpoint inside of validate_buffers()

(gdb) br *0x09BE69E9
Breakpoint 1 at 0x9be69e9
(gdb) c
Continuing.
[Switching to Thread 520]

Thread 6 hit Breakpoint 1, 0x09be69e9 in ?? ()
(gdb) bbt
#0 0x09be69e9 in validate_buffers ()
#1 0x09c18ba1 in checkheaps_process ()
#2 0x0806ab8c in sub_806AB30 ()
#3 0x00000000 in ()
(gdb) disp/5i $pc
1: x/5i $pc
=> 0x9be69e9: mov ebx,DWORD PTR ds:0xb2545e0
0x9be69ef: test ebx,ebx
0x9be69f1: je 0x9be6a00
0x9be69f3: add esp,0x5c
0x9be69f6: pop ebx
(gdb) x/x 0xb2545e0
0xb2545e0: 0x00000001
(gdb) si
0x09be69ef in ?? ()
1: x/5i $pc
=> 0x9be69ef: test ebx,ebx
0x9be69f1: je 0x9be6a00
0x9be69f3: add esp,0x5c
0x9be69f6: pop ebx
0x9be69f7: pop esi
(gdb)
0x09be69f1 in ?? ()
1: x/5i $pc
=> 0x9be69f1: je
0x9be6a00
0x9be69f3: add esp,0x5c
0x9be69f6: pop ebx
0x9be69f7: pop esi
0x9be69f8: pop edi
(gdb)
0x09be69f3 in ?? ()
1: x/5i $pc
=> 0x9be69f3: add esp,0x5c
0x9be69f6: pop ebx
0x9be69f7: pop esi
0x9be69f8: pop edi
0x9be69f9: pop ebp

You can see it exits as expected due to the ch_is_validating variable being set. Now as the final confirmation, let's keep the chunk corrupted and set ch_is_validating back to zero.

(gdb) x/x 0xb2545e0
0xb2545e0: 0x00000001
(gdb) set *0xb2545e0=0
(gdb) x/x 0xb2545e0
0xb2545e0: 0x00000000
(gdb) c
Continuing.

Thread 6 hit Breakpoint 1, 0x09be69e9 in ?? ()
1: x/5i $pc
=> 0x9be69e9: mov ebx,DWORD PTR ds:0xb2545e0
0x9be69ef: test ebx,ebx
0x9be69f1: je 0x9be6a00
0x9be69f3: add esp,0x5c
0x9be69f6: pop ebx
(gdb) bbt
#0 0x09be69e9 in validate_buffers ()
#1 0x09c18ba1 in checkheaps_process ()
#2 0x0806ab8c in sub_806AB30 ()
#3 0x00000000 in ()
(gdb) si
0x09be69ef in ?? ()
1: x/5i $pc
=> 0x9be69ef: test ebx,ebx
0x9be69f1: je 0x9be6a00
0x9be69f3: add esp,0x5c
0x9be69f6: pop ebx
0x9be69f7: pop esi
(gdb)
0x09be69f1 in ?? ()
1: x/5i $pc
=> 0x9be69f1: je 0x9be6a00
0x9be69f3: add esp,0x5c
0x9be69f6: pop ebx
0x9be69f7: pop esi
0x9be69f8: pop edi
(gdb)
0x09be6a00 in ?? ()
1: x/5i $pc
=> 0x9be6a00: mov DWORD PTR ds:0xb2545e0,0x1
0x9be6a0a: xor eax,eax
0x9be6a0c: mov DWORD PTR [eax+0xb749b94],0x0
0x9be6a16: add eax,0x4
0x9be6a19: cmp eax,0x1c
(gdb) c
Continuing.

Thread 6 received signal SIGABRT, Aborted.
0xffffe430 in __kernel_vsyscall ()
(gdb) bbt
#0 0xffffe430 in __kernel_vsyscall ()
#1 0xdc6f5bcb in write () from /home/aa/cisco/firmware/asa924/_asa924-k8.bin.extracted/rootfs/lib/libpthread.so.0
#2 0x090e7fa1 in lina_send_signal ()
#3 0x090e84be in int3 ()
#4 0x09c04cf8 in __lina_assert ()
#5 0x09be6fa9 in validate_buffers ()
#6 0x09c18ba1 in checkheaps_process ()
#7 0x0806ab8c in sub_806AB30 ()
#8 0x00000000 in ()

It doesn't take long for the system to go down due to Checkheaps.

If you are working on a heap-based exploit that allows for multiple write primitives and you can't corrupt the heap in such a way as to evade Checkheaps, this is the best approach. However, on newer 64-bit systems that have ASLR enabled, you will need to bypass ASLR in advance.

Checkheaps on a 64-bit Cisco ASA device

Overall the theory of Checkheaps is very similar on 64-bit, so we won't talk about how you might evade it.

One important fact is to distinguish old 64-bit versions that use dlmalloc2.8.x  and newer 64-bit versions that use ptmalloc2 , as detailed in this table [8]. Below we will focus on the newer 64-bit versions that are based on ptmalloc2 and see that it doesn't really protect the primary heap as much as it should anyway. The tests below were done running a 9.6.2(7) 64-bit firmware inside GNS3 (asav962-7.qcow2), although we confirmed on other 64-bit builds using ptmalloc2 as well. We disabled ASLR to ease others to replicate, thus the default 0x000055555XXXXXXX address range shown.

The 64-bit MEMPOOL_GLOBAL_SHARED mempool

We touched on the state of this mempool in the libmempool blog post, but it's worth revisiting briefly. A dlmalloc mspace contains a dlmalloc mstate followed by custom mempool-specific book-keeping information. The former tracks free chunks while the latter tracks allocated chunks.

On 64-bit ptmalloc-based firmware, there is interestingly still a dlmalloc mstate in use. However, because all of the allocations are serviced by ptmalloc2, we will see that the MEMPOOL_GLOBAL_SHARED mempool is effectively always empty. Let's take a look:

(gdb) python m = libmempool.mempool_list(addr=0x000055555A9404C0)
(gdb) python print(m)
struct mempool_list @ 0x55555a9404c0 {
offset = 0x90
head = 0x55555e631320
unk = 0x0
struct mempool @ 0x55555e631290 {
dlmstate = 0x7fffdda00010
pool_name = MEMPOOL_DMA
field_58 = 0x0
mempool_id = 0x7
next = 0x55555e631170
struct mempool @ 0x55555e6310e0 {
dlmstate = 0x7ffff7ff7010
pool_name = MEMPOOL_GLOBAL_SHARED
field_58 = 0x0
mempool_id = 0x1
next = 0x55555a9404c8
(gdb) dlmstate 0x7ffff7ff7010
struct dl_mstate @ 0x7ffff7ff7010 {
smallmap = 0b000000000000000000000000000000
treemap = 0b000000000000000000000000000000
dvsize = 0x0
topsize = 0xee0
least_addr = 0x7ffff7ff7000
dv = 0x0
top = 0x7ffff7ff80c0
trim_check = 0x200000
magic = 0x2900d4d8
smallbin[00] (sz 0x0) = 0x7ffff7ff7050, 0x7ffff7ff7050 [EMPTY]
smallbin[01] (sz 0x8) = 0x7ffff7ff7060, 0x7ffff7ff7060 [EMPTY]
smallbin[02] (sz 0x10) = 0x7ffff7ff7070, 0x7ffff7ff7070 [EMPTY]
smallbin[03] (sz 0x18) = 0x7ffff7ff7080, 0x7ffff7ff7080 [EMPTY]
[...]
smallbin[30] (sz 0xf0) = 0x7ffff7ff7230, 0x7ffff7ff7230 [EMPTY]
smallbin[31] (sz 0xf8) = 0x7ffff7ff7240, 0x7ffff7ff7240 [EMPTY]
treebin[00] (sz 0x180) = 0x0 [EMPTY]
treebin[01] (sz 0x200) = 0x0 [EMPTY]
treebin[02] (sz 0x300) = 0x0 [EMPTY]
treebin[03] (sz 0x400) = 0x0 [EMPTY]
[...]
treebin[30] (sz 0xc00000) = 0x0 [EMPTY]
treebin[31] (sz 0xffffffff) = 0x0 [EMPTY]
footprint = 0x2000
max_footprint = 0x2000
mflags = 0x7
mutex = 0x0,0x0,0x55555e506260,0x0,0x0,0x7ffff7ff7000,
seg = struct malloc_segment @ 0x7ffff7ff73a0 {
base = 0x7ffff7ff7000
size = 0x2000
next = 0x0
sflags = 0x8
struct mp_mstate @ 0x7ffff7ff73c0 {
[...]
mp_smallbin[08] - sz: 0x00000040 cnt: 0x00d3, mh_fd_link: 0x7fffc85e8620
mp_smallbin[09] - sz: 0x00000048 cnt: 0x0000, mh_fd_link: 0x0
mp_smallbin[10] - sz: 0x00000050 cnt: 0x043e, mh_fd_link: 0x7fffb509f3b0
mp_smallbin[11] - sz: 0x00000058 cnt: 0x0000, mh_fd_link: 0x0
mp_smallbin[12] - sz: 0x00000060 cnt: 0x335a, mh_fd_link: 0x7fffc0005210
mp_smallbin[13] - sz: 0x00000068 cnt: 0x0000, mh_fd_link: 0x0
mp_smallbin[14] - sz: 0x00000070 cnt: 0x0748, mh_fd_link: 0x7fffc8586a40
mp_smallbin[15] - sz: 0x00000078 cnt: 0x0000, mh_fd_link: 0x0
mp_smallbin[16] - sz: 0x00000080 cnt: 0x0305, mh_fd_link: 0x7fffc0000fe0
mp_smallbin[17] - sz: 0x00000088 cnt: 0x0000, mh_fd_link: 0x0
mp_smallbin[18] - sz: 0x00000090 cnt: 0x0c97, mh_fd_link: 0x7fffc0004bc0
mp_smallbin[19] - sz: 0x00000098 cnt: 0x0000, mh_fd_link: 0x0
mp_smallbin[20] - sz: 0x000000a0 cnt: 0x0168, mh_fd_link: 0x7fffc0004b20
mp_smallbin[21] - sz: 0x000000a8 cnt: 0x0000, mh_fd_link: 0x0
mp_smallbin[22] - sz: 0x000000b0 cnt: 0x0094, mh_fd_link: 0x7fffb5494320
mp_smallbin[23] - sz: 0x000000b8 cnt: 0x0000, mh_fd_link: 0x0
mp_smallbin[24] - sz: 0x000000c0 cnt: 0x0121, mh_fd_link: 0x7fffc8586b00
mp_smallbin[25] - sz: 0x000000c8 cnt: 0x0000, mh_fd_link: 0x0
mp_smallbin[26] - sz: 0x000000d0 cnt: 0x00bc, mh_fd_link: 0x7fffbc1926e0
mp_smallbin[27] - sz: 0x000000d8 cnt: 0x0000, mh_fd_link: 0x0
mp_smallbin[28] - sz: 0x000000e0 cnt: 0x005d, mh_fd_link: 0x7fffc0005490
mp_smallbin[29] - sz: 0x000000e8 cnt: 0x0000, mh_fd_link: 0x0
mp_smallbin[30] - sz: 0x000000f0 cnt: 0x016e, mh_fd_link: 0x7fffc85858d0
mp_smallbin[31] - sz: 0x000000f8 cnt: 0x0000, mh_fd_link: 0x0
mp_treebin[00] - sz: 0x00000180 cnt: 0x054b, mh_fd_link: 0x7fffc85f8680
mp_treebin[01] - sz: 0x00000200 cnt: 0x00f0, mh_fd_link: 0x7fffc85fe740
mp_treebin[02] - sz: 0x00000300 cnt: 0x0158, mh_fd_link: 0x7fffc8630af0
mp_treebin[03] - sz: 0x00000400 cnt: 0x013e, mh_fd_link: 0x7fffb0000f80
mp_treebin[04] - sz: 0x00000600 cnt: 0x017e, mh_fd_link: 0x7fffc000b030
mp_treebin[05] - sz: 0x00000800 cnt: 0x009b, mh_fd_link: 0x7fffc846d1d0
mp_treebin[06] - sz: 0x00000c00 cnt: 0x0084, mh_fd_link: 0x7fffc000b470
mp_treebin[07] - sz: 0x00001000 cnt: 0x002c, mh_fd_link: 0x7fffc85f11d0
mp_treebin[08] - sz: 0x00001800 cnt: 0x0329, mh_fd_link: 0x7fffc0009ff0
mp_treebin[09] - sz: 0x00002000 cnt: 0x0038, mh_fd_link: 0x7fffb55bd860
mp_treebin[10] - sz: 0x00003000 cnt: 0x00ba, mh_fd_link: 0x7fffc0002ac0
mp_treebin[11] - sz: 0x00004000 cnt: 0x006d, mh_fd_link: 0x7fffc85ab2f0
mp_treebin[12] - sz: 0x00006000 cnt: 0x023b, mh_fd_link: 0x7fffbc18b160
mp_treebin[13] - sz: 0x00008000 cnt: 0x0018, mh_fd_link: 0x7fffc85c3420
mp_treebin[14] - sz: 0x0000c000 cnt: 0x0030, mh_fd_link: 0x7fffc84604a0
mp_treebin[15] - sz: 0x00010000 cnt: 0x0019, mh_fd_link: 0x7fffc836ed30
mp_treebin[16] - sz: 0x00018000 cnt: 0x0073, mh_fd_link: 0x7fffc85475a0
mp_treebin[17] - sz: 0x00020000 cnt: 0x0018, mh_fd_link: 0x7fffc858adc0
mp_treebin[18] - sz: 0x00030000 cnt: 0x000c, mh_fd_link: 0x7fffd4002010
mp_treebin[19] - sz: 0x00040000 cnt: 0x001d, mh_fd_link: 0x7fffc85fe8f0
mp_treebin[20] - sz: 0x00060000 cnt: 0x000d, mh_fd_link: 0x7fffa0c2f010
mp_treebin[21] - sz: 0x00080000 cnt: 0x001d, mh_fd_link: 0x7fffa0ff9010
mp_treebin[22] - sz: 0x000c0000 cnt: 0x0006, mh_fd_link: 0x7fffa107c010
mp_treebin[23] - sz: 0x00100000 cnt: 0x000a, mh_fd_link: 0x7fffa815e010
mp_treebin[24] - sz: 0x00180000 cnt: 0x000b, mh_fd_link: 0x7fffa0b1d010
mp_treebin[25] - sz: 0x00200000 cnt: 0x000e, mh_fd_link: 0x7fffa36d7010
mp_treebin[26] - sz: 0x00300000 cnt: 0x0007, mh_fd_link: 0x7fffa8222010
mp_treebin[27] - sz: 0x00400000 cnt: 0x0002, mh_fd_link: 0x7fffa07f0010
mp_treebin[28] - sz: 0x00600000 cnt: 0x0003, mh_fd_link: 0x7fffa03e4010
mp_treebin[29] - sz: 0x00800000 cnt: 0x0001, mh_fd_link: 0x7fffa38ac010
mp_treebin[30] - sz: 0x00c00000 cnt: 0x0001, mh_fd_link: 0x7fffd706d010
mp_treebin[31] - sz: 0xffffffff cnt: 0x0003, mh_fd_link: 0x7fffa16d5010 [UNSORTED]

What we can see above is that we have a dlmalloc mstate that is basically tracking no free chunks, because it is managing only a 0x2000-byte memory segment, which consists of only one in-use chunk which holds the mstate structure itself.

(gdb) dlchunk -c 5 0x7ffff7ff7000
0x7ffff7ff7000 M sz:0x010c0 fl:CP alloc_pc:0x00000000,-
0x7ffff7ff80c0 F sz:0x00ee0 fl:-P free_pc:0x00000000,-
0x7ffff7ff8fa0 F sz:0x00060 fl:-- free_pc:0x00000000,-
<<>>

Despite this, you can see that the mempool tracking structures still hold references to in-use chunks, yet they don't fall on the mempool itself. If we take a look at the memory mappings for lina we can confirm that they are in totally different locations. For instance, the actual MEMPOOL_GLOBAL_SHARED mempool's mapping is 0x7ffff7ff7000 which corresponds to 0x2000 byte mapping shown below:

(gdb) info proc mapping 1687
process 1687
Mapped address spaces:

Start Addr End Addr Size Offset objfile
0x555555554000 0x555559a4b000 0x44f7000 0x0 /asa/bin/lina
0x555559c4b000 0x55555aa9a000 0xe4f000 0x44f7000 /asa/bin/lina
0x55555aa9a000 0x55555ea75000 0x3fdb000 0x0 [heap]
[...]
0x7fffc8000000 0x7fffc8632000 0x632000 0x0
[...]
0x7ffff7ff7000 0x7ffff7ff9000 0x2000 0x0 /dev/zero (deleted)
[...]
(gdb)

But we can see in the mp_mstate book-keeping bins that the allocated chunks fall into a different memory range. For instance we have a mempool header at 0x7fffc85e8620, which corresponds to a chunk at 0x7fffc85e8610.

mp_smallbin[08] - sz: 0x00000040 cnt: 0x00d3, mh_fd_link: 0x7fffc85e8620

This corresponds to another address range taken from the info proc mapping command run above:

      0x7fffc8000000     0x7fffc8632000   0x632000        0x0 

What this all means is that the actual core allocations that used to be present on the MEMPOOL_GLOBAL_SHARED mempool in earlier ASA releases are now all handled by ptmalloc2 arenas via glibc.so. What does this mean for Checkheaps?

It appears that Checkheaps wasn't retrofitted to support ptmalloc2 arenas and it therefore uses the same historical logic to scan the memory segments of each mempool in the mempool list. This means that it is incapable of detecting memory corruption within ptmalloc2 arenas and is effectively no longer a hurdle for exploitation on newer 64-bit systems.

To test this observation, as with earlier examples, we set Checkheaps to run every one second.

ciscoasa(config)# checkheaps check-interval 1
ciscoasa(config)# show checkheaps
Checkheaps stats from buffer validation runs
--------------------------------------------
Time elapsed since last run : 0 secs
Duration of last run : 0 millisecs
Number of buffers created : 32
Number of buffers allocated : 32
Number of buffers free : 0
Total memory in use : 87157344 bytes
Total memory in free buffers : 0 bytes
Total number of runs : 55
ciscoasa(config)# show checkheaps
Checkheaps stats from buffer validation runs
--------------------------------------------
Time elapsed since last run : 0 secs
Duration of last run : 0 millisecs
Number of buffers created : 32
Number of buffers allocated : 32
Number of buffers free : 0
Total memory in use : 87157344 bytes
Total memory in free buffers : 0 bytes
Total number of runs : 64

An interesting thing you might note right away about the Checkheaps output above, unlike 32-bit, is that it reports zero free buffers. Now, let's corrupt the mh_magic field in a ptmalloc2 chunk and confirm that it will go undetected.

(gdb) ptchunk -v 0x7fffc85e8610
struct malloc_chunk @ 0x7fffc85e8610 {
prev_size = 0x7fffc85e8660
size = 0x40 (PREV_INUSE|NON_MAIN_ARENA)
struct mp_header @ 0x7fffc85e8620 {
mh_magic = 0xa11c0123
mh_len = 0x3
mh_refcount = 0x10000
mh_unused = 0x0vmh_fd_link = 0x7fffc85e8360 (OK)
mh_bk_link = 0x7ffff7ff7540 (-)
alloc_pc = 0x55555849e260 (-)
free_pc = 0x7fffc83ab621 (-)
(gdb) set *(unsigned int *)0x7fffc85e8620=0xa11c0123
(gdb) set *(unsigned int *)0x7fffc85e8620=0xdeadbeef
(gdb) ptchunk -v 0x7fffc85e8610
struct malloc_chunk @ 0x7fffc85e8610 {
prev_size = 0x7fffc85e8660
size = 0x40 (PREV_INUSE|NON_MAIN_ARENA)
struct mp_header @ 0x7fffc85e8620 {
mh_magic = 0xdeadbeef
mh_len = 0x3
mh_refcount = 0x10000
mh_unused = 0x0
mh_fd_link = 0x7fffc85e8360 (OK)
mh_bk_link = 0x7ffff7ff7540 (-)
alloc_pc = 0x55555849e260 (-)
free_pc = 0x7fffc83ab621 (-)

We can test that Checkheaps is running and not detecting any corruption as we'd expect:

ciscoasa(config)# show checkheaps
Checkheaps stats from buffer validation runs
--------------------------------------------
Time elapsed since last run : 0 secs
Duration of last run : 0 millisecs
Number of buffers created : 32
Number of buffers allocated : 32
Number of buffers free : 0
Total memory in use : 87157344 bytes
Total memory in free buffers : 0 bytes
Total number of runs : 358
ciscoasa(config)# show checkheaps
Checkheaps stats from buffer validation runs
--------------------------------------------
Time elapsed since last run : 0 secs
Duration of last run : 0 millisecs
Number of buffers created : 32
Number of buffers allocated : 32
Number of buffers free : 0
Total memory in use : 87157344 bytes
Total memory in free buffers : 0 bytes
Total number of runs : 360

Conclusion

In this blog post we discussed the Checkheaps stability and security mechanism on Cisco ASA devices. We also explored how it works internally, the risks it poses to reliable exploitation and how it can be bypassed. We highlighted how it is largely ineffective for validating the primary heap on 64-bit systems, as it is not directly tied to a mempool in the traditional sense.

We would appreciate any feedback or corrections. If you would like to contact us we can be reached by email or twitter: aaron(dot)adams(at)nccgroup(dot)trust / @fidgetingbits and cedric(dot)halbronn(at)nccgroup(dot)trust / @saidelike.

Read all posts in the Cisco ASA series

References

[1] http://phrack.org/issues/60/7.html

[2] https://www.blackhat.com/presentations/bh-usa-09/LINDNER/BHUSA09-Lindner-RouterExploit-SLIDES.pdf

[3] http://www.lemuria.org/mirrors/lynn-cisco.pdf

[4]https://web.archive.org/web/20090902030652/http://www.irmplc.com/downloads/whitepapers/Cisco_IOS_Exploitation_Techniques.pdf 

[5] https://www.blackhat.com/presentations/bh-usa-08/Chawdhary_Uppal/BH_US_08_Chawdhary_Uppal_Cisco_IOS_Shellcodes.pdf

[6] http://cir.recurity.com/

[7] http://gee.cs.oswego.edu/pub/misc/malloc-2.8.3.c

[8] https://github.com/nccgroup/asafw/blob/master/README.md#mitigation-summary

Appendix A: Checkheaps validate_buffer()

This is an approximation of what the C implementation of validate_buffer() and the sanity checks it calls look like. Some portions are omitted for brevity. For a complete understanding of some of the macros and functions, see the open source dlmalloc 2.8.3 source file malloc-2.8.3.c [7].

int ch_is_validating = 0;
int ch_num_runs;
int ch_stat_inuse_count;
int ch_stat_free_count;
// 64-bit
long long ch_total_allocated;
long long ch_total_free;

#define FENCEPOST_HEAD (INUSE_BITS|SIZE_T_SIZE)
#define cinuse(p) ((p)->head & CINUSE_BIT)
#define pinuse(p) ((p)->head & PINUSE_BIT)
#define chunksize(p) ((p)->head & ~(INUSE_BITS))
#define clear_pinuse(p) ((p)->head &= ~PINUSE_BIT)
#define clear_cinuse(p) ((p)->head &= ~CINUSE_BIT)

void
validate_buffers()
{
int first;
mstate m;
int unk;
mempool_t * mp;
if (ch_is_validating) return;
ch_is_validating = 1;
memset(&ch_stats, 0, sizeof(ch_stats));
mp = &mempool_list_ptr.next;
do {
mp = (char *)mp - head.offset
if (mparams.page_size == 0) init_mparams();
custom_traverse_and_check(mp->mstate);
mp = mp->next;
} while (mp != &mempool_list__ptr.next);

ch_validation_run_count++;
ch_is_validating = 0;
}

static inline void
custom_traverse_and_check(mstate m)
{
size_t sum;
if (is_initialized(m)) {
msegmentptr s = &m->seg;
#ifndef CHECKHEAPS
sum += m->topsize + TOP_FOOT_SIZE;
#endif
while (s != 0) {
mchunkptr q = align_as_chunk(s->base);
mchunkptr lastq = 0;
assert(pinuse(q));
while (segment_holds(s, q) &&

q != m->top && q->head != FENCEPOST_HEAD) {
sum += chunksize(q);
if (cinuse(q)) {
#ifndef CHECKHEAPS
assert(!bin_find(m, q));
#endif
do_check_inuse_chunk(m, q);
#ifdef CHECKHEAPS
ch_stat_inuse_count++;
if (*((char *)q+8) != 0xa11c0123
|| *(((char *)q + chunksize(q)) + 0x20) != 0xa11ccdef) {
if (q != align_as_chunk(s->base)) {
print_checkheaps_failure("Allocated buffer corrupted");
assert("0");
}
}
ch_total_allocated += chunksize(q);
#endif
}
else {
#ifndef CHECKHEAPS
assert(q == m->dv || bin_find(m, q));
#endif
assert(lastq == 0 || cinuse(lastq)); /* Not 2 consecutive free */
do_check_free_chunk(m, q);
#ifdef CHECKHEAPS
if (chunksize(q) >= 0x18) {
if (chunksize(q) < MAX_SMALL_SIZE) {
free_magic_offset = 0x10;
}
else {
free_magic_offset = 0x20; }
if (*((char *)q+free_magic_offset) != 0xf3ee0123
|| *(((char *)q + chunksize(q)) - 4) != 0xf3eecdef) {
print_checkheaps_failure("Allocated buffer corrupted");
assert("0");
}
}
ch_stat_free_count++;
ch_total_free += chunksize(q);
#endif
}
q = next_chunk(q);
if (check_depth != 0) {
//...
}
lastq = q;
}
s = s->next;
}
}
assert(m->footprint max_footprint);
return;
}

static inline void do_check_any_chunk(mstate m, mchunkptr p) {
assert((is_aligned(chunk2mem(p))) || (p->head == FENCEPOST_HEAD));
assert(ok_address(m, p));
}

static inline void do_check_inuse_chunk(mstate m, mchunkptr p) {
do_check_any_chunk(m, p); assert(cinuse(p));
assert(next_pinuse(p));
/* If not pinuse and not mmapped, previous chunk has OK offset */
assert(is_mmapped(p) || pinuse(p) || next_chunk(prev_chunk(p)) == p);
if (is_mmapped(p))
do_check_mmapped_chunk(m, p);
}

/* Check properties of free chunks */
static void do_check_free_chunk(mstate m, mchunkptr p) {
size_t sz = p->head & ~(PINUSE_BIT|CINUSE_BIT);
mchunkptr next = chunk_plus_offset(p, sz);
do_check_any_chunk(m, p);
assert(!cinuse(p));
assert(!next_pinuse(p));
assert (!is_mmapped(p));
if (p != m->dv && p != m->top) {
if (sz >= MIN_CHUNK_SIZE) {
assert((sz & CHUNK_ALIGN_MASK) == 0);
assert(is_aligned(chunk2mem(p)));
assert(next->prev_foot == sz);
assert(pinuse(p));
assert (next == m->top || cinuse(next));
assert(p->fd->bk == p);
assert(p->bk->fd == p);
}
else /* markers are always of size SIZE_T_SIZE */
assert(sz == SIZE_T_SIZE);
}
}

Published date:  26 October 2017

Written by:  Aaron Adams and Cedric Halbronn

Call us before you need us.

Our experts will help you.

Get in touch