细读源码

“Public wrappers”（公共包装器）通常指的是在软件开发中用于封装和提供对外部（公共）接口的函数或类。这些包装器函数或类可以隐藏底层实现的细节，提供更简单、更易用的接口，以方便其他开发人员使用。

glibc-2.23

首先在读源码前要先了解一些核心结构

mchunkptr是chunk的指针

malloc_state

struct malloc_state
{
  /* Serialize access.  */
  /* 序列化访问 */
  __libc_lock_define (, mutex);

  /* Flags (formerly in max_fast).  */
  /* 标志位（以前在 max_fast 中） */
  int flags;

  /* Set if the fastbin chunks contain recently inserted free blocks.  */
  /* Note this is a bool but not all targets support atomics on booleans.  */
  /* 如果 fastbin 块包含最近插入的空闲块，则设置为 1 */
  /* 注意，这是一个布尔值，但并非所有目标平台都支持对布尔值的原子操作 */
  int have_fastchunks;

  /* Fastbins */
  mfastbinptr fastbinsY[NFASTBINS];

  /* Base of the topmost chunk -- not otherwise kept in a bin */
  /* 最顶层块的基址 -- 不在任何 bin 中 */
  mchunkptr top;

  /* The remainder from the most recent split of a small request */
  /* 最近一次拆分小型请求产生的剩余块 */
  mchunkptr last_remainder;

  /* Normal bins packed as described above */
  /* 正常 bins，按照上述描述进行打包 */
  mchunkptr bins[NBINS * 2 - 2];

  /* Bitmap of bins */
  /* bins 的位图 */
  unsigned int binmap[BINMAPSIZE];

  /* Linked list */
  /* 链表 */
  struct malloc_state *next;

  /* Linked list for free arenas.  Access to this field is serialized
     by free_list_lock in arena.c.  */
  /* 用于空闲 arenas 的链表。访问该字段由 arena.c 中的 free_list_lock 进行序列化 */
  struct malloc_state *next_free;

  /* Number of threads attached to this arena.  0 if the arena is on
     the free list.  Access to this field is serialized by
     free_list_lock in arena.c.  */
   /* 附加到该 arena 的线程数。如果该 arena 在空闲列表中，则为 0。访问该字段由 arena.c 中的              free_list_lock 进行序列化 */
  INTERNAL_SIZE_T attached_threads;

  /* Memory allocated from the system in this arena.  */
  /* 在该 arena 中从系统分配的内存 */
  INTERNAL_SIZE_T system_mem;
  INTERNAL_SIZE_T max_system_mem;
};

mstate是arena的指针

Linux中提供一把互斥锁mutex（也称之为互斥量）。

每个线程在对资源操作前都尝试先加锁，成功加锁才能操作，操作结束解锁。

但通过“锁”就将资源的访问变成互斥操作，而后与时间有关的错误也不会再产生了。

__libc_malloc

void *
__libc_malloc (size_t bytes)
{
  mstate ar_ptr;
  void *victim;

  void *(*hook) (size_t, const void *)
    = atomic_forced_read (__malloc_hook);
  if (__builtin_expect (hook != NULL, 0))
    return (*hook)(bytes, RETURN_ADDRESS (0));

  arena_get (ar_ptr, bytes);

  victim = _int_malloc (ar_ptr, bytes);
  /* Retry with another arena only if we were able to find a usable arena
     before.  */
  if (!victim && ar_ptr != NULL)
    {
      LIBC_PROBE (memory_malloc_retry, 1, bytes);
      ar_ptr = arena_get_retry (ar_ptr, bytes);
      victim = _int_malloc (ar_ptr, bytes);
    }

  if (ar_ptr != NULL)
    (void) mutex_unlock (&ar_ptr->mutex);

  assert (!victim || chunk_is_mmapped (mem2chunk (victim)) ||
          ar_ptr == arena_for_chunk (mem2chunk (victim)));
  return victim;
}
libc_hidden_def (__libc_malloc)

可以分为三个主要部分：主要包含callback、arena_get、_int_malloc，我们把callback和arena_get当作初始化的过程，_int_malloc作为实际分配的过程，接下来就一步一步分析

callback

void *(*hook) (size_t, const void *)
  = atomic_forced_read (__malloc_hook);
if (__builtin_expect (hook != NULL, 0))
  return (*hook)(bytes, RETURN_ADDRESS (0));

其中的atomic_forced_read(__malloc_hook)的宏定义是

1 2	# define atomic_forced_read(x) \ ({ __typeof (x) __x; __asm ("" : "=r" (__x) : "0" (x)); __x; })

这样的，主要是用内联汇编的手段，来使得该段用的是强制读取的原子操作，能够避免多线程的干扰

而__builtin_expect(x,y)这个语句的参数x是一个式子，而y表示期望值，能够使得编译器编译出更加效率的文件，而y=0表示这个式子x基本不可能发生，但是也是有可能发生的

hook是一个函数指针变量，被赋值成了__malloc_hook，后者定义如下：

1 2	void weak_variable (__malloc_hook) (size_t __size, const void *) = malloc_hook_ini;

__malloc_hook被初始化成了malloc_hook_ini，后者定义如下：

void* malloc_hook_ini (size_t sz, const void *caller) {
  __malloc_hook = NULL;
  ptmalloc_init ();
  return __libc_malloc (sz);
}

这里malloc_hook又被赋值成了NULL，然后再重新调用libc_malloc，这样就可以保证在多次调用__libc_malloc的情况下，代码1中的hook回调函数只会被调用一次，如下图所示：

而其中ptmalloc_init的精简定义如下

void ptmalloc_init (void) {
  if (__malloc_initialized >= 0)
    return;
  __malloc_initialized = 0;
  
  thread_arena = &main_arena;
  
  __malloc_initialized = 1;
}

通过__malloc_initialized这个全局flag来检测是不是已经初始化过，如果没有，则把main_arena设成当前的thread_arena，这是因为初始化肯定是主线程在做，而主线程用的是main_arena，然后再回到malloc_hook_ini执行__libc_malloc回到函数中继续执行调用malloc

第一次调用 malloc 函数时函数调用路径如下：

1	malloc -> __libc_malloc -> __malloc_hook(即malloc_hook_ini) -> ptmalloc_init -> __libc_malloc -> _int_malloc

以后用户再调用 malloc 函数的时候，路径将是这样的：

1	malloc -> __libc_malloc -> _int_mallocc

arena_get

void *
__libc_malloc (size_t bytes)
{
  mstate ar_ptr;
  void *victim;

  void *(*hook) (size_t, const void *)
    = atomic_forced_read (__malloc_hook);
  if (__builtin_expect (hook != NULL, 0))
    return (*hook)(bytes, RETURN_ADDRESS (0));  //over

  arena_get (ar_ptr, bytes);

下一个执行的便是arena_get宏

/* arena_get() acquires an arena and locks the corresponding mutex.
   First, try the one last locked successfully by this thread.  (This
   is the common case and handled with a macro for speed.)  Then, loop
   once over the circularly linked list of arenas.  If no arena is
   readily available, create a new one.  In this latter case, `size'
   is just a hint as to how much memory will be required immediately
   in the new arena. */

#define arena_get(ptr, size) do { \
      ptr = thread_arena;						      \
      arena_lock (ptr, size);						      \
  } while (0)

#define arena_lock(ptr, size) do {					      \
      if (ptr && !arena_is_corrupt (ptr))				      \
        (void) mutex_lock (&ptr->mutex);				      \
      else								      \
        ptr = arena_get2 ((size), NULL);				      \
  } while (0)

这上面的注释是一段解释说明它首先尝试获取上次由该线程成功锁定的arena，这是一种常见情况，并且使用宏来提高速度。如果没有可用的arena，就会循环遍历arena的循环链表，直到找到一个可用的arena。如果仍然没有可用的arena，则会创建一个新的arena，其中的size参数只是一个关于新arena所需内存量的提示。

该过程大致如下，其中arena_get2函数的作用是最重要的，包含了三个重要的函数get_free_list，_int_new_arena，reused_arena

做题只会遇到main_arena(libc自带的)，而main_arena没有heap_info对应着不用新创建一个arena

_int_malloc

void *
__libc_malloc (size_t bytes)
{
  mstate ar_ptr;
  void *victim;

  void *(*hook) (size_t, const void *)
    = atomic_forced_read (__malloc_hook);
  if (__builtin_expect (hook != NULL, 0))
    return (*hook)(bytes, RETURN_ADDRESS (0));

  arena_get (ar_ptr, bytes);

  victim = _int_malloc (ar_ptr, bytes);

之后便是调用_int_malloc了

先是初始化

static void *
_int_malloc (mstate av, size_t bytes)
{
  INTERNAL_SIZE_T nb;               /* 规范化的请求大小 */
  unsigned int idx;                 /* 关联的bin索引 */
  mbinptr bin;                      /* 关联的bin */

  mchunkptr victim;                 /* 检查/选中的块 */
  INTERNAL_SIZE_T size;             /* 块的大小 */
  int victim_index;                 /* 块的bin索引 */

  mchunkptr remainder;              /* 拆分后的剩余块 */
  unsigned long remainder_size;     /* 剩余块的大小 */

  unsigned int block;               /* 位图遍历器 */
  unsigned int bit;                 /* 位图遍历器 */
  unsigned int map;                 /* 当前binmap的字 */

  mchunkptr fwd;                    /* 链接的临时变量 */
  mchunkptr bck;                    /* 链接的临时变量 */

  const char errstr = NULL;

之后便是修改bytes使其合法化，nb就是转换后的大小

/*
   通过添加SIZE_SZ字节的开销以及可能的额外开销，将请求大小转换为内部形式，以获得必要的对齐和/或至少为MINSIZE（最小可分配大小）的大小。同时，checked_request2size函数会检测并返回0，对于那些在填充和对齐后会导致溢出到零的请求大小。
*/
checked_request2size (bytes, nb);

之后如果没有arena即没有分配区可用直接调用sysmalloc一个chunk

/* There are no usable arenas.  Fall back to sysmalloc to get a chunk from
 mmap.  */
if (__glibc_unlikely (av == NULL))
{
  void *p = sysmalloc (nb, av);
  if (p != NULL)
alloc_perturb (p, bytes);
  return p;
}

malloc_consolidate(整理堆块)

总结

若 get_max_fast() 返回 0，则进行堆的初始化工作，然后进入第 7 步
从 fastbin 中获取一个空闲 chunk
尝试向后合并
若向前相邻 top_chunk，则直接合并到 top_chunk，然后进入第 6 步
否则尝试向前合并后，插入到 unsorted_bin 中
获取下一个空闲 chunk，回到第 2 步，直到所有 fastbin 清空后进入第 7 步
退出函数

fastbin（FILO）

/*
   If the size qualifies as a fastbin, first check corresponding bin.
   This code is safe to execute even if av is not yet initialized, so we
   can try it without checking, which saves some time on this fast path.
 */

if ((unsigned long) (nb) <= (unsigned long) (get_max_fast ()))
  {
    idx = fastbin_index (nb);
    mfastbinptr *fb = &fastbin (av, idx);
    mchunkptr pp = *fb;
    do
      {
        victim = pp;
        if (victim == NULL)
          break;
      }
    while ((pp = catomic_compare_and_exchange_val_acq (fb, victim->fd, victim))
           != victim);
    if (victim != 0)
      {
        if (__builtin_expect (fastbin_index (chunksize (victim)) != idx, 0))
          {
            errstr = "malloc(): memory corruption (fast)";
          errout:
            malloc_printerr (check_action, errstr, chunk2mem (victim), av);
            return NULL;
          }
        check_remalloced_chunk (av, victim, nb);
        void *p = chunk2mem (victim);
        alloc_perturb (p, bytes);
        return p;
      }
  }

do
    {
      victim = pp;
      if (victim == NULL)
        break;
    }
  while ((pp = catomic_compare_and_exchange_val_acq (fb, victim->fd, victim))
         != victim);

其中用到的原子操作

#define atomic_compare_and_exchange_val_acq(mem, newval, oldval) \
  ({ __typeof (mem) __gmemp = (mem);				      \
     __typeof (*mem) __gret = *__gmemp;				      \
     __typeof (*mem) __gnewval = (newval);			      \
								      \
     if (__gret == (oldval))					      \
       *__gmemp = __gnewval;					      \
     __gret; })

就是比较mem指向的值是否等于oldval,如果等于则将mem指向的值变为newval,并返回oldval

如果victim不为NULL,检查其大小计算得出的索引是否符合当前索引链

最后有个检查调用check_remalloced_chunk (av, victim, nb);

if (__builtin_expect (fastbin_index (chunksize (victim)) != idx, 0))
        {
          errstr = "malloc(): memory corruption (fast)";
        errout:
          malloc_printerr (check_action, errstr, chunk2mem (victim), av);
          return NULL;
        }

这个就是第一个检查，该fastbin的size要合法

第二个检查就是check_remalloced_chunk (av, victim, nb);一般是空函数

其中的检查如下：

保证x的size合法即可

do_check_remalloced_chunk

1	do_check_remalloced_chunk -> do_check_inuse_chunk -> do_check_chunk -> do_check_free_chunk

而下面的do_check_remalloced_chunk并不是这个因为

1
2
3

#ifndef MALLOC_DEBUG
#define MALLOC_DEBUG 0
#endif

MALLOC_DEBUG=0

而

#if !MALLOC_DEBUG

# define check_chunk(A, P)
# define check_free_chunk(A, P)
# define check_inuse_chunk(A, P)
# define check_remalloced_chunk(A, P, N)
# define check_malloced_chunk(A, P, N)
# define check_malloc_state(A)

#else

# define check_chunk(A, P)              do_check_chunk (A, P)
# define check_free_chunk(A, P)         do_check_free_chunk (A, P)
# define check_inuse_chunk(A, P)        do_check_inuse_chunk (A, P)
# define check_remalloced_chunk(A, P, N) do_check_remalloced_chunk (A, P, N)
# define check_malloced_chunk(A, P, N)   do_check_malloced_chunk (A, P, N)
# define check_malloc_state(A)         do_check_malloc_state (A)

这个的先决条件便是要满足debug=1所以我以下的分析都是多次一举

static void
do_check_remalloced_chunk (mstate av, mchunkptr p, INTERNAL_SIZE_T s)
{
  INTERNAL_SIZE_T sz = p->size & ~(PREV_INUSE | NON_MAIN_ARENA);

  if (!chunk_is_mmapped (p))
    {
      assert (av == arena_for_chunk (p));   //确保内存块所属的堆区域正确
      if (chunk_non_main_arena (p))
        assert (av != &main_arena);
      else
        assert (av == &main_arena);  //进一步确定，一般只要标志位正确
    }

  do_check_inuse_chunk (av, p);

  /* Legal size ... */
  assert ((sz & MALLOC_ALIGN_MASK) == 0);  //检查内存对齐
  assert ((unsigned long) (sz) >= MINSIZE);  //检查size>=minsize
  /* ... and alignment */
  assert (aligned_OK (chunk2mem (p)));   //检查起始地址满足对齐条件
  /* chunk is less than MINSIZE more than request */
  assert ((long) (sz) - (long) (s) >= 0);    //检查要分配的size>=要求的size
  assert ((long) (sz) - (long) (s + MINSIZE) < 0);   //检查要分配的size<要求的size+minsize
}

其中的do_check_inuse_chunk (av, p);这个就对p进行更加细节的检查

static void
do_check_inuse_chunk (mstate av, mchunkptr p)
{
  mchunkptr next;

  do_check_chunk (av, p);  //调用在下方

  if (chunk_is_mmapped (p))
    return; /* mmapped chunks have no next/prev */  //如果是mmap的heap直接退出检查

  /* Check whether it claims to be in use ... */
  assert (inuse (p));          //通过p的地址+size来指向下一个chunk判断本chunk是否正在使用，如果为空闲则退出//fastbin的特性，释放但不置0

  next = next_chunk (p);

  /* ... and is surrounded by OK chunks.
     Since more things can be checked with free chunks than inuse ones,
     if an inuse chunk borders them and debug is on, it's worth doing them.
   */
  if (!prev_inuse (p))   
    {
      /* Note that we cannot even look at prev unless it is not inuse */
      mchunkptr prv = prev_chunk (p);
      assert (next_chunk (prv) == p);   //如果前一个chunk为空闲块则检查其next_chunk是不是本chunk
      do_check_free_chunk (av, prv);   //调用在下方
    }

  if (next == av->top)
    {
      assert (prev_inuse (next));  //如果next为top则检查topchunk的pre位和size>=minsize
      assert (chunksize (next) >= MINSIZE);
    }
  else if (!inuse (next))   //如果下一个chunk为空闲块则调用检查
    do_check_free_chunk (av, next);
}

static void
do_check_chunk (mstate av, mchunkptr p)
{
  unsigned long sz = chunksize (p);
  /* min and max possible addresses assuming contiguous allocation */
  char *max_address = (char *) (av->top) + chunksize (av->top);
  char *min_address = max_address - av->system_mem;

  if (!chunk_is_mmapped (p))
    {
      /* Has legal address ... */
      if (p != av->top)
        {
          if (contiguous (av))
            {
              assert (((char *) p) >= min_address);
              assert (((char *) p + sz) <= ((char *) (av->top)));
            }
        }
      else
        {
          /* top size is always at least MINSIZE */
          assert ((unsigned long) (sz) >= MINSIZE);
          /* top predecessor always marked inuse */
          assert (prev_inuse (p));
        }
    }
  else
    {
      /* address is outside main heap  */
      if (contiguous (av) && av->top != initial_top (av))
        {
          assert (((char *) p) < min_address || ((char *) p) >= max_address);
        }
      /* chunk is page-aligned */
      assert (((p->prev_size + sz) & (GLRO (dl_pagesize) - 1)) == 0);
      /* mem is aligned */
      assert (aligned_OK (chunk2mem (p)));
    }
}

fastbin只需要注意if (p != av->top)这部分即可

断言内存块的地址大于等于最小地址范围。
断言内存块的结束地址小于等于堆的顶部块的地址。

static void
do_check_free_chunk (mstate av, mchunkptr p)
{
  INTERNAL_SIZE_T sz = p->size & ~(PREV_INUSE | NON_MAIN_ARENA);
  mchunkptr next = chunk_at_offset (p, sz);

  do_check_chunk (av, p);

  /* Chunk must claim to be free ... */
  assert (!inuse (p));   //必须为空闲
  assert (!chunk_is_mmapped (p));    //必须为非mmap

  /* Unless a special marker, must have OK fields */
  if ((unsigned long) (sz) >= MINSIZE)
    {
      assert ((sz & MALLOC_ALIGN_MASK) == 0);   //内存块大小对齐
      assert (aligned_OK (chunk2mem (p)));   //内存对齐
      /* ... matching footer field */
      assert (next->prev_size == sz);    //next_chunk的pre_size==size
      /* ... and is fully consolidated */
      
      //前一个chunk被标记为已使用
      assert (next == av->top || inuse (next));     //next_chunk为top或者next_chunk要为非空闲的

      /* ... and has minimally sane links */
      assert (p->fd->bk == p);   //fd和bk正常检查
      assert (p->bk->fd == p);
    }
  else /* markers are always of size SIZE_SZ */
    assert (sz == SIZE_SZ);
}

MALLOC_DEBUG的作用是用于开发者在发现内存错误的时候，修改其值能够更快的定位错误

正常时候MALLOC_DEBUG=0

smallbin（FIFO）

if (in_smallbin_range (nb))
    {
      idx = smallbin_index (nb);
      bin = bin_at (av, idx);

      if ((victim = last (bin)) != bin)
        {
          if (victim == 0) /* initialization check */
            malloc_consolidate (av);
          else
            {
              bck = victim->bk;
              if (__glibc_unlikely (bck->fd != victim))
                {
                  errstr = "malloc(): smallbin double linked list corrupted";
                  goto errout;
                }
              set_inuse_bit_at_offset (victim, nb);   //设置pre位为1
              bin->bk = bck;
              bck->fd = bin;

              if (av != &main_arena)
                victim->size |= NON_MAIN_ARENA;
              check_malloced_chunk (av, victim, nb);
              void *p = chunk2mem (victim);
              alloc_perturb (p, bytes);
              return p;
            }
        }
    }

符合smallbin的范围的话，就会判定

如果链表不为空的话
- 如果是由于未初始化导致的就调用malloc_consolidate (av);
- 如果不是则取出

其中有着两个检查，第一个是

bck = victim->bk;
if (__glibc_unlikely (bck->fd != victim))
{
  errstr = "malloc(): smallbin double linked list corrupted";
  goto errout;
}

就要满足要分配的victim->bk->fd==victim，而fd没有这个要求

第二个检查就是 check_malloced_chunk (av, victim, nb);

static void
do_check_malloced_chunk (mstate av, mchunkptr p, INTERNAL_SIZE_T s)
{
  /* same as recycled case ... */
  do_check_remalloced_chunk (av, p, s);

  /*
     ... plus,  must obey implementation invariant that prev_inuse is
     always true of any allocated chunk; i.e., that each allocated
     chunk borders either a previously allocated and still in-use
     chunk, or the base of its memory arena. This is ensured
     by making all allocations from the `lowest' part of any found
     chunk.  This does not necessarily hold however for chunks
     recycled via fastbins.
 */
 /*
   ... 另外，必须遵守实现不变式，即任何分配的内存块的prev_inuse始终为真；
   换句话说，每个分配的内存块要么与先前分配且仍在使用的内存块相邻，
   要么与其内存区域的基址相邻。通过从任何找到的内存块的“最低”部分进行所有分配，
   可以确保这一点。然而，通过快速链表回收的内存块不一定满足这个条件。
*/

  assert (prev_inuse (p));
}

其中包含的do_check_remalloc_chunk在上面也有过了

static void
do_check_remalloced_chunk (mstate av, mchunkptr p, INTERNAL_SIZE_T s)
{
  INTERNAL_SIZE_T sz = p->size & ~(PREV_INUSE | NON_MAIN_ARENA);

  if (!chunk_is_mmapped (p))
    {
      assert (av == arena_for_chunk (p));   //确保内存块所属的堆区域正确
      if (chunk_non_main_arena (p))
        assert (av != &main_arena);
      else
        assert (av == &main_arena);  //进一步确定，一般只要标志位正确
    }

  do_check_inuse_chunk (av, p);

  /* Legal size ... */
  assert ((sz & MALLOC_ALIGN_MASK) == 0);  //检查内存对齐
  assert ((unsigned long) (sz) >= MINSIZE);  //检查size>=minsize
  /* ... and alignment */
  assert (aligned_OK (chunk2mem (p)));   //检查起始地址满足对齐条件
  /* chunk is less than MINSIZE more than request */
  assert ((long) (sz) - (long) (s) >= 0);    //检查要分配的size>=要求的size
  assert ((long) (sz) - (long) (s + MINSIZE) < 0);   //检查要分配的size<要求的size+minsize
}

而其中包含的do_check_inuse_chunk (av, p);概括来说是首先确保本chunk为非空闲状态，然后前后的chunk如果有空闲的则进行合法的检查

而do_check_inuse_chunk (av, p);其中包含的do_check_chunk，概括来说就是确保本chunk的地址在合法的范围

而assert (prev_inuse (p));这个语句是由于如果free的时候前方为空闲则会合并，那么必须为非空闲

largebin(FIFO)

else
{
  idx = largebin_index (nb);
  if (have_fastchunks (av))
    malloc_consolidate (av);
}

判断是否有fastchunk,是则触发malloc_consolidate (av);
进入unsortedbin遍历循环,找到则返回
进入largebin分配

大循环

接下来就是大众所知的大循环了，接下来的几个部分都是大循环的

1
2
3

for (;; )       //大循环
  {
    int iters = 0;

unsortedbin(FIFO)遍历

while ((victim = unsorted_chunks (av)->bk) != unsorted_chunks (av))   //unsortedbin中有chunk
        {
          bck = victim->bk;       //bck是第一个unsorted chunk的bk
          if (__builtin_expect (victim->size <= 2 * SIZE_SZ, 0)
              || __builtin_expect (victim->size > av->system_mem, 0))   //first check
            malloc_printerr (check_action, "malloc(): memory corruption",
                             chunk2mem (victim), av);
          size = chunksize (victim);         

          /*
             If a small request, try to use last remainder if it is the
             only chunk in unsorted bin.  This helps promote locality for
             runs of consecutive small requests. This is the only
             exception to best-fit, and applies only when there is
             no exact fit for a small chunk.
           */

          if (in_smallbin_range (nb) &&
              bck == unsorted_chunks (av) &&
              victim == av->last_remainder &&
              (unsigned long) (size) > (unsigned long) (nb + MINSIZE))    //如果要申请的chunk处于smallbin的范围内，并且大小合适
            {                                                  //victim为unsortedbin的唯一一个chunk，也是切割剩余的chunk
              /* split and reattach remainder */                    //采用last_remainder分配方式
              remainder_size = size - nb;             
              remainder = chunk_at_offset (victim, nb);
              unsorted_chunks (av)->bk = unsorted_chunks (av)->fd = remainder;
              av->last_remainder = remainder;
              remainder->bk = remainder->fd = unsorted_chunks (av);
              if (!in_smallbin_range (remainder_size))
                {
                  remainder->fd_nextsize = NULL;
                  remainder->bk_nextsize = NULL;
                }

              set_head (victim, nb | PREV_INUSE |
                        (av != &main_arena ? NON_MAIN_ARENA : 0));
              set_head (remainder, remainder_size | PREV_INUSE);
              set_foot (remainder, remainder_size);

              check_malloced_chunk (av, victim, nb);
              void *p = chunk2mem (victim);
              alloc_perturb (p, bytes);
              return p;
            }

          /* remove from unsorted list */
          unsorted_chunks (av)->bk = bck;
          bck->fd = unsorted_chunks (av);   //初始取出

          /* Take now instead of binning if exact fit */

          if (size == nb)      //大小相等直接取出
            { 
              set_inuse_bit_at_offset (victim, size);
              if (av != &main_arena)
                victim->size |= NON_MAIN_ARENA;
              check_malloced_chunk (av, victim, nb);
              void *p = chunk2mem (victim);
              alloc_perturb (p, bytes);
              return p;
            }

          /* place chunk in bin */

          if (in_smallbin_range (size))     //如果在smallbin的范围
            {
              victim_index = smallbin_index (size);
              bck = bin_at (av, victim_index);
              fwd = bck->fd;
            }
          else       //如果不在smallbin即largebin的范围
            {
              victim_index = largebin_index (size);
              bck = bin_at (av, victim_index);
              fwd = bck->fd;

              /* maintain large bins in sorted order */
              if (fwd != bck)                                 //largebin本身有chunk
                {
                  /* Or with inuse bit to speed comparisons */
                  size |= PREV_INUSE;          
                  /* if smaller than smallest, bypass loop below */
                  assert ((bck->bk->size & NON_MAIN_ARENA) == 0);
                  if ((unsigned long) (size) < (unsigned long) (bck->bk->size))  //小于最小的size
                    {
                      fwd = bck;
                      bck = bck->bk;

                      victim->fd_nextsize = fwd->fd;
                      victim->bk_nextsize = fwd->fd->bk_nextsize;
                      fwd->fd->bk_nextsize = victim->bk_nextsize->fd_nextsize = victim;
                    }
                  else
                    {
                      assert ((fwd->size & NON_MAIN_ARENA) == 0);
                      while ((unsigned long) size < fwd->size) 
                        {
                          fwd = fwd->fd_nextsize;
                          assert ((fwd->size & NON_MAIN_ARENA) == 0);
                        }

                      if ((unsigned long) size == (unsigned long) fwd->size)
                        /* Always insert in the second position.  */
                        fwd = fwd->fd;   
                      else
                        {
                          victim->fd_nextsize = fwd;
                          victim->bk_nextsize = fwd->bk_nextsize;
                          fwd->bk_nextsize = victim;
                          victim->bk_nextsize->fd_nextsize = victim;
                        }
                      bck = fwd->bk;
                    }
                }
              else
                victim->fd_nextsize = victim->bk_nextsize = victim;
            }

          mark_bin (av, victim_index);   //标记相应binmap
          victim->bk = bck;
          victim->fd = fwd;
          fwd->bk = victim;
          bck->fd = victim;    //插入

#define MAX_ITERS       10000
          if (++iters >= MAX_ITERS)   //最多循环1000次
            break;
        }

过程

对victim的size进行检查
当申请大小处于smallbin范围&&victim为unsortedbin中最后一个chunk&&victim为last_remainer&&size大于申请大小+min_size,采用last_remainer分配
将victim移除unsorted
如果size刚好则直接返回
否则进行进入bin的成链准备
标志binmap,并彻底成链,unsorted遍历也是唯一一会mark_bin处

编译后largebin的分配

同一个largebin的size的大小是不一定相等的，而largebin的链表头->bk是最小的size，链表头->fd是最大的size，在除了链表头以外指向bk越来越大，指向fd越来越小；而bk_nextsize和fd_nextsize是指向下一个不同大小的chunk，同一个大小的chunk只有第一个chunk有nextsize，其他的nextsize位置为垃圾数据，bk_nextsize指向下一个更大的chunk，fd_nextsize指向上一个更小的chunk

if (!in_smallbin_range (nb))
{
  bin = bin_at (av, idx);

  /* skip scan if empty or largest chunk is too small */
  if ((victim = first (bin)) != bin &&                           //victim初始是本bin最大的chunk
      (unsigned long) (victim->size) >= (unsigned long) (nb))    //largebin非空且要分配的size小于largebin最大的size
    {
      victim = victim->bk_nextsize;                    
      while (((unsigned long) (size = chunksize (victim)) <    //找到size恰好大于等于要分配size的chunk
              (unsigned long) (nb)))
        victim = victim->bk_nextsize;

      /* Avoid removing the first entry for a size so that the skip
         list does not have to be rerouted.  */
      if (victim != last (bin) && victim->size == victim->fd->size)
        victim = victim->fd;    //尽量不取带有nextsize的chunk

      remainder_size = size - nb;
      unlink (av, victim, bck, fwd);   //取出victim，同时也有设置nextsize的功能

      /* Exhaust */
      if (remainder_size < MINSIZE)  
        {
          set_inuse_bit_at_offset (victim, size);
          if (av != &main_arena)
            victim->size |= NON_MAIN_ARENA;
        }
      /* Split */
      else
        {
          remainder = chunk_at_offset (victim, nb);
          /* We cannot assume the unsorted list is empty and therefore
             have to perform a complete insert here.  */
          bck = unsorted_chunks (av);
          fwd = bck->fd;
if (__glibc_unlikely (fwd->bk != bck))    //检查unsorted bin
            {
              errstr = "malloc(): corrupted unsorted chunks";
              goto errout;
            }
          remainder->bk = bck;    //将剩余的块放到unsorted bin里
          remainder->fd = fwd;
          bck->fd = remainder;
          fwd->bk = remainder;
          if (!in_smallbin_range (remainder_size))
            {
              remainder->fd_nextsize = NULL;
              remainder->bk_nextsize = NULL;
            }
          set_head (victim, nb | PREV_INUSE |
                    (av != &main_arena ? NON_MAIN_ARENA : 0));
          set_head (remainder, remainder_size | PREV_INUSE);
          set_foot (remainder, remainder_size);
        }
      check_malloced_chunk (av, victim, nb);
      void *p = chunk2mem (victim);
      alloc_perturb (p, bytes);
      return p;
    }
}

过程：

首先先前条件是largebin不为空，要分配的size小于索引的最大size
之后找到size恰好大于要分配的size的chunk
如果该chunk不唯一，则取其fd指针指向的(尽量不取nextsize)
之后取出本chunk，得到remainder
如果remainder较小则保持原样，仍然在该chunk上即部分个
如果remainder>=MINSIZE，就取出然后放入unsortedbin里，在放入前会有一个check，即unsortedbin里的表头->fd->bk==表头
之后分配返回该chunk

binmap情况

当在smallbin和largebin没有找到恰恰刚刚好的chunk的时候(bin里没有chunk)，就会进行到这一步

/*
   通过扫描存储桶来搜索内存块，从下一个较大的存储桶开始。这个搜索严格按照最佳适配的原则进行，也就是选择最小的（如果有多个大小相同的内存块，则选择近似最近使用的）适合的内存块。

   位图的使用避免了需要检查大多数块是否非空的情况。在还没有返回任何内存块的热身阶段跳过所有存储桶的特殊情况比看起来要快。
*/

/*
 	Search for a chunk by scanning bins, starting with next largest
 bin. This search is strictly by best-fit; i.e., the smallest
 (with ties going to approximately the least recently used) chunk
 that fits is selected.

 	The bitmap avoids needing to check that most blocks are nonempty.
 The particular case of skipping all bins during warm-up phases
 when no chunks have been returned yet is faster than it might look.
*/

   ++idx;            //此时必然没有合适的chunk，那么就++开始找
   bin = bin_at (av, idx);
   block = idx2block (idx);   //block表示对应的idx属于哪一个block
   map = av->binmap[block];   //map就表示block对应的bit组成的二进制数字
   bit = idx2bit (idx);       //idx的bit

   for (;; )
     {
       /* Skip rest of block if there are no more set bits in this block.  */
       if (bit > map || bit == 0)      //该map没有相应的chunk
         {
           do
             {
               if (++block >= BINMAPSIZE) /* out of bins */
                 goto use_top;
             }
           while ((map = av->binmap[block]) == 0);  //如果能不满足说明该map对应的block肯定有满足条件的chunk

           bin = bin_at (av, (block << BINMAPSHIFT));
           bit = 1;
         }

       /* Advance to bin with set bit. There must be one. */ 
       /* 前进到具有设置位的存储桶。必须存在一个存储桶。 */
       while ((bit & map) == 0)   //找到对应的bin
         {
           bin = next_bin (bin);
           bit <<= 1;
           assert (bit != 0);    //bit!=0
         }

       /* Inspect the bin. It is likely to be non-empty */
       victim = last (bin);

       /*  If a false alarm (empty bin), clear the bit. */
       /* 如果是误报（空的存储桶），则清除该位。 */
       if (victim == bin)    
         {
           av->binmap[block] = map &= ~bit; /* Write through */
           bin = next_bin (bin);
           bit <<= 1;
         }

       else//接下来就是常规分割了
         {
           size = chunksize (victim);

           /*  We know the first chunk in this bin is big enough to use. */
           assert ((unsigned long) (size) >= (unsigned long) (nb));

           remainder_size = size - nb;

           /* unlink */
           unlink (av, victim, bck, fwd);

           /* Exhaust */
           if (remainder_size < MINSIZE)
             {
               set_inuse_bit_at_offset (victim, size);
               if (av != &main_arena)
                 victim->size |= NON_MAIN_ARENA;
             }

           /* Split */
           else
             {
               remainder = chunk_at_offset (victim, nb);

               /* We cannot assume the unsorted list is empty and therefore
                  have to perform a complete insert here.  */
               bck = unsorted_chunks (av);
               fwd = bck->fd;
if (__glibc_unlikely (fwd->bk != bck))
                 {
                   errstr = "malloc(): corrupted unsorted chunks 2";
                   goto errout;
                 }
               remainder->bk = bck;
               remainder->fd = fwd;
               bck->fd = remainder;
               fwd->bk = remainder;

               /* advertise as last remainder */
               if (in_smallbin_range (nb))
                 av->last_remainder = remainder;
               if (!in_smallbin_range (remainder_size))
                 {
                   remainder->fd_nextsize = NULL;
                   remainder->bk_nextsize = NULL;
                 }
               set_head (victim, nb | PREV_INUSE |
                         (av != &main_arena ? NON_MAIN_ARENA : 0));
               set_head (remainder, remainder_size | PREV_INUSE);
               set_foot (remainder, remainder_size);
             }
           check_malloced_chunk (av, victim, nb);
           void *p = chunk2mem (victim);
           alloc_perturb (p, bytes);
           return p;
         }
     }

过程：

首先先++idx，因为此时必然没有合适的chunk，那么就++开始找，然后赋值block map bit那些的
之后判定bit>map(该map没有对应的chunk) || bit==0(出界)，如果判定成功就会往下一个block找，如果block>=BINMAPSIZE说明超出界限，那么直接进行use top，但是如果找到了对应的block则该block一定存在空闲的chunk
接下来就是在这个map里一位一位找，直到找到对应的bin
然后判定如果该bin为空(被用完了但是没有置零)，就会置零然后重头开始
之后满足所有条件就是分割chunk的常规操作

use_top

victim = av->top;
      size = chunksize (victim);

      if ((unsigned long) (size) >= (unsigned long) (nb + MINSIZE))
        {
          remainder_size = size - nb;
          remainder = chunk_at_offset (victim, nb);
          av->top = remainder;
          set_head (victim, nb | PREV_INUSE |
                    (av != &main_arena ? NON_MAIN_ARENA : 0));
          set_head (remainder, remainder_size | PREV_INUSE);

          check_malloced_chunk (av, victim, nb);
          void *p = chunk2mem (victim);
          alloc_perturb (p, bytes);
          return p;
        }

      /* When we are using atomic ops to free fast chunks we can get
         here for all block sizes.  */
      else if (have_fastchunks (av))
        {
          malloc_consolidate (av);
          /* restore original bin index */
          if (in_smallbin_range (nb))
            idx = smallbin_index (nb);
          else
            idx = largebin_index (nb);
        }

如果topchunk的size足够分配,那么从topchunk中进行切割
若不够,如果有fast chunk则进行malloc_consolidate,并再次计算idx,回到大循环起始,一般是申请small chunk进入大循环才有可能会触发这个选项
若不够,且没有fast chunk,采用sysmalloc

sysmalloc

else
        {
          void *p = sysmalloc (nb, av);
          if (p != NULL)
            alloc_perturb (p, bytes);
          return p;
        }
    }
}

__libc_free

void
__libc_free (void *mem)
{
  mstate ar_ptr;
  mchunkptr p;                          /* chunk corresponding to mem */

  void (*hook) (void *, const void *)
    = atomic_forced_read (__free_hook);
  if (__builtin_expect (hook != NULL, 0))   //__free_hook的调用
    {
      (*hook)(mem, RETURN_ADDRESS (0));
      return;
    }

  if (mem == 0)                              /* free(0) has no effect */
    return;

  p = mem2chunk (mem);

  if (chunk_is_mmapped (p))                       /* release mmapped memory. */
    {
      /* see if the dynamic brk/mmap threshold needs adjusting */
      if (!mp_.no_dyn_threshold
          && p->size > mp_.mmap_threshold
          && p->size <= DEFAULT_MMAP_THRESHOLD_MAX)
        {
          mp_.mmap_threshold = chunksize (p);
          mp_.trim_threshold = 2 * mp_.mmap_threshold;
          LIBC_PROBE (memory_mallopt_free_dyn_thresholds, 2,
                      mp_.mmap_threshold, mp_.trim_threshold);
        }
      munmap_chunk (p);
      return;
    }

  ar_ptr = arena_for_chunk (p);
  _int_free (ar_ptr, p, 0);
}

过程：

如果__free_hook上有东西则直接调用，然后返回
free(0)没有影响，直接返回
如果是mmap的chunk则特殊处理
正常调用_int_free

_int_free

初始化

static void
_int_free (mstate av, mchunkptr p, int have_lock)
{
  INTERNAL_SIZE_T size;        /* its size */
  mfastbinptr *fb;             /* associated fastbin */
  mchunkptr nextchunk;         /* next contiguous chunk */
  INTERNAL_SIZE_T nextsize;    /* its size */
  int nextinuse;               /* true if nextchunk is used */
  INTERNAL_SIZE_T prevsize;    /* size of previous contiguous chunk */
  mchunkptr bck;               /* misc temp for linking */
  mchunkptr fwd;               /* misc temp for linking */

  const char *errstr = NULL;
  int locked = 0;

  size = chunksize (p);

检查该chunk的合法性

if (__builtin_expect ((uintptr_t) p > (uintptr_t) -size, 0)   //指针超出了地址空间的范围
    || __builtin_expect (misaligned_chunk (p), 0))       //指针p不对齐
  {
    errstr = "free(): invalid pointer";
  errout:
    if (!have_lock && locked)
      (void) mutex_unlock (&av->mutex);
    malloc_printerr (check_action, errstr, chunk2mem (p), av);
    return;
  }
/* We know that each chunk is at least MINSIZE bytes in size or a
   multiple of MALLOC_ALIGNMENT.  */
if (__glibc_unlikely (size < MINSIZE || !aligned_OK (size)))  //size < MINSIZE或者size不对齐
  {
    errstr = "free(): invalid size";
    goto errout;
  }

check_inuse_chunk(av, p);

fastbin

  if ((unsigned long)(size) <= (unsigned long)(get_max_fast ())   //属于fastbin范围

#if TRIM_FASTBINS   //为0，不调用
      /*
	If TRIM_FASTBINS set, don't place chunks
	bordering top into fastbins
      */
      && (chunk_at_offset(p, size) != av->top)
#endif
      ) {

    if (__builtin_expect (chunk_at_offset (p, size)->size <= 2 * SIZE_SZ, 0)  
	|| __builtin_expect (chunksize (chunk_at_offset (p, size))    //检查next_chunk的size大小合法
			     >= av->system_mem, 0))
      {
	/* We might not have a lock at this point and concurrent modifications
	   of system_mem might have let to a false positive.  Redo the test
	   after getting the lock.  */
  /*
   在这一点上，我们可能还没有获取到锁，并发修改 system_mem 可能导致了一个误报。在获取锁之后重新进行测试。
  */
	if (have_lock
	    || ({ assert (locked == 0);
		  mutex_lock(&av->mutex);
		  locked = 1;                  //加锁
		  chunk_at_offset (p, size)->size <= 2 * SIZE_SZ
		    || chunksize (chunk_at_offset (p, size)) >= av->system_mem;
	      }))
	  {
	    errstr = "free(): invalid next size (fast)";
	    goto errout;
	  }
	if (! have_lock)   
	  {
	    (void)mutex_unlock(&av->mutex);
	    locked = 0;
	  }              //由于要求不要锁，所以mutex与locked置0
      }

    free_perturb (chunk2mem(p), size - 2 * SIZE_SZ);

    set_fastchunks(av);   
    unsigned int idx = fastbin_index(size);               //计算索引
    fb = &fastbin (av, idx);                              //fb是链表头

    /* Atomically link P to its fastbin: P->FD = *FB; *FB = P;  */
    mchunkptr old = *fb, old2;                            //old=*fb，定义old2
    unsigned int old_idx = ~0u;
    do
      {
	/* Check that the top of the bin is not the record we are going to add
	   (i.e., double free).  */
	if (__builtin_expect (old == p, 0))     //检查链表头是不是要释放的chunk，避免double free
	  {
	    errstr = "double free or corruption (fasttop)";
	    goto errout;
	  }
	/* Check that size of fastbin chunk at the top is the same as
	   size of the chunk that we are adding.  We can dereference OLD
	   only if we have the lock, otherwise it might have already been
	   deallocated.  See use of OLD_IDX below for the actual check.  */
  /*
   检查顶部的 fastbin 块的大小是否与我们要添加的块的大小相同。
   只有在持有锁的情况下才能引用 OLD，否则它可能已经被释放。
   下面的 OLD_IDX 的使用是实际检查的部分。
  */
	if (have_lock && old != NULL)
	  old_idx = fastbin_index(chunksize(old));
	p->fd = old2 = old;       
      }
    while ((old = catomic_compare_and_exchange_val_rel (fb, p, old2)) != old2);  //总的来说目的就是将p插入到fastbin里

    if (have_lock && old != NULL && __builtin_expect (old_idx != idx, 0))
      {
	errstr = "invalid fastbin entry (free)";
	goto errout;
      }
  }

继续检查

 /*
   Consolidate other non-mmapped chunks as they arrive.
 */

 else if (!chunk_is_mmapped(p)) {
   if (! have_lock) {
     (void)mutex_lock(&av->mutex);                    //加锁
     locked = 1;
   }

   nextchunk = chunk_at_offset(p, size);

   /* Lightweight tests: check whether the block is already the
      top block.  */
   if (__glibc_unlikely (p == av->top))    //p不能是top chunk
     {
errstr = "double free or corruption (top)";
goto errout;
     }
   /* Or whether the next chunk is beyond the boundaries of the arena.  */
   if (__builtin_expect (contiguous (av)
		  && (char *) nextchunk
		  >= ((char *) av->top + chunksize(av->top)), 0))       //内存块连续且nextchunk超出边界
     {
errstr = "double free or corruption (out)";
goto errout;
     }
   /* Or whether the block is actually not marked used.  */
   if (__glibc_unlikely (!prev_inuse(nextchunk)))    //nextchunk的p位要为1
     {
errstr = "double free or corruption (!prev)";
goto errout;
     }

   nextsize = chunksize(nextchunk);
   if (__builtin_expect (nextchunk->size <= 2 * SIZE_SZ, 0)
|| __builtin_expect (nextsize >= av->system_mem, 0))     //size的大小要合法
     {
errstr = "free(): invalid next size (normal)";
goto errout;
     }

   free_perturb (chunk2mem(p), size - 2 * SIZE_SZ);

考虑合并

   /* consolidate backward */
   if (!prev_inuse(p)) {         //检查，前向合并
     prevsize = p->prev_size;
     size += prevsize;
     p = chunk_at_offset(p, -((long) prevsize));
     unlink(av, p, bck, fwd);
   }

   if (nextchunk != av->top) {        //如果nextchunk不是topchunk
     /* get and clear inuse bit */
     nextinuse = inuse_bit_at_offset(nextchunk, nextsize);  //设置nextchunk的p位

     /* consolidate forward */
     if (!nextinuse) {   //检查，后向合并
unlink(av, nextchunk, bck, fwd);
size += nextsize;
     } else
clear_inuse_bit_at_offset(nextchunk, 0);

unsorted bin

     /*
Place the chunk in unsorted chunk list. Chunks are
not placed into regular bins until after they have
been given one chance to be used in malloc.
     */

     bck = unsorted_chunks(av);
     fwd = bck->fd;
     if (__glibc_unlikely (fwd->bk != bck))   //bck->fd->bk==bck
{
  errstr = "free(): corrupted unsorted chunks";
  goto errout;
}
     p->fd = fwd;        //插入unsorted bin里
     p->bk = bck;
     if (!in_smallbin_range(size))
{
  p->fd_nextsize = NULL;
  p->bk_nextsize = NULL;
}
     bck->fd = p;
     fwd->bk = p;

     set_head(p, size | PREV_INUSE);
     set_foot(p, size);

     check_free_chunk(av, p);
   }

   /*
     If the chunk borders the current high end of memory,
     consolidate into top
   */

   else {         //如果nextchunk是topchunk，则合并到topchunk里
     size += nextsize;
     set_head(p, size | PREV_INUSE);
     av->top = p;
     check_chunk(av, p);
   }

整理堆块

    /*
      If freeing a large space, consolidate possibly-surrounding
      chunks. Then, if the total unused topmost memory exceeds trim
      threshold, ask malloc_trim to reduce top.

      Unless max_fast is 0, we don't know if there are fastbins
      bordering top, so we cannot tell for sure whether threshold
      has been reached unless fastbins are consolidated.  But we
      don't want to consolidate on each free.  As a compromise,
      consolidation is performed if FASTBIN_CONSOLIDATION_THRESHOLD
      is reached.
    */

    if ((unsigned long)(size) >= FASTBIN_CONSOLIDATION_THRESHOLD) {      //如果该chunk不在fastbin范围里
      if (have_fastchunks(av))
	malloc_consolidate(av);            //整理fastbin

      if (av == &main_arena) {
#ifndef MORECORE_CANNOT_TRIM       
	if ((unsigned long)(chunksize(av->top)) >=
	    (unsigned long)(mp_.trim_threshold))
	  systrim(mp_.top_pad, av);        //收缩堆
#endif
      } else {
	/* Always try heap_trim(), even if the top chunk is not
	   large, because the corresponding heap might go away.  */
	heap_info *heap = heap_for_ptr(top(av));

	assert(heap->ar_ptr == av);
	heap_trim(heap, mp_.top_pad);
      }
    }

    if (! have_lock) {
      assert (locked);
      (void)mutex_unlock(&av->mutex);
    }
  }
  /*
    If the chunk was allocated via mmap, release via munmap().
  */

  else {            //如果chunk是mmap得到的，那就调用munmap_chunk
    munmap_chunk (p);
  }
}

除非有报错，正常情况下_int_free会重头执行到尾

过程：fastbin->合并->unsorted bin->整理堆块

glibc-2.27

最大的区别就是2.26加了一个tcache

先来了解一些tcache的调用

tcache

tcache结构体

/* We overlay this structure on the user-data portion of a chunk when
   the chunk is stored in the per-thread cache.  */
typedef struct tcache_entry
{
  struct tcache_entry *next;
} tcache_entry;

/* There is one of these for each thread, which contains the
   per-thread cache (hence "tcache_perthread_struct").  Keeping
   overall size low is mildly important.  Note that COUNTS and ENTRIES
   are redundant (we could have just counted the linked list each
   time), this is for performance reasons.  */
typedef struct tcache_perthread_struct
{
  char counts[TCACHE_MAX_BINS];
  tcache_entry *entries[TCACHE_MAX_BINS];
} tcache_perthread_struct;

static __thread bool tcache_shutting_down = false;
static __thread tcache_perthread_struct *tcache = NULL;

tcache_get

/* Caller must ensure that we know tc_idx is valid and there's
   available chunks to remove.  */
static __always_inline void *
tcache_get (size_t tc_idx)
{
  tcache_entry *e = tcache->entries[tc_idx];
  assert (tc_idx < TCACHE_MAX_BINS);
  assert (tcache->entries[tc_idx] > 0);
  tcache->entries[tc_idx] = e->next;
  --(tcache->counts[tc_idx]);
  return (void *) e;
}

从tcache取出一个对应的块

tcache_put

/* Caller must ensure that we know tc_idx is valid and there's room
   for more chunks.  */
static __always_inline void
tcache_put (mchunkptr chunk, size_t tc_idx)
{
  tcache_entry *e = (tcache_entry *) chunk2mem (chunk);
  assert (tc_idx < TCACHE_MAX_BINS);
  e->next = tcache->entries[tc_idx];
  tcache->entries[tc_idx] = e;
  ++(tcache->counts[tc_idx]);
}

将chunk放入对应的tcache

__libc_malloc

#if USE_TCACHE
  /* int_free also calls request2size, be careful to not pad twice.  */
  size_t tbytes;
  checked_request2size (bytes, tbytes);
  size_t tc_idx = csize2tidx (tbytes);

  MAYBE_INIT_TCACHE ();

  DIAG_PUSH_NEEDS_COMMENT;f
  if (tc_idx < mp_.tcache_bins
      /*&& tc_idx < TCACHE_MAX_BINS*/ /* to appease gcc */
      && tcache
      && tcache->entries[tc_idx] != NULL)
    {
      return tcache_get (tc_idx);
    }
  DIAG_POP_NEEDS_COMMENT;
#endif

  if (SINGLE_THREAD_P)
    {
      victim = _int_malloc (&main_arena, bytes);
      assert (!victim || chunk_is_mmapped (mem2chunk (victim)) ||
	      &main_arena == arena_for_chunk (mem2chunk (victim)));
      return victim;
    }

在__libc_malloc里多了tcache的操作，可以直接在__libc_malloc获得chunk返回

并且多了SINGLE_THREAD_P判定，即是否为单线程，如果为单线程，就直接调用_int_malloc分配chunk，之后进行断言检查

_int_malloc

fastbin

#define REMOVE_FB(fb, victim, pp)			\
  do							\
    {							\
      victim = pp;					\
      if (victim == NULL)				\
	break;						\
    }							\
  while ((pp = catomic_compare_and_exchange_val_acq (fb, victim->fd, victim)) != victim);				


  if ((unsigned long) (nb) <= (unsigned long) (get_max_fast ()))
    {
      idx = fastbin_index (nb);
      mfastbinptr *fb = &fastbin (av, idx);
      mchunkptr pp;
      victim = *fb;

      if (victim != NULL)    
	{
	  if (SINGLE_THREAD_P)
	    *fb = victim->fd;
	  else
	    REMOVE_FB (fb, pp, victim);
	  if (__glibc_likely (victim != NULL))
	    {
	      size_t victim_idx = fastbin_index (chunksize (victim));
	      if (__builtin_expect (victim_idx != idx, 0))
		malloc_printerr ("malloc(): memory corruption (fast)");
	      check_remalloced_chunk (av, victim, nb);
#if USE_TCACHE
	      /* While we're here, if we see other chunks of the same size,
		 stash them in the tcache.  */
	      size_t tc_idx = csize2tidx (nb);
	      if (tcache && tc_idx < mp_.tcache_bins)
		{
		  mchunkptr tc_victim;

		  /* While bin not empty and tcache not full, copy chunks.  */
		  while (tcache->counts[tc_idx] < mp_.tcache_count
			 && (tc_victim = *fb) != NULL)
		    {
		      if (SINGLE_THREAD_P)
			*fb = tc_victim->fd;
		      else
			{
			  REMOVE_FB (fb, pp, tc_victim);
			  if (__glibc_unlikely (tc_victim == NULL))
			    break;
			}
		      tcache_put (tc_victim, tc_idx);
		    }
		}
#endif
	      void *p = chunk2mem (victim);
	      alloc_perturb (p, bytes);
	      return p;
	    }
	}
    }

fastbin移除形式增加至两种
- 如果为单线程SINGLE_THREAD_P那么就*fb = victim->fd;直接移除
- 如果不是SINGLE_THREAD_P那么就调用REMOVE_FB宏就是glibc2.23的正常移除形式
新增fastbin填充tcache机制,当从fastbin中取出chunk后,如果该fastbin中还有剩余chunk,且对应tcache中有剩余空间,则会将fastbin中的chunk移入tcachebin

smallbin

首先去除了smallbin未初始化就触发malloc_consolidate机制

#if USE_TCACHE
	  /* While we're here, if we see other chunks of the same size,
	     stash them in the tcache.  */
	  size_t tc_idx = csize2tidx (nb);
	  if (tcache && tc_idx < mp_.tcache_bins)  //tcache存在，且tc_idx对应的tcache存在
	    {
	      mchunkptr tc_victim;

	      /* While bin not empty and tcache not full, copy chunks over.  */
	      while (tcache->counts[tc_idx] < mp_.tcache_count  //no full
		     && (tc_victim = last (bin)) != bin)            //链表有多个节点
		{
		  if (tc_victim != 0)
		    {
		      bck = tc_victim->bk;
		      set_inuse_bit_at_offset (tc_victim, nb);
		      if (av != &main_arena)
			set_non_main_arena (tc_victim);
		      bin->bk = bck;
		      bck->fd = bin;

		      tcache_put (tc_victim, tc_idx);   //把相应的chunk放到tcache
	            }
		}
	    }
#endif

smallbin同样新增tcache填充机制,当从smallbin中取出chunk后,如果该smallbinbin中还有剩余chunk,且对应tcache中有剩余空间,则会将smallbin中的chunk移入tcachebin

注意填充过程中是没有对smallbin的完整性进行检查的

大循环

#if USE_TCACHE
  INTERNAL_SIZE_T tcache_nb = 0;
  size_t tc_idx = csize2tidx (nb);
  if (tcache && tc_idx < mp_.tcache_bins)
    tcache_nb = nb;
  int return_cached = 0;

  tcache_unsorted_count = 0;
#endif

设置参数

unsorted bin

#if USE_TCACHE
	      /* Fill cache first, return to user only if cache fills.
		 We may return one of these chunks later.  */
	      if (tcache_nb
		  && tcache->counts[tc_idx] < mp_.tcache_count)  //如果unsortedbin的符合tcache条件
		{
		  tcache_put (victim, tc_idx);   //放到tcache里
		  return_cached = 1;
		  continue;
		}
	      else
		{
#endif

如果unsortedbin找到的符合tcache，那么先放到tcache里

#if USE_TCACHE
      /* If we've processed as many chunks as we're allowed while
	 filling the cache, return one of the cached ones.  */
      ++tcache_unsorted_count;
      if (return_cached
	  && mp_.tcache_unsorted_limit > 0
	  && tcache_unsorted_count > mp_.tcache_unsorted_limit)
	{
	  return tcache_get (tc_idx);
	}
#endif

在unsortedbin遍历结束后，会再进行判定，如果已经找到符合条件的chunk就直接返回

__libc_free

1	MAYBE_INIT_TCACHE ();

增加了tcache初始化判定，没有则初始化

_int_free

#if USE_TCACHE
  {
    size_t tc_idx = csize2tidx (size);

    if (tcache
	&& tc_idx < mp_.tcache_bins
	&& tcache->counts[tc_idx] < mp_.tcache_count)
      {
	tcache_put (p, tc_idx);
	return;
      }
  }
#endif

tcache有位置，则先放在tcache里(优先级最高MAX)

fastbin

发生了与malloc相似的变化，free了一个fast chunk如果fastbin里还有，都放到tcache里

glibc-2.29

tcache

结构体新增了一个成员key

typedef struct tcache_entry
{
  struct tcache_entry *next;
  /* This field exists to detect double frees.  */
  struct tcache_perthread_struct *key;
} tcache_entry;

__libc_malloc

无变化

_int_malloc

新增了一些检查

大循环

大循环开头

if (__glibc_unlikely (chunksize_nomask (next) < 2 * SIZE_SZ)
  || __glibc_unlikely (chunksize_nomask (next) > av->system_mem))   //next的size合法性
	malloc_printerr ("malloc(): invalid next size (unsorted)");
if (__glibc_unlikely ((prev_size (next) & ~(SIZE_BITS)) != size))   //标志位
	malloc_printerr ("malloc(): mismatching next->prev_size (unsorted)");
if (__glibc_unlikely (bck->fd != victim)                //victim->bk->fd==victim
  || __glibc_unlikely (victim->fd != unsorted_chunks (av)))
	malloc_printerr ("malloc(): unsorted double linked list corrupted");
if (__glibc_unlikely (prev_inuse (next))) 
	malloc_printerr ("malloc(): invalid next->prev_inuse (unsorted)");

分割判定后

1 2	if (__glibc_unlikely (bck->fd != victim)) //victim->bk->fd==victim malloc_printerr ("malloc(): corrupted unsorted chunks 3");

__libc_free

无变化

_int_free

if (__glibc_unlikely (e->key == tcache))
  {
    tcache_entry *tmp;
    LIBC_PROBE (memory_tcache_double_free, 2, e, tc_idx);
    for (tmp = tcache->entries[tc_idx];
     tmp;
     tmp = tmp->next)
      if (tmp == e)
    malloc_printerr ("free(): double free detected in tcache 2");
    /* If we get here, it was a coincidence.  We've wasted a
       few cycles, but don't abort.  */
  }

在free的时候会遍历所有tcache，对比要free的是否存在double free

glibc-2.32

tcache

tcache_put

1	e->next = PROTECT_PTR (&e->next, tcache->entries[tc_idx]);

放入tcache时加密

tcache_get

1	tcache->entries[tc_idx] = REVEAL_PTR (e->next);

tcache取出时解密

__libc_malloc

没变化

_int_malloc

1 2	if (SINGLE_THREAD_P) *fb = REVEAL_PTR (tc_victim->fd);

1	*fb = REVEAL_PTR (tc_victim->fd);

fastbin和tcache的时候，会对加密后的指针解密

__libc_free

没变化

__int_free

1	p->fd = PROTECT_PTR (&p->fd, old);

fastbin的时候对指针加密，同时判断tcache double free的时候会解密

tcache&&fastbin加密

/* Safe-Linking:
   Use randomness from ASLR (mmap_base) to protect single-linked lists
   of Fast-Bins and TCache.  That is, mask the "next" pointers of the
   lists' chunks, and also perform allocation alignment checks on them.
   This mechanism reduces the risk of pointer hijacking, as was done with
   Safe-Unlinking in the double-linked lists of Small-Bins.
   It assumes a minimum page size of 4096 bytes (12 bits).  Systems with
   larger pages provide less entropy, although the pointer mangling
   still works.  */
#define PROTECT_PTR(pos, ptr) \
  ((__typeof (ptr)) ((((size_t) pos) >> 12) ^ ((size_t) ptr)))
#define REVEAL_PTR(ptr)  PROTECT_PTR (&ptr, ptr)

tcache和fastbin新增链表加密

加密的具体方法是:取chunk的fd字段(next字段)的地址右移12位,再与要加密的数据异或,得到结果

解密则是与加密恰好相反

并且可以发现:

对于一条fastbin或tcachebin单链表,从链表头看起,他的第一个chunk成员是不加密的,只有从第二个开始才会进行加密

glibc-2.34

发现在该版本去除了所有的hook函数

__libc_malloc

主要是去除了hook

__int_malloc

没变化

__libc_free

去除了hook

_int_free

没变化

tcache

key验证

以往key_entry结构体中用于验证double free的key字段是用tcache填充,现在改为用tcache_key填充

tcache_key的产生如下

/* Process-wide key to try and catch a double-free in the same thread.  */
static uintptr_t tcache_key;

/* The value of tcache_key does not really have to be a cryptographically
   secure random number.  It only needs to be arbitrary enough so that it does
   not collide with values present in applications.  If a collision does happen
   consistently enough, it could cause a degradation in performance since the
   entire list is checked to check if the block indeed has been freed the
   second time.  The odds of this happening are exceedingly low though, about 1
   in 2^wordsize.  There is probably a higher chance of the performance
   degradation being due to a double free where the first free happened in a
   different thread; that's a case this check does not cover.  */
static void
tcache_key_initialize (void)
{
  if (__getrandom (&tcache_key, sizeof(tcache_key), GRND_NONBLOCK)
      != sizeof (tcache_key))
    {
      tcache_key = random_bits ();
#if __WORDSIZE == 64
      tcache_key = (tcache_key << 32) | random_bits ();
#endif
    }
}

glibc-2.39

__libc_malloc

无变化

_int_malloc

没变化

__libc_free

没变化

_int_free

没什么变化，就是部分调用由_int_free_merge_chunk (av, p, size);来完成