glibc中的strlen实现 – 对不起我不是针对你

近来打算阅读glibc源码, 便从最简单的string.h开始. 说道string库, 第一个想到的是什么函数? 当然是strlen

于是翻开strlen.c , 大牛写的代码就是不一样, 总像是故意把原本简单的东西写得复杂, 仔细阅读才发现简直是神作.

对任何c程序员来说以下代码可以说是信手拈来:

size_t strlen(const char *str) {
    const char *str_iter = str;
    while(*str_iter++);
    return str_iter - str - 1;
}

原以为strlen无非就是做循环, 逐个字节做判断, 看了glibc的strlen实现才发现我之前简直是too young too simple!

先贴代码:

size_t
strlen (str)
     const char *str;
{
  const char *char_ptr;
  const unsigned long int *longword_ptr;
  unsigned long int longword, himagic, lomagic;

  /* Handle the first few characters by reading one character at a time.
     Do this until CHAR_PTR is aligned on a longword boundary.  */
  for (char_ptr = str; ((unsigned long int) char_ptr
            & (sizeof (longword) - 1)) != 0;
       ++char_ptr)
    if (*char_ptr == '')
      return char_ptr - str;

  /* All these elucidatory comments refer to 4-byte longwords,
     but the theory applies equally well to 8-byte longwords.  */

  longword_ptr = (unsigned long int *) char_ptr;

  /* Bits 31, 24, 16, and 8 of this number are zero.  Call these bits
     the "holes."  Note that there is a hole just to the left of
     each byte, with an extra at the end:

     bits:  01111110 11111110 11111110 11111111
     bytes: AAAAAAAA BBBBBBBB CCCCCCCC DDDDDDDD

     The 1-bits make sure that carries propagate to the next 0-bit.
     The 0-bits provide holes for carries to fall into.  */
  himagic = 0x80808080L;
  lomagic = 0x01010101L;
  if (sizeof (longword) > 4)
    {
      /* 64-bit version of the magic.  */
      /* Do the shift in two steps to avoid a warning if long has 32 bits.  */
      himagic = ((himagic << 16) << 16) | himagic;
      lomagic = ((lomagic << 16) << 16) | lomagic;
    }
  if (sizeof (longword) > 8)
    abort ();

  /* Instead of the traditional loop which tests each character,
     we will test a longword at a time.  The tricky part is testing
     if *any of the four* bytes in the longword in question are zero.  */
  for (;;)
    {
      longword = *longword_ptr++;

      if (((longword - lomagic) & ~longword & himagic) != 0)
    {
      /* Which of the bytes was the zero?  If none of them were, it was
         a misfire; continue the search.  */

      const char *cp = (const char *) (longword_ptr - 1);

      if (cp[0] == 0)
        return cp - str;
      if (cp[1] == 0)
        return cp - str + 1;
      if (cp[2] == 0)
        return cp - str + 2;
      if (cp[3] == 0)
        return cp - str + 3;
      if (sizeof (longword) > 4)
        {
          if (cp[4] == 0)
        return cp - str + 4;
          if (cp[5] == 0)
        return cp - str + 5;
          if (cp[6] == 0)
        return cp - str + 6;
          if (cp[7] == 0)
        return cp - str + 7;
        }
    }
    }
}

注释得挺详细, 可以看出这段代码高效就在于循环时不是逐字节判断, 而是逐字长判断

为什么快, 首先这符合循环展开(CSAPP p348), 这样一次循环就可以判断4个/8个字节. 其次, 从机器角度讲, 操作数为机器字长的指令相对较快

但一开始需要先处理一些没对齐的情况, 直到指针指向第一个地址对齐的字节

(i386没有强制要求边界对齐, 但对齐了可以提高处理效率. 这里应该是出于可移植性的考虑, 因为某些cpu需要对齐)

代码最关键的地方在于50行中: (longword – lomagic) & ~longword & himagic

(在旧版本中是(longword – lomagic) & himagic, 但存在一些缺陷)

这个表达式是为了判断longword中是否存在某字节为0, 如果存在则返回0, 否则不为0

对于这行代码, 想了很久, 终于理解了它的正确性.

一: 假设机器字长只有一个字节, longword 也是一个字节

lomagic: 0000 0001
himagic: 1000 0000

(longword – lomagic) & ~longword & himagic 等效为 (longword – 1) & (~longword) & 0x80

只需要关注最高位, (longword-1) 和(~longword) 中任何一个最高位等于0, 则整个表达式为0

1. 如果(longword >= 0x80), 那么~longword 最高位则为0, 于是表达式的值为0

2. 如果(0 < longword < 0x80), 那么 0 <= (longword -1) <= (0x80-2) , 即(longword -1) 最高位为0, 表达式的值为0

3. 如果 (longword == 0) 有 ongword-1 == 255, ~longword == 255, 表达式结果为0x80, 为非0

也就是当(longword == 0) 时结果是非0, (longword != 0) 时结果为0

那么对于单个字节, 表达式的正确性已经得到验证

二. 再假设机器字长为32位, 那么两个魔数位模式如下

          high                                   low
            byte3,     byte2,     byte1,     byte0
lomagic:  0000 0001, 0000 0001, 0000 0001, 0000 0001
himagic:  1000 0000, 1000 0000, 1000 0000, 1000 0000

longword: xxxx xxxx, xxxx xxxx, xxxx xxxx, xxxx xxxx
result:   xxxx xxxx, xxxx xxxx, xxxx xxxx, xxxx xxxx

以result表示表达式计算结果

1. 如果任何字节都不等于0, 也就是每个字节都大于等于1

那么 longword-lomagic 相当于每个字节分别减去1, 不存在借位

后面的取反以及按位与操作, 都可以看作是4个独立的字节单独计算的, 互不影响, 此时表达式的计算结果还是0

2. 如果有一个或者多个字节等于0, 考察值为0的最低字节, 比如longword中byte3, byte1都等于0, 考察其中的byte1

由于从低到高, byte1是第一个值等于0的字节, byte0不会向byte1借位

所以byte1对应的字节仍然可以看作单独计算 (longword_byte1 – 0x01) & ~longword_byte1 & 0x80

于是result中byte1字节处的值不等于0, 整个结果也就非0

第一步中 longword – lomagic 的时候, longword的byte1会向byte2借一位, 如果单独考察byte2做减法时相当于(byte2 – 0x02).

但这不会影响整体的结果, 因为结果中byte1处是非0, 无论byte2结果如何, 总体结果都是非0

32位如此, 64位也是如此

而当遇到字当中存在值为0的字节, 便继续判断是具体哪个字节, 后面的都比较简单了

仔细阅读注释, 可以看到很多精辟的技巧.

字长操作代替字节操作也可以应用到其他函数, 比如strcpy, memcpy, memset