Redis之字典

2022-06-14 02:00:41

字典结构

字典(dict)是Redis中使用最频繁的数据结构，除了hash会使用到字典(dict)，整个Redis数据库中的所有key/value也组成了一个全局字典，

带有过期时间的key集合也是一个字典，zset集合中的value和score值映射也是通过字典实现的，Set也是字典实现，只是Value都是NULL。

字典结构中包含两个hashtable，一般情况下只有一个hashtable有值，在字典拓容缩容时，需要分配新的hashtable, 此时需要进行渐进式搬迁，

这时候两个hashtable一个存储旧数据，另一个存储新数据，搬迁结束后，旧hashtable被删除，新hashtable取而代之。

struct dict {

  dictht ht[2];
  ....

}

hashtable结构与Java的HashMap基本类似，都通过分桶的方式解决hash冲突，一维结构是数组，二维结构是链表，数组每个节点存储链表的第一个元素指针。

struct dictht {
    dictEntry** table; //链表
    long size; // 一维数组长度
    long used; // hash表中的元素个数
    ....
}


struct dictEntry {
    void* key;
    void* val;
    dictEntry* next; //下一个entry
}

渐进式hash

字典比较大的情况下，扩容非常耗费时间，因为要重新申请新的数组，然后将旧字典中的所有链表重新hash到新数组中，是O(n)级别的操作，会引起Redis阻塞，因此redis使用渐进式rehash，一点一点扩容。

dictEntry *dictAddRaw(dict *d, void *key, dictEntry **existing)
{
    long index;
    dictEntry *entry;
    dictht *ht;

    // 渐进式rehash
    if (dictIsRehashing(d)) _dictRehashStep(d);

    /* Get the index of the new element, or -1 if
     * the element already exists. */
    if ((index = _dictKeyIndex(d, key, dictHashKey(d,key), existing)) == -1)
        return NULL;

    /* Allocate the memory and store the new entry.
     * Insert the element in top, with the assumption that in a database
     * system it is more likely that recently added entries are accessed
     * more frequently. */
    // 如果字典处于搬迁中，将新元素挂在新的数组下面
    ht = dictIsRehashing(d) ? &d->ht[1] : &d->ht[0];
    entry = zmalloc(sizeof(*entry));
    entry->next = ht->table[index];
    ht->table[index] = entry;
    ht->used++;

    /* Set the hash entry fields. */
    dictSetKey(d, entry, key);
    return entry;
}

搬迁操作可以在对字典操作的hset、hdel等指令中进行，如果没有指令到来，Redis继续在定时任务中对字典进行主动搬迁，直到搬迁完毕。

// 定时任务
void databasesCron(void) {
    ......
        /* Rehash 继续搬迁 */
        if (server.activerehashing) {
            for (j = 0; j < dbs_per_call; j++) {
                int work_done = incrementallyRehash(rehash_db);
                if (work_done) {
                    /* If the function did some work, stop here, we'll do
                     * more at the next cron loop. */
                    break;
                } else {
                    /* If this db didn't need rehash, we'll try the next one. */
                    rehash_db++;
                    rehash_db %= server.dbnum;
                }
            }
        }
}

查找过程

新增和删除操作都必须先定位元素所在的链表上，然后对链表元素进行操作，redis中的hash函数会将key映射为一个整数，不同的key会被分布成比较均匀散乱的整数，只有hash值均匀，字典才是稳定平衡的，链表中的元素不会过多，查找起来速度也就更快。

func get(key) {
    let index = hash_func(key) % size;
    let entry = table[index];
    while(entry != NULL) {
        if entry.key == target {
            return entry.value;
        }
        entry = entry.next;
    }

}

hash函数与hash攻击

字典的性能好不好取决于hash函数，key被分布的越均匀性能就越好，redis默认的hash函数是siphash, 该算法在输入的key很小的情况下，也可以产生随机性很好的结果。

如果hash函数存在偏向性，此时客户端大量输入特定的key，将会导致字典的二维链表元素长度过长，当很多的元素在链表中时，字典的查找复杂度由O(1)退化的O(N)，服务器可能会被拖垮，这种就是hash攻击。

拓容条件与缩容条件

正常情况下，当字典中的元素个数超过第一维数组时，将进行拓容，扩容的新数组大小是原来的2倍，但是redis如果正在做bgsave，为了减少内存页过多的COW，则尽量不去拓容，但是如果字典中的元素个数达到了一维数组的5倍，此时将会强制拓容。

static int _dictExpandIfNeeded(dict *d)
{
    /* Incremental rehashing already in progress. Return. */
    if (dictIsRehashing(d)) return DICT_OK;

    /* If the hash table is empty expand it to the initial size. */
    if (d->ht[0].size == 0) return dictExpand(d, DICT_HT_INITIAL_SIZE);

    /* If we reached the 1:1 ratio, and we are allowed to resize the hash
     * table (global setting) or we should avoid it but the ratio between
     * elements/buckets is over the "safe" threshold, we resize doubling
     * the number of buckets. */
    if (d->ht[0].used >= d->ht[0].size &&
        (dict_can_resize ||
         d->ht[0].used/d->ht[0].size > dict_force_resize_ratio))
    {
        return dictExpand(d, d->ht[0].used*2);
    }
    return DICT_OK;
}

当字典中的元素被删除的越来越稀疏时，redis将对字典进行缩容操作，减少一维数组空间占用，缩容条件是元素个数低于数组长度的10%，并且缩容并不会考虑Redis是否在bgsave。

int htNeedsResize(dict *dict) {
    long long size, used;

    size = dictSlots(dict);
    used = dictSize(dict);
    return (size > DICT_HT_INITIAL_SIZE &&
            (used*100/size < HASHTABLE_MIN_FILL));
}

码农公寓

字典结构

渐进式hash

查找过程

hash函数与hash攻击

拓容条件与缩容条件

相关文章