Redis数据类型-hashes总结

2022-10-14 13:27:34

Redis的hashes类型是用来存储行记录的数据类型，一个key可以存储多条记录。

一、基本使用

HSET key field value

1、HSET是新增数据语法

2、key 是存储的数据key

3、field 是hash表中的某条记录名称

4、value是hash表某条数据的值

HGET key field

1、 hget是获取行数据的语法

2、根据key和field获取某行记录值

二、使用特点

1、field和value都是字符串，一个key对应多个field和value

2、某个field不能单独设置过期时间，只能对某个key设置过期时间

3、存在数据量大的时候，数据分布不均匀的情况

三、存储实现

hash类型总共有两种编码类型，一种是ziplist,一种是hashtable。

什么情况下使用ziplist呢？redis提供了两个配置项，如果hash表中

小于512个元素或者元素值小于64字节时使用ziplist。

//小于512个元素hash-max-ziplist-entries 512//元素值小于64字节hash-max-ziplist-value 64

总结：元素个数小，元素值内容占用小，用ziplist。

四、ziplist存储结构

在源码ziplist.c中，说明了ziplist的整体结构，如下：

<zlbytes> <zltail> <zllen> <entry> <entry> ... <entry> <zlend>

zlbytes:是一个32位无符号整数，用于保存 ziplist 占用的字节数。

zltail:是列表中最后一个fiedl的偏移量，32位无符号整数

zllen:field的数量，16位无符号整数

entry:具体的field内容

zllen:是一个特殊的条目，代表 ziplist 的结尾

具体field结构

<prevlen> <encoding> <entry-data>

prevlen：代表前一个元素长度，如果该长度小于 254 字节，它将仅消耗一个字节，表示该长度为未编码的8位整数。当长度大于等于254时，会消耗5个字节。第一个字节设置为 254 (FE) 以指示后面有更大的值。剩下的 4 个字节取前一个条目的长度作为值

encoding：是编码类型，字符串或者整型，编码属性和内容息息相关。

当entry是字符串时，编码第一个字节的前2位会保存用于存储字符串长度的编码类型，后面是字符串的实际长度
当条目是整数时，前2位都设置为1。接下来的2位用于指定将在此标头后存储哪种整数。

field的存储结构如下图，ziplist是一个字节数组，重复考虑每一个字节的使用，不浪费空间，充分提高内存利用率。

五、hashtable存储结构

在redis的dict.h中定义了hashtable的结构，它和java中的hashmap类似，也是适应数组+链表实现

整体结构图，有两个hashtable,但是只会有一个使用到，另外一个是扩容的时候使用。

扩容条件分析

既然是hashtable，那他和hashmap一样也会存在扩容情况，在redis中由一个标记位和负载因子决定是否扩容

//是否需要扩容static int dict_can_resize = 1;//扩容负载因子static unsigned int dict_force_resize_ratio = 5;

/* 扩容条件判断 Expand the hash table if needed */static int _dictExpandIfNeeded(dict *d){    /* Incremental rehashing already in progress. Return. */    if (dictIsRehashing(d)) return DICT_OK;    /* 初始化If the hash table is empty expand it to the initial size. */    if (d->ht[0].size == 0) return dictExpand(d, DICT_HT_INITIAL_SIZE);    /* If we reached the 1:1 ratio, and we are allowed to resize the hash     * table (global setting) or we should avoid it but the ratio between     * elements/buckets is over the "safe" threshold, we resize doubling     * the number of buckets. */     //使用的空间大于分配的空间    if (d->ht[0].used >= d->ht[0].size &&    //扩容开关打开或者使用的空间大小大于5倍        (dict_can_resize ||         d->ht[0].used/d->ht[0].size > dict_force_resize_ratio))    {        return dictExpand(d, d->ht[0].used*2);    }    return DICT_OK;}```

什么情况下不能扩容呢？

void updateDictResizePolicy(void) {//rdb,aof子进程正在进行备份数据，禁止扩容    if (server.rdb_child_pid == -1 && server.aof_child_pid == -1)        dictEnableResize();    else        dictDisableResize();}  void dictEnableResize(void) {    dict_can_resize = 1;}  void dictDisableResize(void) {    dict_can_resize = 0;}

扩容是否会影响吞吐量和性能？

答案是不会的，redis采用了渐近性的扩容方式，每次add,del,get操作都会对1个数组下标元素进行迁移，直到所有元素迁移完，将dict1指向dict0，重新分配内存给dict0。

例如新增一个元素：

dictEntry *dictAddRaw(dict *d, void *key, dictEntry **existing){    long index;    dictEntry *entry;    dictht *ht;    //尝试帮助扩容一个数组下标位置    if (dictIsRehashing(d)) _dictRehashStep(d);    /* Get the index of the new element, or -1 if     * the element already exists. */    //获取下标，并尝试扩容    if ((index = _dictKeyIndex(d, key, dictHashKey(d,key), existing)) == -1)        return NULL;    /* Allocate the memory and store the new entry.     * Insert the element in top, with the assumption that in a database     * system it is more likely that recently added entries are accessed     * more frequently. */     //如果在扩容中，则新元素加到第2个hashtable数组里    ht = dictIsRehashing(d) ? &d->ht[1] : &d->ht[0];    entry = zmalloc(sizeof(*entry));    entry->next = ht->table[index];    ht->table[index] = entry;    ht->used++;    /* Set the hash entry fields. */    dictSetKey(d, entry, key);    return entry;}

hashtable总结

通过`渐近性扩容方式`，每个新增，删除，查询操作都协助迁移一个元素从第一个hash表到第2个hahs表。

对于新增，修改，删除，查询操作都不会影响，因为有标记，判断是否在扩容中，扩容中则操作第2个hash表，不在扩容中则操作第1个hash表。

六、hash类型总结

hashes存储类型是用来存储多行记录的存储类型，底层使用了两种编码类型来存储hash类型的数据，在设计ziplist时候，充分利用了每一个字节的作用，也是为了精细化的设计这个结构，提升hash结构的存储性能。

hashtable从扩容方式上对性能有充分的考量，基于分而治之的思想，扩容操作分担给多个客户端操作（多个线程），避免长时间的扩容等待。

码农公寓

相关文章