7. Python3源码—Dict对象

7.1. 散列表

散列表的基本思想,是通过一定的函数将需搜索的键值映射为一个整数,将这个整数视为索引值去访问某片连续的内存区域。理论上,在最优情况下,散列表能提供O(1)复杂度的搜索效率。

用于映射的函数称为散列函数(hash function),而映射后的值称为元素的散列值(hash value)。在散列表的实现中,所选择的散列函数的优劣将直接决定所实现的散列表的搜索效率的高低。

在使用散列表的过程中,不同的对象经过散列函数的作用,可能被映射为相同的散列值。而且随着需要存储的数据的增多,这样的冲突就会发生得越来越频繁。散列冲突是散列技术与生俱来的问题。

装载率是散列表中已使用空间和总空间的比值。如果散列表一共可以容纳10个元素,而当前已经装入了6个元素,那么装载率就是6/10。研究表明,当散列表的装载率大于2/3时,散列冲突发生的概率就会大大增加。

当产生散列冲突时,Python会通过一个二次探测函数f,计算下一个候选位置addr,如果位置addr可用,则可将待插入元素放到位置addr;如果位置addr不可用,则Python会再次使用探测函数f,获得下一个候选位置,如此不断探测,总会找到一个可用的位置。

当需要删除某条探测链上的某个元素时,问题就产生了。假如这条链的首元素位置为a,尾元素的位置为c,现在需要删除中间的某个位置b上的元素。如果直接将位置b上的元素删除,则会导致探测链的断裂,从而找不到c。所以,在采用开放定址的冲突解决策略的散列表中,删除某条探测链上的元素时不能进行真正的删除,而是进行一种“伪删除”操作,必须要让该元素还存在于探测链上,担当承前启后的重任。

7.2. Dict对象数据结构

7.2.1. Dict对象的C源码

PyDictObject的C源码如下:

// dictobject.h
typedef struct _dictkeysobject PyDictKeysObject;

/* The ma_values pointer is NULL for a combined table
 * or points to an array of PyObject* for a split table
 */
typedef struct {
    PyObject_HEAD

    /* Number of items in the dictionary */
    Py_ssize_t ma_used;

    /* Dictionary version: globally unique, value change each time the dictionary is modified */
    uint64_t ma_version_tag;

    PyDictKeysObject *ma_keys;

    /* If ma_values is NULL, the table is "combined": keys and values are stored in ma_keys.
       If ma_values is not NULL, the table is splitted: keys are stored in ma_keys and values are stored in ma_values */
    PyObject **ma_values;
} PyDictObject;

PyDictObject结构里最重要的一个属性是ma_keys,在combined table模式下ma_keys存着key和value。ma_keys为PyDictKeysObject 类型,PyDictKeysObject的C源码如下:

// dict-common.h
typedef struct {
    /* Cached hash code of me_key. */
    Py_hash_t me_hash;
    PyObject *me_key;
    PyObject *me_value; /* This field is only meaningful for combined tables */
} PyDictKeyEntry;

/* dict_lookup_func() returns index of entry which can be used like DK_ENTRIES(dk)[index].
 * -1 when no entry found, -3 when compare raises error.
 */
typedef Py_ssize_t (*dict_lookup_func)
    (PyDictObject *mp, PyObject *key, Py_hash_t hash, PyObject **value_addr);

/* See dictobject.c for actual layout of DictKeysObject */
struct _dictkeysobject {
    Py_ssize_t dk_refcnt;

    /* Size of the hash table (dk_indices). It must be a power of 2. */
    Py_ssize_t dk_size;

    /* Function to lookup in the hash table (dk_indices): */
    dict_lookup_func dk_lookup;

    /* Number of usable entries in dk_entries. */
    Py_ssize_t dk_usable;

    /* Number of used entries in dk_entries. */
    Py_ssize_t dk_nentries;

    /* Actual hash table of dk_size entries. It holds indices in dk_entries,
       or DKIX_EMPTY(-1) or DKIX_DUMMY(-2).

       Indices must be: 0 <= indice < USABLE_FRACTION(dk_size).

       The size in bytes of an indice depends on dk_size:

       - 1 byte if dk_size <= 0xff (char*)
       - 2 bytes if dk_size <= 0xffff (int16_t*)
       - 4 bytes if dk_size <= 0xffffffff (int32_t*)
       - 8 bytes otherwise (int64_t*)

       Dynamically sized, 8 is minimum. */
    union {
        int8_t as_1[8];
        int16_t as_2[4];
        int32_t as_4[2];
#if SIZEOF_VOID_P > 4
        int64_t as_8[1];
#endif
    } dk_indices;
};

7.2.2. PyDictKeysObject内存布局

内存布局如下:

+---------------+
| dk_refcnt     |
| dk_size       |
| dk_lookup     |
| dk_usable     |
| dk_nentries   |
+---------------+
| dk_indices    |
|               |
+---------------+
| dk_entries    |
|               |
+---------------+

其中:

  • dk_indices是实际的hash表,它保存dk_entries中的index,或DKIX_EMPTY(-1),或DKIX_DUMMY(-2)。dk_indices的size为dk_size。
  • dk_size不同,index的类型也不同,如下表。值得注意的是,由于-1和-2有特殊的含义,所以索引必须是有符号整数,所以dk_size <= 128的时候使用int8,而不是dk_size <=256的时候使用int8。
int8  for          dk_size <= 128
int16 for 256   <= dk_size <= 2**15
int32 for 2**16 <= dk_size <= 2**31
int64 for 2**32 <= dk_size
  • dk_entries是PyDictKeyEntry类型的数组,数组的size是USABLE_FRACTION(dk_size),即可dk_size * 2/3。通过DK_ENTRIES(dk)可以获取dk_entries的指针。

7.2.3. PyDictKeysObject的两种形式

分为combined table和split table:

  • combined table:ma_values == NULL,dk_refcnt == 1,ma_keys存储key和值;
  • split table:ma_values != NULL,dk_refcnt >= 1,ma_keys存储key,me_value数组里存储值;

下面的分析基于combined table。

7.3. Dict对象

Dict对象是“变长对象”。

7.3.1. Python中的创建

Python中Dict对象最重要的创建方法为PyDict_New,如下Python语句最终会调用到PyDict_New:

test = {1:'hello'}

7.3.2. PyDict_New的C调用栈

// compile.c
PyAST_CompileObject

// symtable.c
=>PySymtable_BuildObject
=>symtable_new

// dictobject.c
=> PyDict_New

7.3.3. PyDict_New源码

// dictobject.c
#define USABLE_FRACTION(n) (((n) << 1)/3)
#define PyDict_MINSIZE 8

#define DK_SIZE(dk) ((dk)->dk_size)

#define DK_IXSIZE(dk)                          \
    (DK_SIZE(dk) <= 0xff ?                     \
        1 : DK_SIZE(dk) <= 0xffff ?            \
            2 : DK_SIZE(dk) <= 0xffffffff ?    \
                4 : sizeof(int64_t))

#define DK_ENTRIES(dk) \
    ((PyDictKeyEntry*)(&(dk)->dk_indices.as_1[DK_SIZE(dk) * DK_IXSIZE(dk)]))

PyObject *
PyDict_New(void)
{
    PyDictKeysObject *keys = new_keys_object(PyDict_MINSIZE);
    if (keys == NULL)
        return NULL;
    return new_dict(keys, NULL);
}

new_keys_object用于创建PyDictKeysObject对象。计算usable,如果缓冲区没有可用的PyDictKeysObject对象则计算PyDictKeysObject对象的实际大小并分配内存,然后各种设置,最后设置查找函数lookdict_unicode_nodummy,并memset内存。源码如下:

// dictobject.c
static PyDictKeysObject *new_keys_object(Py_ssize_t size)
{
    PyDictKeysObject *dk;
    Py_ssize_t es, usable;

    assert(size >= PyDict_MINSIZE);
    assert(IS_POWER_OF_2(size));

    usable = USABLE_FRACTION(size);
    if (size <= 0xff) {
        es = 1;
    }
    else if (size <= 0xffff) {
        es = 2;
    }
#if SIZEOF_VOID_P > 4
    else if (size <= 0xffffffff) {
        es = 4;
    }
#endif
    else {
        es = sizeof(Py_ssize_t);
    }

    if (size == PyDict_MINSIZE && numfreekeys > 0) {
        dk = keys_free_list[--numfreekeys];
    }
    else {
        dk = PyObject_MALLOC(sizeof(PyDictKeysObject)
                             - Py_MEMBER_SIZE(PyDictKeysObject, dk_indices)
                             + es * size
                             + sizeof(PyDictKeyEntry) * usable);
        if (dk == NULL) {
            PyErr_NoMemory();
            return NULL;
        }
    }
    DK_DEBUG_INCREF dk->dk_refcnt = 1;
    dk->dk_size = size;
    dk->dk_usable = usable;
    dk->dk_lookup = lookdict_unicode_nodummy;
    dk->dk_nentries = 0;
    memset(&dk->dk_indices.as_1[0], 0xff, es * size);
    memset(DK_ENTRIES(dk), 0, sizeof(PyDictKeyEntry) * usable);
    return dk;
}

new_dict创建PyDictObject 对象,PyDictKeysObject对象的壳。如果缓冲区没有可用的PyDictObject对象则分配内存,并且各种设置。源码如下:

// dictobject.c
static PyObject *
new_dict(PyDictKeysObject *keys, PyObject **values)
{
    PyDictObject *mp;
    assert(keys != NULL);
    if (numfree) {
        mp = free_list[--numfree];
        assert (mp != NULL);
        assert (Py_TYPE(mp) == &PyDict_Type);
        _Py_NewReference((PyObject *)mp);
    }
    else {
        mp = PyObject_GC_New(PyDictObject, &PyDict_Type);
        if (mp == NULL) {
            DK_DECREF(keys);
            free_values(values);
            return NULL;
        }
    }
    mp->ma_keys = keys;
    mp->ma_values = values;
    mp->ma_used = 0;
    mp->ma_version_tag = DICT_NEXT_VERSION();
    assert(_PyDict_CheckConsistency(mp));
    return (PyObject *)mp;
}

可以看到:

  • PyDictKeysObject对象缓冲:通过操作numfreekeys实现,numfreekeys在free_keys_object和dictresize方法中进行调整。
// dictobject.c
#define PyDict_MAXFREELIST 80

static PyDictKeysObject *keys_free_list[PyDict_MAXFREELIST];
static int numfreekeys = 0;
  • PyDictObject对象缓冲:通过操作numfree实现,numfree在dict_dealloc方法中进行调整。
// dictobject.c
#define PyDict_MAXFREELIST 80

static PyDictObject *free_list[PyDict_MAXFREELIST];
static int numfree = 0;

7.4. Dict对象的查找

Dict对象的查找是Dict对象最重要的方法。Python Dict对象默认的查找方法为lookdict_unicode_nodummy,在lookdict_unicode_nodummy方法里会判断如果key不是unicode类型,则将查找方法设置为lookdict,并调用lookdict进行查找。

lookdict_unicode_nodummy与lookdict最重要的区别在于,hash相同的情况下对key的比对,lookdict_unicode_nodummy调用的unicode_eq,而lookdict调用的PyObject_RichCompareBool。由于Python源码实现中大量的使用string作为key的dict对象,所以这是一项优化。但是对于整个查找机制而言,只要分析明白其中一个方法即可。

lookdict_unicode_nodummy源码如下:

// dictobject.c
#define DK_MASK(dk) (((dk)->dk_size)-1)

static Py_ssize_t _Py_HOT_FUNCTION
lookdict_unicode_nodummy(PyDictObject *mp, PyObject *key,
                   Py_hash_t hash, PyObject **value_addr)
{
    assert(mp->ma_values == NULL);
    if (!PyUnicode_CheckExact(key)) {
        mp->ma_keys->dk_lookup = lookdict;
        return lookdict(mp, key, hash, value_addr);
    }

    PyDictKeyEntry *ep0 = DK_ENTRIES(mp->ma_keys);
    size_t mask = DK_MASK(mp->ma_keys);
    size_t perturb = (size_t)hash;
    size_t i = (size_t)hash & mask;

    for (;;) {
        Py_ssize_t ix = dk_get_index(mp->ma_keys, i);
        assert (ix != DKIX_DUMMY);
        if (ix == DKIX_EMPTY) {
            *value_addr = NULL;
            return DKIX_EMPTY;
        }
        PyDictKeyEntry *ep = &ep0[ix];
        assert(ep->me_key != NULL);
        assert(PyUnicode_CheckExact(ep->me_key));
        if (ep->me_key == key ||
            (ep->me_hash == hash && unicode_eq(ep->me_key, key))) {
            *value_addr = ep->me_value;
            return ix;
        }
        perturb >>= PERTURB_SHIFT;
        i = mask & (i*5 + perturb + 1);
    }
    Py_UNREACHABLE();
}

lookdict_unicode_nodummy中通过:

i = (size_t)hash & mask;
Py_ssize_t ix = dk_get_index(dk, i);

计算i在dk_entries中的索引。

如果该索引:

1. DKIX_EMPTY,即没有找到,则value=NULL,并返回索引值(DKIX_EMPTY);
2. DKIX_DUMMY,跳转到4;
3. 索引值ix >= 0:
    3.1. 如果key对象完全一致,则返回value和索引值;
    3.2. 如果hash值一致,则调用unicode_eq或PyObject_RichCompareBool 比较key
        3.2.1. 如果key相等,则返回value和索引;
        3.2.2. 否则跳转到4;
    3.3. 否则跳转到4
4. 根据探测函数(i = mask & (i*5 + perturb + 1))计算下一个ix;

所以lookdict_unicode_nodummy方法:

1. 要么返回NULL和DKIX_EMPTY;
2. 要么返回value和索引;

7.5. Dict对象的维护

7.5.1. 设置/添加元素

如下Python语句最终会调用到PyDict_SetItem:

test = {100: 200}

PyDict_SetItem的C调用栈:

// pystate.c
PyInterpreterState_New

// ceval.c
=>_PyEval_EvalFrameDefault (case BUILD_MAP)

// dictobject.c
=> PyDict_SetItem

PyDict_SetItem源码:

// dictobject.c
int
PyDict_SetItem(PyObject *op, PyObject *key, PyObject *value)
{
    PyDictObject *mp;
    Py_hash_t hash;
    if (!PyDict_Check(op)) {
        PyErr_BadInternalCall();
        return -1;
    }
    assert(key);
    assert(value);
    mp = (PyDictObject *)op;
    if (!PyUnicode_CheckExact(key) ||
        (hash = ((PyASCIIObject *) key)->hash) == -1)
    {
        hash = PyObject_Hash(key);
        if (hash == -1)
            return -1;
    }

    /* insertdict() handles any resizing that might be necessary */
    return insertdict(mp, key, hash, value);
}

PyDict_SetItem方法中做了一下检查,调用insertdict方法:

// dictobject.c
static int
insertdict(PyDictObject *mp, PyObject *key, Py_hash_t hash, PyObject *value)
{
    PyObject *old_value;
    PyDictKeyEntry *ep;

    Py_INCREF(key);
    Py_INCREF(value);
    if (mp->ma_values != NULL && !PyUnicode_CheckExact(key)) {
        if (insertion_resize(mp) < 0)
            goto Fail;
    }

    Py_ssize_t ix = mp->ma_keys->dk_lookup(mp, key, hash, &old_value);
    if (ix == DKIX_ERROR)
        goto Fail;

    assert(PyUnicode_CheckExact(key) || mp->ma_keys->dk_lookup == lookdict);
    MAINTAIN_TRACKING(mp, key, value);

    /* When insertion order is different from shared key, we can't share
     * the key anymore.  Convert this instance to combine table.
     */
    if (_PyDict_HasSplitTable(mp) &&
        ((ix >= 0 && old_value == NULL && mp->ma_used != ix) ||
         (ix == DKIX_EMPTY && mp->ma_used != mp->ma_keys->dk_nentries))) {
        if (insertion_resize(mp) < 0)
            goto Fail;
        ix = DKIX_EMPTY;
    }

    if (ix == DKIX_EMPTY) {
        /* Insert into new slot. */
        assert(old_value == NULL);
        if (mp->ma_keys->dk_usable <= 0) {
            /* Need to resize. */
            if (insertion_resize(mp) < 0)
                goto Fail;
        }
        Py_ssize_t hashpos = find_empty_slot(mp->ma_keys, hash);
        ep = &DK_ENTRIES(mp->ma_keys)[mp->ma_keys->dk_nentries];
        dk_set_index(mp->ma_keys, hashpos, mp->ma_keys->dk_nentries);
        ep->me_key = key;
        ep->me_hash = hash;
        if (mp->ma_values) {
            assert (mp->ma_values[mp->ma_keys->dk_nentries] == NULL);
            mp->ma_values[mp->ma_keys->dk_nentries] = value;
        }
        else {
            ep->me_value = value;
        }
        mp->ma_used++;
        mp->ma_version_tag = DICT_NEXT_VERSION();
        mp->ma_keys->dk_usable--;
        mp->ma_keys->dk_nentries++;
        assert(mp->ma_keys->dk_usable >= 0);
        assert(_PyDict_CheckConsistency(mp));
        return 0;
    }

    if (_PyDict_HasSplitTable(mp)) {
        mp->ma_values[ix] = value;
        if (old_value == NULL) {
            /* pending state */
            assert(ix == mp->ma_used);
            mp->ma_used++;
        }
    }
    else {
        assert(old_value != NULL);
        DK_ENTRIES(mp->ma_keys)[ix].me_value = value;
    }

    mp->ma_version_tag = DICT_NEXT_VERSION();
    Py_XDECREF(old_value); /* which **CAN** re-enter (see issue #22653) */
    assert(_PyDict_CheckConsistency(mp));
    Py_DECREF(key);
    return 0;

Fail:
    Py_DECREF(value);
    Py_DECREF(key);
    return -1;
}

insertdict中调用lookdict_unicode_nodummy或lookdict方法寻找Dice对象:

1. 没有找到key
    1.1. 检查是否已经用完空间,如果用完,调用insertion_resize调整dict大小;
    1.2. 调用find_empty_slot方法寻找探测链上第一个为DKIX_DUMMY或DKIX_EMPTY的,设置索引值,各种设置和调整
2. 否则设置value即可

find_empty_slot方法源码如下:

// dictobject.c
static Py_ssize_t
find_empty_slot(PyDictKeysObject *keys, Py_hash_t hash)
{
    assert(keys != NULL);

    const size_t mask = DK_MASK(keys);
    size_t i = hash & mask;
    Py_ssize_t ix = dk_get_index(keys, i);
    for (size_t perturb = hash; ix >= 0;) {
        perturb >>= PERTURB_SHIFT;
        i = (i*5 + perturb + 1) & mask;
        ix = dk_get_index(keys, i);
    }
    return i;
}

7.5.2. 删除元素

如下Python语句最终会调用到PyDict_DelItem:

test = {100: 200}
del test[100]

PyDict_SetItem的C调用栈:

// dictobject.c
dict_ass_sub
=> PyDict_DelItem

PyDict_SetItem源码:

// dictobject.c
int
PyDict_DelItem(PyObject *op, PyObject *key)
{
    Py_hash_t hash;
    assert(key);
    if (!PyUnicode_CheckExact(key) ||
        (hash = ((PyASCIIObject *) key)->hash) == -1) {
        hash = PyObject_Hash(key);
        if (hash == -1)
            return -1;
    }

    return _PyDict_DelItem_KnownHash(op, key, hash);
}

PyDict_SetItem方法中做了一下检查,调用_PyDict_DelItem_KnownHash方法:

// dictobject.c
int
_PyDict_DelItem_KnownHash(PyObject *op, PyObject *key, Py_hash_t hash)
{
    Py_ssize_t ix;
    PyDictObject *mp;
    PyObject *old_value;

    if (!PyDict_Check(op)) {
        PyErr_BadInternalCall();
        return -1;
    }
    assert(key);
    assert(hash != -1);
    mp = (PyDictObject *)op;
    ix = (mp->ma_keys->dk_lookup)(mp, key, hash, &old_value);
    if (ix == DKIX_ERROR)
        return -1;
    if (ix == DKIX_EMPTY || old_value == NULL) {
        _PyErr_SetKeyError(key);
        return -1;
    }

    // Split table doesn't allow deletion.  Combine it.
    if (_PyDict_HasSplitTable(mp)) {
        if (dictresize(mp, DK_SIZE(mp->ma_keys))) {
            return -1;
        }
        ix = (mp->ma_keys->dk_lookup)(mp, key, hash, &old_value);
        assert(ix >= 0);
    }

    return delitem_common(mp, hash, ix, old_value);
}

查找key,查找到了调用delitem_common。需要注意的是查找到了,只是把状态设置为DKIX_DUMMY, 并没有从探测链上摘除。delitem_common源码如下:

// dictobject.c
static int
delitem_common(PyDictObject *mp, Py_hash_t hash, Py_ssize_t ix,
               PyObject *old_value)
{
    PyObject *old_key;
    PyDictKeyEntry *ep;

    Py_ssize_t hashpos = lookdict_index(mp->ma_keys, hash, ix);
    assert(hashpos >= 0);

    mp->ma_used--;
    mp->ma_version_tag = DICT_NEXT_VERSION();
    ep = &DK_ENTRIES(mp->ma_keys)[ix];
    dk_set_index(mp->ma_keys, hashpos, DKIX_DUMMY);
    ENSURE_ALLOWS_DELETIONS(mp);
    old_key = ep->me_key;
    ep->me_key = NULL;
    ep->me_value = NULL;
    Py_DECREF(old_key);
    Py_DECREF(old_value);

    assert(_PyDict_CheckConsistency(mp));
    return 0;
}

static Py_ssize_t
lookdict_index(PyDictKeysObject *k, Py_hash_t hash, Py_ssize_t index)
{
    size_t mask = DK_MASK(k);
    size_t perturb = (size_t)hash;
    size_t i = (size_t)hash & mask;

    for (;;) {
        Py_ssize_t ix = dk_get_index(k, i);
        if (ix == index) {
            return i;
        }
        if (ix == DKIX_EMPTY) {
            return DKIX_EMPTY;
        }
        perturb >>= PERTURB_SHIFT;
        i = mask & (i*5 + perturb + 1);
    }
    Py_UNREACHABLE();
}

7.5.3. 调整大小

无论是设置、插入还是删除,在满足一定条件下(参见上面的代码分析),都会调用insertion_resize。insertion_resize方法会重新创建PyDictKeysObject对象,在这个过程中会拷贝旧的对象所有的数据。

// dictobject.c
#define GROWTH_RATE(d) (((d)->ma_used*2)+((d)->ma_keys->dk_size>>1))

static int
insertion_resize(PyDictObject *mp)
{
    return dictresize(mp, GROWTH_RATE(mp));
}

static int
dictresize(PyDictObject *mp, Py_ssize_t minsize)
{
    Py_ssize_t newsize, numentries;
    PyDictKeysObject *oldkeys;
    PyObject **oldvalues;
    PyDictKeyEntry *oldentries, *newentries;

    /* Find the smallest table size > minused. */
    for (newsize = PyDict_MINSIZE;
         newsize < minsize && newsize > 0;
         newsize <<= 1)
        ;
    if (newsize <= 0) {
        PyErr_NoMemory();
        return -1;
    }

    oldkeys = mp->ma_keys;

    /* Allocate a new table. */
    mp->ma_keys = new_keys_object(newsize);
    if (mp->ma_keys == NULL) {
        mp->ma_keys = oldkeys;
        return -1;
    }
    // New table must be large enough.
    assert(mp->ma_keys->dk_usable >= mp->ma_used);
    if (oldkeys->dk_lookup == lookdict)
        mp->ma_keys->dk_lookup = lookdict;

    numentries = mp->ma_used;
    oldentries = DK_ENTRIES(oldkeys);
    newentries = DK_ENTRIES(mp->ma_keys);
    oldvalues = mp->ma_values;
    if (oldvalues != NULL) {
        /* Convert split table into new combined table.
         * We must incref keys; we can transfer values.
         * Note that values of split table is always dense.
         */
        for (Py_ssize_t i = 0; i < numentries; i++) {
            assert(oldvalues[i] != NULL);
            PyDictKeyEntry *ep = &oldentries[i];
            PyObject *key = ep->me_key;
            Py_INCREF(key);
            newentries[i].me_key = key;
            newentries[i].me_hash = ep->me_hash;
            newentries[i].me_value = oldvalues[i];
        }

        DK_DECREF(oldkeys);
        mp->ma_values = NULL;
        if (oldvalues != empty_values) {
            free_values(oldvalues);
        }
    }
    else {  // combined table.
        if (oldkeys->dk_nentries == numentries) {
            memcpy(newentries, oldentries, numentries * sizeof(PyDictKeyEntry));
        }
        else {
            PyDictKeyEntry *ep = oldentries;
            for (Py_ssize_t i = 0; i < numentries; i++) {
                while (ep->me_value == NULL)
                    ep++;
                newentries[i] = *ep++;
            }
        }

        assert(oldkeys->dk_lookup != lookdict_split);
        assert(oldkeys->dk_refcnt == 1);
        if (oldkeys->dk_size == PyDict_MINSIZE &&
            numfreekeys < PyDict_MAXFREELIST) {
            DK_DEBUG_DECREF keys_free_list[numfreekeys++] = oldkeys;
        }
        else {
            DK_DEBUG_DECREF PyObject_FREE(oldkeys);
        }
    }

    build_indices(mp->ma_keys, newentries, numentries);
    mp->ma_keys->dk_usable -= numentries;
    mp->ma_keys->dk_nentries = numentries;
    return 0;
}

由于扩/缩容,所以要调整索引,调用build_indices方法:

// dictobject.c
static void
build_indices(PyDictKeysObject *keys, PyDictKeyEntry *ep, Py_ssize_t n)
{
    size_t mask = (size_t)DK_SIZE(keys) - 1;
    for (Py_ssize_t ix = 0; ix != n; ix++, ep++) {
        Py_hash_t hash = ep->me_hash;
        size_t i = hash & mask;
        for (size_t perturb = hash; dk_get_index(keys, i) != DKIX_EMPTY;) {
            perturb >>= PERTURB_SHIFT;
            i = mask & (i*5 + perturb + 1);
        }
        dk_set_index(keys, i, ix);
    }
}

7.6. Dict对象的特性

支持tp_as_sequence、tp_as_mapping两种操作。

7.6.1. 序列操作

// dictobject.c
&dict_as_sequence,                      /* tp_as_sequence */
// dictobject.c
static PySequenceMethods dict_as_sequence = {
    0,                          /* sq_length */
    0,                          /* sq_concat */
    0,                          /* sq_repeat */
    0,                          /* sq_item */
    0,                          /* sq_slice */
    0,                          /* sq_ass_item */
    0,                          /* sq_ass_slice */
    PyDict_Contains,            /* sq_contains */
    0,                          /* sq_inplace_concat */
    0,                          /* sq_inplace_repeat */
};

其中:

  • PyDict_Contains
test = {200:100}
200 in test    # True
100 in test    # False

7.6.2. 关联操作

// dictobject.c
&dict_as_mapping,                          /* tp_as_mapping */
// dictobject.c
static PyMappingMethods dict_as_mapping = {
    (lenfunc)dict_length, /*mp_length*/
    (binaryfunc)dict_subscript, /*mp_subscript*/
    (objobjargproc)dict_ass_sub, /*mp_ass_subscript*/
};

其中:

  • dict_length
test = {200:100}
len(test)
  • dict_subscript
test = {200:100}
test[200]
  • dict_ass_sub
test = {}
test[200] = 100

7.6.3. to string

// dictobject.c
(reprfunc)dict_repr,                        /* tp_repr */
0,                                          /* tp_str */

7.6.4. hash

// dictobject.c
PyObject_HashNotImplemented,                /* tp_hash */

7.6.5. 比较

// dictobject.c
dict_richcompare,                           /* tp_richcompare */

7.6.6. 内置方法

// dictobject.c
mapp_methods,                              /* tp_methods */

7.7. 参考

  • Python源码剖析
上一篇:0. Python3源码—编译


下一篇:5. Python3源码—字符串(str)对象