1、数据结构
// A header for a Go map.
type hmap struct {
// Note: the format of the hmap is also encoded in cmd/compile/internal/reflectdata/reflect.go.
// Make sure this stays in sync with the compiler's definition.
count int // # live cells == size of map. Must be first (used by len() builtin)
flags uint8
B uint8 // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
hash0 uint32 // hash seed
buckets unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
nevacuate uintptr // progress counter for evacuation (buckets less than this have been evacuated)
extra *mapextra // optional fields
}
// mapextra holds fields that are not present on all maps.
type mapextra struct {
// If both key and elem do not contain pointers and are inline, then we mark bucket
// type as containing no pointers. This avoids scanning such maps.
// However, bmap.overflow is a pointer. In order to keep overflow buckets
// alive, we store pointers to all overflow buckets in hmap.extra.overflow and hmap.extra.oldoverflow.
// overflow and oldoverflow are only used if key and elem do not contain pointers.
// overflow contains overflow buckets for hmap.buckets.
// oldoverflow contains overflow buckets for hmap.oldbuckets.
// The indirection allows to store a pointer to the slice in hiter.
overflow *[]*bmap
oldoverflow *[]*bmap
// nextOverflow holds a pointer to a free overflow bucket.
nextOverflow *bmap
}
// A bucket for a Go map.
type bmap struct {
// tophash generally contains the top byte of the hash value
// for each key in this bucket. If tophash[0] < minTopHash,
// tophash[0] is a bucket evacuation state instead.
tophash [abi.MapBucketCount]uint8
// Followed by bucketCnt keys and then bucketCnt elems.
// NOTE: packing all the keys together and then all the elems together makes the
// code a bit more complicated than alternating key/elem/key/elem/... but it allows
// us to eliminate padding which would be needed for, e.g., map[int64]int8.
// Followed by an overflow pointer.
}
map
的底层结构包含以下几个重要的元素:
-
hmap
:hmap
是 Go 中map
类型的实际数据结构,它包含了哈希表的核心信息。例如,桶的指针、大小、负载因子、哈希函数等。 -
buckets
:buckets
数组存储了map
中的哈希桶,每个桶内存储着一部分键值对。如果哈希冲突较多,桶会存储多个键值对。 -
overflow
: 为了处理大量的哈希冲突,Go 中的哈希桶通常还会有一个overflow
字段,表示某些冲突较严重的元素会被溢出到其他位置。
2、底层实现
(1)哈希表和桶的结构
Go 语言的 map
底层实现是一个哈希表,通过链地址法(数组加链表)来解决冲突,它的每个单元是一个桶。它的数组是由一组桶组成,每个桶是一个链式结构,可以存储8个键值对,当发生hash冲突时首先存在桶内,如果桶满了,就在桶后面连接溢出桶来存储键值对。
(2)哈希函数
Go 使用一个高效的哈希函数将键映射到桶的位置。不同类型的键使用不同的哈希函数,例如字符串和整数会有不同的哈希计算方法。
(3)扩容和负载因子
负载因子为6.5,代表键值对数量与桶数量比值的上限。如果达到负载因子值或溢出桶的数量超过正常桶的数量(或者超过上限值1<<15),就会发生扩容,新的桶数量是原来的2倍数。(负载因子的计算和桶的倍数不考虑溢出桶的数量)
Go 的 map
底层哈希表设计采用了渐进式扩容的策略,当扩容时,并不是所有的键值对都会立刻重新哈希。Go 会 分批次地、渐进地将老桶中的元素迁移到新桶中,避免在扩容过程中产生性能瓶颈。