MongoDB集群分片管理之哈希分片

检查数据库分片
mongos> use config
switched to db config
mongos> db.databases.find()
{ "_id" : "recommend", "primary" : "hdshard1", "partitioned" : true, "version" : { "uuid" : UUID("cb833b8e-cc4f-4c52-83c3-719aa383bac4"), "lastMod" : 1 } }
{ "_id" : "db1", "primary" : "hdshard3", "partitioned" : true, "version" : { "uuid" : UUID("71bb472c-7896-4a31-a77c-e3aaf723be3c"), "lastMod" : 1 } }

 
可以看到数据库分片已经打开。
 

查看集合分片情况

mongos> db.rcmd_1_min_tag_mei_rong.getShardDistribution()
Collection recommend.rcmd_1_min_tag_mei_rong is not sharded.

接下来我们对该集合进行分片。
 

查看索引

mongos> db.rcmd_1_min_tag_mei_rong.getIndexes()
[
    {
        "v" : 2,
        "key" : {
            "_id" : 1
        },
        "name" : "_id_",
        "ns" : "recommend.rcmd_1_min_tag_mei_rong"
    }
]

 

创建哈希索引

mongos> db.rcmd_1_min_tag_mei_rong.createIndex({_id:‘hashed‘})
{
    "raw" : {
        "hdshard1/172.16.254.136:40001,172.16.254.137:40001,172.16.254.138:40001" : {
            "createdCollectionAutomatically" : false,
            "numIndexesBefore" : 1,
            "numIndexesAfter" : 2,
            "ok" : 1
        }
    },
    "ok" : 1,
    "operationTime" : Timestamp(1618801258, 1),
    "$clusterTime" : {
        "clusterTime" : Timestamp(1618801258, 1),
        "signature" : {
            "hash" : BinData(0,"Ls7hIrcqTdWwfI49kxXo6NQtAk4="),
            "keyId" : NumberLong("6941260985399246879")
        }
    }
}

 

检查索引

mongos> db.rcmd_1_min_tag_mei_rong.getIndexes()
[
    {
        "v" : 2,
        "key" : {
            "_id" : 1
        },
        "name" : "_id_",
        "ns" : "recommend.rcmd_1_min_tag_mei_rong"
    },
    {
        "v" : 2,
        "key" : {
            "_id" : "hashed"
        },
        "name" : "_id_hashed",
        "ns" : "recommend.rcmd_1_min_tag_mei_rong"
    }
]

可以看到id字段的哈希索引已经创建。
 

进行哈希分片

mongos> sh.shardCollection( "recommend.rcmd_1_min_tag_mei_rong", { _id: "hashed" } )
{
    "collectionsharded" : "recommend.rcmd_1_min_tag_mei_rong",
    "collectionUUID" : UUID("ccac34f4-59e4-4048-971b-644b454adc13"),
    "ok" : 1,
    "operationTime" : Timestamp(1618801293, 5),
    "$clusterTime" : {
        "clusterTime" : Timestamp(1618801293, 5),
        "signature" : {
            "hash" : BinData(0,"bs4MPk31IZlYGMw3c2JN6eJ3pSc="),
            "keyId" : NumberLong("6941260985399246879")
        }
    }
}

 

查看分片情况

mongos> db.rcmd_1_min_tag_mei_rong.getShardDistribution()

Shard hdshard1 at hdshard1/172.16.254.136:40001,172.16.254.137:40001,172.16.254.138:40001
 data : 72.94MiB docs : 166775 chunks : 3
 estimated data per chunk : 24.31MiB
 estimated docs per chunk : 55591

Shard hdshard2 at hdshard2/172.16.254.136:40002,172.16.254.137:40002,172.16.254.138:40002
 data : 96.13MiB docs : 219788 chunks : 3
 estimated data per chunk : 32.04MiB
 estimated docs per chunk : 73262

Shard hdshard3 at hdshard3/172.16.254.136:40003,172.16.254.137:40003,172.16.254.138:40003
 data : 64.11MiB docs : 146526 chunks : 2
 estimated data per chunk : 32.05MiB
 estimated docs per chunk : 73263

Totals
 data : 233.18MiB docs : 533089 chunks : 8
 Shard hdshard1 contains 31.27% data, 31.28% docs in cluster, avg obj size on shard : 458B
 Shard hdshard2 contains 41.22% data, 41.22% docs in cluster, avg obj size on shard : 458B
 Shard hdshard3 contains 27.49% data, 27.48% docs in cluster, avg obj size on shard : 458B

哈希分片已经完成。
 

范围分片和哈希分片对比

1)基于范围的分片方式提供了更高效的范围查询,给定一个片键的范围,分发路由可以很简单地确定哪个数据块存储了请求需要的数据,并将请求转发到相应的分片中。

2)不过,基于范围的分片会导致数据在不同分片上的不均衡,有时候,带来的消极作用会大于查询性能的积极作用.比如,如果片键所在的字段是线性增长的,一定时间内的所有请求都会落到某个固定的数据块中,最终导致分布在同一个分片中.在这种情况下,一小部分分片承载了集群大部分的数据,系统并不能很好地进行扩展。

3)与此相比,基于哈希的分片方式以范围查询性能的损失为代价,保证了集群中数据的均衡.哈希值的随机性使数据随机分布在每个数据块中,因此也随机分布在不同分片中.但是也正由于随机性,一个范围查询很难确定应该请求哪些分片,通常为了返回需要的结果,需要请求所有分片。

MongoDB集群分片管理之哈希分片

上一篇:MySQL聚合函数


下一篇:ef linq方式插入+sql操作数据注意事项