部署Ceph分布式存储实验记录
一、实验环境
Deploy | Node1 | Node2 | Node3 | Client | |
---|---|---|---|---|---|
Hostname | deploy.ceph.local | node1.ceph.local | node2.ceph.local | node3.ceph.local | client.ceph.local |
CPU | 2C | 2C | 2C | 2C | 2C |
Memory | 4GB | 4GB | 4GB | 4GB | 2GB |
Disk | 32G | 32G+3*20G | 32G+3*20G | 32G+3*20G | 32G |
Nic | Nic1:192.168.0.10 | Nic1:192.168.0.11 Nic2:10.0.0.11 | Nic1:192.168.0.12 Nic2:10.0.0.12 | Nic1:192.168.0.13 Nic2:10.0.0.13 | Nic1:192.168.0.100 |
所有节点系统均为CentOS 7.X最小化安装
二、搭建Ceph存储集群
1、系统基础配置
如未声明,如下操作在所有节点执行
(1)安装基本软件
yum -y install wget vim
(2)添加软件仓库
mkdir /etc/yum.repos.d/bak
mv /etc/yum.repos.d/*.repo /etc/yum.repos.d/bak
wget -O /etc/yum.repos.d/CentOS-Base.repo
http://mirrors.aliyun.com/repo/Centos-7.repo
wget -O /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo
vim /etc/yum.repos.d/ceph.repo
[ceph]
name=ceph
baseurl=http://mirrors.aliyun.com/ceph/rpm-nautilus/el7/x86_64/
gpgcheck=0
priority =1
[ceph-noarch]
name=cephnoarch
baseurl=http://mirrors.aliyun.com/ceph/rpm-nautilus/el7/noarch/
gpgcheck=0
priority =1
[ceph-source]
name=Ceph source packages
baseurl=http://mirrors.aliyun.com/ceph/rpm-nautilus/el7/SRPMS
gpgcheck=0
priority=1
(3)更新系统
yum -y update
systemctl reboot
(4)关闭防火墙、SELinux
systemctl stop firewalld
systemctl disable firewalld
setenforce 0
sed -i ‘s/SELINUX=enforcing/SELINUX=disabled/g’ /etc/selinux/config
(5)修改hosts文件
Delpoy节点:
vim /etc/hosts
192.168.0.10 deploy.ceph.local
192.168.0.11 node1.ceph.local
192.168.0.12 node2.ceph.local
192.168.0.13 node3.ceph.local
ping deploy.ceph.local -c 1
ping node1.ceph.local -c 1
ping node2.ceph.local -c 1
ping node3.ceph.local -c 1
(6)配置ssh互信
Delpoy节点:
ssh-keygen
for host in deploy.ceph.local node1.ceph.local node2.ceph.local
node3.ceph.local; do ssh-copy-id -i ~/.ssh/id_rsa.pub $host; done
(7)配置NTP
Node节点:
yum -y install ntp ntpdate ntp-doc
systemctl start ntpd
systemctl status ntpd
2、创建Ceph存储集群
Deploy节点:
(1)安装ceph-deploy
yum -y install ceph-deploy python-setuptools
(2)创建目录
创建一个目录,保存 ceph-deploy 生成的配置文件和密钥对
mkdir cluster
cd cluster/
(3)创建集群
ceph-deploy new deploy.ceph.local
(4)修改配置文件
指定前端和后端网络
vim ceph.conf
public network = 192.168.0.0/24
cluster network = 10.0.0.0/24
[mon]
mon allow pool delete = true
如配置文件更改,需同步配置文件至各节点,并重启相关进程
ceph-deploy --overwrite-conf config push deploy.ceph.local
ceph-deploy --overwrite-conf config push node1.ceph.local
ceph-deploy --overwrite-conf config push node2.ceph.local
ceph-deploy --overwrite-conf config push node3.ceph.local
(5)安装Ceph
ceph-deploy install deploy.ceph.local
ceph-deploy install node1.ceph.local
ceph-deploy install node2.ceph.local
ceph-deploy install node3.ceph.local
(6)初始化MON节点
ceph-deploy mon create-initial
(7)收集密钥
ceph-deploy gatherkeys deploy.ceph.local
(8)查看节点磁盘
ceph-deploy disk list node1.ceph.local
ceph-deploy disk list node2.ceph.local
ceph-deploy disk list node3.ceph.local
(9)擦除节点磁盘
ceph-deploy disk zap node1.ceph.local /dev/sdb
ceph-deploy disk zap node1.ceph.local /dev/sdc
ceph-deploy disk zap node1.ceph.local /dev/sdd
ceph-deploy disk zap node2.ceph.local /dev/sdb
ceph-deploy disk zap node2.ceph.local /dev/sdc
ceph-deploy disk zap node2.ceph.local /dev/sdd
ceph-deploy disk zap node3.ceph.local /dev/sdb
ceph-deploy disk zap node3.ceph.local /dev/sdc
ceph-deploy disk zap node3.ceph.local /dev/sdd
(10)创建OSD
ceph-deploy osd create node1.ceph.local --data /dev/sdb
ceph-deploy osd create node1.ceph.local --data /dev/sdc
ceph-deploy osd create node1.ceph.local --data /dev/sdd
ceph-deploy osd create node2.ceph.local --data /dev/sdb
ceph-deploy osd create node2.ceph.local --data /dev/sdc
ceph-deploy osd create node2.ceph.local --data /dev/sdd
ceph-deploy osd create node3.ceph.local --data /dev/sdb
ceph-deploy osd create node3.ceph.local --data /dev/sdc
ceph-deploy osd create node3.ceph.local --data /dev/sdd
查看磁盘及分区信息
ceph-deploy disk list node1.ceph.local
ceph-deploy disk list node2.ceph.local
ceph-deploy disk list node3.ceph.local
lsblk
(11)拷贝配置和密钥
把配置文件和admin密钥拷贝至Ceph节点
ceph-deploy admin node1.ceph.local node2.ceph.local node3.ceph.local
(12)初始化MGR节点
ceph-deploy mgr create node1.ceph.local
ceph-deploy mgr create node2.ceph.local
ceph-deploy mgr create node3.ceph.local
cp ~/cluster/*.keyring /etc/ceph/
(13)查看集群状态
查看Ceph状态
ceph -s
ceph health
查看OSD状态
ceph osd stat
查看OSD目录树
ceph osd tree
三、Ceph文件系统
1、创建Ceph文件系统
(1)初始化MDS节点
ceph-deploy mds create deploy.ceph.local
(2)创建存储池
ceph osd pool create cephfs_data 128
ceph osd pool create cephfs_metadata 128
(3)创建文件系统
ceph fs new cephfs cephfs_metadata cephfs_data
(4)查看文件系统
ceph fs ls
(5)查看MDS状态
ceph mds stat
2、使用Ceph文件系统
(1)查看密钥
cat /etc/ceph/ceph.client.admin.keyring
(2)方式一:内核驱动方式
内核驱动方式,密码存在命令行中,重启失效
mkdir /mycephfs/
mount -t ceph 192.168.0.10:/ /mycephfs -o
name=admin,secret=AQCEYt9cJJnrNhAALEpPjJNSCZLyEoV477UvSw==
df -h
或者
内核驱动方式,密码存在文件中,重启失效
yum -y install ceph
vim admin.secret
AQCEYt9cJJnrNhAALEpPjJNSCZLyEoV477UvSw==
mkdir /mycephfs/
mount -t ceph 192.168.0.10:/ /mycephfs -o
name=admin,secretfile=/root/admin.secret
df -h
或者
内核驱动方式,密码存在文件中,重启仍生效
vim /etc/fstab
192.168.0.10:6789:/ /mycephfs ceph
name=admin,secretfile=/root/admin.secret,noatime 0 0
mkdir /mycephfs/
mount -a
df -h
(3)方式二:用户空间文件系统方式
FUSE方式,重启失效
yum -y install ceph-fuse
mkdir /etc/ceph
scp root@192.168.0.10:/etc/ceph/ceph.conf /etc/ceph/
scp root@192.168.0.10:/etc/ceph/ceph.client.admin.keyring
/etc/ceph/
mkdir /mycephfs
ceph-fuse -m 192.168.0.10:6789 /mycephfs
df -h
或者
FUSE方式,重启仍生效
yum -y install ceph-fuse
mkdir /etc/ceph
scp root@192.168.0.10:/etc/ceph/ceph.conf /etc/ceph/
scp root@192.168.0.10:/etc/ceph/ceph.client.admin.keyring
/etc/ceph/
mkdir /mycephfs
vim /etc/fstab
id=admin,conf=/etc/ceph/ceph.conf /mycephfs fuse.ceph defaults 0 0
mount -a
df -h
四、Ceph块设备
Ceph块设备操作可在Clinet端实现
1、创建Ceph块设备
(1)安装ceph-common
yum -y install ceph-common
(2)拷贝密钥文件
scp root@192.168.0.10:/etc/ceph/ceph.conf /etc/ceph/
scp root@192.168.0.10:/etc/ceph/ceph.client.admin.keyring /etc/ceph/
(3)创建存储池
ceph osd pool create cephpool 128
(4)创建块设备映射
rbd create --size 1024 cephpool/cephimage --image-feature layering
(5)获取映像列表
rbd list cephpool
(6)查看映射信息
rbd info cephpool/cephimage
(7)映射块设备
rbd map cephpool/cephimage --id admin
(8)查看映射块设备
rbd showmapped
ll /dev/rbd/cephpool
2、使用Ceph块设备
(1)格式化块设备
mkfs.xfs /dev/rbd0
(2)挂载块设备
mkdir /mycephrdb
mount /dev/rbd0 /mycephrdb
df -h
五、Ceph对象存储
1、创建Ceph对象存储
(1)安装RGW节点
ceph-deploy install --rgw node1.ceph.local
ceph-deploy install --rgw node2.ceph.local
ceph-deploy install --rgw node3.ceph.local
(2)添加节点权限
ceph-deploy admin node1.ceph.local
ceph-deploy admin node2.ceph.local
ceph-deploy admin node3.ceph.local
(3)初始化RGW节点
ceph-deploy rgw create node1.ceph.local
ceph-deploy rgw create node2.ceph.local
ceph-deploy rgw create node3.ceph.local
(4)查看服务
curl http://node1.ceph.local:7480 -v
2、使用Ceph对象存储
(1)创建S3用户
radosgw-admin user create --uid=“mengshicheng” --display-name=“First User”
记录密钥
“access_key”: “AV70Y84NMT02S3EP7GU0”,
“secret_key”: “8bwvtvJFINWMdPN98bE3VBRufgsasnMEBBpHgQn0”
(2)创建Swift用户
radosgw-admin subuser create --uid=mengshicheng --subuser=mengshicheng:swift
–access=full
记录密钥
“user”: “mengshicheng:swift”,
“secret_key”: “W5G58A7iO2lchWZka9PGWt287FzR4sDksqQDBAqv”
(3)安装S3客户端
yum -y install python-boto
(4)S3接口创建文件
vim s3.py
import boto.s3.connection
access_key = 'AV70Y84NMT02S3EP7GU0'
secret_key = '8bwvtvJFINWMdPN98bE3VBRufgsasnMEBBpHgQn0'
conn = boto.connect_s3(
aws_access_key_id=access_key,
aws_secret_access_key=secret_key,
host='node1.ceph.local', port=7480,
is_secure=False, calling_format=boto.s3.connection.OrdinaryCallingFormat(),
)
bucket = conn.create_bucket('s3_create_bucket')
for bucket in conn.get_all_buckets():
print "{name} {created}".format(
name=bucket.name,
created=bucket.creation_date,
)
python s3.py
测试Swift接口
安装依赖包
(5)安装Swift客户端
yum -y install python-setuptools
easy_install pip
pip install python-swiftclient
(6)Swift接口读取文件
swift -A http://node1.ceph.local:7480/auth/1.0 -U mengshicheng:swift -K
W5G58A7iO2lchWZka9PGWt287FzR4sDksqQDBAqv list
六、Ceph运维
1、节点扩容
与新建集群步骤相同,节点基础配置,见前章节
(1)扩容前状态
ceph osd tree
(1)修改hosts文件
vim /etc/hosts
192.168.0.14 node4.ceph.local
(2)配置ssh互信
ssh-copy-id -i ~/.ssh/id_rsa.pub node4.ceph.local
(3)配置NTP
yum -y install ntp ntpdate ntp-doc
systemctl start ntpd
systemctl status ntpd
(4)安装Ceph
ceph-deploy install node4.ceph.local
(5)查看节点磁盘
ceph-deploy disk list node4.ceph.local
(6)擦除节点磁盘
ceph-deploy disk zap node4.ceph.local /dev/sdb
ceph-deploy disk zap node4.ceph.local /dev/sdc
ceph-deploy disk zap node4.ceph.local /dev/sdd
(7)创建OSD
ceph-deploy osd create node4.ceph.local --data /dev/sdb
ceph-deploy osd create node4.ceph.local --data /dev/sdc
ceph-deploy osd create node4.ceph.local --data /dev/sdd
(8)拷贝配置和密钥
把配置文件和admin密钥拷贝至Ceph节点
ceph-deploy admin node4.ceph.local
(9)扩容后状态
ceph osd tree
2、节点缩容
(1)缩容前状态
ceph osd tree
(2)标记OSD状态
ceph osd out 9
ceph osd out 10
ceph osd out 11
(3)停止OSD进程
systemctl stop ceph-osd@9
systemctl stop ceph-osd@10
systemctl stop ceph-osd@11
(4)CRUSH Map移除OSD
ceph osd crush remove osd.9
ceph osd crush remove osd.10
ceph osd crush remove osd.11
(5)删除OSD认证
ceph auth del osd.9
ceph auth del osd.10
ceph auth del osd.11
(6)删除OSD
ceph osd rm 9
ceph osd rm 10
ceph osd rm 11
(7)卸载OSD挂载目录
df -h
umount /var/lib/ceph/osd/ceph-9
umount /var/lib/ceph/osd/ceph-10
umount /var/lib/ceph/osd/ceph-11
(8)释放磁盘
dmsetup remove_all
(9)卸载Ceph软件
ceph-deploy purge node4.ceph.local
(10)CRUSH Map移除节点
ceph osd crush rm node4
(11)缩容后状态
ceph osd tree
3、磁盘扩容
(1)扩容前状态
ceph osd tree
(1)查看节点磁盘
ceph-deploy disk list node1.ceph.local
(2)擦除节点磁盘
ceph-deploy disk zap node1.ceph.local /dev/sde
(3)创建OSD
ceph-deploy osd create node1.ceph.local --data /dev/sde
(4)扩容后状态
ceph osd tree
4、磁盘缩容
(1)缩容前状态
ceph osd tree
(2)标记OSD状态
ceph osd out 9
(3)停止OSD进程
systemctl stop ceph-osd@9
(4)CRUSH Map移除OSD
ceph osd crush remove osd.9
(5)删除OSD认证
ceph auth del osd.9
(6)删除OSD
ceph osd rm 9
(7)卸载OSD挂载目录
df -h
umount /var/lib/ceph/osd/ceph-9
(8)释放磁盘
dmsetup remove_all
(9)缩容后状态
ceph osd tree
拔出磁盘
5、块设备快照
(1)创建快照
快照前,创建file1
mount /dev/rbd0 /mycephrdb
cd /mycephrdb
touch file1
ll
cd
umount /mycephrdb
rbd snap create cephpool/cephimage@cephsnapshot
(2)查看快照
rbd snap ls cephpool/cephimage
(3)回滚快照
回滚前,创建file2
mount /dev/rbd0 /mycephrdb
cd /mycephrdb
touch file2
ll
cd
umount /mycephrdb
rbd snap rollback cephpool/cephimage@cephsnapshot
验证快照回滚结果
mount /dev/rbd0 /mycephrdb
cd /mycephrdb
ll
(4)删除快照
rbd snap rm cephpool/cephimage@cephsnapshot
(5)清除快照
rbd snap purge cephpool/cephimage
(6)保护快照
rbd snap protect cephpool/cephimage@cephsnapshot
(7)克隆快照
克隆快照需为受保护快照
rbd clone cephpool/cephimage@cephsnapshot cephpool/newimage
(8)查看子快照
rbd children cephpool/cephimage@cephsnapshot
(9)扁平克隆快照
rbd flatten cephpool/newimage
(10)取消保护快照
rbd snap unprotect cephpool/cephimage@cephsnapshot
6、块设备扩容
(1)扩容前状态
rbd info cephpool/cephimage
(2)块设备扩容
rbd resize cephpool/cephimage --size 2048
(3)扩容后状态
rbd info cephpool/cephimage
(4)调整文件系统大小
df -h
xfs_growfs /dev/rbd0
xfs_growfs针对文件系统xfs,格式为xfs_growfs /dev/rbd0
resize2fs针对文件系统ext2 ext3 ext4,格式为resize2fs -f /dev/rbd0
7、块设备缩容
(1)缩容前状态
rbd info cephpool/cephimage
(2)块设备缩容
rbd resize cephpool/cephimage --size 1024 --allow-shrink
(3)缩容后状态
rbd info cephpool/cephimage
(4)调整文件系统大小
块设备缩容后,需重新格式化磁盘才能使用
8、纠删码
(1)创建profile
ceph osd erasure-code-profile set myprofile k=3 m=2 crush-failure-domain=osd
crush-failure-domain=osd/host/chassis/rack/row等选项,表示两个块不能存储在同一osd/host/chassis/rack/row上
(2)查看profile
ceph osd erasure-code-profile ls
default为默认profile,等价与RAID5,要求至少3个节点,k=2,m=1
ceph osd erasure-code-profile get default
ceph osd erasure-code-profile get myprofile
(3)创建纠删码存储池
根据默认profile创建纠删码存储池
ceph osd pool create ecpool1 18 12 erasure
根据创建profile创建纠删码存储池
ceph osd pool create ecpool2 18 12 erasure myprofile
18是pgp_num,12是pg_num,pgp_num必须大于等于pg_num
(4)数据读写测试
echo ABCDEFG | rados --pool ecpool1 put NYAN -
rados --pool ecpool1 get NYAN -
echo ABCDEFG | rados --pool ecpool2 put NYAN -
rados --pool ecpool2 get NYAN -
(5)删除profile
ceph osd erasure-code-profile rm myprofile
9、分级缓存
(1)创建存储池
ceph osd pool create cold-storage 16
(2)创建缓存池
ceph osd pool create hot-storage 128
(3)存储池和缓存池关联
ceph osd tier add cold-storage hot-storage
(4)设置缓存模式
writeback模式
ceph osd tier cache-mode hot-storage writeback
readonly模式
ceph osd tier cache-mode hot-storage readonly --yes-i-really-mean-it
(5)设置缓存池为overlay
ceph osd tier set-overlay cold-storage hot-storage
(6)拆除缓存池
拆除只读缓存池:
把缓存模式改为none即可禁用
ceph osd tier cache-mode hot-storage none
删除overlay,客户端就不会被指到缓存池
ceph osd tier remove-overlay cold-storage
存储池和缓存池剥离
ceph osd tier remove cold-storage hot-storage
拆除回写缓存:
把缓存模式改为forward,使新的和更改过的对象将直接刷回到存储池
ceph osd tier cache-mode hot-storage forward --yes-i-really-mean-it
确保缓存池已刷回,可能要等数分钟
rados -p hot-storage ls
如果缓存池还有对象,可手动刷回
rados -p hot-storage cache-flush-evict-all
删除overlay,客户端就不会被指到缓存池
ceph osd tier remove-overlay cold-storage
存储池和缓存池剥离
ceph osd tier remove cold-storage hot-storage
(7)缓存池参数设置
1、hit_set_type
缓存池hit_set_type 只能使用 Bloom 过滤器
ceph osd pool set hot-storage hit_set_type bloom
2、hit_set_count
缓存池hit_set_count定义了HitSet达到的次数
ceph osd pool set hot-storage hit_set_count 1
3、hit_set_period
缓存池hit_set_period定义了HitSet存储时间
ceph osd pool set hot-storage hit_set_period 3600
4、min_read_recency_for_promote/min_write_recency_for_promote
缓存池min_read_recency_for_promote定义了在处理一个对象的读/写操作时检查多少个HitSet,检查结果将用于决定是否异步地提升对象,它的取值在0和hit_set_count之间
如果设置为0,对象会一直被提升;
如果设置为1,只检查当前HitSet,如果此对象在当前HitSet里就提升它,否则就不提升;
设置为其它值时,要挨个检查此数量的历史HitSet,如果此对象出现在min_read_recency_for_promote个HitSet里的任意一个,那就提升;
ceph osd pool set hot-storage min_read_recency_for_promote 1
ceph osd pool set hot-storage min_write_recency_for_promote 1
5、cache_target_dirty_ratio
缓存池cache_target_dirty_ratio定义脏对象达到缓存池容量的X%时开始刷回,如:
ceph osd pool set hot-storage cache_target_dirty_ratio 0.4
6、cache_target_dirty_high_ratio
缓存池cache_target_dirty_high_ratio定义脏对象达到缓存池容量的X%时开始更激进地刷回,其值最好在
dirty_ratio和full_ratio之间,如:
ceph osd pool set hot-storage cache_target_dirty_high_ratio 0.6
7、cache_target_full_ratio
缓存池cache_target_full_ratio定义干净对象占到总容量的X%时开始赶出缓存池,如:
ceph osd pool set hot-storage cache_target_full_ratio 0.8
8、target_max_bytes
缓存池target_max_bytes定义缓存池数据达到X字节时开始刷回或赶出,如:
ceph osd pool set hot-storage target_max_bytes 1000000000000
9、target_max_objects
缓存池target_max_objects定义缓存池对象数量达到X时开始刷回或赶出,如:
ceph osd pool set hot-storage target_max_objects 1000000
10、cache_min_flush_age
缓存池cache_min_flush_age定义缓存时长,已修改或脏对象需至少延迟X秒才能刷回,如:
ceph osd pool set hot-storage cache_min_flush_age 600
11、cache_min_evict_age
缓存池cache_min_evict_age定义X秒后才赶出对象,如:
ceph osd pool set hot-storage cache_min_evict_age 1800
10、CRUSH Map
(1)编辑CRUSH Map
1、获取CRUSH Map
ceph osd getcrushmap -o crushmap
2、反编译CRUSH Map
crushtool -d crushmap -o decompiled_crushmap
3、编辑CRUSH Map
见CRUSH Map参数
4、编译CRUSH Map
crushtool -c decompiled_crushmap -o crushmap
5、注入CRUSH Map
ceph osd setcrushmap -i crushmap
(2)CRUSH Map参数
1、CRUSH Map设备列表
设备段包含了集群中所有的OSD列表信息,在一个Ceph集群中,CRUSH
Map的设备列表都会自动更新,由Ceph自行管理,如果需要添加一个新设备,在设备部分的结尾添加一行,并在OSD后面标注唯一的设备号
device {num} {osd.name} class {class.name},如:
device 0 osd.0 class hdd
2、CRUSH Map bucket类型
该部分定义了在CRUSH Map中使用到的bucket类型,默认的CRUSH
Map包含多个bucket类型,对于大部分Ceph集群足够使用,基于需求可以在该部分中添加或者删除bucket类型,为添加一个新的bucket类型,在CRUSH
Map文件中bucket段新增一行即可,并在bucket名称后输入类型以及ID
type {num} {bucket-name},如:
type 0 osd
3、CRUSH Map bucket定义
bucket类型声明后,对于主机和其他故障领域,就已定义,bucket定义可以对Ceph集群的体系架构进行整体调整,如定义host、row、rack、chassis、room、datacenter,同样可以定义bucket应使用的算法,bucket定义包含多个参数
[bucket-type] [bucket-name] {
id [a unique negative numeric ID]
weight [the relative capacity/capability of the item(s)]
alg [the bucket type: uniform | list | tree | straw ]
hash [the hash type: 0 by default]
item [item-name] weight [weight]
}
参数说明:
(a)桶类型(alg)
uniform:这种桶用完全相同的权重汇聚设备,当存储设备权重都相同时,可以用uniform桶类型,允许Crush按常数把副本映射到uniform桶
list:这种桶内容汇聚为链表,基于RUSH
P算法,一个链表就是一个自然、直观的扩张集群,对象会按一定概率被重定位到最新的设备、或者像从前一样仍保留在较老的设备上,结果是优化了新条目加入桶时的数据迁移,如果从链表的中间或末尾删除了一些条目,将会导致大量没必要的挪动,所以这种桶适合永不或极少缩减的场景
tree:这种桶使用二进制搜索树,在桶包含大量条目时比list桶更高效,基于RUSH
R算法,tree桶把归置时间减少到了O(log n),更适合管理更大规模的设备或嵌套桶
straw:list和tree桶用分而治之策略,给特定条目一定优先级(如位于链表开头的条目)、或避开对整个子树上所有条目的考虑,这样提升了副本归置进程的性能,但也导致了重新组织时的次优结果,如增加、拆除、或重设某条目的权重,straw桶类型允许所有条目公平地相互竞争副本归置
(b)哈希算法(hash)
各个桶都用了一种哈希算法,当前Ceph仅支持rjenkins1,输入0表示哈希算法设置为rjenkins1
(c)桶权重(weight)
Ceph用双精度类型数据表示桶权重,桶权重是一维的,根据容量和数据传输率设置不同权重,建议用1.00作为1TB存储设备的相对权重,数据传输速率相对低和数据传输率相对高的存储设备,设置不同的权重,较高级桶的权重是所有叶子桶的权重之和,如:
host node1 {
id -1
alg straw
hash 0
item osd.0 weight 1.00
item osd.1 weight 1.00
}
host node2 {
id -2
alg straw
hash 0
item osd.2 weight 1.00
item osd.3 weight 1.00
}
rack rack1 {
id -3
alg straw
hash 0
item node1 weight 2.00
item node2 weight 2.00
}
4、CRUSH Map规则
规则定义了如何从池中选择合适的bucket用于数据存放,对于一个较大规模的集群,可能有多个池;每一个池有对应的Crush规则,CRUSH
Map规则有多个参数
rule <rulename> {
ruleset <ruleset>
type [ replicated | erasure ]
min_size <min-size>
max_size <max-size>
step take <bucket-type>
step [choose|chooseleaf] [firstn|indep] <N> <bucket-type>
step emit
}
参数说明:
(a)ruleset:区分一条规则属于某个规则集的手段,给存储池设置规则集后激活
(b)type:规则类型,目前仅支持replicated和erasure,默认是replicated
(c)min_size:可以选择此规则的存储池最小副本数
(d)max_size:可以选择此规则的存储池最大副本数
(e)step take <bucket-name>:选取起始的桶名,并迭代到树底
(f)step choose firstn {num} type
{bucket-type}:选取指定类型桶的数量,这个数字通常是存储池的副本数(即pool
size),如果{num} == 0,选择pool-num-replicas个桶(所有可用的);如果{num} > 0
&& < pool-num-replicas,就选择那么多的桶;如果{num} <
0,意味着选择pool-num-replicas - {num}个桶
step chooseleaf firstn {num} type
{bucket-type}:选择{bucket-type}类型的桶集合,并从各桶的子树里选择一个叶子节点,桶集合的数量通常是存储池的副本数(即pool
size),如果{num} == 0,选择pool-num-replicas个桶(所有可用的);如果{num} > 0
&& < pool-num-replicas,就选择那么多的桶;如果{num} <
0,意味着选择pool-num-replicas - {num}个桶
(g)step
emit:输出当前值并清空堆栈,通常用于规则末尾,也适用于相同规则应用到不同树的情况,如:
rule ssd {
ruleset 4
type replicated
min_size 0
max_size 4
step take ssd
step chooseleaf firstn 0 type host
step emit
}
查看CRUSH Rule
ceph osd crush rule ls
(3)主亲和性
客户端读写数据时,总是连接acting
set里的主OSD,某个OSD与其它的相比并不适合做主OSD,可以调整OSD的主亲和性,CRUSH就尽量不把它用作acting
set里的主OSD
ceph osd primary-affinity <osd-id> <weight>
weight取值范围为0-1,默认值为1,0表示此OSD不能用作主,1表示此OSD可用作主的,此权重小于1时,CRUSH选择主OSD时选中它的可能性低,如:
ceph osd primary-affinity osd.8 0.5
(4)指定OSD创建存储池
1、查看OSD Class
ceph osd tree
2、查看CRUSH Class
ceph osd crush class ls
3、删除OSD Class
for i in 0 3 6;do ceph osd crush rm-device-class osd.$i;done
4、设置OSD Class
for i in 0 3 6;do ceph osd crush set-device-class ssd osd.$i;done
5、创建CRUSH Rule
ceph osd crush rule create-replicated rule-ssd default host ssd
6、创建存储池
ceph osd pool create ssdpool 128 128 rule-ssd
7、数据读写测试
echo ABCDEFG | rados --pool ssdpool put NYAN -
rados --pool ssdpool get NYAN -
如上class、rule等操作可通过编译、导入CRUSH Map完成
(5)调整OSD权重
1、调整前状态
ceph osd tree
2、调整OSD权重
ceph osd crush reweight {name} {weight},如:
ceph osd crush reweight osd.0 1
3、调整后状态
ceph osd tree
如上weight等操作可通过编译、导入CRUSH Map完成