cassandra-stress是cassandra自带的压测工具,可以针对具体的表schema设计,模拟各种负载情况,测试集群的读写性能。这个工具功能十分强大,但是网上能找到的(中文)资料并不多,尤其是对yaml配置文件的介绍很少。本文简单介绍这个工具的用法,重点会描述yaml配置文件的格式。
基本用法
cassandra-stress工具在cassandra的tools/bin目录下面,其命令行格式为cassandra-stress <command> [options]
。command和options的详细介绍可以通过cassandra-stress help <command|option>
查看帮助或是查看cassandra文档。下面介绍一些常用的命令和选项。
常用命令
read:
并发读测试。在运行这个选项之前需要先运行写测试写入数据。
write:
并发写测试。
mixed:
读写混合测试。可以配置读写的比例和数据分布。
counter_write:
counter写测试。
counter_read:
counter读测试。同样,也需要先进行写测试。
user:
使用用户提供的查询语句进行压测。下一小节会详细介绍。
常用选项
cl=
指定压测期间的一致性级别。可选 ONE, QUORUM, LOCAL_QUORUM, EACH_QUORUM, ALL, ANY。默认为LOCAL_ONE。
n=
指定要执行的操作次数。
profile=
指定yaml配置文件。下一个小节会详细介绍配置文件。这个参数只在命令为user时起作用。
-node
指定目标集群的nodes。用法为-node [<host>] [file=<files>] [whitelist <whitelist>]
这里面host参数指定目标集群的nodes。files、whitelist选项的使用可以参考帮助。
-rate
设置写入速度相关的参数。用法:
-rate threads=<threads> [throttle=< throttle >] [fixed=<fixed rate>]
或
-rate [threads>=<min threads>] [threads<=<max threads>] [auto]
这里面,threads参数指定同时有多少个客户端并行写入;throttle限制最大读写速度(op/s);fixed选项则表示以固定的速度读写。
auto方式下,会不断增大线程数直到吞吐量达到饱和(判断依据是3次测试中吞吐量没有改进)。
-schema
指定replication策略,压缩算法,compaction策略等。用法:
-schema [replication(<replication>)] [keyspace=<keyspace>] [compaction(<compaction>)] [compression=< compression >]
-col
列的配置。例如数据大小的分布,每次读/写多少个列的分布等。用法:
-col names=<names> [slice <slice>] [super=<super>] [comparator=<comparator>] [timestamp=<timestamp>] [size=<size dist>]
或
-col [n=<count dist>] [slice <slice>] [super=<super>] [comparator=<comparator>] [timestamp=<timestamp>] [size=<size dist>]
上面的描述比较抽象,举一个例子。如下命令以Local_Quorum方式写入100万条数据,32个列,每列固定写入2048字节的内容,客户端线程数=500个,副本数=3
cassandra-stress write n=1000000 cl=LOCAL_QUORUM -rate threads=500 \
-col "size=fixed(2048)" "n=fixed(32)" -schema "replication(factor=3)" -node localhost
yaml配置文件
我们以tools目录下的cqlstress-example.yaml为例说明yaml文件的配置格式。
#
# This is an example YAML profile for cassandra-stress
#
# insert data
# cassandra-stress user profile=/home/jake/stress1.yaml ops(insert=1)
#
# read, using query simple1:
# cassandra-stress profile=/home/jake/stress1.yaml ops(simple1=1)
#
# mixed workload (90/10)
# cassandra-stress user profile=/home/jake/stress1.yaml ops(insert=1,simple1=9)
#
# Keyspace info
#
keyspace: stresscql
#
# The CQL for creating a keyspace (optional if it already exists)
#
keyspace_definition: |
CREATE KEYSPACE stresscql WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
#
# Table info
#
table: typestest
#
# The CQL for creating a table you wish to stress (optional if it already exists)
#
table_definition: |
CREATE TABLE typestest (
name text,
choice boolean,
date timestamp,
address inet,
dbl double,
lval bigint,
ival int,
uid timeuuid,
value blob,
PRIMARY KEY((name,choice), date, address, dbl, lval, ival, uid)
)
WITH compaction = { 'class':'LeveledCompactionStrategy' }
# AND compression = { 'sstable_compression' : '' }
# AND comment='A table of many types to test wide rows'
#
# Optional meta information on the generated columns in the above table
# The min and max only apply to text and blob types
# The distribution field represents the total unique population
# distribution of that column across rows. Supported types are
#
# EXP(min..max) An exponential distribution over the range [min..max]
# EXTREME(min..max,shape) An extreme value (Weibull) distribution over the range [min..max]
# GAUSSIAN(min..max,stdvrng) A gaussian/normal distribution, where mean=(min+max)/2, and stdev is (mean-min)/stdvrng
# GAUSSIAN(min..max,mean,stdev) A gaussian/normal distribution, with explicitly defined mean and stdev
# UNIFORM(min..max) A uniform distribution over the range [min, max]
# FIXED(val) A fixed distribution, always returning the same value
# SEQ(min..max) A fixed sequence, returning values in the range min to max sequentially (starting based on seed), wrapping if necessary.
# Aliases: extr, gauss, normal, norm, weibull
#
# If preceded by ~, the distribution is inverted
#
# Defaults for all columns are size: uniform(4..8), population: uniform(1..100B), cluster: fixed(1)
#
columnspec:
- name: name
size: uniform(1..10)
population: uniform(1..10) # the range of unique values to select for the field (default is 100Billion)
- name: date
cluster: uniform(20..40)
- name: lval
population: gaussian(1..1000)
cluster: uniform(1..4)
insert:
partitions: uniform(1..50) # number of unique partitions to update in a single operation
# if batchcount > 1, multiple batches will be used but all partitions will
# occur in all batches (unless they finish early); only the row counts will vary
batchtype: LOGGED # type of batch to use
select: uniform(1..10)/10 # uniform chance any single generated CQL row will be visited in a partition;
# generated for each partition independently, each time we visit it
#
# A list of queries you wish to run against the schema
#
queries:
simple1:
cql: select * from typestest where name = ? and choice = ? LIMIT 100
fields: samerow # samerow or multirow (select arguments from the same row, or randomly from all rows in the partition)
range1:
cql: select * from typestest where name = ? and choice = ? and date >= ? LIMIT 100
fields: multirow # samerow or multirow (select arguments from the same row, or randomly from all rows in the partition)
#
# A list of bulk read queries that analytics tools may perform against the schema
# Each query will sweep an entire token range, page by page.
#
token_range_queries:
all_columns_tr_query:
columns: '*'
page_size: 5000
value_tr_query:
columns: value
下面介绍文件中各个配置项的含义:
keyspace/keyspace_definition/table/table_definition
这几项应该不用解释了。。。
columnspec
配置列的信息。作用类似于命令行中的-col选项。
name指定列名。
size指定列长度的分布。例如这里name字段的长度是均匀分布的,也就是有10%的记录长度为1,10%的记录长度为2,...
population指定列取值的分布。举例来说,lval字段的值符合高斯分布。而name字段指定population参数就比较令人困惑。。。其实代码里面是按照uniform分布来生成随机数种子,然后用这个随机数种子来产生字符串的内容。
insert
指定写入操作的选项。
partitions参数指定partition key的分布。
batchtype指定batch的类型。
select指定在每个partiton内数据的分布。
queries
指定一系列需要执行的查询。具体参数应该不需要翻译了。。。
token_range_queries
指定大范围的查询。
以all_columns_tr_query为例,这个查询会转换成为一系列select * from typestest where token(name,choice) > ? and token(name,choice) < ?
这样的查询。具体的参数应该不需要更多解释了。
使用如下命令执行压测:
cassandra-stress user profile=tools/cqlstress-example.yaml n=1000000 ops(insert=3,simple1=1)
入群邀约
为了营造一个开放的 Cassandra 技术交流,我们建立了微信群公众号和钉钉群,为广大用户提供专业的技术分享及问答,定期开展专家技术直播,欢迎大家加入
钉钉群入群链接:https://c.tb.cn/F3.ZRTY0o
微信群公众号: