OpenTSDB-Writing Data

Writing Data

You may want to jump right in and start throwing data into your TSD, but to really take advantage of OpenTSDB's power and flexibility, you may want to pause and think about your naming schema. After you've done that, you can procede to pushing data over the Telnet or HTTP APIs, or use an existing tool with OpenTSDB support such as 'tcollector'.

你可能调到这里,开始将数据丢进TSD中,但是真正地利用好OpenTSDB的强大功能以及灵活性,你可能需要停一下,想一下你的naming schema。

然后,你就可以继续通过Telnet或者HTTPAPIs推送数据,或者利用现有OpenTSDB支持的工具,如tcollector

Naming Schema命名范式

Many metrics administrators are used to supplying a single name for their time series. For example, systems administrators used to RRD-style systems may name their time series webserver01.sys.cpu.0.user. The name tells us that the time series is recording the amount of time in user space for cpu 0 on webserver01. This works great if you want to retrieve just the user time for that cpu core on that particular web server later on.

多数的metrics使用单个命名。例如,系统管理的参数使用RRD-格式命名,格式如webserver01.sys.cpu.0.user。这个名字告诉我们,时间序列是记录webser01上cpu0的user 

占用的时间。如果你想获取特定web server上cpu的用户态使用时间的话,这将很好地支持。

But what if the web server has 64 cores and you want to get the average time across all of them? Some systems allow you to specify a wild card such as webserver01.sys.cpu.*.user that would read all 64 files and aggregate the results. Alternatively, you could record a new time series called webserver01.sys.cpu.user.all that represents the same aggregate but you must now write '64 + 1' different time series. What if you had a thousand web servers and you wanted the average cpu time for all of your servers? You could craft a wild card query like *.sys.cpu.*.user and the system would open all 64,000 files, aggregate the results and return the data. Or you setup a process to pre-aggregate the data and write it to webservers.sys.cpu.user.all.

但是,如果web server有64个核,而你想获取平均时间呢?有些系统允许你使用一个模糊匹配,例如webserver01.sys.cpu.*.user ,然后读取64个文件,然后将它们聚合。

另外,你可以记录一个新的时间序列,名为webserver01.sys.cpu.user.all,这样表示同样的聚合效果,但是需要64+1个不同的时间序列。

如果你有1000个webserer,对所有的server求cpu平均时间的画?你可能使用*.sys.cpu.*.user ,然后读取64000个文件,然后聚合结果返回数据,或者提前聚合数据,写入新的时间序列如webservers.sys.cpu.user.all。

OpenTSDB handles things a bit differently by introducing the idea of 'tags'. Each time series still has a 'metric' name, but it's much more generic, something that can be shared by many unique time series. Instead, the uniqueness comes from a combination of tag key/value pairs that allows for flexible queries with very fast aggregations.

OpenTSDB使用不同的处理方式,引入tags的思想。每个时间序列都有一个metric的名字,但是这个更通用,被很多不同的时间序列共享。

唯一性来自于tag,key/value pairs,这样使用查询灵活,也快速进行整合。

Note

Every time series in OpenTSDB must have at least one tag.

在OpenTSDB中的每个时间至少有一个tag。

Take the previous example where the metric was webserver01.sys.cpu.0.user. In OpenTSDB, this may become sys.cpu.userhost=webserver01, cpu=0. Now if we want the data for an individual core, we can craft a query likesum:sys.cpu.user{host=webserver01,cpu=42}. If we want all of the cores, we simply drop the cpu tag and ask forsum:sys.cpu.user{host=webserver01}. This will give us the aggregated results for all 64 cores. If we want the results for all 1,000 servers, we simply request sum:sys.cpu.user. The underlying data schema will store all of the sys.cpu.user time series next to each other so that aggregating the individual values is very fast and efficient. OpenTSDB was designed to make these aggregate queries as fast as possible since most users start out at a high level, then drill down for detailed information.

回到前面的例子中的metric,webserver01.sys.cpu.0.user。在OpenTSDB中,将变为sys.cpu.userhost=webserver01, cpu=0。

如果想获取单个核的数据,可以使用如下查询sys.cpu.user{host=webserver01,cpu=42}。

如果想获取所有核的话,可以使用如下查询sys.cpu.user{host=webserver01},这给出64个核聚合后的结果。

如果想获取所有webserver的,查询方式如sys.cpu.user。

底层的数据结构是逐个存储sys.cpu.user时间序列,因此获取单个值是非常快和高效的。

OpenTSDB设计的目标是尽可能地快进行查询的整合,因为大多数用户进行更上层的查询,然后获取更细节的信息。

Aggregations——聚合

While the tagging system is flexible, some problems can arise if you don't understand how the querying side of OpenTSDB, hence the need for some forethought. Take the example query above: sum:sys.cpu.user{host=webserver01}. We recorded 64 unique time series forwebserver01, one time series for each of the CPU cores. When we issued that query, all of the time series for metric sys.cpu.user with the tag host=webserver01 were retrieved, averaged, and returned as one series of numbers. Let's say the resulting average was 50 for timestamp 1356998400. Now we were migrating from another system to OpenTSDB and had a process that pre-aggregated all 64 cores so that we could quickly get the average value and simply wrote a new time series sys.cpu.user host=webserver01. If we run the same query, we'll get a value of 100 at 1356998400. What happened? OpenTSDB aggregated all 64 time series and the pre-aggregated time series to get to that 100. In storage, we would have something like this:

虽然标签系统很灵活,但是如果不了解OpenTSDB的查询方式,可能还会遇到问题,因此需要进一步了解。

以上面的查询作为例子:sum:sys.cpu.user{host=webserver01}

webserver01记录64个不同时间序列,每个核都记录一个。当讨论查询时,所有带有标签host=webserver01的sys.cpu.user的metric都会查询,然后求平均,返回一串数字。

假设结果平均值为50,时间戳为1356998400。现在我们移到另一个OpenTSDB系统,它有一个进程提前整合64核的数据,这样我们将快速得到平均值,写入一个新的时间序列中sys.cpu.user host=webserver01,但是运行同样的查询,结果却为100。这样是发生什么事情呢?

在存储中,数据格式如下:

sys.cpu.user host=webserver01        1356998400  50
sys.cpu.user host=webserver01,cpu=0 1356998400 1
sys.cpu.user host=webserver01,cpu=1 1356998400 0
sys.cpu.user host=webserver01,cpu=2 1356998400 2
sys.cpu.user host=webserver01,cpu=3 1356998400 0
...
sys.cpu.user host=webserver01,cpu=63 1356998400 1

OpenTSDB will automatically aggregate all of the time series for the metric in a query if no tags are given. If one or more tags are defined, the aggregate will 'include all' time series that match on that tag, regardless of other tags. With the querysum:sys.cpu.user{host=webserver01}, we would include sys.cpu.user host=webserver01,cpu=0 as well as sys.cpu.userhost=webserver01,cpu=0,manufacturer=Intel,

sys.cpu.user host=webserver01,foo=bar and

sys.cpu.userhost=webserver01,cpu=0,datacenter=lax,department=ops.

The moral of this example is: be careful with your naming schema.

如果在一个查询中没有设置tags,OpenTSDB自动整合所有时间序列。如果定义一个或者多个tags,整合只会包含和tag匹配的时间序列,忽略掉其他的tags。

例如,查询sum:sys.cpu.user{host=webserver01},将会包括如下:

sys.cpu.user host=webserver01,cpu=0

sys.cpu.userhost=webserver01,cpu=0,manufacturer=Intel

sys.cpu.user host=webserver01,foo=bar

sys.cpu.userhost=webserver01,cpu=0,datacenter=lax,department=ops

这个例子的寓意是:使用naming schema应谨慎

Time Series Cardinality--时间序列基数

A critical aspect of any naming schema is to consider the cardinality of your time series. Cardinality is defined as the number of unique items in a set. In OpenTSDB's case, this means the number of items associated with a metric, i.e. all of the possible tag name and value combinations, as well as the number of unique metric names, tag names and tag values. Cardinality is important for two reasons outlined below.

任何naming schema都需要考虑时间序列的基数。基数定义为集合中唯一items的个数。

在OpenTSDB中,意思是一个metric关联的items个数,所有tag的name和values的组合数,也可能是唯一metric名称、tag名称以及tag值的数目。

Cardinality比较重要,主要下面两个原因。

(1)Limited Unique IDs (UIDs)

There is a limited number of unique IDs to assign for each metric, tag name and tag value. By default there are just over 16 million possible IDs per type. If, for example, you ran a very popular web service and tried to track the IP address of clients as a tag, e.g. web.app.hitsclientip=38.26.34.10, you may quickly run into the UID assignment limit as there are over 4 billion possible IP version 4 addresses. Additionally, this approach would lead to creating a very sparse time series as the user at address 38.26.34.10 may only use your app sporadically, or perhaps never again from that specific address.

对于每个metric,tag name以及tag value存在唯一一个ID标示。每个类型的ID可能有1600万+。

例如,你可能运行一个web service,将IP地址设置为tag,例如web.app.hitsclientip=38.26.34.10。UID的最大值为400万,IP4的地址。

The UID limit is usually not an issue, however. A tag value is assigned a UID that is completely disassociated from its tag name. If you use numeric identifiers for tag values, the number is assigned a UID once and can be used with many tag names. For example, if we assign a UID to the number 2, we could store timeseries with the tag pairs cpu=2interface=2hdd=2 and fan=2 while consuming only 1 tag value UID (1) and 4 tag name UIDs (cpuinterfacehdd and fan).

UID限制不是关键。tag值赋值给UID,可能完全和tag名称没有数目关系。

例如使用数字标示不同的tag,cpu=2interface=2hdd=2 and fan=2,一个tag value,4个tag name。

If you think that the UID limit may impact you, first think about the queries that you want to execute. If we look at the web.app.hitsexample above, you probably only care about the total number of hits to your service and rarely need to drill down to a specific IP address. In that case, you may want to store the IP address as an annotation. That way you could still benefit from low cardinality but if you need to, you could search the results for that particular IP using external scripts. (Note: Support for annotation queries is expected in a futureversion of OpenTSDB.)

如果你认为UID限制可能会影响你,首先要考虑执行的查询。如果查询是web.app.hit,你可能需要关注一下服务的hists个数,而关注具体的IP。

这种场景,存储IP地址作为一个标示。这样根据具体IP查询相关结果。

支持annotation查询是OpenTSD未来的版本。

If you desperately need more than 16 million values, you can increase the number of bytes that OpenTSDB uses to encode UIDs from 3 bytes up to a maximum of 8 bytes. This change would require modifying the value in source code, recompiling, deploying your customized code to all TSDs which will access this data, and maintaining this customization across all future patches and releases.

如果需要存储超过1600+万的值,需要增加OpenTSD使用UIDS的字节数,由3byte扩展到8byte。这个需要修改源码,重编译,然后重新部署所有的TSD。

(2)Query Speed

Cardinality also affects query speed a great deal, so consider the queries you will be performing frequently and optimize your naming schema for those. OpenTSDB creates a new row per time series per hour. If we have the time series sys.cpu.userhost=webserver01,cpu=0 with data written every second for 1 day, that would result in 24 rows of data. However if we have 8 possible CPU cores for that host, now we have 192 rows of data. This looks good because we can get easily a sum or average of CPU usage across all cores by issuing a query like start=1d-ago&m=avg:sys.cpu.user{host=webserver01}.

基数也影响查询速度,因此需要考虑频繁的查询,优化naming schema。OpenTSDB每个小时每个时间序列都产生新的一行记录。

例如有时间序列sys.cpu.userhost=webserver01,cpu=0 ,每秒写入,采集一天,将产生24行数据。

每台主机有8个核,一天就192行数据。对于这样的查询start=1d-ago&m=avg:sys.cpu.user{host=webserver01},看上去很好。

However what if we have 20,000 hosts, each with 8 cores? Now we will have 3.8 million rows per day due to a high cardinality of host values. Queries for the average core usage on host webserver01 will be slower as it must pick out 192 rows out of 3.8 million.

但是有20000个hosts,每个有8核,每天将有380万的记录。

查询webserver01的平均CPU使用性能,将从380万数据中找出192行记录。

The benefits of this schema are that you have very deep granularity in your data, e.g., storing usage metrics on a per-core basis. You can also easily craft a query to get the average usage across all cores an all hosts: start=1d-ago&m=avg:sys.cpu.user. However queries against that particular metric will take longer as there are more rows to sift through.

schema的优势在于有更粗的粒度。

Here are some common means of dealing with cardinality:

Pre-Aggregate - In the example above with sys.cpu.user, you generally care about the average usage on the host, not the usage per core. While the data collector may send a separate value per core with the tagging schema above, the collector could also send one extra data point such as sys.cpu.user.avg host=webserver01. Now you have a completely separate timeseries that would only have 24 rows per day and with 20K hosts, only 480K rows to sift through. Queries will be much more responsive for the per-host average and you still have per-core data to drill down to separately.

Shift to Metric - What if you really only care about the metrics for a particular host and don't need to aggregate across hosts? In that case you can shift the hostname to the metric. Our previous example becomes sys.cpu.user.websvr01 cpu=0. Queries against this schema are very fast as there would only be 192 rows per day for the metric. However to aggregate across hosts you would have to execute mutliple queries and aggregate outside of OpenTSDB. (Future work will include this capability).

Pre-Aggregate

在上面的列子中涉及的sys.cpu.user,你通常是求host的平均使用率,而不是每个核的使用。当基于上面tag schema定义,collector采集数据是按照单个核的。

collector也可以发送一个额外的数据点,例如sys.cpu.user.avg host=webserver01。现在你有一个单独的时间序列,每天有24行记录,20k台机器,就是480k行记录。

这样查找每台服务器平均使用率,同时,你也有单个核的数据。

Shift to Metric

如果你真的关注一个具体host的metric,不需要聚合所有机器。这样你可以将hostname和metric进行绑定。

前面的例子变为sys.cpu.user.websvr01 cpu=0。

基于这个schema的查询会很快,因为单个metric一天只有192行记录。但是不同机器间的整合需要多次查询OpentTSDB,然后再外面整合。

Naming Conclusion

When you design your naming schema, keep these suggestions in mind:

  • Be consistent with your naming to reduce duplication. Always use the same case for metrics, tag names and values.
  • Use the same number and type of tags for each metric. E.g. don't store my.metric host=foo and my.metric datacenter=lga.
  • Think about the most common queries you'll be executing and optimize your schema for those queries
  • Think about how you may want to drill down when querying
  • Don't use too many tags, keep it to a fairly small number, usually up to 4 or 5 tags (By default, OpenTSDB supports a maximum of 8 tags).
 Naming总结:
建议:
1、命名的原则减少数据重复。这个原则也适应于metrics,tag names 和values
2、每个metic使相同数目和类型的tag。例如不要存储my.metric host=foo和my.metric datacenter=lga.
3、多思考最多共同的查询,这样可以基于这些查询优化schema
4、在查询的时候多思考如何减少数据量
5、不要使用太多的tags,保持在一个相对小的数目,一般4或者5个。OpenTSDB默认最多8个tags
 

Data Specification

Every time series data point requires the following data:

  • metric - A generic name for the time series such as sys.cpu.userstock.quote or env.probe.temp.
  • timestamp - A Unix/POSIX Epoch timestamp in seconds or milliseconds defined as the number of seconds that have elapsed since January 1st, 1970 at 00:00:00 UTC time. Only positive timestamps are supported at this time.
  • value - A numeric value to store at the given timestamp for the time series. This may be an integer or a floating point value.
  • tag(s) - A key/value pair consisting of a tagk (the key) and a tagv (the value). Each data point must have at least one tag.

每个时间序列数据点包括以下数据:

1、metric:时间序列的名字,例如sys.cpu.userstock.quote or env.probe.temp

2、timestamp:UNIX/POSIX的时间戳

3、value:在对应的timestamp下metric对应的值,可以是整数或者浮点数

4、tags:KV对,由tagk和tagv组成。每个数据点至少包括一个tag

Timestamps

Data can be written to OpenTSDB with second or millisecond resolution. Timestamps must be integers and be no longer than 13 digits (See first [NOTE] below). Millisecond timestamps must be of the format 1364410924250 where the final three digits represent the milliseconds. Applications that generate timestamps with more than 13 digits (i.e., greater than millisecond resolution) must be rounded to a maximum of 13 digits before submitting or an error will be generated.

Timestamps with second resolution are stored on 2 bytes while millisecond resolution are stored on 4. Thus if you do not need millisecond resolution or all of your data points are on 1 second boundaries, we recommend that you submit timestamps with 10 digits for second resolution so that you can save on storage space. It's also a good idea to avoid mixing second and millisecond timestamps for a given time series. Doing so will slow down queries as iteration across mixed timestamps takes longer than if you only record one type or the other. OpenTSDB will store whatever you give it.

 Timestamps
数据写入OpenTSDB可以在秒级或者毫秒级别。Timestamps必须是整数,不超过13个数字。毫秒级的时间戳格式为1364410924250,最后三个数字表示毫秒。
应用程序产生的时间戳不能超过13个数字,否则提交会出错。
秒级的时间戳存储在2个字节内,而毫秒级需要4个字节。因此,如果你不需要毫秒级解决方案,所有数据点都在1秒内,推荐使用10个数字,这样可以节约存储空间。
在给定时间序列,避免混合使用秒级和毫秒级的时间戳。使用混合时间戳会降低查询。不论选择那种,OpenTSDB在给定类型下都会存储。
 

Metrics and Tags

The following rules apply to metric and tag values:

  • Strings are case sensitive, i.e. "Sys.Cpu.User" will be stored separately from "sys.cpu.user"
  • Spaces are not allowed
  • Only the following characters are allowed: a to zA to Z0 to 9-_./ or Unicode letters (as per the specification)

Metric and tags are not limited in length, though you should try to keep the values fairly short.

Metrics和Tags

1、大小写敏感

2、不允许空格

3、支持[a-zA-Z0-9-_./]类型

Metric和tags没有长度限制,建议尽量保持短一些

Integer Values

If the value from a put command is parsed without a decimal point (.), it will be treated as a signed integer. Integers are stored, unsigned, with variable length encoding so that a data point may take as little as 1 byte of space or up to 8 bytes. This means a data point can have a minimum value of -9,223,372,036,854,775,808 and a maximum value of 9,223,372,036,854,775,807 (inclusive). Integers cannot have commas or any character other than digits and the dash (for negative values). For example, in order to store the maximum value, it must be provided in the form 9223372036854775807.

Floating Point Values

If the value from a put command is parsed with a decimal point (.) it will be treated as a floating point value. Currently all floating point values are stored on 4 bytes, single-precision, with support for 8 bytes planned for a future release. Floats are stored in IEEE 754 floating-point "single format" with positive and negative value support. Infinity and Not-a-Number values are not supported and will throw an error if supplied to a TSD. See Wikipedia and the Java Documentation for details.

Note

Because OpenTSDB only supports floating point values, it is not suitable for storing measurements that require exact values like currency. This is why, when storing a value like 15.2 the database may return 15.199999809265137.

Integer Values

在put命令value不包括小数点,其将被作为整数值。最多8个字节

Floating Point Values

value包含小数点,将被作为float。现在float存储4个字节,单精度

Ordering

Unlike other solutions, OpenTSDB allows for writing data for a given time series in any order you want. This enables significant flexibility in writing data to a TSD, allowing for populating current data from your systems, then importing historical data at a later time

OpenTSDB允许给定时间序列任何排序规则。增强写入TSD数据的灵活性,允许收集系统现在状态数据,后面再将历史数据导入。

Duplicate Data Points

Writing data points in OpenTSDB is generally idempotent within an hour of the original write. This means you can write the value 42 at timestamp 1356998400 and then write 42 again for the same time and nothing bad will happen. However if you have compactions enabled to reduce storage consumption and write the same data point after the row of data has been compacted, an exception may be returned when you query over that row. If you attempt to write two different values with the same timestamp, a duplicate data point exception may be thrown during query time. This is due to a difference in encoding integers on 1, 2, 4 or 8 bytes and floating point numbers. If the first value was an integer and the second a floating point, the duplicate error will always be thrown. However if both values were floats or they were both integers that could be encoded on the same length, then the original value may be overwritten if a compaction has not occured on the row.

In most situations, if a duplicate data point is written it is usually an indication that something went wrong with the data source such as a process restarting unexpectedly or a bug in a script. OpenTSDB will fail "safe" by throwing an exception when you query over a row with one or more duplicates so you can down the issue.

With OpenTSDB 2.1 you can enable last-write-wins by setting the tsd.storage.fix_duplicates configuration value to true. With this flag enabled, at query time, the most recent value recorded will be returned instead of throwing an exception. A warning will also be written to the log file noting a duplicate was found. If compaction is also enabled, then the original compacted value will be overwritten with the latest value.

写入OpenTSDB中的数据点是幂等的。意思是你再时间点1356998400写入值42,再写一次42,是不会有问题的。但是如果从节约存储角度考虑,写入同样的数据点需要被compacted,否则在查询这行数据的时候可能会出现异常。如果你在同一个时间点写入两个不同值,查询的时候可能会出现异常。

因为两个值可能编码类型不一样,一个是整数,一个可能是浮点数。

通常情况下,是在采集的时候进行去重。OpenTSDB在查询到重复数据的时候会返回异常,便于查找错误。

OpenTSDB2.1可以开启配置tsd.storage.fix_duplicates 。查询时候返回最近的那个条记录,而不是抛出一次。在日志中记录一条警告。

如果compatction可行,原始数据将被覆盖。

Input Methods

There are currently three main methods to get data into OpenTSDB: Telnet API, HTTP API and batch import from a file. Alternatively you can use a tool that provides OpenTSDB support, or if you're extremely adventurous, use the Java library.

主要由三种方式从OpenTSDB获取数据:

1、Telnet API

2、HTTP API

3、批量从文件导入

这样使用OpenTSDB支持的tool,也可以直接使用java库

Telnet

The easiest way to get started with OpenTSDB is to open up a terminal or telnet client, connect to your TSD and issue a put command and hit 'enter'. If you are writing a program, simply open a socket, print the string command with a new line and send the packet. The telnet command format is:

put <metric> <timestamp> <value> <tagk1=tagv1[ tagk2=tagv2 ...tagkN=tagvN]>

For example:

put sys.cpu.user 1356998400 42.5 host=webserver01 cpu=0

Each put can only send a single data point. Don't forget the newline character, e.g. \n at the end of your command.

在Telnet连接上TSD后,使用put命令。不同行之间需要加上\n

Http API

As of version 2.0, data can be sent over HTTP in formats supported by 'Serializer' plugins. Multiple, un-related data points can be sent in a single HTTP POST request to save bandwidth. See the /api/put for details.

HTTPAPI方式,api/put

Batch Import

If you are importing data from another system or you need to backfill historical data, you can use the import CLI utility. See import for details.

导入历史数据使用

Write Performance

OpenTSDB can scale to writing millions of data points per 'second' on commodity servers with regular spinning hard drives. However users who fire up a VM with HBase in stand-alone mode and try to slam millions of data points at a brand new TSD are disappointed when they can only write data in the hundreds of points per second. Here's what you need to do to scale for brand new installs or testing and for expanding existing systems.

 在通用的机器上OpenTSDB可以扩展支持单秒百万数据的写入。
但是如果在VM上使用HBASE stand-alone模式,单秒只能写入几百个点。因此,需要在已经存在的系统中进行新的安装和测试。

UID Assignment

The first sticking point folks run into is ''uid assignment''. Every string for a metric, tag key and tag value must be assigned a UID before the data point can be stored. For example, the metric sys.cpu.user may be assigned a UID of 000001 the first time it is encountered by a TSD. This assignment takes a fair amount of time as it must fetch an available UID, write a UID to name mapping and a name to UID mapping, then use the UID to write the data point's row key. The UID will be stored in the TSD's cache so that the next time the same metric comes through, it can find the UID very quickly.

每个metric,tag key和tag value在数据存储之前都有一个UID。例如sys.cpu.user metric,TSD第一次设置其UID为000001。

UID作为数据点的row key。UID会存储在TSD的cache中,下次同样的metric来了,查找UID就很快。

Therefore, we recommend that you 'pre-assign' UID to as many metrics, tag keys and tag values as you can. If you have designed a naming schema as recommended above, you'll know most of the values to assign. You can use the CLI tools mkmetricuid or the HTTP API /api/uid to perform pre-assignments. Any time you are about to send a bunch of new metrics or tags to a running OpenTSDB cluster, try to pre-assign or the TSDs will bog down a bit when they get the new data.

建议给很多metric、tag key和tag values提前分配UID。可以使用CLI或HTTP API进行预分配。

如果在线上OpenTSDB新增metric或者tag,TSDB将性能会降低一点,当其获取新数据的时候。

Note

If you restart a TSD, it will have to lookup the UID for every metric and tag so performance will be a little slow until the cache is filled.

重启TSD的时候,它会查出每个metric、tag的UID,因此性能会低一些。

Pre-Split HBase Regions

For brand new installs you will see much better performance if you pre-split the regions in HBase regardless of if you're testing on a stand-alone server or running a full cluster. HBase regions handle a defined range of row keys and are essentially a single file. When you create the tsdb table and start writing data for the first time, all of those data points are being sent to this one file on one server. As a region fills up, HBase will automatically split it into different files and move it to other servers in the cluster, but when this happens, the TSDs cannot write to the region and must buffer the data points. Therefore, if you can pre-allocate a number of regions before you start writing, the TSDs can send data to multiple files or servers and you'll be taking advantage of the linear scalability immediately.

无论是在进行单机还是集群测试的时候,提前在HBase上进行预分区,都将获取较好的性能。

HBase的region可以处理一定范围的row keys。

当你创建一个tsdb表,开始第一次写入数据,这些数据将被发送到单个server的单个文件中。

当一个region满了,HBase会主动将其分成不同的文件,在一个cluster也可能将其挪到不同的机器上。

但是这个事情,TSD不能再往这个region写入数据,必须缓存这些数据点。

因此在开始写入数据前,提前分配好一定数目的region,TSD可以持续往不同的files或者server发送数据,利用到其线性扩展的优势。

The simplest way to pre-split your tsdb table regions is to estimate the number of unique metric names you'll be recording. If you have designed a naming schema, you should have a pretty good idea. Let's say that we will track 4,000 metrics in our system. That's not to say 4,000 time series, as we're not counting the tags yet, just the metric names such as "sys.cpu.user". Data points are written in row keys where the metric's UID comprises the first bytes, 3 bytes by default. The first metric will be assigned a UID of 000001 as a hex encoded value. The 4,000th metric will have a UID of 000FA0 in hex. You can use these as the start and end keys in the script from the HBase Book to split your table into any number of regions. 256 regions may be a good place to start depending on how many time series share each metric.

简单地划分tsdb表是根据唯一metric名称的数目来。如果你设计一个naming schema,会比较好知道这个。

假设我们系统中有4000个metric,不是说有4000个时间序列,例如metric sys.cpu.user。

数据点写入是根据row key。metric的uid组成第一个字节。

第一个metric,uid为000001,第4000个metric,对应的UID为 000FA0 。

可以根据HBASE Book上的脚本计算region的数量。256个region是个推荐值。

But don't worry too much about splitting. As stated above, HBase will automatically split regions for you so over time, the data will be distributed fairly evenly.

不要担心分得太细。

Distributed HBase

HBase will run in stand-alone mode where it will use the local file system for storing files. It will still use multiple regions and perform as well as the underlying disk or raid array will let it. You'll definitely want a RAID array under HBase so that if a drive fails, you can replace it without losing data. This kind of setup is fine for testing or very small installations and you should be able to get into the low thousands of data points per second.

Hbase可以运行在单机版,使用本地文件系统存储文件。它依然使用不同的分区,在底层磁盘或者raid阵列上性能还不错。

可以在RAID阵列上运行HBASE,这样单个盘故障可以不会损失数据。

单秒数据点降低千点。

However if you want serious throughput and scalability you have to setup a Hadoop and HBase cluster with multiple servers. In a distributed setup HDFS manages region files, automatically distributing copies to different servers for fault tolerance. HBase assigns regions to different servers and OpenTSDB's client will send data points to the specific server where they will be stored. You're now spreading operations amongst multiple servers, increasing performance and storage. If you need even more throughput or storage, just add nodes or disks.

但是考虑吞吐量和扩展性,你需要在不同的server上安装hadoop和hbase集群。

使用HDFS管理region文件,在不同server之间自动拷贝数据。HBase在不同的server包含不同的region。

OpenTSDB client可以将数据点发生到某个server上进行存储。现在你可以在不同server上进行操作,增强性能和存储量。

如果你需要更好的吞吐和更大的存储,只需要添加节点或者磁盘就可以了。

There are a number of ways to setup a Hadoop/HBase cluster and a ton of various tuning tweaks to make, so Google around and ask user groups for advice. Some general recomendations include:

  • Dedicate a pair of high memory, low disk space servers for the Name Node. Set them up for high availability using something like Heartbeat and Pacemaker.
  • Setup Zookeeper on at least 3 servers for fault tolerance. They must have a lot of RAM and a fairly fast disk for log writing. On small clusters, these can run on the Name node servers.
  • JBOD for the HDFS data nodes
  • HBase region servers can be collocated with the HDFS data nodes
  • At least 1 gbps links between servers, 10 gbps preferable.
  • Keep the cluster in a single data center
 
 有很多方式安装Hadoop和HBase cluster,也有很多的优化建议:
1、将高内存、低磁盘空间的server作为Name Node,配置高可用,例如使用Heartbeat 和Pacemaker
2、配置zookeeper,至少3台server,必须有可以快速访问磁盘,方便日志写入。对于小集群而言,可以部署在NameNode上
3、为HDFS 数据节点配置JBOD
4、HBase的region server分配在HDFS的data node
5、server之间至少1gbps(千兆网卡),10gbps最好
6、在单个data center配置cluster
 

Multiple TSDs

A single TSD can handle thousands of writes per second. But if you have many sources it's best to scale by running multiple TSDs and using a load balancer (such as Varnish or DNS round robin) to distribute the writes. Many users colocate TSDs on their HBase region servers when the cluster is dedicated to OpenTSDB.

单个TSD每秒可以处理几个写入。但是如果你有很多源,最好运行较多的TSDs,并且使用一个load balancer来均衡写入。

很多使用者将TSD放在HBase region server上,当OpenTSDB使用cluster

Persistent Connections

Enable keep alives in the TSDs and make sure that any applications you are using to send time series data keep their connections open instead of opening and closing for every write. See Configuration for details.

保持长连接,避免对每个写入重复地打开和关闭连接

 

Disable Meta Data and Real Time Publishing

OpenTSDB 2.0 introduced meta data for tracking the kinds of data in the system. When tracking is enabled, a counter is incremented for every data point written and new UIDs or time series will generate meta data. The data may be pushed to a search engine or passed through tree generation code. These processes require greater memory in the TSD and may affect throughput. Tracking is disabled by default so test it out before enabling the feature.

2.0 also introduced a real-time publishing plugin where incoming data points can be emitted to another destination immediately after they're queued for storage. This is diabled by default so test any plugins you are interested in before deploying in production.

OpenTSDB2.0引入meta data,跟踪系统中各类数据。如果跟踪开启,每个数据点写入新的UIDs都会有个counter,然后产生对应的meta data。

这些可以推送给搜索引擎,产生tree generation code。这些过程需要更大的内存,可能会影响吞吐。

Tracking默认是被关闭的,所以开启之前首先测试一下。

2.0也引入实时publishing plugin,这样导入的数据可以发生到其他存储上,通过队列的方式。

默认也是关闭,要部署到生产环境上

【参考资料】

1、http://opentsdb.net/docs/build/html/user_guide/writing.html

2、http://en.wikipedia.org/wiki/IPv4_address_exhaustion

3、Pacemaker

http://asram.blog.51cto.com/1442164/351135

上一篇:控制面板里找不到“应用程序server”这个项目,Windows XP中金蝶安装时无“应用程序server”的解决的方法


下一篇:干掉命令行窗口下MySql乱码