The author selected the Computer History Museum to receive a donation as part of the Write for DOnations program.
作者选择“ 计算机历史博物馆”作为“ Write for DOnations”计划的一部分接受捐赠。
介绍 (Introduction)
Many applications, such as monitoring systems and data collection systems, accumulate data for further analysis. These analyses often look at the way a piece of data or a system changes over time. In these instances, data is represented as a time series, with every data point accompanied by a timestamp. An example would look like this:
许多应用程序(例如监视系统和数据收集系统)会累积数据以进行进一步分析。 这些分析通常着眼于数据或系统随时间变化的方式。 在这些情况下,数据表示为时间序列,每个数据点都带有时间戳。 一个示例如下所示:
-
2019-11-01 09:00:00 server.cpu.1 0.9
-
2019-11-01 09:00:00 server.cpu.15 0.8
-
2019-11-01 09:01:00 server.cpu.1 0.9
-
2019-11-01 09:01:00 server.cpu.15 0.8
-
...
The relevance of time series data has recently grown thanks to the new deployments of the Internet of Things (IoT) and Industrial Internet of Things. There are more and more devices that collect various time series information: fitness trackers, smart watches, home weather stations, and various sensors, to name a few. These devices collect a lot of information, and all this data must be stored somewhere.
由于物联网 (IoT)和工业物联网的新部署,时间序列数据的相关性最近有所增长。 越来越多的设备收集各种时间序列信息:健身追踪器,智能手表,家庭气象站和各种传感器,仅举几例。 这些设备收集了大量信息,所有这些数据必须存储在某个地方。
Classic relational databases are most often used to store data, but they don’t always fit when it comes to the huge data volumes of time series. When you need to process a large amount of time series data, relational databases can be too slow. Because of this, specially optimized databases, called NoSQL databases, have been created to avoid the problems of relational databases.
经典的关系数据库最常用于存储数据,但涉及时间序列的庞大数据量时,它们并不总是适合。 当您需要处理大量时间序列数据时,关系数据库可能太慢。 因此,已经创建了专门优化的数据库,称为NoSQL数据库 ,以避免关系数据库的问题。
TimescaleDB is an open-source database optimized for storing time series data. It is implemented as an extension of PostgreSQL and combines the ease-of-use of relational databases and the speed of NoSQL databases. As a result, it allows you to use PostgreSQL for both storing business data and time series data in one place.
TimescaleDB是一个开放源数据库,已针对存储时间序列数据进行了优化。 它实现为PostgreSQL的扩展,并结合了关系数据库的易用性和NoSQL数据库的速度。 结果,它允许您使用PostgreSQL在一个地方存储业务数据和时间序列数据。
By following this tutorial, you’ll set up TimescaleDB on CentOS 7, configure it, and learn how to work with it. You’ll run through creating time series databases and making simple queries. Finally, you’ll see how to remove unnecessary data.
通过遵循本教程,您将在CentOS 7上设置TimescaleDB,对其进行配置,并学习如何使用它。 您将创建时间序列数据库并进行简单查询。 最后,您将看到如何删除不必要的数据。
先决条件 (Prerequisites)
To follow this tutorial, you will need:
要遵循本教程,您将需要:
-
One CentOS 7 server set up by following our Initial Server Setup with CentOS 7 guide, including a non-root user with sudo privileges and a firewall set up with
firewalld
. To set upfirewalld
, follow the “Configuring a Basic Firewall” section of the Additional Recommended Steps for New CentOS 7 Servers tutorial.按照我们的CentOS 7初始服务器设置指南设置一台CentOS 7服务器,其中包括具有sudo特权的非root用户和使用firewalld设置的
firewalld
。 要设置firewalld
,请遵循“ 针对新CentOS 7服务器的其他推荐步骤”教程的“配置基本防火墙”部分。 -
PostgreSQL installed on your server. Follow our How To Install and Use PostgreSQL on CentOS 7 guide to install and configure it.
在您的服务器上安装了PostgreSQL。 遵循我们的“ 如何在CentOS 7上安装和使用PostgreSQL”指南进行安装和配置。
第1步-安装TimescaleDB (Step 1 — Installing TimescaleDB)
TimescaleDB is not available in CentOS default package repositories, so in this step you will install it from the TimescaleDB’s third-party repository.
CentOS默认软件包存储库中没有TimescaleDB,因此在这一步中,您将从TimescaleDB的第三方存储库中安装它。
First, create a new repository file:
首先,创建一个新的存储库文件:
- sudo vi /etc/yum.repos.d/timescaledb.repo
Enter insert mode by pressing i
and paste the following configuration into the file:
通过按i
进入插入模式,然后将以下配置粘贴到文件中:
/etc/yum.repos.d/timescaledb.repo
/etc/yum.repos.d/timescaledb.repo
-
[timescale_timescaledb]
-
name=timescale_timescaledb
-
baseurl=https://packagecloud.io/timescale/timescaledb/el/7/$basearch
-
repo_gpgcheck=1
-
gpgcheck=0
-
enabled=1
-
gpgkey=https://packagecloud.io/timescale/timescaledb/gpgkey
-
sslverify=1
-
sslcacert=/etc/pki/tls/certs/ca-bundle.crt
-
metadata_expire=300
When you’re finished, press ESC
to leave insert mode, then :wq
and ENTER
to save and exit the file. To learn more about the text editor vi and its successor vim, check out our Installing and Using the Vim Text Editor on a Cloud Server tutorial.
完成后,按ESC
退出插入模式,然后按:wq
和ENTER
保存并退出文件。 要了解有关文本编辑器vi及其后续vim的更多信息,请查看我们在Cloud Server上安装和使用Vim文本编辑器的教程。
You can now proceed with the installation. This tutorial uses PostgreSQL version 11; if you are using a different version of PostgreSQL (9.6 or 11, for example), replace the value in the following command and run it:
现在,您可以继续安装。 本教程使用PostgreSQL版本11; 如果使用的是其他版本的PostgreSQL(例如9.6或11),请替换以下命令中的值并运行它:
-
sudo yum install -y timescaledb-postgresql-11
须藤yum install -y timescaledb-postgresql- 11
TimescaleDB is now installed and ready to be used. Next, you will turn it on and adjust some of the settings associated with it in the PostgreSQL configuration file to optimize the database.
现在已安装TimescaleDB并准备使用它。 接下来,您将打开它并在PostgreSQL配置文件中调整与它相关联的一些设置以优化数据库。
第2步-配置TimescaleDB (Step 2 — Configuring TimescaleDB)
The TimescaleDB module works fine with the default PostgreSQL configuration settings, but to improve performance and make better use of processor, memory, and disk resources, developers of TimescaleDB suggest configuring some individual parameters. This can be done automatically with the timescaledb-tune
tool or by manually editing your server’s postgresql.conf
file.
TimescaleDB模块可以与默认的PostgreSQL配置设置一起正常工作,但是为了提高性能并更好地利用处理器,内存和磁盘资源,TimescaleDB的开发人员建议配置一些单独的参数。 这可以使用timescaledb-tune
工具自动完成,也可以通过手动编辑服务器的postgresql.conf
文件来完成。
In this tutorial, you will use the timescaledb-tune
tool. It reads the postgresql.conf
file and interactively suggests making changes.
在本教程中,您将使用timescaledb-tune
工具 。 它读取postgresql.conf
文件并以交互方式建议进行更改。
Run the following command to start the configuration wizard:
运行以下命令以启动配置向导:
-
sudo timescaledb-tune --pg-config=/usr/pgsql-11/bin/pg_config
sudo timescaledb-tune --pg-config = / usr / pgsql- 11 / bin / pg_config
First, you will be asked to confirm the path to the PostgreSQL configuration file:
首先,将要求您确认PostgreSQL配置文件的路径:
-
Output
-
Using postgresql.conf at this path:
-
/var/lib/pgsql/11/data/postgresql.conf
-
Is this correct? [(y)es/(n)o]:
The utility automatically detects the path to the configuration file, so confirm this by entering y
:
该实用程序会自动检测到配置文件的路径,因此请输入y
确认:
-
Output
-
...
-
Is this correct? [(y)es/(n)o]: y
-
Writing backup to:
-
/tmp/timescaledb_tune.backup201912191633
Next, enable the TimescaleDB module by typing y
at the next prompt and pressing ENTER
:
接下来,通过在下一个提示符下键入y
并按ENTER
来启用TimescaleDB模块:
-
Output
-
shared_preload_libraries needs to be updated
-
Current:
-
#shared_preload_libraries = ''
-
Recommended:
-
shared_preload_libraries = 'timescaledb'
-
Is this okay? [(y)es/(n)o]: y
-
success: shared_preload_libraries will be updated
Based on the characteristics of your server and the PostgreSQL version, you will then be offered to tune your settings. Press y
to start the tuning process:
根据服务器的特性和PostgreSQL版本,将提示您调整设置。 按y
开始调整过程:
-
Output
-
Tune memory/parallelism/WAL and other settings? [(y)es/(n)o]: y
-
Recommendations based on 7.64 GB of available memory and 4 CPUs for PostgreSQL 11
-
Memory settings recommendations
-
Current:
-
shared_buffers = 128MB
-
#effective_cache_size = 4GB
-
#maintenance_work_mem = 64MB
-
#work_mem = 4MB
-
Recommended:
-
shared_buffers = 1955MB
-
effective_cache_size = 5865MB
-
maintenance_work_mem = 1001121kB
-
work_mem = 5005kB
-
Is this okay? [(y)es/(s)kip/(q)uit]:
timescaledb-tune
will automatically detect the server’s available memory and calculate recommended values for the shared_buffers
, effective_cache_size
, maintenance_work_mem
, and work_mem
settings. If you’d like to learn more about how this is done, check out the GitHub page for timescaledb-tune
.
timescaledb-tune
将自动检测服务器的可用内存,并为shared_buffers
, effective_cache_size
, maintenance_work_mem
和work_mem
设置计算推荐值。 如果您想了解更多有关此操作的信息,请查看GitHub页面以获取timescaledb-tune
。
If these settings look OK, enter y
:
如果这些设置看起来不错,请输入y
:
-
Output
-
...
-
Is this okay? [(y)es/(s)kip/(q)uit]: y
-
success: memory settings will be updated
At this point, if your server has multiple CPUs, you will find the recommendations for parallelism settings. However if you have one CPU, timescaledb-tune
will send you directly to the WAL settings.
此时,如果您的服务器有多个CPU,则会发现有关并行性设置的建议。 但是,如果您有一个CPU, timescaledb-tune
会将您直接发送到WAL设置。
Those with multiple CPUs will encounter recommendations like this:
具有多个CPU的用户将遇到以下建议:
-
Output
-
Parallelism settings recommendations
-
Current:
-
missing: timescaledb.max_background_workers
-
#max_worker_processes = 8
-
#max_parallel_workers_per_gather = 2
-
#max_parallel_workers = 8
-
Recommended:
-
timescaledb.max_background_workers = 8
-
max_worker_processes = 15
-
max_parallel_workers_per_gather = 2
-
max_parallel_workers = 4
-
Is this okay? [(y)es/(s)kip/(q)uit]:
These settings regulate the number of workers that process requests and background tasks. You can learn more about these settings from the TimescaleDB and PostgreSQL documentation.
这些设置可调节处理请求和后台任务的工作人员的数量。 您可以从TimescaleDB和PostgreSQL文档中了解有关这些设置的更多信息。
Type y
then ENTER
to accept these settings:
键入y
然后按ENTER
以接受以下设置:
-
Output
-
...
-
Is this okay? [(y)es/(s)kip/(q)uit]: y
-
success: parallelism settings will be updated
Next, you will find recommendations for Write Ahead Log (WAL):
接下来,您将找到有关预写日志(WAL)的建议 :
-
Output
-
WAL settings recommendations
-
Current:
-
#wal_buffers = -1
-
#min_wal_size = 80MB
-
#max_wal_size = 1GB
-
Recommended:
-
wal_buffers = 16MB
-
min_wal_size = 4GB
-
max_wal_size = 8GB
-
Is this okay? [(y)es/(s)kip/(q)uit]:
WAL preserves data integrity, but the default settings can cause inefficient I/O that slows down write performance. Type and enter y
to optimize these settings:
WAL保留数据完整性,但是默认设置会导致低效率的I / O,从而降低写入性能。 输入并输入y
以优化这些设置:
-
Output
-
...
-
Is this okay? [(y)es/(s)kip/(q)uit]: y
-
success: WAL settings will be updated
You’ll now find some miscellaneous recommendations:
现在,您会发现一些其他建议:
-
Output
-
Miscellaneous settings recommendations
-
Current:
-
#default_statistics_target = 100
-
#random_page_cost = 4.0
-
#checkpoint_completion_target = 0.5
-
#max_locks_per_transaction = 64
-
#autovacuum_max_workers = 3
-
#autovacuum_naptime = 1min
-
#effective_io_concurrency = 1
-
Recommended:
-
default_statistics_target = 500
-
random_page_cost = 1.1
-
checkpoint_completion_target = 0.9
-
max_locks_per_transaction = 64
-
autovacuum_max_workers = 10
-
autovacuum_naptime = 10
-
effective_io_concurrency = 200
-
Is this okay? [(y)es/(s)kip/(q)uit]:
All of these different parameters are aimed at increasing performance. For example, SSDs can process many concurrent requests, so the best value for the effective_io_concurrency
might be in the hundreds. You can find more info about these options in the PostgreSQL documentation.
所有这些不同的参数都旨在提高性能。 例如,SSD可以处理许多并发请求,因此, effective_io_concurrency
的最佳值可能是数百。 您可以在PostgreSQL文档中找到有关这些选项的更多信息。
Press y
then ENTER
to continue.
按y
然后按ENTER
以继续。
-
Output
-
...
-
Is this okay? [(y)es/(s)kip/(q)uit]: y
-
success: miscellaneous settings will be updated
-
Saving changes to: /var/lib/pgsql/11/data/postgresql.conf
As a result, you will get a ready-made configuration file at /var/lib/pgsql/11/data/postgresql.conf
.
结果,您将在/var/lib/pgsql/ 11 /data/postgresql.conf
获得现成的配置文件。
Note: If you are doing the installation from scratch, you could also run the initial command with the --quiet
and --yes
flags, which will automatically apply all the recommendations and will make changes to the postgresql.conf
configuration file:
注意:如果您要从头开始安装,还可以使用--quiet
和--yes
标志运行初始命令,这将自动应用所有建议并会对postgresql.conf
配置文件进行更改:
-
sudo timescaledb-tune --pg-config=/usr/pgsql-11/bin/pg_config --quiet --yes
sudo timescaledb-tune --pg-config = / usr / pgsql- 11 / bin / pg_config --quiet --yes
In order for the configuration changes to take effect, you must restart the PostgreSQL service:
为了使配置更改生效,您必须重新启动PostgreSQL服务:
-
sudo systemctl restart postgresql-11.service
sudo systemctl重新启动postgresql- 11 .service
Now the database is running with optimal parameters and is ready to work with the time series data. In the next steps, you’ll try out working with this data: creating new databases and hypertables and performing operations.
现在,数据库正在使用最佳参数运行,并准备使用时间序列数据。 在接下来的步骤中,您将尝试使用这些数据:创建新的数据库和超表并执行操作。
第3步-创建新的数据库和超表 (Step 3 — Creating a New Database and Hypertable)
With your TimescaleDB setup optimized, you are ready to work with time series data. TimescaleDB is implemented as an extension of PostgreSQL, so operations with time series data are not much different from relational data operations. At the same time, the database allows you to freely combine data from time series and relational tables in the future.
优化您的TimescaleDB设置后,您就可以使用时间序列数据了。 TimescaleDB被实现为PostgreSQL扩展,因此使用时间序列数据进行的操作与关系数据操作并没有太大区别。 同时,该数据库允许您将来*组合时间序列和关系表中的数据。
First, you will create a new database and turn on the TimescaleDB extension for it. Log in to your PostgreSQL database:
首先,您将创建一个新数据库并为其打开TimescaleDB扩展。 登录到PostgreSQL数据库:
- sudo -u postgres psql
Now create a new database and connect to it. This tutorial will name the database timeseries
:
现在创建一个新数据库并连接到它。 本教程将命名数据库timeseries
:
-
CREATE DATABASE timeseries;
创建数据库时间序列 ;
-
\c timeseries
\ c 时间序列
You can find additional information about working with the PostgreSQL database in our How To Create, Remove & Manage Tables in PostgreSQL on a Cloud Server tutorial.
您可以在《 Cloud Server上的如何在PostgreSQL中创建,删除和管理表》教程中找到有关使用PostgreSQL数据库的其他信息。
Finally, enable the TimescaleDB extension:
最后,启用TimescaleDB扩展:
- CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;
You will see the following output:
您将看到以下输出:
-
Output
-
WARNING:
-
WELCOME TO
-
_____ _ _ ____________
-
|_ _(_) | | | _ \ ___ \
-
| | _ _ __ ___ ___ ___ ___ __ _| | ___| | | | |_/ /
-
| | | | _ ` _ \ / _ \/ __|/ __/ _` | |/ _ \ | | | ___ \
-
| | | | | | | | | __/\__ \ (_| (_| | | __/ |/ /| |_/ /
-
|_| |_|_| |_| |_|\___||___/\___\__,_|_|\___|___/ \____/
-
Running version 1.5.1
-
For more information on TimescaleDB, please visit the following links:
-
1. Getting started: https://docs.timescale.com/getting-started
-
2. API reference documentation: https://docs.timescale.com/api
-
3. How TimescaleDB is designed: https://docs.timescale.com/introduction/architecture
-
Note: TimescaleDB collects anonymous reports to better understand and assist our users.
-
For more information and how to disable, please see our docs https://docs.timescaledb.com/using-timescaledb/telemetry.
-
CREATE EXTENSION
The primary point of interaction with your timeseries data are hypertables, an abstraction of many individual tables holding the data, called chunks.
与时间序列数据进行交互的主要点是超表,它是保存数据的许多单独表的抽象,称为表块 。
To create a hypertable, start with a regular SQL table and then convert it into a hypertable via the function create_hypertable
.
要创建超表,请从常规SQL表开始,然后通过函数create_hypertable
将其转换为超create_hypertable
。
Make a table that will store data for tracking temperature and humidity across a collection of devices over time:
制作一个表,该表将存储数据以跟踪一段时间内各个设备的温度和湿度:
- CREATE TABLE conditions (
- time TIMESTAMP WITH TIME ZONE NOT NULL,
- device_id TEXT,
- temperature NUMERIC,
- humidity NUMERIC
- );
This command will create a table called conditions
with four columns. The first column will store the timestamp, which includes the time zone and cannot be empty. Next, you will use the time column to transform your table into a hypertable that is partitioned by time:
此命令将创建一个称为conditions
的表,该表有四列。 第一列将存储时间戳,其中包括时区,不能为空。 接下来,您将使用时间列将表转换为按时间分区的超表:
- SELECT create_hypertable('conditions', 'time');
This command calls the create_hypertable()
function, which creates a TimescaleDB hypertable from a PostgreSQL table, replacing the latter.
该命令调用create_hypertable()
函数,该函数从PostgreSQL表创建一个TimescaleDB超表,以替换后者。
You will receive the following output:
您将收到以下输出:
-
Output
-
create_hypertable
-
-------------------------
-
(1,public,conditions,t)
-
(1 row)
In this step, you created a new hypertable to store timeseries data. Now you can populate it with data by writing to the hypertable, then run through the process of deleting it.
在此步骤中,您创建了一个新的超表来存储时间序列数据。 现在,您可以通过写入超表来使用数据填充它,然后执行删除它的过程。
步骤4 —写入和删除数据 (Step 4 — Writing and Deleting Data)
In this step, you will insert data using standard SQL commands and import large sets of data from external sources. This will show you the relational database aspects of TimescaleDB.
在此步骤中,您将使用标准SQL命令插入数据,并从外部源导入大量数据。 这将向您展示TimescaleDB的关系数据库方面。
First, try out the basic commands. Data can be inserted into the hypertable using the standard INSERT
SQL command. Insert some sample temperature
and humidity
data for the theoretical device weather-pro-000000
using the following command:
首先,尝试基本命令。 可以使用标准INSERT
SQL命令将数据插入到超表中。 使用以下命令为理论设备weather-pro-000000
插入一些样本temperature
和humidity
数据:
- INSERT INTO conditions(time, device_id, temperature, humidity)
- VALUES (NOW(), 'weather-pro-000000', 84.1, 84.1);
You will receive the following output:
您将收到以下输出:
-
Output
-
INSERT 0 1
You can also insert multiple rows of data at once. Try the following:
您也可以一次插入多行数据。 请尝试以下操作:
- INSERT INTO conditions
- VALUES
- (NOW(), 'weather-pro-000002', 71.0, 51.0),
- (NOW(), 'weather-pro-000003', 70.5, 50.5),
- (NOW(), 'weather-pro-000004', 70.0, 50.2);
You will receive the following:
您将收到以下信息:
-
Output
-
INSERT 0 3
You can also specify that the INSERT
command will return some or all of the inserted data using the RETURNING
statement:
您也可以使用RETURNING
语句指定INSERT
命令将返回部分或全部插入的数据:
- INSERT INTO conditions
- VALUES (NOW(), 'weather-pro-000002', 70.1, 50.1) RETURNING *;
You will see the following output:
您将看到以下输出:
-
Output
-
time | device_id | temperature | humidity
-
-------------------------------+--------------------+-------------+----------
-
2019-09-15 14:14:01.576651+00 | weather-pro-000002 | 70.1 | 50.1
-
(1 row)
If you want to delete data from the hypertable, use the standard DELETE
SQL command. Run the following to delete whatever data has a temperature
higher than 80
or a humidity
higher than 50
:
如果要从超级表中删除数据,请使用标准的DELETE
SQL命令。 运行以下命令以删除temperature
高于80
或humidity
高于50
:
- DELETE FROM conditions WHERE temperature > 80;
- DELETE FROM conditions WHERE humidity > 50;
After the delete operation, it is recommended to use the VACUUM
command, which will reclaim space still used by data that had been deleted.
删除操作之后,建议使用VACUUM
命令,该命令将回收已被删除的数据仍在使用的空间。
- VACUUM conditions;
You can find more info about VACUUM
command in the PostgreSQL documentation.
您可以在PostgreSQL文档中找到有关VACUUM
命令的更多信息。
These commands are fine for small-scale data entry, but since time series data often generates huge datasets from multiple devices simultaneously, it’s essential also to know how to insert hundreds or thousands of rows at a time. If you have prepared data from external sources in a structured form, for example in csv format, this task can be accomplished quickly.
这些命令适用于小规模数据输入,但是由于时间序列数据通常会同时从多个设备生成庞大的数据集,因此了解如何一次插入数百或数千行也很重要。 如果您以结构化形式(例如csv格式)从外部源准备了数据,则可以快速完成此任务。
To test this out, you will use a sample dataset that represents temperature and humidity data from a variety of locations. It was created by TimescaleDB developers to allow you to test out their database. You can check out more info about sample datasets in the TimescaleDB documentation.
为了测试这一点,您将使用一个样本数据集,该数据集表示来自各个位置的温度和湿度数据。 它是由TimescaleDB开发人员创建的,可让您测试他们的数据库。 您可以在TimescaleDB文档中查看有关样本数据集的更多信息。
Let’s see how you can import data from the weather_small
sample dataset into your database. First, quit Postgresql:
让我们看看如何将数据从weather_small
样本数据集中导入数据库。 首先,退出Postgresql:
- \q
Then download the dataset and extract it:
然后下载数据集并提取它:
- cd /tmp
-
curl https://timescaledata.blob.core.windows.net/datasets/weather_small.tar.gz -o weather_small.tar.gz
卷曲https://timescaledata.blob.core.windows.net/datasets/ weather_small .tar.gz -o weather_small .tar.gz
-
tar -xvzf weather_small.tar.gz
tar -xvzf weather_small .tar.gz
Next, import the temperature and humidity data into your database:
接下来,将温度和湿度数据导入数据库:
-
sudo -u postgres psql -d timeseries -c "\COPY conditions FROM weather_small_conditions.csv CSV"
sudo -u postgres psql -d 时间序列 -c“ \ COPY条件来自weather_small_conditions.csv CSV”
This connects to the timeseries
database and executes the \COPY
command that copies the data from the chosen file into the conditions
hypertable. It will run for a few seconds.
这将连接到timeseries
数据库并执行\COPY
命令,该命令会将数据从所选文件复制到conditions
超表中。 它将运行几秒钟。
When the data has been entered into your table, you will receive the following output:
将数据输入到表中后,您将收到以下输出:
-
Output
-
COPY 1000000
In this step, you added data to the hypertable manually and in batches. Next, continue on to performing queries.
在此步骤中,您将数据手动且成批添加到了超表中。 接下来,继续执行查询。
第5步-查询数据 (Step 5 — Querying Data)
Now that your table contains data, you can perform various queries to analyze it.
现在您的表已包含数据,您可以执行各种查询来对其进行分析。
To get started, log in to the database:
首先,登录数据库:
-
sudo -u postgres psql -d timeseries
须藤-u postgres psql -d 时间序列
As mentioned before, to work with hypertables you can use standard SQL commands. For example, to show the last 10 entries from the conditions
hypertable, run the following command:
如前所述,要使用超表,您可以使用标准SQL命令。 例如,要显示conditions
超表中的最后10个条目,请运行以下命令:
- SELECT * FROM conditions LIMIT 10;
You will see the following output:
您将看到以下输出:
-
Output
-
time | device_id | temperature | humidity
-
------------------------+--------------------+--------------------+----------
-
2016-11-15 12:00:00+00 | weather-pro-000000 | 39.9 | 49.9
-
2016-11-15 12:00:00+00 | weather-pro-000001 | 32.4 | 49.8
-
2016-11-15 12:00:00+00 | weather-pro-000002 | 39.800000000000004 | 50.2
-
2016-11-15 12:00:00+00 | weather-pro-000003 | 36.800000000000004 | 49.8
-
2016-11-15 12:00:00+00 | weather-pro-000004 | 71.8 | 50.1
-
2016-11-15 12:00:00+00 | weather-pro-000005 | 71.8 | 49.9
-
2016-11-15 12:00:00+00 | weather-pro-000006 | 37 | 49.8
-
2016-11-15 12:00:00+00 | weather-pro-000007 | 72 | 50
-
2016-11-15 12:00:00+00 | weather-pro-000008 | 31.3 | 50
-
2016-11-15 12:00:00+00 | weather-pro-000009 | 84.4 | 87.8
-
(10 rows)
This command lets you see what data is in the database. Since the database contains a million records, you used LIMIT 10
to limit the output to 10 entries.
该命令使您可以查看数据库中有哪些数据。 由于数据库包含一百万条记录,因此您使用LIMIT 10
将输出限制为10个条目。
To see the most recent entries, sort the data array by time in descending order:
要查看最新条目,请按时间降序对数据数组进行排序:
- SELECT * FROM conditions ORDER BY time DESC LIMIT 20;
This will output the top 20 most recent entries.
这将输出前20个最新条目。
You can also add a filter. For example, to see entries from the weather-pro-000000
device, run the following:
您也可以添加过滤器。 例如,要查看来自weather-pro-000000
设备的条目,请运行以下命令:
- SELECT * FROM conditions WHERE device_id = 'weather-pro-000000' ORDER BY time DESC LIMIT 10;
In this case, you will see the 10 most recent temperature and humidity datapoints recorded by the weather-pro-000000
device.
在这种情况下,您将看到weather-pro-000000
设备记录的10个最新温度和湿度数据点。
In addition to standard SQL commands, TimescaleDB also provides a number of special functions that are useful for timeseries data analysis. For example, to find the median of the temperature values, you can use the following query with the percentile_cont
function:
除标准SQL命令外,TimescaleDB还提供了许多特殊功能,这些功能对于时间序列数据分析很有用。 例如,要查找温度值的中值,可以将以下查询与percentile_cont
函数一起使用:
- SELECT percentile_cont(0.5)
- WITHIN GROUP (ORDER BY temperature)
- FROM conditions
- WHERE device_id = 'weather-pro-000000';
You will see the following output:
您将看到以下输出:
-
Output
-
percentile_cont
-
-----------------
-
40.5
-
(1 row)
In this way, you’ll see the median temperature for the entire observation period where the weather-pro-00000
sensor is located.
这样,您将看到weather-pro-00000
传感器所在的整个观察期的中间温度。
To show the latest values from each of the sensors, you can use the last
function:
要显示每个传感器的最新值,可以使用last
函数:
- select device_id, last(temperature, time)
- FROM conditions
- GROUP BY device_id;
In the output you will see a list of all the sensors and relevant latest values.
在输出中,您将看到所有传感器和相关最新值的列表。
To get the first values use the first
function.
要获取第一个值,请使用第first
函数。
The following example is more complex. It will show the hourly average, minimum, and maximum temperatures for the chosen sensor within the last 24 hours:
以下示例更为复杂。 它将显示最近24小时内所选传感器的每小时平均,最低和最高温度:
- SELECT time_bucket('1 hour', time) "hour",
- trunc(avg(temperature), 2) avg_temp,
- trunc(min(temperature), 2) min_temp,
- trunc(max(temperature), 2) max_temp
- FROM conditions
- WHERE device_id = 'weather-pro-000000'
- GROUP BY "hour" ORDER BY "hour" DESC LIMIT 24;
Here you used the time_bucket
function, which acts as a more powerful version of the PostgreSQL date_trunc
function. As a result, you will see which periods of the day the temperature rises or decreases:
在这里,您使用了time_bucket
函数,该函数充当PostgreSQL date_trunc
函数的更强大版本。 结果,您将看到温度在一天中的哪个时段升高或降低:
-
Output
-
hour | avg_temp | min_temp | max_temp
-
------------------------+----------+----------+----------
-
2016-11-16 21:00:00+00 | 42.00 | 42.00 | 42.00
-
2016-11-16 20:00:00+00 | 41.92 | 41.69 | 42.00
-
2016-11-16 19:00:00+00 | 41.07 | 40.59 | 41.59
-
2016-11-16 18:00:00+00 | 40.11 | 39.79 | 40.59
-
2016-11-16 17:00:00+00 | 39.46 | 38.99 | 39.79
-
2016-11-16 16:00:00+00 | 38.54 | 38.19 | 38.99
-
2016-11-16 15:00:00+00 | 37.56 | 37.09 | 38.09
-
2016-11-16 14:00:00+00 | 36.62 | 36.39 | 37.09
-
2016-11-16 13:00:00+00 | 35.59 | 34.79 | 36.29
-
2016-11-16 12:00:00+00 | 34.59 | 34.19 | 34.79
-
2016-11-16 11:00:00+00 | 33.94 | 33.49 | 34.19
-
2016-11-16 10:00:00+00 | 33.27 | 32.79 | 33.39
-
2016-11-16 09:00:00+00 | 33.37 | 32.69 | 34.09
-
2016-11-16 08:00:00+00 | 34.94 | 34.19 | 35.49
-
2016-11-16 07:00:00+00 | 36.12 | 35.49 | 36.69
-
2016-11-16 06:00:00+00 | 37.02 | 36.69 | 37.49
-
2016-11-16 05:00:00+00 | 38.05 | 37.49 | 38.39
-
2016-11-16 04:00:00+00 | 38.71 | 38.39 | 39.19
-
2016-11-16 03:00:00+00 | 39.72 | 39.19 | 40.19
-
2016-11-16 02:00:00+00 | 40.67 | 40.29 | 40.99
-
2016-11-16 01:00:00+00 | 41.63 | 40.99 | 42.00
-
2016-11-16 00:00:00+00 | 42.00 | 42.00 | 42.00
-
2016-11-15 23:00:00+00 | 42.00 | 42.00 | 42.00
-
2016-11-15 22:00:00+00 | 42.00 | 42.00 | 42.00
-
(24 rows)
You can find more useful functions in the TimescaleDB documentation.
您可以在TimescaleDB文档中找到更多有用的功能。
Now you know how to handle your data. Next, you will go through how to delete unnecessary data and how to compress data.
现在您知道了如何处理数据。 接下来,您将了解如何删除不必要的数据以及如何压缩数据。
步骤6 —配置数据压缩和删除 (Step 6 — Configuring Data Compression and Deletion)
As data accumulates, it will take up more and more space on your hard drive. To save space, the latest version of TimescaleDB provides a data compression feature. This feature doesn’t require tweaking any file system settings, and can be used to quickly make your database more efficient. For more information on how this compression works, take a look at this Compression article from TimescaleDB.
随着数据的累积,它将占用越来越多的硬盘空间。 为了节省空间,最新版本的TimescaleDB提供了数据压缩功能。 此功能不需要调整任何文件系统设置,可用于快速提高数据库效率。 有关此压缩如何工作的更多信息,请参阅TimescaleDB上的此Compression文章 。
First, enable the compression of your hypertable:
首先,启用您的超表压缩:
- ALTER TABLE conditions SET (
- timescaledb.compress,
- timescaledb.compress_segmentby = 'device_id'
- );
You will receive the following data:
您将收到以下数据:
-
Output
-
NOTICE: adding index _compressed_hypertable_2_device_id__ts_meta_sequence_num_idx ON _timescaledb_internal._compressed_hypertable_2 USING BTREE(device_id, _ts_meta_sequence_num)
-
ALTER TABLE
Note: You can also set up TimescaleDB to compress data over the specified time period. For example, you could run:
注意:您也可以将TimescaleDB设置为在指定时间段内压缩数据。 例如,您可以运行:
- SELECT add_compress_chunks_policy('conditions', INTERVAL '7 days');
In this example, the data will be automatically compressed after a week.
在此示例中,数据将在一周后自动压缩。
You can see the statistics on the compressed data with the command:
您可以使用以下命令查看压缩数据的统计信息:
- SELECT *
- FROM timescaledb_information.compressed_chunk_stats;
You will then see a list of chunks with their statuses: compression status and how much space is taken up by uncompressed and compressed data in bytes.
然后,您将看到一个块的列表及其状态:压缩状态以及未压缩和压缩数据以字节为单位占用了多少空间。
If you don’t have the need to store data for a long period of time, you can delete out-of-date data to free up even more space. There is a special drop_chunks
function for this. It allows you to delete chunks with data older than the specified time:
如果您不需要长时间存储数据,则可以删除过时的数据以释放更多的空间。 drop_chunks
有一个特殊的drop_chunks
函数。 它允许您删除数据早于指定时间的数据块:
- SELECT drop_chunks(interval '24 hours', 'conditions');
This query will drop all chunks from the hypertable conditions
that only include data older than a day ago.
该查询将从超表conditions
中删除仅包含早于一天前数据的所有块。
You will receive the following output:
您将收到以下输出:
-
Output
-
drop_chunks
-
----------------------------------------
-
_timescaledb_internal._hyper_1_2_chunk
-
(1 row)
To automatically delete old data, you can configure a cron
task. See our tutorial to learn more about how to use cron
to automate various system tasks.
要自动删除旧数据,您可以配置cron
任务。 请参阅我们的教程,以了解有关如何使用cron
自动化各种系统任务的更多信息 。
Exit from the database:
从数据库退出:
- \q
Next, edit your crontab
with the following command, which should be run from the shell:
接下来,使用以下命令编辑crontab
,该命令应从外壳程序运行:
- crontab -e
Now add the following line to the end of the file:
现在,将以下行添加到文件末尾:
crontab
crontab
-
...
-
0 1 * * * /usr/bin/psql -h localhost -p 5432 -U postgres -d postgres -c "SELECT drop_chunks(interval '24 hours', 'conditions');" >/dev/null 2>&1
This job will delete obsolete data that is older than one day at 1:00 AM every day.
该作业将删除每天凌晨1:00之前一天以上的过时数据。
结论 (Conclusion)
You’ve now set up TimescaleDB on your CentOS server. You also tried out creating hypertables, inserting data into it, querying the data, compressing, and deleting unnecessary records. With these examples, you’ll be able to take advantage of TimescaleDB’s key benefits over traditional relational database management systems for storing time-series data, including:
现在,您已经在CentOS服务器上设置了TimescaleDB。 您还尝试了创建超表,将数据插入其中,查询数据,压缩和删除不必要的记录。 通过这些示例,您将能够利用TimescaleDB相对于传统的关系数据库管理系统的主要优势来存储时间序列数据,包括:
- Higher data ingest rates
- Quicker query performance
- Time-oriented features
翻译自: https://www.digitalocean.com/community/tutorials/how-to-install-and-use-timescaledb-on-centos-7