【系统环境】
操作系统:Ubuntu 18.04 LTS(阿里云)
系统IP
# 内网,私有地址 172.18.内.内 # 外网,公有地址 112.74.外.外
Elasticsearch 版本:7.2
Kibana 版本:7.2
Logstash 版本:7.2
【安装与配置】
官方文档:https://www.elastic.co/guide/en/logstash/current/installing-logstash.html
Logstash 不同于 Elasticsearch 和 Kibana,需要单独安装 Java 环境
~ $ sudo apt install default-jdk ~ $ java -version openjdk version "11.0.3" 2019-04-16 OpenJDK Runtime Environment (build 11.0.3+7-Ubuntu-1ubuntu218.04.1) OpenJDK 64-Bit Server VM (build 11.0.3+7-Ubuntu-1ubuntu218.04.1, mixed mode, sharing)
下载 logstash
wget https://artifacts.elastic.co/downloads/logstash/logstash-7.2.0.tar.gz
解压 logstash
tar -zxf logstash-7.2.0.tar.gz
后续操作在 logstash-7.2.0 内目录进行
cd logstash-7.2.0/
到 grouplens 下载 MovieLens 测试数据集
wget http://files.grouplens.org/datasets/movielens/ml-latest-small.zip
解压测试数据集
unzip ml-latest-small.zip
创建并编辑 logstash.conf 文件,添加如下内容(Ruby 语法)
input { file { # 注意自己的路径 path => "/home/walker/es/ml-latest-small/movies.csv" start_position => "beginning" sincedb_path => "/dev/null" } } filter { csv { separator => "," columns => ["id", "content", "genre"] } mutate { split => { "genre" => "|" } remove_field => ["path", "host", "@timestamp", "message"] } mutate { split => ["content", "("] add_field => { "title" => "%{[content][0]}" } add_field => { "year" => "%{[content][1]}" } } mutate { convert => { "year" => "integer" } strip => ["title"] remove_field => ["path", "host", "@timestamp", "message", "content"] } } output { elasticsearch { # 注意自己的 Elasticsearch 地址 hosts => "http://112.74.内.内:9200" index => "movies" document_id => "%{id}" } stdout {} }
导入数据,注意导入后不会自动退出,用 Ctrl-C 手动退出
# 个人路径有所不同 sudo ./bin/logstash -f /home/walker/es/ml-latest-small/logstash.conf
Management 查看数据
Dev tools 查看文档总数
删除 movies 数据(Elasticsearch 的 Index 可对标关系型数据库的 Table)
【内容说明】
本文是阮一鸣《Elasticsearch核心技术与实战》的学习笔记。
*** walker ***