Elasticsearch学习笔记
这篇博客用于记录学习和使用Elasticsearch的过程,主要内容包括安装配置和通过Python访问Elasticsearch。Tips: Elasticsearch安装在一台Linux服务器上。
安装配置Elasticsearch
-
下载安装包:Download Elasticsearch;
-
解压缩:
tar -xvf elasticsearch-7.15.2-linux-x86_64.tar.gz
; -
修改config目录下的elasticsearch.yml文件,配置局域网访问:
network.host: 0.0.0.0
; -
切换到bin目录,敲击命令
./elasticsearch
启动Elasticsearch,出现以下错误信息:ERROR: [2] bootstrap checks failed. You must address the points described in the following [2] lines before starting Elasticsearch. bootstrap check failure [1] of [2]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144] bootstrap check failure [2] of [2]: the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers, cluster.initial_master_nodes] must be configured
-
由于当前用户拥有的内存权限太小,Elasticsearch不能正常启动,需要修改系统配置文件/etc/sysctl.conf,设置
vm.max_map_count=262144
,重启系统(或执行sysctl -w vm.max_map_count=262144
); -
另外,由于没有指定以下配置项,Elasticsearch不能正常启动:
- discovery.seed_hosts: 集群主机列表;
- discovery.seed_providers: 基于配置文件配置集群主机列表;
- cluster.initial_master_nodes: 启动时初始化的参与选主的node,生产环境必填。
修改配置文件elasticsearch.yml,设置
discovery.seed_hosts: ["192.168.1.xx"]
和cluster.initial_master_nodes: ["192.168.1.xx:9300"]
; -
重新启动Elasticsearch,浏览器访问
http://192.168.1.xx:9200/
:{ "name" : "xxxx", "cluster_name" : "elasticsearch", "cluster_uuid" : "4urQVMKyQgGl0oTM_wvgjQ", "version" : { "number" : "7.15.2", "build_flavor" : "default", "build_type" : "tar", "build_hash" : "93d5a7f6192e8a1a12e154a2b81bf6fa7309da0c", "build_date" : "2021-11-04T14:04:42.515624022Z", "build_snapshot" : false, "lucene_version" : "8.9.0", "minimum_wire_compatibility_version" : "6.8.0", "minimum_index_compatibility_version" : "6.0.0-beta1" }, "tagline" : "You Know, for Search" }
-
安装、配置成功!
通过Python访问Elasticsearch
-
安装Elasticsearch的Python客户端:
conda install elasticsearch
; -
连接Elasticsearch:
from elasticsearch import Elasticsearch es = Elasticsearch(hosts=['192.168.1.xx']) result = es.indices.create(index='news_and_events', ignore=400) # 状态码400表示由于已经存在同名Index,创建失败 print(result)
-
安装插件
elasticsearch-analysis-ik
,使Elasticsearch具备中文分词的能力:./elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.15.2/elasticsearch-analysis-ik-7.15.2.zip
安装成功后,重启Elasticsearch;
-
填充数据:
from elasticsearch import Elasticsearch from tqdm import tqdm # 导入本地的模块 from database import SessionLocal from models import Record es = Elasticsearch(hosts=['192.168.1.xx']) mapping = { 'properties': { 'title': { 'type': 'text', 'analyzer': 'ik_max_word', 'search_analyzer': 'ik_max_word' }, 'content': { 'type': 'text', 'analyzer': 'ik_max_word', 'search_analyzer': 'ik_max_word' } } } es.indices.create(index='news_and_events', ignore=400) # Elasticsearch中的index可以类比关系型数据库里面的database es.indices.put_mapping(index='news_and_events', doc_type='records', body=mapping, include_type_name=True) # doc_type类比关系模式 # 查询数据库,导出所有的新闻和公告 db = SessionLocal() result_set = db.query(Record).all() for record in tqdm(result_set): data = { 'record_id': record.record_id, 'title': record.title, 'content': record.content } es.create(index='news_and_events', doc_type='records', id=record.record_id, document=data) # 关闭数据库和Elasticsearch连接 db.close() es.close()
-
查询数据:
q = { 'query': { 'multi_match': { 'query': '成都重庆双城经济圈', 'fields': ['title^2', 'content'] } } } results = es.search(q, index='news_and_events', doc_type='records')
Elasticsearch返回的结果:
{ "took": 10, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 623, "relation": "eq" }, "max_score": 57.087215, "hits": [ { "_index": "news_and_events", "_type": "records", "_id": "2508", "_score": 57.087215, "_source": { "record_id": 2508, "title": "关于成渝地区双城经济圈创新创业峰会的通知", "content": "各学院,各位老师和同学:\n现转发重庆市教育委员会和四川省教育厅等六部门联合发布的《关于举办\"智创巴蜀\"首届成渝地区双城经济圈创新创业峰会的通知》,详见附件。欢迎积极参加。\n联系人:x老师\n联系电话:xxxxxxxx\n教务处\nxxxx年xx月xx日\n附件1-川渝6部门联合发峰会正式文件" } } ] } }
参考资料:
- Important system configuration / Virtual memory
- 启动elasticsearch报错:max virtual memory areas vm.max_map_count
- ES启动异常:the default discovery settings are unsuitable for production use...
- master not discovered yet, this node has not...
- Elasticsearch基本介绍及其与Python的对接实现
- Python Elasticsearch Client