ES的资源:
https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started.html
https://www.elastic.co/webinars/getting-started-kibana?baymax=rtp&elektra=docs&storm=top-video&iesrc=ctr
https://www.elastic.co/webinars/getting-started-logstash?baymax=rtp&elektra=docs&storm=top-video&iesrc=ctr
es默认端口9200,可以看到es的基本信息
http://localhost:9200/
Elasticsearch: The Definitive Guide(第二个是master分支版本的权威指南)
https://www.elastic.co/guide/en/elasticsearch/guide/index.html
https://www.elastic.co/guide/en/elasticsearch/guide/master/index.html
shard代表一个索引(在主节点)存储到N个文件中,因为单个索引文件,太大了,查询将会有问题,所以分成多个文件来保存,其实有一种分割的味道,没有问题。
replica代表副本,其实主要是用于高可用;避免单点故障。
获取索引信息(_cat并不是cat猫,而是category)
GET /_cat/indices?v
创建一个索引
PUT /customer?pretty
GET /_cat/indices?v
创建一个文档;PUT指定ID,POST则是不指定ID创建一个文档,ID为随机数;这里面有个pretty?这个pretty代表pretty-print,是指返回有好的JSON串;
PUT /customer/_doc/?pretty
{
"name": "John Doe"
}
GET /customer/_doc/?pretty POST /customer/_doc?pretty
{
"name": "Jane Doe"
}
修改文档(本质是先删除后添加)
POST /customer/_doc//_update?pretty
{
"doc": { "name": "Jane Doe" }
} POST /customer/_doc//_update?pretty
{
"doc": { "name": "Jane Doe", "age": }
} POST /customer/_doc//_update?pretty
{
"script" : "ctx._source.age += 5"
}
删除文档
DELETE /customer/_doc/?pretty
批量处理(批量添加,以及批量修改)
POST /customer/_doc/_bulk?pretty
{"index":{"_id":""}}
{"name": "John Doe" }
{"index":{"_id":""}}
{"name": "Jane Doe" } POST /customer/_doc/_bulk?pretty
{"update":{"_id":""}}
{"doc": { "name": "John Doe becomes Jane Doe" } }
{"delete":{"_id":""}}
批量导入数据
curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_doc/_bulk?pretty&refresh" --data-binary "@accounts.json"
查询,注意这里用到了_search,还有在修改的时候,这个位是“_update"。q=*代表查询所有的文档,sort代表按照account_number做升序(asc)排列,pretty上面介绍了。返回结果中hits代表命中的documents,totals属性代表了返回条数;但是注意默认返回10条;可以由size属性来制定;
GET /bank/_search?q=*&sort=account_number:asc&pretty
等价查询
GET /bank/_search
{
"query": { "match_all": {} },
"sort": [
{ "account_number": "asc" }
]
}
如果想要从中间某段,通过指定from属性,代表从index=n开始;如果n=5.98,系统将会向下取整,取n=5;注意在此之前都是返回值max_score都是0,但是从这个查询开始因为引入了查询条件,max_score开始有值了。
GET /bank/_search
{
"query": { "match_all": {} },
"from": , #代表从id=10开始
"size":
}
返回指定列(Select col1,col2...)
GET /bank/_search
{
"query": { "match_all": {} },
"_source": ["account_number", "balance"]
}
指定检索列(Where)
GET /bank/_search
{
"query": { "match": { "account_number": } }
}
注意下面两组查询的差别,match和match phase之间的差别;前者是只要有任何一个匹配都是会作为检索结果的;并根据打分结果进行排序罗列;后者则要求短语全匹配,即位置之间关系必须严格按照mill在lane前一个位置;但是在操作中发现比如mill lane即使全匹配分值也不过是13.2,这个匹配是单词能够全部匹配,比如果198 Mill2 Lane,尽管只差一个Mill2,但是这样一来,分值是8.3,这个和其他数据,只匹配一个Lane的分值(Mill完全匹配不了)是一样的。
GET /bank/_search
{
"query": { "match": { "address": "198 Mill Lane" } }
} GET /bank/_search
{
"query": { "match_phrase": { "address": "198 Mill Lane" } }
}
bool查询,相当于where的“and”
GET /bank/_search
{
"query": {
"bool": {
"must": [
{ "match": { "address": "mill" } },
{ "match": { "address": "lane" } }
]
}
}
}
bool+should相当于where条件的“or”
GET /bank/_search
{
"query": {
"bool": {
"should": [
{ "match": { "address": "mill" } },
{ "match": { "address": "lane" } }
]
}
}
}
还有where条件取反,不包含呢
GET /bank/_search
{
"query": {
"bool": {
"must_not": [
{ "match": { "address": "mill" } },
{ "match": { "address": "lane" } }
]
}
}
}
还可以组合查询
GET /bank/_search
{
"query": {
"bool": {
"must": [
{ "match": { "age": "" } }
],
"must_not": [
{ "match": { "state": "ID" } }
]
}
}
}
过滤器
这个过滤器是在bool查询器里面的;但是filter并不会触发文档计分;这个查询score显示为1是因为bool查询导致的文档评分;
get /bank/_search
{
"query":{
"bool":{
"must":{"match_all":{}},
"filter":{
"range":{
"balance":{
"gte":,
"lte":
}
}
}
}
}
}
分组
分组相当于groupby,下面的例子就是对于字段“state”值进行分组,去count值;group_by_state默认就是按照字段聚合计算count()值;
这里size设置为0是因为只要聚集函数的结果,而不要查询结果;如果设置了size>0将会将检索结果显示在response中;
GET /bank/_search
{
"size": ,
"aggs": {
"group_by_state": {
"terms": {
"field": "state.keyword"
}
}
}
}
再来一个复杂一些的,groupby做count合计之外,还做了balance字段取均值;注意均值是放在group_by_state里面的;同时在在groupby之后,按照均值进行排序。
GET /bank/_search
{
"size": ,
"aggs": {
"group_by_state": {
"terms": {
"field": "state.keyword",
"order": {
"average_balance": "desc"
}
},
"aggs": {
"average_balance": {
"avg": {
"field": "balance"
}
}
}
}
}
}
再上一个更加复杂的,指定范围进行排序,同时指定了二级聚合字段(gender)
GET /bank/_search
{
"size": ,
"aggs": {
"group_by_age": {
"range": {
"field": "age",
"ranges": [
{
"from": ,
"to":
},
{
"from": ,
"to":
},
{
"from": ,
"to":
}
]
},
"aggs": {
"group_by_gender": {
"terms": {
"field": "gender.keyword"
},
"aggs": {
"average_balance": {
"avg": {
"field": "balance"
}
}
}
}
}
}
}
}
返回的片段
"aggregations": {
"group_by_age": {
"buckets": [
{
"key": "20.0-30.0", #以及聚合字段
"from": ,
"to": ,
"doc_count": ,
"group_by_gender": {
"doc_count_error_upper_bound": ,
"sum_other_doc_count": ,
"buckets": [ #二级聚合字段
{
"key": "M",
"doc_count": ,
"average_balance": {
"value": 27374.05172413793
}
},
{
"key": "F",
"doc_count": ,
"average_balance": {
"value": 25341.260273972603
}
}
]
}
},
... ...