https://www.elastic.co/guide/en/elasticsearch/guide/current/getting-started.html
Elasticsearch is a real-time distributed search and analytics engine. It allows you to explore your data at a speed and at a scale never before possible. It is used for full-text search, structured search, analytics, and all three in combination:
- Wikipedia uses Elasticsearch to provide full-text search with highlighted search snippets, and search-as-you-type and did-you-mean suggestions.
- The Guardian uses Elasticsearch to combine visitor logs with social -network data to provide real-time feedback to its editors about the public’s response to new articles.
- Stack Overflow combines full-text search with geolocation queries and uses more-like-this to find related questions and answers.
- GitHub uses Elasticsearch to query 130 billion lines of code.
But Elasticsearch is not just for mega-corporations. It has enabled many startups like Datadog and Klout to prototype ideas and to turn them into scalable solutions. Elasticsearch can run on your laptop, or scale out to hundreds of servers and petabytes of data.
No individual part of Elasticsearch is new or revolutionary. Full-text search has been done before, as have analytics systems and distributed databases. The revolution is the combination of these individually useful parts into a single, coherent, real-time application. It has a low barrier to entry for the new user, but can keep pace with you as your skills and needs grow.
If you are picking up this book, it is because you have data, and there is no point in having data unless you plan to do something with it.
Unfortunately, most databases are astonishingly inept at extracting actionable knowledge from your data. Sure, they can filter by timestamp or exact values, but can they perform full-text search, handle synonyms, and score documents by relevance? Can they generate analytics and aggregations from the same data? Most important, can they do this in real time without big batch-processing jobs?
This is what sets Elasticsearch apart: Elasticsearch encourages you to explore and utilize your data, rather than letting it rot in a warehouse because it is too difficult to query.
【Lucene】
Elasticsearch is an open-source search engine built on top of Apache Lucene™, a full-text search-engine library. Lucene is arguably the most advanced, high-performance, and fully featured search engine library in existence today—both open source and proprietary.
But Lucene is just a library. To leverage its power, you need to work in Java and to integrate Lucene directly with your application. Worse, you will likely require a degree in information retrieval to understand how it works. Lucene is very complex.
Elasticsearch is also written in Java and uses Lucene internally for all of its indexing and searching, but it aims to make full-text search easy by hiding the complexities of Lucene behind a simple, coherent, RESTful API.
However, Elasticsearch is much more than just Lucene and much more than “just” full-text search. It can also be described as follows:
- A distributed real-time document store where every field is indexed and searchable
- A distributed search engine with real-time analytics
- Capable of scaling to hundreds of servers and petabytes of structured and unstructured data
And it packages up all this functionality into a standalone server that your application can talk to via a simple RESTful API, using a web client from your favorite programming language, or even from the command line.
It is easy to get started with Elasticsearch. It ships with sensible defaults and hides complicated search theory away from beginners. It just works, right out of the box. With minimal understanding, you can soon become productive.
As your knowledge grows, you can leverage more of Elasticsearch’s advanced features. The entire engine is configurable and flexible. Pick and choose from the advanced features to tailor Elasticsearch to your problem domain.
【work on an abstraction layer to make Lucene easier for Java programmers to add search to their applications】
The Mists of Time
Many years ago, a newly married unemployed developer called Shay Banon followed his wife to London, where she was studying to be a chef. While looking for gainful employment, he started playing with an early version of Lucene, with the intent of building his wife a recipe search engine.
Working directly with Lucene can be tricky, so Shay started work on an abstraction layer to make it easier for Java programmers to add search to their applications. He released this as his first open source project, called Compass.
Later Shay took a job working in a high-performance, distributed environment with in-memory data grids. The need for a high-performance, real-time, distributed search engine was obvious, and he decided to rewrite the Compass libraries as a standalone server called Elasticsearch.
The first public release came out in February 2010. Since then, Elasticsearch has become one of the most popular projects on GitHub with commits from over 300 contributors. A company has formed around Elasticsearch to provide commercial support and to develop new features, but Elasticsearch is, and forever will be, open source and available to all.
Shay’s wife is still waiting for the recipe search…
Once you’ve extracted the archive file, Elasticsearch is ready to run. To start it up in the foreground:
cd elasticsearch-<version>
./bin/elasticsearch
Add |
|
If you’re running Elasticsearch on Windows, simply run |
Test it out by opening another terminal window and running the following:
curl 'http://localhost:9200/?pretty'
You should see a response like this:
{
"name" : "Tom Foster",
"cluster_name" : "elasticsearch",
"version" : {
"number" : "2.1.0",
"build_hash" : "72cd1f1a3eee09505e036106146dc1949dc5dc87",
"build_timestamp" : "2015-11-18T22:40:03Z",
"build_snapshot" : false,
"lucene_version" : "5.3.1"
},
"tagline" : "You Know, for Search"
}
This means that you have an Elasticsearch node up and running, and you can start experimenting with it. A node is a running instance of Elasticsearch. A cluster is a group of nodes with the same cluster.name
that are working together to share data and to provide failover and scale. (A single node, however, can form a cluster all by itself.) You can change the cluster.name
in the elasticsearch.yml
configuration file that’s loaded when you start a node.
Elasticsearch is running in the foreground.
Installing Sense
Sense is a Kibana app that provides an interactive console for submitting requests to Elasticsearch directly from your browser. Many of the code examples in the online version of this book include a View in Sense link. When clicked, it opens up a working example of the code in the Sense console. You do not have to install Sense, but it will make this book much more interactive by allowing you to experiment with the code samples on your local Elasticsearch cluster.
To install and run Sense:
To install and run Sense:
-
Run the following command in the Kibana directory to download and install the Sense app:
./bin/kibana plugin --install elastic/sense
Windows:
bin\kibana.bat plugin --install elastic/sense
.You can download Sense from https://download.elastic.co/elastic/sense/sense-latest.tar.gz to install it on an offline machine.
-
Start Kibana.
./bin/kibana
Windows:
bin\kibana.bat
. - Open Sense your web browser by going to
http://localhost:5601/app/sense
.
https://www.elastic.co/guide/en/elasticsearch/guide/current/_talking_to_elasticsearch.html#_restful_api_with_json_over_http
Java APIedit
If you are using Java, Elasticsearch comes with two built-in clients that you can use in your code:
【数据在集群的哪个节点】
- Node client
- The node client joins a local cluster as a non data node. In other words, it doesn’t hold any data itself, but it knows what data lives on which node in the cluster, and can forward requests directly to the correct node.
- 【转发请求】
- Transport client
- The lighter-weight transport client can be used to send requests to a remote cluster. It doesn’t join the cluster itself, but simply forwards requests to a node in the cluster.
Both Java clients talk to the cluster over port 9300, using the native Elasticsearch transport protocol. The nodes in the cluster also communicate with each other over port 9300. If this port is not open, your nodes will not be able to form a cluster.
节点客户端(Node client)节点客户端作为一个非数据节点加入到本地集群中。换句话说,它本身不保存任何数据,但是它知道数据在集群中的哪个节点中,并且可以把请求转发到正确的节点。
传输客户端(Transport client)轻量级的传输客户端可以可以将请求发送到远程集群。它本身不加入集群,但是它可以将请求转发到集群中的一个节点上。
A request to Elasticsearch consists of the same parts as any HTTP request:
curl -X<VERB> '<PROTOCOL>://<HOST>:<PORT>/<PATH>?<QUERY_STRING>' -d '<BODY>'
The parts marked with < >
above are:
|
The appropriate HTTP method or verb: |
|
Either |
|
The hostname of any node in your Elasticsearch cluster, or |
|
The port running the Elasticsearch HTTP service, which defaults to |
|
API Endpoint (for example |
|
Any optional query-string parameters (for example |
|
A JSON-encoded request body (if the request needs one.) |
Document Oriented
Objects in an application are seldom just a simple list of keys and values. More often than not, they are complex data structures that may contain dates, geo locations, other objects, or arrays of values.
Sooner or later you’re going to want to store these objects in a database. Trying to do this with the rows and columns of a relational database is the equivalent of trying to squeeze your rich, expressive objects into a very big spreadsheet: you have to flatten the object to fit the table schema—usually one field per column—and then have to reconstruct it every time you retrieve it.
Elasticsearch is document oriented, meaning that it stores entire objects or documents. It not only stores them, but also indexes the contents of each document in order to make them searchable. In Elasticsearch, you index, search, sort, and filter documents—not rows of columnar data. This is a fundamentally different way of thinking about data and is one of the reasons Elasticsearch can perform complex full-text search.
应用中的一个个对象,不仅仅是键值对的数据结构。
【document oriented】
Elasticsearch是以文档document为中心的,来存储整个应用中的对象、文档。Elasticsearch为每一个文档的内容(contents of each documents)建立索引,使得他们便于被检索到。
【可以执行全文搜索的原因 full-text search】
索引、搜索、排序、过滤的操作对象是文档,而不是传统的列式数据
基础入门 | Elasticsearch: 权威指南 | Elastic https://elasticsearch.cn/book/elasticsearch_definitive_guide_2.x/getting-started.html
Elasticsearch 中没有一个单独的组件是全新的或者是革命性的。全文搜索很久之前就已经可以做到了, 就像早就出现了的分析系统和分布式数据库。 革命性的成果在于将这些单独的,有用的组件融合到一个单一的、一致的、实时的应用中。它对于初学者而言有一个较低的门槛, 而当你的技能提升或需求增加时,它也始终能满足你的需求。
信息检索的概念、分布式系统原理、Query DSL
ElasticSearch 模糊匹配查询 - CSDN博客 http://blog.csdn.net/hemuxiao/article/details/72869105
目前的需求输入:王? 女 济南 20-30
- 能够查询以王开头的人的名字
- 性别为女性
- 地址为济南
- *年龄为20-30
{
"from": 0,
"query": {
"bool": {
"minimum_should_match": 1,
"must": [
{
"match": {
"XB": "2 "
}
},
{
"wildcard": {
"XM": "王*"
}
}
],
"should": [
{
"range": {
"CSRQ": {
"gte": "2010-09-01",
"lte": "2014-09-01"
}
}
}
]
}
},
"size": 10
}
模糊查询
curl '1.2.21.10 : 9200/dvote/kwaddress/_search?pretty=true' -d '
{
"from": ,
"query": {
"bool": {
"minimum_should_match": ,
"must": [
{
"wildcard": {
"title": "*网*"
}
}
]
}
},
"size":
}
'
https://elasticsearch.cn/book/elasticsearch_definitive_guide_2.x/_distributed_nature.html
分布式特性
在本章开头,我们提到过 Elasticsearch 可以横向扩展至数百(甚至数千)的服务器节点,同时可以处理PB级数据。我们的教程给出了一些使用 Elasticsearch 的示例,但并不涉及任何内部机制。Elasticsearch 天生就是分布式的,并且在设计时屏蔽了分布式的复杂性。
Elasticsearch 在分布式方面几乎是透明的。教程中并不要求了解分布式系统、分片、集群发现或其他的各种分布式概念。可以使用笔记本上的单节点轻松地运行教程里的程序,但如果你想要在 100 个节点的集群上运行程序,一切依然顺畅。
Elasticsearch 尽可能地屏蔽了分布式系统的复杂性。这里列举了一些在后台自动执行的操作:
- 分配文档到不同的容器 或 分片 中,文档可以储存在一个或多个节点中
- 按集群节点来均衡分配这些分片,从而对索引和搜索过程进行负载均衡
- 复制每个分片以支持数据冗余,从而防止硬件故障导致的数据丢失
- 将集群中任一节点的请求路由到存有相关数据的节点
- 集群扩容时无缝整合新节点,重新分配分片以便从离群节点恢复
当阅读本书时,将会遇到有关 Elasticsearch 分布式特性的补充章节。这些章节将介绍有关集群扩容、故障转移(集群内的原理) 、应对文档存储(分布式文档存储) 、执行分布式搜索(执行分布式检索) ,以及分区(shard)及其工作原理(分片内部原理) 。
Distributed Nature | Elasticsearch: The Definitive Guide [2.x] | Elastic https://www.elastic.co/guide/en/elasticsearch/guide/current/_distributed_nature.html
Distributed Nature
At the beginning of this chapter, we said that Elasticsearch can scale out to hundreds (or even thousands) of servers and handle petabytes of data. While our tutorial gave examples of how to use Elasticsearch, it didn’t touch on the mechanics at all. Elasticsearch is distributed by nature, and it is designed to hide the complexity that comes with being distributed.
The distributed aspect of Elasticsearch is largely transparent. Nothing in the tutorial required you to know about distributed systems, sharding, cluster discovery, or dozens of other distributed concepts. It happily ran the tutorial on a single node living inside your laptop, but if you were to run the tutorial on a cluster containing 100 nodes, everything would work in exactly the same way.
Elasticsearch tries hard to hide the complexity of distributed systems. Here are some of the operations happening automatically under the hood:
- Partitioning your documents into different containers or shards, which can be stored on a single node or on multiple nodes
- Balancing these shards across the nodes in your cluster to spread the indexing and search load
- Duplicating each shard to provide redundant copies of your data, to prevent data loss in case of hardware failure
- Routing requests from any node in the cluster to the nodes that hold the data you’re interested in
- Seamlessly integrating new nodes as your cluster grows or redistributing shards to recover from node loss
As you read through this book, you’ll encounter supplemental chapters about the distributed nature of Elasticsearch. These chapters will teach you about how the cluster scales and deals with failover (Life Inside a Cluster), handles document storage (Distributed Document Store), executes distributed search (Distributed Search Execution), and what a shard is and how it works (Inside a Shard).
【面向对象编程语言如此流行的原因】
面向对象编程语言如此流行的原因之一是对象帮我们表示和处理现实世界具有潜在的复杂的数据结构的实体,到目前为止,一切都很完美!
但是当我们需要存储这些实体时问题来了,传统上,我们以行和列的形式存储数据到关系型数据库中,相当于使用电子表格。 正因为我们使用了这种不灵活的存储媒介导致所有我们使用对象的灵活性都丢失了。
但是否我们可以将我们的对象按对象的方式来存储? 这样我们就能更加专注于 使用 数据,而不是在电子表格的局限性下对我们的应用建模。 我们可以重新利用对象的灵活性。
一个 对象 是基于特定语言的内存的数据结构。 为了通过网络发送或者存储它,我们需要将它表示成某种标准的格式。 JSON 是一种以人可读的文本表示对象的方法。 它已经变成 NoSQL 世界交换数据的事实标准。当一个对象被序列化成为 JSON,它被称为一个 JSON 文档 。