Apache Lucene is a free and open-source search engine software library,
originally written completely in Java
by Doug Cutting.
Apache Lucene是一个免费的开源搜索引擎软件库,
最初完全由Doug Cutting用Java编写。
It is supported by
the Apache Software Foundation
and is released under
the Apache Software License.
它由【Apache软件基金会】支持,
并在【Apache软件许可证】下发布。
Lucene has been ported to other programming languages
including Object Pascal, Perl, C#, C++, Python, Ruby and PHP.
Lucene已经移植到其他编程语言,
包括
Object Pascal、
Perl、
C#、
C++、
Python、
Ruby、
PHP。
History
Doug Cutting originally wrote Lucene in 1999.
道格·柯林(Doug Cutting)最初是在1999年创作Lucene的。
Lucene was his fifth search engine,
having previously written two while at Xerox PARC,
one at Apple,
and a fourth at Excite.
Lucene是他的第五个搜索引擎,
之前在施乐(Xerox)PARC公司写了两个,
一个在苹果(Apple),
第四个在Excite。
It was initially available for download
from its home at the SourceForge web site.
它最初可以在SourceForge的网站上下载。
It joined
the Apache Software Foundation‘s
Jakarta family of
open-source Java products
in September 2001
and became its own
top-level Apache project
in February 2005.
它于2001年9月加入
Apache软件基金会的
雅加达(Jakarta)
开源Java产品家族,
并于2005年2月成为自己的*Apache项目。
The name Lucene is Doug Cutting‘s wife‘s middle name
and her maternal grandmother‘s first name.
Lucene这个名字是Doug Cutting老婆的中名,
也是她姥姥的前名。
Lucene formerly included a number of sub-projects,
such as Lucene.NET, Mahout, Tika and Nutch.
Lucene以前包括许多子项目,
如Lucene.NET
Mahout
Tika
Nutch
These three are now independent top-level projects.
这三个现在都是独立的*项目。
In March 2010,
the Apache Solr search server
joined as a Lucene sub-project,
merging the developer communities.
2010年3月,
Apache Solr搜索服务器
作为Lucene的一个子项目加入,
合并了开发人员社区。
Version 4.0 was released on October 12, 2012.
4.0版于2012年10月12日发布。
Features and common use
While suitable for any application
that requires full text indexing and searching capability,
Lucene is recognized for its utility
in the implementation of
Internet search engines and
local, single-site searching.
虽然Lucene适用于
任何需要全文索引和搜索功能的应用程序,
但它在实现
互联网搜索引擎
和
本地单站点搜索方面
的效用是公认的。
Lucene includes a feature to
perform a fuzzy search based on edit distance.
Lucene包括一个
基于编辑距离执行模糊搜索的功能。
Lucene has also been used to implement recommendation systems.
Lucene也被用来实现推荐系统。
For example,
Lucene‘s ‘MoreLikeThis‘ Class
can generate recommendations
for similar documents.
例如,Lucene的“MoreLikeThis”类
可以为类似的文档生成推荐。
In a comparison of
the term vector-based similarity approach of ‘MoreLikeThis‘
with
citation-based document similarity measures,
such as co-citation and co-citation proximity analysis,
Lucene‘s approach excelled at recommending documents
with very similar structural characteristics
and more narrow relatedness.
在
MoreLikeThis类的基于术语向量的相似性方法
与
基于引文的文档相似性度量(如共引文和共引文邻近性分析)
的比较中,
Lucene的方法擅长推荐
具有非常相似的结构特征
和
更窄相关性的文档。
In contrast,
citation-based document similarity measures
tended to be
more suitable for recommending more broadly related documents,
meaning
citation-based approaches may be
more suitable for
generating serendipitous recommendations,
as long as documents to be recommended
contain in-text citations.
相比之下,
基于引用的文档相似性度量
往往更适合推荐更广泛的相关文档,
这意味着基于引用的方法可能更适合
生成偶然的推荐,
只要要推荐的文档包含文本内引用。
Lucene-based projects - 基于Lucene的项目
Lucene itself is just an indexing and search library
and does not contain crawling
and HTML parsing
functionality.
Lucene本身只是一个索引和搜索库,
不包含爬行和HTML解析功能。
However, several projects extend Lucene‘s capability:
然而,有几个项目扩展了Lucene的功能:
- Apache Nutch – provides web crawling and HTML parsing
Apache Nutch–提供网页抓取和HTML解析
- Apache Solr – an enterprise search server
Apache Solr–企业搜索服务器
- Compass – the predecessor to Elasticsearch
Compass——Elasticsearch的前身
- CrateDB – open source, distributed SQL database built on Lucene
CrateDB - 基于Lucene的开源分布式SQL数据库
- DocFetcher – a multiplatform desktop search application
DocFetcher - 多平台、桌面搜索、应用程序
- Elasticsearch – an enterprise search server released in 2010
Elasticsearch–2010年发布的企业搜索服务器
- Kinosearch – a search engine written in Perl and C
and a loose port of Lucene.
kinosearch——用Perl和C编写的搜索引擎,Lucene的一个松散端口。
- The Socialtext wiki software uses this search engine,
and so does the MojoMojo wiki.
Socialtext维基软件使用这个搜索引擎,
MoJomojo维基也是如此。
- It is also used by
the Human Metabolome Database (HMDB)
and
the Toxin and Toxin-Target Database (T3DB).
人类代谢组数据库(HMDB)
和
毒素及毒素靶标数据库(T3DB)
也使用该数据库。
- Swiftype – an enterprise search startup based on Lucene
swiftype——一家基于Lucene的企业搜索初创公司