【手把手教你全文检索】Lucene索引的【增、删、改、查】

2022-12-01 19:45:26

前言

　　搞检索的，应该多少都会了解Lucene一些，它开源而且简单上手，官方API足够编写些小DEMO。并且根据倒排索引，实现快速检索。本文就简单的实现增量添加索引，删除索引，通过关键字查询，以及更新索引等操作。

　　目前博猪使用的不爽的地方就是，读取文件内容进行全文检索时，需要自己编写读取过程（这个solr免费帮我们实现）。而且创建索引的过程比较慢，还有很大的优化空间，这个就要细心下来研究了。

　　创建索引

　　Lucene在进行创建索引时，根据前面一篇博客，已经讲完了大体的流程，这里再简单说下：

 Directory directory = FSDirectory.open("/tmp/testindex");

 IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_CURRENT, analyzer);

 IndexWriter iwriter = new IndexWriter(directory, config);

 Document doc = new Document();

 String text = "This is the text to be indexed.";

 doc.add(new Field("fieldname", text, TextField.TYPE_STORED)); iwriter.close();

　　1 创建Directory，获取索引目录

　　2 创建词法分析器，创建IndexWriter对象

　　3 创建document对象，存储数据

　　4 关闭IndexWriter，提交

 /**

      * 建立索引

      *

      * @param args

      */

     public static void index() throws Exception {

         String text1 = "hello,man!";

         String text2 = "goodbye,man!";

         String text3 = "hello,woman!";

         String text4 = "goodbye,woman!";

         Date date1 = new Date();

         analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);

         directory = FSDirectory.open(new File(INDEX_DIR));

         IndexWriterConfig config = new IndexWriterConfig(

                 Version.LUCENE_CURRENT, analyzer);

         indexWriter = new IndexWriter(directory, config);

         Document doc1 = new Document();

         doc1.add(new TextField("filename", "text1", Store.YES));

         doc1.add(new TextField("content", text1, Store.YES));

         indexWriter.addDocument(doc1);

         Document doc2 = new Document();

         doc2.add(new TextField("filename", "text2", Store.YES));

         doc2.add(new TextField("content", text2, Store.YES));

         indexWriter.addDocument(doc2);

         Document doc3 = new Document();

         doc3.add(new TextField("filename", "text3", Store.YES));

         doc3.add(new TextField("content", text3, Store.YES));

         indexWriter.addDocument(doc3);

         Document doc4 = new Document();

         doc4.add(new TextField("filename", "text4", Store.YES));

         doc4.add(new TextField("content", text4, Store.YES));

         indexWriter.addDocument(doc4);

         indexWriter.commit();

         indexWriter.close();

         Date date2 = new Date();

         System.out.println("创建索引耗时：" + (date2.getTime() - date1.getTime()) + "ms\n");

     }

　　增量添加索引

　　Lucene拥有增量添加索引的功能，在不会影响之前的索引情况下，添加索引，它会在何时的时机，自动合并索引文件。

 /**

      * 增加索引

      *

      * @throws Exception

      */

     public static void insert() throws Exception {

         String text5 = "hello,goodbye,man,woman";

         Date date1 = new Date();

         analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);

         directory = FSDirectory.open(new File(INDEX_DIR));

         IndexWriterConfig config = new IndexWriterConfig(

                 Version.LUCENE_CURRENT, analyzer);

         indexWriter = new IndexWriter(directory, config);

         Document doc1 = new Document();

         doc1.add(new TextField("filename", "text5", Store.YES));

         doc1.add(new TextField("content", text5, Store.YES));

         indexWriter.addDocument(doc1);

         indexWriter.commit();

         indexWriter.close();

         Date date2 = new Date();

         System.out.println("增加索引耗时：" + (date2.getTime() - date1.getTime()) + "ms\n");

     }

　　删除索引

　　Lucene也是通过IndexWriter调用它的delete方法，来删除索引。我们可以通过关键字，删除与这个关键字有关的所有内容。如果仅仅是想要删除一个文档，那么最好就顶一个唯一的ID域，通过这个ID域，来进行删除操作。

 /**

      * 删除索引

      *

      * @param str 删除的关键字

      * @throws Exception

      */

     public static void delete(String str) throws Exception {

         Date date1 = new Date();

         analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);

         directory = FSDirectory.open(new File(INDEX_DIR));

         IndexWriterConfig config = new IndexWriterConfig(

                 Version.LUCENE_CURRENT, analyzer);

         indexWriter = new IndexWriter(directory, config);

         indexWriter.deleteDocuments(new Term("filename",str));  

         indexWriter.close();

         Date date2 = new Date();

         System.out.println("删除索引耗时：" + (date2.getTime() - date1.getTime()) + "ms\n");

     }

　　更新索引

　　Lucene没有真正的更新操作，通过某个fieldname，可以更新这个域对应的索引，但是实质上，它是先删除索引，再重新建立的。

 /**

      * 更新索引

      *

      * @throws Exception

      */

     public static void update() throws Exception {

         String text1 = "update,hello,man!";

         Date date1 = new Date();

          analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);

          directory = FSDirectory.open(new File(INDEX_DIR));

          IndexWriterConfig config = new IndexWriterConfig(

                  Version.LUCENE_CURRENT, analyzer);

          indexWriter = new IndexWriter(directory, config);

          Document doc1 = new Document();

         doc1.add(new TextField("filename", "text1", Store.YES));

         doc1.add(new TextField("content", text1, Store.YES));

         indexWriter.updateDocument(new Term("filename","text1"), doc1);

          indexWriter.close();

          Date date2 = new Date();

          System.out.println("更新索引耗时：" + (date2.getTime() - date1.getTime()) + "ms\n");

     }

　　通过索引查询关键字

　　Lucene的查询方式有很多种，这里就不做详细介绍了。它会返回一个ScoreDoc的集合，类似ResultSet的集合，我们可以通过域名获取想要获取的内容。

 /**

      * 关键字查询

      *

      * @param str

      * @throws Exception

      */

     public static void search(String str) throws Exception {

         directory = FSDirectory.open(new File(INDEX_DIR));

         analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);

         DirectoryReader ireader = DirectoryReader.open(directory);

         IndexSearcher isearcher = new IndexSearcher(ireader);

         QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, "content",analyzer);

         Query query = parser.parse(str);

         ScoreDoc[] hits = isearcher.search(query, null, 1000).scoreDocs;

         for (int i = 0; i < hits.length; i++) {

             Document hitDoc = isearcher.doc(hits[i].doc);

             System.out.println(hitDoc.get("filename"));

             System.out.println(hitDoc.get("content"));

         }

         ireader.close();

         directory.close();

     }

　　全部代码

 package test;

 import java.io.File;

 import java.util.Date;

 import java.util.List;

 import org.apache.lucene.analysis.Analyzer;

 import org.apache.lucene.analysis.standard.StandardAnalyzer;

 import org.apache.lucene.document.Document;

 import org.apache.lucene.document.LongField;

 import org.apache.lucene.document.TextField;

 import org.apache.lucene.document.Field.Store;

 import org.apache.lucene.index.DirectoryReader;

 import org.apache.lucene.index.IndexWriter;

 import org.apache.lucene.index.IndexWriterConfig;

 import org.apache.lucene.index.Term;

 import org.apache.lucene.queryparser.classic.QueryParser;

 import org.apache.lucene.search.IndexSearcher;

 import org.apache.lucene.search.Query;

 import org.apache.lucene.search.ScoreDoc;

 import org.apache.lucene.store.Directory;

 import org.apache.lucene.store.FSDirectory;

 import org.apache.lucene.util.Version;

 public class TestLucene {

     // 保存路径

     private static String INDEX_DIR = "D:\\luceneIndex";

     private static Analyzer analyzer = null;

     private static Directory directory = null;

     private static IndexWriter indexWriter = null;

     public static void main(String[] args) {

         try {

 //            index();

             search("man");

 //            insert();

 //            delete("text5");

 //            update();

         } catch (Exception e) {

             e.printStackTrace();

         }

     }

     /**

      * 更新索引

      *

      * @throws Exception

      */

     public static void update() throws Exception {

         String text1 = "update,hello,man!";

         Date date1 = new Date();

          analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);

          directory = FSDirectory.open(new File(INDEX_DIR));

          IndexWriterConfig config = new IndexWriterConfig(

                  Version.LUCENE_CURRENT, analyzer);

          indexWriter = new IndexWriter(directory, config);

          Document doc1 = new Document();

         doc1.add(new TextField("filename", "text1", Store.YES));

         doc1.add(new TextField("content", text1, Store.YES));

         indexWriter.updateDocument(new Term("filename","text1"), doc1);

          indexWriter.close();

          Date date2 = new Date();

          System.out.println("更新索引耗时：" + (date2.getTime() - date1.getTime()) + "ms\n");

     }

     /**

      * 删除索引

      *

      * @param str 删除的关键字

      * @throws Exception

      */

     public static void delete(String str) throws Exception {

         Date date1 = new Date();

         analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);

         directory = FSDirectory.open(new File(INDEX_DIR));

         IndexWriterConfig config = new IndexWriterConfig(

                 Version.LUCENE_CURRENT, analyzer);

         indexWriter = new IndexWriter(directory, config);

         indexWriter.deleteDocuments(new Term("filename",str));  

         indexWriter.close();

         Date date2 = new Date();

         System.out.println("删除索引耗时：" + (date2.getTime() - date1.getTime()) + "ms\n");

     }

     /**

      * 增加索引

      *

      * @throws Exception

      */

     public static void insert() throws Exception {

         String text5 = "hello,goodbye,man,woman";

         Date date1 = new Date();

         analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);

         directory = FSDirectory.open(new File(INDEX_DIR));

         IndexWriterConfig config = new IndexWriterConfig(

                 Version.LUCENE_CURRENT, analyzer);

         indexWriter = new IndexWriter(directory, config);

         Document doc1 = new Document();

         doc1.add(new TextField("filename", "text5", Store.YES));

         doc1.add(new TextField("content", text5, Store.YES));

         indexWriter.addDocument(doc1);

         indexWriter.commit();

         indexWriter.close();

         Date date2 = new Date();

         System.out.println("增加索引耗时：" + (date2.getTime() - date1.getTime()) + "ms\n");

     }

     /**

      * 建立索引

      *

      * @param args

      */

     public static void index() throws Exception {

         String text1 = "hello,man!";

         String text2 = "goodbye,man!";

         String text3 = "hello,woman!";

         String text4 = "goodbye,woman!";

         Date date1 = new Date();

         analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);

         directory = FSDirectory.open(new File(INDEX_DIR));

         IndexWriterConfig config = new IndexWriterConfig(

                 Version.LUCENE_CURRENT, analyzer);

         indexWriter = new IndexWriter(directory, config);

         Document doc1 = new Document();

         doc1.add(new TextField("filename", "text1", Store.YES));

         doc1.add(new TextField("content", text1, Store.YES));

         indexWriter.addDocument(doc1);

         Document doc2 = new Document();

         doc2.add(new TextField("filename", "text2", Store.YES));

         doc2.add(new TextField("content", text2, Store.YES));

         indexWriter.addDocument(doc2);

         Document doc3 = new Document();

         doc3.add(new TextField("filename", "text3", Store.YES));

         doc3.add(new TextField("content", text3, Store.YES));

         indexWriter.addDocument(doc3);

         Document doc4 = new Document();

         doc4.add(new TextField("filename", "text4", Store.YES));

         doc4.add(new TextField("content", text4, Store.YES));

         indexWriter.addDocument(doc4);

         indexWriter.commit();

         indexWriter.close();

         Date date2 = new Date();

         System.out.println("创建索引耗时：" + (date2.getTime() - date1.getTime()) + "ms\n");

     }

     /**

      * 关键字查询

      *

      * @param str

      * @throws Exception

      */

     public static void search(String str) throws Exception {

         directory = FSDirectory.open(new File(INDEX_DIR));

         analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);

         DirectoryReader ireader = DirectoryReader.open(directory);

         IndexSearcher isearcher = new IndexSearcher(ireader);

         QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, "content",analyzer);

         Query query = parser.parse(str);

         ScoreDoc[] hits = isearcher.search(query, null, 1000).scoreDocs;

         for (int i = 0; i < hits.length; i++) {

             Document hitDoc = isearcher.doc(hits[i].doc);

             System.out.println(hitDoc.get("filename"));

             System.out.println(hitDoc.get("content"));

         }

         ireader.close();

         directory.close();

     }

 }

　　参考资料

　　http://www.cnblogs.com/xing901022/p/3933675.html

码农公寓

创建索引

增量添加索引

删除索引

更新索引

通过索引查询关键字

全部代码

参考资料

相关文章

　　创建索引

　　增量添加索引

　　删除索引

　　更新索引

　　通过索引查询关键字

　　全部代码

　　参考资料