2021SC@SDUSC
生成新的段信息对象
代码如下:
newSegment = new SegmentInfo(segment, flushedDocCount, directory, false, true, docStoreOffset,
docStoreSegment, docStoreIsCompoundFile, docWriter.hasProx());
segmentInfos.add(newSegment);
准备删除文档
代码:
docWriter.pushDeletes();
--> deletesFlushed.update(deletesInRAM);
此处将 deletesInRAM 全部加到 deletesFlushed 中,并把 deletesInRAM 清空。原因上面已经阐明。
生成 cfs 段
代码:
docWriter.createCompoundFile(segment);
newSegment.setUseCompoundFile(true);
DocumentsWriter.createCompoundFile(String segment) {
CompoundFileWriter cfsWriter = new CompoundFileWriter(directory, segment + "." +
IndexFileNames.COMPOUND_FILE_EXTENSION);
//将上述中记录的文档名全部加入 cfs 段的写对象。
for (final String flushedFile : flushState.flushedFiles)
cfsWriter.addFile(flushedFile);
cfsWriter.close();
}
删除文档
代码:
applyDeletes();
boolean applyDeletes(SegmentInfos infos) {
if (!hasDeletes())
return false;
final int infosEnd = infos.size();
int docStart = 0;
boolean any = false;
for (int i = 0; i < infosEnd; i++) {
assert infos.info(i).dir == directory;
SegmentReader reader = writer.readerPool.get(infos.info(i), false);
try {
any |= applyDeletes(reader, docStart);
docStart += reader.maxDoc();
} finally {
writer.readerPool.release(reader);
}
}
deletesFlushed.clear();
return any;
}
Lucene 删除文档可以用 reader,也可以用 writer,但是归根结底还是用 reader 来删除的。
reader 的删除有以下三种方式:
按照词删除,删除所有包含此词的文档。
按照文档号删除。
按照查询对象删除,删除所有满足此查询的文档。
但是这三种方式归根结底还是按照文档号删除,也就是写.del 文件的过程。
private final synchronized boolean applyDeletes(IndexReader reader, int docIDStart)
throws CorruptIndexException, IOException {
final int docEnd = docIDStart + reader.maxDoc();
boolean any = false;
//按照词删除,删除所有包此词的文档。
TermDocs docs = reader.termDocs();
try {
for (Entry<Term, BufferedDeletes.Num> entry: deletesFlushed.terms.entrySet()) {
Term term = entry.getKey();
docs.seek(term);
int limit = entry.getValue().getNum();
while (docs.next()) {
int docID = docs.doc();
if (docIDStart+docID >= limit)
break;
reader.deleteDocument(docID);
any = true;
}
}
} finally {
docs.close();
}
//按照文档号删除。
for (Integer docIdInt : deletesFlushed.docIDs) {
int docID = docIdInt.intValue();
if (docID >= docIDStart && docID < docEnd) {
reader.deleteDocument(docID-docIDStart);
any = true;
}
}
//按照查询对象删除,删除所有满足此查询的文档。
IndexSearcher searcher = new IndexSearcher(reader);
for (Entry<Query, Integer> entry : deletesFlushed.queries.entrySet()) {
Query query = entry.getKey();
int limit = entry.getValue().intValue();
Weight weight = query.weight(searcher);
Scorer scorer = weight.scorer(reader, true, false);
if (scorer != null) {
while(true) {
int doc = scorer.nextDoc();
if (((long) docIDStart) + doc >= limit)
break;
reader.deleteDocument(doc);
any = true;
}
}
}
searcher.close();
return any;
}