开发学院软件开发 Java 深入浅出 jackrabbit 七文本提取（下) 阅读

深入浅出 jackrabbit 七文本提取（下)

　2009-09-17 00:00:00　来源：WEB开发网　　　

核心提示： 这个方法之前已经说过，就是多线程生成document的索引数据，深入浅出 jackrabbit 七文本提取（下)(2)，不过这次我们的重点并不是在多线程生成document，而是在getFinishedDocument()方法上，jvm退出，那么在redolog和indexingqueuelo

这个方法之前已经说过，就是多线程生成document的索引数据，不过这次我们的重点并不是在多线程生成document，而是在getFinishedDocument()方法上，首先让我们来看看它的注释：

Returns a document that is finished with text extraction and is ready to be added to the index

也就是说只有提取完成的document才会被返回，那么如果是一个新的document，还没有执行提取操作呢，只能深入其中才能窥探它的奥秘了。

Java代码　　　

private　Document　getFinishedDocument(Document　doc)　throws　IOException　{　　 /*　Util.isDocumentReady(doc)方法非常之十分重要，如果一眼带过（新成语）我们就会错过精彩的细节，正是在这个方法中，我们的提取工作开始了，还记得上一篇文章中的TextExtractorReader#isExtractorFinished方法吗，这个方法会判断，如果开始就等100毫秒，等待返回，否则就返回false，那么返回的flase就是用在了下面的if方法中。代表还没有提取完成。如果没有提取完成，就进入了if　的代码块*/　　　　　　if　(!Util.isDocumentReady(doc))　{　　 /*从这里可以看出，超过100毫秒，那么就创建另外一个document对象，然后把这个原始的document的值拷贝给这个新对象，需要注意的是如果field是LazyTextExtractorField　的话，那么就先把这个field置空*/　　　　　　　　Document　copy　=　new　Document();　　　　　　　　for　(Iterator　fields　=　doc.getFields().iterator();　fields.hasNext();　)　{　　　　　　　　　　Fieldable　f　=　(Fieldable)　fields.next();　　　　　　　　　　Fieldable　field　=　null;　　　　　　　　　　Field.TermVector　tv　=　getTermVectorParameter(f);　　　　　　　　　　Field.Store　stored　=　getStoreParameter(f);　　　　　　　　　　Field.Index　indexed　=　getIndexParameter(f);　　　　　　　　　　if　(f　instanceof　LazyTextExtractorField　||　f.readerValue()　!=　null)　{　　　　　　　　　　　　//　replace　all　readers　with　empty　string　reader　　　　　　　　　　　　field　=　new　Field(f.name(),　new　StringReader(""),　tv);　　　　　　　　　　}　else　if　(f.stringValue()　!=　null)　{　　　　　　　　　　　　field　=　new　Field(f.name(),　f.stringValue(),　　　　　　　　　　　　　　　　stored,　indexed,　tv);　　　　　　　　　　}　else　if　(f.isBinary())　{　　　　　　　　　　　　field　=　new　Field(f.name(),　f.binaryValue(),　stored);　　　　　　　　　　}　　　　　　　　　　if　(field　!=　null)　{　　　　　　　　　　　　field.setOmitNorms(f.getOmitNorms());　　　　　　　　　　　　copy.add(field);　　　　　　　　　　}　　　　　　　　}　　　　　　　　//　schedule　the　original　document　for　later　indexing　　 /*在这里，生产者终于把原始的document对象加入了indexingQueue队列。*/　　　　　　　　Document　existing　=　indexingQueue.addDocument(doc);　　　　　　　　if　(existing　!=　null)　{　　 /*如果之前这个nodeId在做索引的时候由于异常原因，jvm退出，那么在redolog和indexingqueuelog中都存在这个nodeid，那么在这个地方，可能就返回一个indexingqueue中已经存在的document了　*/　　　　　　　　　　//　the　queue　already　contained　a　pending　document　for　this　　　　　　　　　　//　node.　->　dispose　the　document　　　　　　　　　　Util.disposeDocument(existing);　　　　　　　　}　　　　　　　　//　use　the　stripped　down　copy　for　now　　　　　　　　doc　=　copy;　　　　　　}　　　　　　return　doc;　　 }