|
ZoieSystem
ZoieSystem ArchitectureArchitecture OverviewZoie is a realtime indexing and search system, and as such needs to have relatively close coupling between the logically distinct Indexing and Searching subsystems: as soon as a document made available to be indexed, it must be immediately searchable. The ZoieSystem is the primary component of Zoie, that incorporates both Indexing (via implementing DataConsumer<V>) and Search (via implementing IndexReaderFactory<R extends IndexReader>).
ConfigurationZoieSystem can be configured via Spring:
<!-- An instance of a DataProvider:
FileDataProvider recurses through a given directory and provides the DataConsumer
indexing requests built from the gathered files.
In the example, this provider needs to be started manually, and it is done via jmx.
-->
<bean id="dataprovider" class="proj.zoie.impl.indexing.FileDataProvider">
<constructor-arg value="file:${source.directory}"/>
<property name="dataConsumer" ref="indexingSystem" />
</bean>
<!--
an instance of an IndexableInterpreter:
FileIndexableInterpreter converts a text file into a lucene document, for example
purposes only
-->
<bean id="fileInterpreter" class="proj.zoie.impl.indexing.FileIndexableInterpreter" />
<!-- A decorator for an IndexReader instance:
The default decorator is just a pass through, the input IndexReader is returned.
-->
<bean id="idxDecorator" class="proj.zoie.impl.indexing.DefaultIndexReaderDecorator" />
<!-- A zoie system declaration, passed as a DataConsumer to the DataProvider declared above -->
<bean id="indexingSystem" class="proj.zoie.impl.indexing.ZoieSystem" init-method="start" destroy-method="shutdown">
<!-- disk index directory-->
<constructor-arg index="0" value="file:${index.directory}"/>
<!-- sets the interpreter -->
<constructor-arg index="1" ref="fileInterpreter" />
<!-- sets the decorator -->
<constructor-arg index="2">
<ref bean="idxDecorator"/>
</constructor-arg>
<!-- set the Analyzer, if null is passed, Lucene's StandardAnalyzer is used -->
<constructor-arg index="3">
<null/>
</constructor-arg>
<!-- sets the Similarity, if null is passed, Lucene's DefaultSimilarity is used -->
<constructor-arg index="4">
<null/>
</constructor-arg>
<!-- the following parameters indicate how often to triggered batched indexing,
whichever the first of the following two event happens will triggered indexing
-->
<!-- Batch size: how many items to put on the queue before indexing is triggered -->
<constructor-arg index="5" value="1000" />
<!-- Batch delay, how long to wait before indxing is triggered -->
<constructor-arg index="6" value="300000" />
<!-- flag turning on/off real time indexing -->
<constructor-arg index="7" value="true" />
</bean>
<!-- a search service -->
<bean id="mySearchService" class="com.mycompany.search.SearchService">
<!-- IndexReader factory that produces index readers to build Searchers from -->
<constructor-arg ref="indexingSystem" />
</bean>IndexingDocuments get into the ZoieSystem for addition to lucene indices by way of a decoupled DataProvider abstraction, which indexes via push: ZoieSystem implements the DataConsumer interface, the natural partner to DataProvider. What follows is a brief call-stack walk-through of indexing:
RAM-to-Disk Index Segment Copy: Prior to 1.4.0, indexing for RAM Index and Disk Index both tokenized document data and built inverted indexes separately. In 1.4.0, we eliminated this duplicate work. Disk index now copies index segments from RAM index instead of going through tokenization and inversion again. This greatly reduces CPU load and disk I/O when documents are flushed to Disk index. Overview: The part of Zoie that enables real-time searchability is the fact that ZoieSystem contains three IndexDataLoader objects:
All write requests that come in through the DataProvider are tee'ed off into a "currentWritable" RAMDirectory and into an in memory queue of DataEvent's which the BatchedIndexDataLoader collects until it hits a threshold batch size (and a minimum delay time has passed), after which it gets written to disk (and if the amount of time since the last optimize is greater than the parametrized optimizationDuration, an IndexWriter#optmize() is called).
SearchingZoieSystem acting as an IndexReaderFactory, provides an "expert" search api (note that these IndexReader instances will always be subclasses of ReadOnlyIndexReader, and thus not usable for modifying the index - only searching it), for clients of ZoieSystem who need access to the IndexReader internals (for faceting, caching, etc...). For clients who do not need/want such an expert api, there will be (in an upcoming Zoie release) a more simplified SearcherFactory interface which compartmentalizes the IndexReader internals a bit more by wrapping the IndexReader's in a MultiSearcher. ZoieSystem delegates the getIndexReaders() call to the SearchIndexManager, which returns a triplet of two ZoieIndexReaders backed by RAMDirectory's (which are transient - one Reader per request. These indices are small, so this is performant) and one ZoieIndexReader which has an IndexReaderDispenser with multiple IndexReader views on the same disk-based FSDirectory. |
can u pls explain the role of RAM Index B?
While Ram A holds transient indexing data for when the batch indexer sleeps, Ram B holds transient indexing data when batch indexer is working on a batch.
the images in the wiki did not display.