My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
CodeSamples  
Code Samples
Featured, Phase-Implementation
Updated Nov 15, 2009 by john.w...@gmail.com

Introduction

This wiki contains a set of code samples to get your started.

Basic Objects

My data definition:

class Data{
  long id;
  String content;
}

Define a ZoieIndexableInterpreter:

A ZoieIndexableInterpreter is a way to convert a data object into a Lucene document:

class DataIndexable implements ZoieIndexable {
    private Data _data;
    public DataIndexable(Data data) {
        _data = data;
    }

    public long getUID() {
        return _data.id;
    }

    public IndexingReq[] buildIndexingReqs() {
        // it is possible we want to map 1 data object to multiple lucene documents
        // but not for this example
        Document doc = new Document();
        doc.add(new Field("content",_data.content,Store.NO,Index.ANALYZED));

        // no need to add the id field, Zoie will manage the id for you
        return new IndexingReq[]{new IndexingReq(doc)};
    }

    // the following methods in this example are kind of hacky,
    // but it is designed to be used when information needed to determine whether documents are to be deleted and/or skipped
    // are only known at runtime

    public boolean isDeleted() {
        return "_MARKED_FOR_DELETE".equals(_data.content);
    }

    public boolean isSkip(){
        return "_MARKED_FOR_SKIP".equals(_data.content);
    }
}

class DataIndexableInterpreter implements ZoieIndexableInterpreter<Data> {
    public ZoieIndexable interpret(Data src){
        return new DataIndexable(src);
    }
}

Build an IndexDecorator

An IndexDecorator is a way for clients to decorate a given ZoieIndexReader to a custom IndexReader type, e.g. FilterIndexReader class in Lucene.

This is not mandatory, client for most cases can just use the returned ZoieIndexReader.

class MyDoNothingFilterIndexReader extends FilterIndexReader {
    public MyDoNothingFilterIndexReader(IndexReader reader) {
        super(reader);       
    }
    public void updateInnerReader(IndexReader inner) {
        in = inner;
    }
}

class MyDoNothingIndexReaderDecorator implements IndexReaderDecorator<MyDoNothingFilterIndexReader> {
    public MyDoNothingIndexReaderDecorator decorate(ZoieIndexReader indexReader) throws IOException {
        return new MyDoNothingFilterIndexReader(indexReader);
    }
      public MyDoNothingIndexReaderDecorator redecorate(MyDoNothingIndexReaderDecorator decorated,ZoieIndexReader copy) throws IOException {
        // underlying segment has not changed, just change the inner reader

        decorated.updateInnerReader(copy);
        return decorated;
    }
}

Build a ZoieSystem

We are now ready to build a ZoieSystem instance:

// index directory
File idxDir = new File("myIdxDir");

// create an analyzer
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);

// create similarity
Similarity similarity = new DefaultSimilarity();

ZoieIndexableInterpreter<Data> myInterpreter = new DataIndexableInterpreter();

IndexReaderDecorator<MyDoNothingFilterIndexReader> decorator = new MyDoNothingIndexReaderDecorator();

ZoieSystem indexingSystem = new ZoieSystem(idxDir,                     // index direcotry
                                                                                    myInterpreter,        // my interpreter
                                                                                    decorator,                         // index decorator
                                                                                    analyzer,                 // my analyzer
                                                                                    similarity,               // my similarity
                                                                                    1000,                      // # events to hold in mem before flushing to disk
                                                                                    300000,                  // time(ms) to wait before flushing to disk
                                                                                    true);                       // true for realtime

indexingSystem.start();     // ready to accept indexing events

Basic Search

This example shows how to set up basic indexing and search

thread 1: (indexing thread)

long batchVersion = 0;
while(true){
     Data[] data = buildDataEvents(...);       // build a batch of data object to index

      // construct a collection of indexing events
      ArrayList<DataEvent> eventList = new ArrayList<DataEvent>(data.length);
      for (Data datum : data){
         eventList.add(new DataEvent<Data>(batchVersion,datum));
      }

      // do indexing
      indexingSystem.consume(events);

      // increment my version
      batchVersion++;
}

thread 2: (search thread)

  // get the IndexReaders
  List<ZoieIndexReader<MyDoNothingFilterIndexReader>> readerList = indexingSystem.getIndexReaders();

  // MyDoNothingFilterIndexReader instances can be obtained by calling
  // ZoieIndexReader.getDecoratedReaders()

  // combine the readers
  MultiReader reader = new MultiReader(readerList.toArray(new IndexReader[readerList.size()]),false);  

  // do search
  IndexSearcher searcher = new IndexSearcher(reader);
 
  Query q = buildQuery("myquery",indexingSystem.getAnalyzer());

   TopDocs docs = searcher.search(q,10);

   // return readers
   indexingSystem.returnIndexReaders(readerList);

UID/docid mapping

// given a ZoieIndexReader instance:
    ZoieIndexReader zreader = ...

docid to uid

    long uid = zreader.getUID(docid);

    // make sure uid is not deleted in this reader:

    if (uid==ZoieIndexReader.DELETED_UID)
        throw new ZoieException("uid deleted");

uid to docid

    DocIDMapper docidMapper = zreader.getDocIDMapper();
    int docid = docidMapper.getDocID(uid);

    if (docid==DocIDMapper.NOT_FOUND)
        throw new ZoieException("uid does not exist");

Data Providers

Data providers can be many things, e.g.:

  • RDBMS streamer
  • Crawler

Zoie comes out of the box with some useful data providers.

StreamDataProvider

This is the top level abstraction for stream based data providers. See StreamDataProvider javadoc.

To write an implementation, simply override the next() method and return null to indicate end of the stream.

All StreamDataProvider instances can be managed by the JMX mbean: DataProviderAdminMBean

MemoryStreamDataProvider

A very simple stream data provider that constructs from a list of events and iterates through them. The Zoie unit tests are built from it. See javadoc.

FileDataProvider

This stream data provider takes a java File object and recursive iterates all files within it (if it is a directory). It is constructed with simply a File instance. See javadoc.

Comment by goo...@scale.io, Feb 22, 2010

Hey, I think there is an error in

class MyDoNothingIndexReaderDecorator implements IndexReaderDecorator<MyDoNothingFilterIndexReader> {
    public MyDoNothingIndexReaderDecorator decorate(ZoieIndexReader indexReader) throws IOException {
        return new MyDoNothingFilterIndexReader(indexReader);
    }
      public MyDoNothingIndexReaderDecorator redecorate(MyDoNothingIndexReaderDecorator decorated,ZoieIndexReader copy) throws IOException {
        // underlying segment has not changed, just change the inner reader

        decorated.updateInnerReader(copy);
        return decorated;
    }

This needs to read

class MyDoNothingIndexReaderDecorator implements IndexReaderDecorator<MyDoNothingFilterIndexReader> {
        public MyDoNothingFilterIndexReader decorate(ZoieIndexReader indexReader) throws IOException {
            return new MyDoNothingFilterIndexReader(indexReader);
        }

        public MyDoNothingFilterIndexReader redecorate(MyDoNothingFilterIndexReader decorated, ZoieIndexReader copy) throws IOException {
            // underlying segment has not changed, just change the inner reader

            decorated.updateInnerReader(copy);
            return decorated;
        }
    }

So it uses the MyDoNothingFilterIndexReader? instead of the MyDoNothingIndexReaderDecorator? in the constrcutor.

Comment by goo...@scale.io, Feb 22, 2010

And also the DataIndexableInterpreter? needs to implement a new method, too.

    public ZoieIndexable convertAndInterpret(Data src) {
        return new DataIndexable(src);
    }

That might be a change introduced in the latest version, though.

Comment by jinglong...@gmail.com, Jan 19, 2011

can you give me an example about updating the index?

Comment by Xiaorong...@gmail.com, May 16, 2011

any one also use this wiki ? or another wiki be used ?

Comment by chinmay....@gmail.com, Jul 5, 2011

I'm having trouble creating a ZoieIndexReader?. I'm creating it as :

FSDirectory dir1 = FSDirectory.open(new File("index_dir")); indReader1 = IndexReader?.open(dir1,true); zIndReader1 = ZoieIndexReader?.open(indReader1);

After this however, it seems that delDocIds and docIDMapper are NULL. Is it normal for these fields to be NULL ? Or have I openend it wrong. Please let me know at your convenience.


Sign in to add a comment
Powered by Google Project Hosting