My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
GettingStarted  
Using Flint - A Simple Example
Featured, Phase-Design
Updated Mar 23, 2010 by tyenki

Using Flint - A Simple Example

What Flint is trying to achieve is simplify is the task of loading data into Lucene. Although this task is not overly difficult without Flint, it can still be a steep learning curve depending on the complexity of the data.

What Flint provides is an easily understood abstraction and code that makes it easy to prepare and "pre-flight" data prior to indexing it. This helps developers to:

  • understand what the Lucene index will look like.
  • use XML validation and QA technology to check the data before indexing.
  • provides a logical point to make Lucene-specific improvements to the data before indexing.

Hopefully the following example will help developers to understand how Flint can add value and improve productivity when loading data into Apache's Lucene.

Prerequisite

Use a Subversion (SVN) client, such as the following, to checkout the example code:

OR
Download the zip file and import it into your development environment, say, Eclipse.

How to write data to a Lucene index

Flint depends on data being converted into an intermediary XML format, which we call Index XML or IXML. IXML is simply an XML representation of how records are stored into Lucene.
Here is an example of source document
example.xml

<example>
  <artists>
    <artist>
      <artist.id>12625</artist.id>
      <artist.about>bio etc</artist.about>
      <artist.URL>MySpace URL</artist.URL>
      <artist.name>JO MEARES AND THE HONEYRIDERS</artist.name>
      <artist.altname1>JO MEARES</artist.altname1>
      <artist.altid1>5851</artist.altid1>
      <artist.altname2>JO MEARS</artist.altname2>
      <artist.altid2>5852</artist.altid2>
    </artist>
    <artist>
      <artist.id>10642</artist.id>
      <artist.about>bio etc</artist.about>
      <artist.URL>MySpace URL</artist.URL>
      <artist.name>THE MULES</artist.name>
    </artist>
    <artist>
      <artist.id>4937</artist.id>
      <artist.about>bio etc</artist.about>
      <artist.URL>MySpace URL</artist.URL>
      <artist.name>GOLDEN MEAN</artist.name>
      <artist.rel1>18023</artist.rel1>
      <artist.rel1.desc>bass player left Golden Mean and went to this band
      </artist.rel1.desc>
    </artist>
  </artists>
  <events>
    <event>
      <event.id>27526</event.id>
      <event.date>12022009</event.date>
      <event.venue>493</event.venue>
      <event.artist1>12625</event.artist1>
      <event.artist2>10642</event.artist2>
      <event.artist3>4937</event.artist3>
      <event.title></event.title>
      <event.artwork></event.artwork>
    </event>
  </events>
  <venues>
    <venue>
      <venue.id>493</venue.id>
      <venue.name>Excelsior Hotel</venue.name>
      <venue.www_url>http://www.excelsiorhotel.com.au/</venue.www_url>
      <venue.location_url>http://maps.google.com.au/places/au/surry-hills/foveaux-st/64/-excelsior-hotel</venue.location_url>
      <venue.contact></venue.contact>
      <venue.street_addr></venue.street_addr>
      <venue.postcode></venue.postcode>
    </venue>
  </venues>
</example>

This is the same data transformed into IXML
example.xml [ixml format]

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE documents PUBLIC "-//Weborganic//DTD::Flint Index Document 1.0//EN" "http://www.weborganic.org/code/flint/schema/index-documents-1.0.dtd">
<documents>
  <document>
    <field name="id" store="yes" index="un-tokenized">ARTIST-12625</field>
    <field name="type" store="yes" index="un-tokenized">ARTIST</field>
    <field name="name" store="yes" index="un-tokenized">JO MEARES AND THE HONEYRIDERS</field>
    <field name="url" store="yes" index="un-tokenized">MySpace URL</field>
    <field name="about" store="yes" index="tokenized">bio etc</field>
    <field name="fulltext" store="no" index="tokenized">JO MEARES AND THE HONEYRIDERSbio etc</field>
  </document>
  <document>
    <field name="id" store="yes" index="un-tokenized">ARTIST-10642</field>
    <field name="type" store="yes" index="un-tokenized">ARTIST</field>
    <field name="name" store="yes" index="un-tokenized">THE MULES</field>
    <field name="url" store="yes" index="un-tokenized">MySpace URL</field>
    <field name="about" store="yes" index="tokenized">bio etc</field>
    <field name="fulltext" store="no" index="tokenized">THE MULESbio etc</field>
  </document>
  <document>
    <field name="id" store="yes" index="un-tokenized">ARTIST-4937</field>
    <field name="type" store="yes" index="un-tokenized">ARTIST</field>
    <field name="name" store="yes" index="un-tokenized">GOLDEN MEAN</field>
    <field name="url" store="yes" index="un-tokenized">MySpace URL</field>
    <field name="about" store="yes" index="tokenized">bio etc</field>
    <field name="fulltext" store="no" index="tokenized">GOLDEN MEANbio etc</field>
  </document>
  <document>
    <field name="id" store="yes" index="un-tokenized">EVENT-27526</field>
    <field name="type" store="yes" index="un-tokenized">EVENT</field>
    <field name="name" store="yes" index="un-tokenized"/>
    <field name="date" store="yes" index="un-tokenized" parse-date-as="yyyyMMdd" resolution="day">12022009</field>
    <field name="artist" store="no" index="un-tokenized">12625</field>
    <field name="artist" store="no" index="un-tokenized">10642</field>
    <field name="artist" store="no" index="un-tokenized">4937</field>
    <field name="fulltext" store="no" index="tokenized"/>
  </document>
  <document>
    <field name="id" store="yes" index="un-tokenized">VENUE-493</field>
    <field name="type" store="yes" index="un-tokenized">VENUE</field>
    <field name="name" store="yes" index="un-tokenized">Excelsior Hotel</field>
    <field name="url" store="yes" index="un-tokenized">http://www.excelsiorhotel.com.au/</field>
    <field name="location" store="yes" index="tokenized">http://maps.google.com.au/places/au/surry-hills/foveaux-st/64/-excelsior-hotel</field>
    <field name="fulltext" store="no" index="tokenized"/>
  </document>
</documents>

This transformation can be done with any standard XML processing tool, such as XSLT. The XSLT for this example can be found under the example project folder.

Generate the index file

In order to generate the Lucene index file based on these IXML files, Flint has a Java class is written to accomplish this job. It can be triggered from the command line.

java org.weborganic.flint.Indexer -index [Lucene index file output folder] -file [ixml files used to generate the index]

Note: You might need to specified class path in order to trigger from the command line.

java -cp log4j-*.jar;lucene-core-*.jar;flint-*.jar 
org.weborganic.flint.Indexer -index [Lucene index file output folder] -file [ixml files used to generate the index]

Try It

An ANT script is written to make this process simpler to manage. The script simply convert the source xml into IXML format, and then calls the Indexer class to generate the Lucene index based on the IXML input.

In order to run the process, just run the default ANT task named "index", the index files will be generated in the index folder, which can be specified in the command-line argument, see above for details.

Comment by project member ciber....@gmail.com, Mar 14, 2010

There are few things I was struggling when the first time I try to use wo-flint. Hopefully the following might help people who have difficulty to use wo-flint.

1. To generate Lucene Index file, please remember to specify class path. (log4j, lucene-core, flint). 2. When you use flint-0.5.0, it is better to use Lucene core 2.30. (flint-0.5 doesn’t suppose Lucene 3.) 3. To use Luck to check whether you have generated correct index. 4. It will be easier to start with Flint-Example1.zip


Sign in to add a comment
Powered by Google Project Hosting