My favorites | Sign in
Project Home Wiki Issues Source
READ-ONLY: This project has been archived. For more information see this post.
Search
for
DocCatalogs  
Updated Feb 4, 2010 by mwolf...@gmail.com

Creating catalogs

What's a catalog?

A catalog basically is one single Lucene search index. You can have multiple separate catalogs containing different groups of documents. When searching for documents, you can specify the catalog you wish to search in.

Whenever you don't specify a catalog name, the name "default" is assumed.

IMPORTANT: Catalogs are self-contained. There is no way you can search a phrase over multiple catalogs at once. If you wished to do that, you'd have to obtain a list of catalogs and search each catalog for the same phrase separately.

What's it good for, then?

Having multiple catalogs is a good thing(tm) if...

  • you're running a shared hosting environment and want to provide full-text search for all those millions of websites you have
  • you have documents in many languages and only need to search over one language at a time. Splitting the languages up into different catalogs will speed up both searching and indexing, as the index files for each language become smaller.
  • add your own creative use case here...

How are catalogs organized?

Within the Marjory directory you will find a subdirectory called data. Within the data directory, there are... right, even more directories! In here, each directory represents a catalog and also has the same name as the catalog. Each of the catalog directories contains a separate Lucene search index.

Searching a specific catalog

Remember, if you don't specify a catalog, the catalog name "default" will be assumed, so it's best to create this catalog when you first install Marjory. Thus, if you search for the term "Marjory" like this:

http://marjory.example.com/rest/select?q=Marjory

...only the documents in the catalog "default" will be searched for this phrase.

If you want to search a different catalog, you have to specify it explicitly per request parameter:

http://marjory.example.com/rest/select?q=Marjory&catalog=MyGloriousCatalog

This will search for the phrase "Marjory" within the... oh well, you get the picture :-)

Adding, updating and deleting documents in a specific catalog

Specifying the catalog to be affected by add, update and delete operations is just as easy: Just provide a "catalog" attribute within the root tag of the XML snippet you send to Marjory via POST.

<!-- For adding: -->
<add catalog="MyGloriousCatalog"> ... </add>

<!-- For updating: -->
<update catalog="MyGloriousCatalog"> ... </update>

<!-- For deleting: -->
<delete catalog="MyGloriousCatalog"> ... </delete>

How to add a new catalog

...via the command line

Within the Marjory installation directory, do:

cd scripts
php createIndex.php MyGloriousCatalog

IMPORTANT: Whenever you create a catalog via commandline, make sure the webserver has full read- and write-access to it!

...via the webservice

Send the following XML snippet to the ReST-controller's catalog action (e.g. http://marjory.example.com/rest/catalog/) via POST:

<add catalog="MyGloriousCatalog" />

How to remove a catalog

Uh-oh, this one's really hard. There's currently no webservice action defined to do that, so you'll have to go to the commandline, go to the data directory and type a complex command sequence:

rm -Rf MyGloriousCatalog

Disclaimer

At the time of writing, only the Zend_Search_Lucene engine adaptor has been finished. Future engines may or may not handle catalogs slightly differently. Only time will tell. If you know that you will only ever be using Lucene as the search engine of choice, this information can be regarded mildly accurate :-)

Powered by Google Project Hosting