|
DocBasics
Basic ConceptsHow Marjory worksMarjory is a webservice for indexing and searching documents. As such, it can be used to easily add fulltext search to your own applications - pretty much regardless of the language your application is written in. Marjory does not use any fancy XML-RPC or SOAP protocols that add needless overhead and tend to overcomplicate even the simplest of things. Instead, a lean ReST interface is used to communicate with the outside world. If you have ever used XML documents and made a HTTP request in your language of choice, chances are good you'll be able to understand and use Marjory pretty quickly :-) As Marjory is written in PHP and based on the Zend Framework, it uses Zend's PHP implementation of Lucene as the default search engine. Marjory is designed to work with other search engines as well, but as it's a very young project, no other engines have been implemented yet. How to search documentsTo search documents stored in a Marjory catalog, just point your browser to the URL of the ReST webservice. If you have installed Marjory to the URL marjory.example.com, for example, and wanted to search for the term "Marjory", this would be the URL you needed to type: http://marjory.example.com/rest/select?q=Marjory In your browser, you will see that this URL will return an XML document containing the search result. Easy, right? Now, if you don't want to download XML documents in your browser, but want to use them within your application, it gets a little more complicated. But just a little. Here's what using the search service would look like in PHP: $xml = simplexml_load_file('http://marjory.example.com/rest/select?q=Marjory');
foreach($xml->xpath('//doc') as $document) {
printf("\nFound document: %s\n", (string) $document['uri']);
foreach($document->str as $field) {
printf("Field %s contains value: %s\n", (string)$field['name'], (string)$field);
}
}This will print a list of all fields contained in the documents that were returned by the above search result. How to add documentsThis is a bit more complex, although the basics remain the same: You make a HTTP request to the proper webservice, this time a POST request, and send a XML snippet containing information about the document you wish to add. The only important thing is that all your documents must have a unique resource identifier (URI), that can be used to store and retrieve the document. This can be a URL for web documents, a file path if the documents are located within a filesystem, a database ID or a hash of some kind. Marjory doesn't really care what it is as long as it uniquely identifies a document. The easiest way to index a document is to provide a link to it. Just do a POST request to the ReST-controller's add action (e.g. http://marjory.example.com/rest/add/) and have it contain this little XML snippet: <add catalog="default"> <doc src="http://my.website.tld/my/document.html" /> </add> Marjory will then fetch the document from the given address, using it as the document URI. By default, the Zend_Search_Lucene document parser for HTML documents will be invoked to parse out the document title and body. If you have special parsing needs, this can be overridden by a custom class. Non-HTML documents are not yet supported by Marjory, but if you need to index those as well, there's another way. All you need to do is obtain the document content in plain text, define a few fields for splitting up its content (e.g. title, abstract, content) and pass it to Marjory as an XML document just as above, but using this format: <add catalog="default">
<doc uri="MyUniqueDocumentId">
<field name="title">Marjory: Search as a service</field>
<field name="abstract">An epic novel about full-text indexing in an SOA environment</field>
<field name="content">Lorem ipsum dolor sit amet... (to be continued)</field>
</doc>
</add>You can choose the number of fields and their names freely. Keep in mind that the field names "title" and "content" are being assumed as a default though so if you use different fields, you'll have a bit more work. More work is of course bad, so better stick to the standards :-) When you have to index a lot of documents at once, you can cut down the number of requests by providing multiple <doc></doc> blocks within one request.
|
Questo articolo รจ davvero interessante. Complimenti By www.prenotaora.com