What's new? | Help | Directory | Sign in
Google
grassyknoll
GrassyKnoll: a Search Engine in Python
  
  
  
  
    
Search
for
Updated Mar 17, 2008 by aaron.s.lav
Labels: Featured
Collection  
description of the data model

A Collection is the central concept in Grassyknoll. It saves Documents and retrieves Results. The collection is the common interface enabling communication between the frontend and the backend. A collection is indexed by Id. Documents and results are composed of Fields.

A set of abstract base classes live in grassyknoll.core.Collection.

Fields

Fields are simply name-value pairs. Names are always strings. Values are basic scalar types, including strings, integers, floats, booleans, dates, times and None. A few fields:

{
  "title": "Japan", 
  "last_modified": "2007-04-02T01:00:00", 
  "link_count": 725,
  "__id__": "pants"
}

XXX describe multivalue fields?

CollectionIds

Source: CollectionIds

An id uniquely identifies a document or result in the collection. It is a field named __id__. The value is always unicode. Note that uniqueness of ids is the only integrity constraint enforced by Grassyknoll.

CollectionIds are a list of ids, plus some metadata.

CollectionDocument

Source: CollectionDocument

A document is an item that can be inserted into a collection. Data is stored in the document as a set of fields. It is identified by an id. The document's norman describes how it should be interpreted by the collection.

CollectionResult

Source: CollectionResult

A result is an item that can be retrieved from a collection. A result contains a set of fields, including a __id__ that uniquely identifies it.

Note that the set of fields in a result may differ from those supplied in its corresponding document. Backends may not store all of the fields in a document in a retrievable form and can provide additional fields that were not present in the original document.

CollectionResultSet

Source: CollectionResultSet

A list of Results, plus some metadata.

Collection

Source: Collection

A collection is a persistent data store that saves documents and retrieves results. It provides a few basic operations. Note that some backends may not implement all of these operations. The while the create and delete take lists of documents/ids, this is purely for optimization purposes; there is no implication of transactions.

list

List the ids of all documents present in the collection.

__len__

Return the number of items in the collection

retrieve([ids], [fields])

Retrieve the results identified by the list of ids. Only those fields named will be returned (defaulting to all). It is not an error for an id to be absent.

delete([ids])

Delete documents from the collection in the list ids. Silently succeeds even if no document identified by an id exists.

create([documents])

Adds a new documents to the collection. Returns the ids of the documents, generating one if none was supplied. Note that this may overwrite an existing document if the document already contains a id.

<foo>Query([fields], **kwargs)

Queries are backend-specific means of searching the collection. Queries are named liked <foo>Query, and each query is free to define its own arguments. All queries return a resultset. Only those fields named will be returned (defaulting to all fields). If the collection/query does not provide an ordering, results will be returned in random order

Metadata

The following metadata is provided. Each item is prefixed with the name of the backend class.

Individaul backends may provide addtional metadata.


Sign in to add a comment