|
|
Introduction
This tutorial is intended to give an overview of how to work with Grassyknoll. To follow along at home, you should already have Grassyknoll installed. You'll want to grab the source tarball as well, as the samples are not installed by default. See issue 74.
The tutorial runs a webserver, the RestFrontend, and provides an interface to a Collection. Accessing the server with your standard web browser provides a basic HTML interface to a subset of the Collection API. Sorry it's so ugly, the last time I made web page was in 1997. Seriously. The full Collection API is available when talking to the server in a non-HTML format.
Working Environment
If the executables are not on your PATH, you'll need to adjust the commands accordingly.
You'll see lots of examples like:
pfein@brick:~/grassyknoll$ ls -l samples/ total 3944 drwxr-xr-x 5 pfein pfein 8192 2008-03-09 20:04 demo -rwxr-xr-x 1 pfein pfein 615 2008-03-09 19:00 load_nsf.py -rw-r--r-- 1 pfein pfein 2170880 2008-03-07 22:25 nsf_ra.tar.gz -rw-r--r-- 1 pfein pfein 1842253 2008-02-27 19:13 shakespeare.tar.gz
pfein@brick:~/grassyknoll$ is your shell prompt. Everything else is the output of shell commands. If you've unpacked the tarball somewhere other than ~/grassyknoll, you'll want to change to that directory. Don't worry if the contents of your tarball don't exactly match the above.
Sample Data
We'll be using 2003 National Science Foundation Awards abstracts, covering basic research in the sciences in a wide variety of academic disciplines. See data definitions
Available Demos
Configuration files are provided for several backends.
| File | BackEnds | Notes |
| lucene_config.py | LuceneBackend | Full-text search queries |
| sqlite_config.py | SqliteBackend | Query by choosing among fixed values |
| dict_config.py | DictionaryBackend | No queries. |
| remote_config.py | ClientBackend | Forwards to another (running) server. No queries. Listens on port 8081 Don't start with this! |
This tutorial is written using the LuceneBackend. If you use one of the other backends, some of the output will be slightly different. However, the tutorial is identical for all backends (except for queries, where backend-specific details will be provided).
Running the Server
Fire up the server in a separate terminal:
pfein@brick:~/grassyknoll$ grassyknoll_d ~/grassyknoll/samples/demo/lucene_config.py WARNING:SmartStorage:Created /home/pfein/grassyknoll/samples/demo/lucene_demo
This will create an empty lucene index in samples/demo/lucene_demo/. The server takes a single argument, the name of a config file.
The server will print out the requests that come in:
localhost - - [10/Mar/2008 01:39:35] "GET / HTTP/1.1" 200 634 localhost - - [10/Mar/2008 01:39:57] "GET /pants HTTP/1.1" 404 238
You can watch this output to get a sense of what's happening behind the scenes. See RestUrls for more information on the URLs.
Did it work?
Let's see what's in the server. Point your browser to http://localhost:8080/
Stopping the server
To stop the server, just hit ^C. You can shut down the server at any time and restart it by re-running the above command.
Advanced: Using curl
curl is a command line HTTP client. It's useful for debugging and getting a lower-level view of a webserver than is possible with a web browser.
pfein@brick:~/grassyknoll$ curl -H "Accept: application/json" http://localhost:8080/
{"ids": [],
"metadata": {"LuceneCollection_thread": "MainThread",
"LuceneCollection_pid": 20432,
"LuceneCollection_host": "brick",
"LuceneCollection_time": 0.0011680126190185547}}By adding the Accept: application/json header, we get results in JSON format.
The curl output has been edited for readability on this wiki.
Loading Data
Demos aren't very interesting without data. Load some:
pfein@brick:~/grassyknoll$ samples/load_nsf.py
Listing Documents
Going to the server root will give you a list of available document ids.
pfein@brick:~/grassyknoll$ curl -H "Accept: application/json" http://localhost:8080/
{"ids": ["a0300005", "a0300025", "a0300044", "a0300051", "a0300064", "a0300071",
<...>
"a0331381", "a0331387", "a0331497"],
"metadata": {"LuceneCollection_thread": "MainThread",
"LuceneCollection_pid": 20432,
"LuceneCollection_host": "brick",
"LuceneCollection_time": 0.064599037170410156}}Retrieve a Result
Clicking on an id will retrieve the corresponding result.
http://localhost:8080/a0300005
pfein@brick:~/grassyknoll$ curl -H "Accept: application/json" http://localhost:8080/a0300005?fields=Title,Date,Award_Instr
{"__id__": "a0300005",
"__url__": "\/a0300005",
"Award_Instr": "Standard Grant",
"Date": "2003-03-26",
"Title": "A Model Analysis of Newly Released Galileo Electron Density Data"}Deleting Documents
Hitting the Delete button on a result page will delete that document.
http://localhost:8080/a0300005?method=DELETE
Verify that the document is gone by clicking on its id. You should get a 404 Not Found page.
http://localhost:8080/a0300005
pfein@brick:~/grassyknoll$ curl -H "Accept: application/json" -X DELETE http://localhost:8080/a0300005
{"ids": ["a0300005"],
"metadata": {"LuceneCollection_thread": "MainThread",
"LuceneCollection_pid": 20432,
"LuceneCollection_host": "brick",
"LuceneCollection_time": 0.00030088424682617188}}Queries
Different backends support different methods of querying the Collection.
LuceneBackend
Go back to the server homepage and enter some search terms. The server supports the Lucene query syntax.
For example, we'll search for abstracts about "biochemistry".
http://localhost:8080/__query__/search?q=biochemistry
pfein@brick:~/grassyknoll$ curl -H "Accept: application/json" http://localhost:8080/__query__/search?q=biochemistry
{"results": [{"__id__": "a0307212",
"__url__": "\/a0307212",
"__score__": 0.380184739828,
"Investigator": "Himadri B. Pakrasi pakrasi@biology2.wustl.edu (Principal Investigator current)\nBijoy K. Ghosh (Co-Principal Investigator current)\nRalph S. Quatrano (Co-Principal Investigator current)",
"Total_Amt": 50000,
<...>}],
"metadata": {"count": 3,
"LuceneCollection_thread": "MainThread",
"LuceneCollection_pid": 20432,
"LuceneCollection_host": "brick",
"LuceneCollection_time": 0.01209712028503418}}SqliteBackend
Go back to the server homepage and choose an item from the menu.
For example, we'll query for "Cooperative Agreements".
http://localhost:8080/__query__/and?Award_Instr=Cooperative+Agreement
pfein@brick:~/grassyknoll$ curl -H "Accept: application/json" http://localhost:8080/__query__/and?Award_Instr=Cooperative+Agreement
{"results": [{"__id__": "a0310163",
"__url__": "\/a0307300",
"Expires": "2003-09-30", "Investigator": "Jeffrey T. Kiehl (Principal Investigator current)",
"Total_Amt": 25000, "Award_Instr": "Cooperative Agreement",
<...>}],
"metadata": {"count": 2,
"SqliteCollectionReader_pid": 21778,
"SqliteCollectionReader_time": 0.00316619873046875,
"SqliteCollectionReader_host": "brick",
"SqliteCollectionReader_thread": "MainThread"}}Other URLs
There are several other URLs that the server understands, but you'll need to type them in by hand.
Distributed Server
The remote_config.py uses a ClientBackend to provide access to another GrassyKnoll server. It's a proof of concept of distributed computing in GrassyKnoll.
While leaving your first server running, open another terminal and run
pfein@brick:~/grassyknoll$ grassyknoll_d ~/grassyknoll/samples/demo/remote_config.py
This server listens on http://localhost:8081/ (note the different port). It forwards all requests to the original server. You can tell you're talking to the proxying server by the extra metadata.
By watching the server output, you can see the requests being forwarded:
pfein@brick:~/grassyknoll$ grassyknoll_d ~/grassyknoll/samples/demo/remote_config.py localhost - - [10/Mar/2008 03:22:10] "GET / HTTP/1.1" 200 30394
pfein@brick:~/grassyknoll$ grassyknoll_d ~/grassyknoll/samples/demo/lucene_config.py INFO:SmartStorage:Opened /home/pfein/grassyknoll/samples/demo/lucene_demo localhost - - [10/Mar/2008 03:22:11] "GET / HTTP/1.1" 200 7904
curl
pfein@brick:~/grassyknoll$ curl -H "Accept: application/json" http://localhost:8081/
{"ids": ["a0300025", "a0300044", "a0300051", "a0300064", "a0300071",
<...>
"a0331290", "a0331381", "a0331387", "a0331497"],
"metadata": {"ClientCollection_thread": "MainThread",
"ClientCollection_pid": 22071,
"ClientCollection_host": "brick",
"ClientCollection_time": 0.11742901802062988,
"LuceneCollection_pid": 22055,
"LuceneCollection_host": "brick",
"LuceneCollection_time": 0.059844970703125,
"LuceneCollection_thread": "MainThread"}}Where Next?
Thanks for coming this far! You can:
- play around with the server more.
- try running another of the BackEnds
- read about Collections, the data model
- read about the concurrency model
- browse the source code documentation
- contact the developers and other users.
Sign in to add a comment

Note: in the deletion example, http://localhost:8080/a0300005?method=DELETE won't work from the browser, because the browser sends the request with a GET method, and GrassyKnoll correctly returns a 405 (because deletion isn't idempotent).