Google Code Archive - Long-term storage for Google Code Project Hosting.

Posted on Jan 17, 2008 by Happy Lion

Taking a simple but complete schema, outlined below, and some sample data, also supplied, use the Storm Python ORM toolkit to perform operations on the data:

https://storm.canonical.com/

It is suggested, but not required, that sqlite be used as the database for the task: its driver is included with Python 2.5+ and all the ORMs support it out of the box.

Schema:

This suggested schema is the basis for a book review site. The focus of the site is the book, but within the book article you can link to other books by the same author, other books in the same series, or other books in the same genres. The schema is very slightly artificial as it tries to include 1:m, m:n, optional 1:1 and extended m:n relationships.

The key entity is the BOOK. It has a title which is unique and is limited to 100 characters and review notes which may be as long as you like. It must be written by an author who may have written other books. It may occupy a specific place in a series. It may belong to one or more genres, of which one is considered the principal genre. The book has a type -- Fiction or Nonfiction. If the book is Nonfiction, it is linked to a set of nonfiction data.

The AUTHOR has a unique name and, optionally, biographical notes. The author must have written at least one book in the database and may have written more.

The SERIES has a name and, optionally, overall review notes for the series. The series must contain at least one book in the database and may contain more.

The GENRE has a name and a brief description. A genre may contain no books, one or some.

The NONFICTION_DATA includes the publisher, publication date and number of pages of the book. Nonfiction data refers to only one book.

Task steps and completion criteria:

The first part of the task would be to use the ORM to create the schema according to its own techniques. Then the data should be loaded from the .csv files supplied.

The second part of the task would involve running a series of sample queries, exercising common data needs. A set of queries to use is listed below.

Completion criteria: upload the source code used for this ORM and the output from the query runs. Include a brief note indicating where the ORM appeared to facilitate a particular structure or query and where it seemed to hamper.

Queries:

Formulate and then print the output of these queries.

List all books (title only) by the author John Buchan
List all books for which Adventure is a genre
List all authors with only one book in the database
List the author and title of all books which don't form part of a series
List the title, author, publisher and publication date for all books whose author's surname begins with 'B'. (For fiction books, the last two will be empty).
List the top 5 most prolific authors
List any genre which doesn't include any books
Create a new genre "Python" and apply it to all books with the word "dark" in their title.
Add a new non-fiction book "The Python Programming Language", published by "Python Press" in 2007.
Update the notes on the book "Scorpia" to add the phrase: "I loved this book".

Data:

A data file has been made available in the ghop-python google group's files section at http://ghop-python.googlegroups.com/web/book-schema.zip. Although the data set comes from a real-world book review site (goodtoread.org) it has been artificially adjusted to suit the task. In particular, two non-fiction books were added with their additional data to be able to have a 1:1 optional relationship. And to extend the m:n between books and genres, the first listed of the genres should be considered the principal genre for a book.

Comment #1

Posted on Jan 17, 2008 by Helpful Giraffe

(No comment was entered for this change.)

Comment #2

Posted on Jan 22, 2008 by Quick Panda

I claim this task.

Comment #3

Posted on Jan 22, 2008 by Quick Panda

Would it be OK if I uploaded the source code, SQLite database file, my report and then a DocTest which should work against it? I just think that DocTests would be handy for something like this, and it could serve as a good tutorial in the future.

Comment #4

Posted on Jan 22, 2008 by Happy Bear

Why not?

Comment #5

Posted on Jan 22, 2008 by Quick Panda

By the way, storm for some reason does not use the built-in sqlite API but needs pysqlite2. I'm guessing that you can get this via easy_install; I used apt-get, so I don't know. Just a caution for when I upload this.

Comment #6

Posted on Jan 23, 2008 by Quick Panda

Made it, and it all works, the doctest features about 118 tests which all pass. I found this an interesting ORM; it works quite well, but at the same time it's not nearly as powerful as SQLAlchemy, which I've used before. Tables had to be created using SQL, and a lot of the stuff that the ORM should have done was left to the user. Nevertheless, it was fast to set up and get working, and I learnt how to use it quite quickly. There is a serious lack of documentation and docstrings in all classes and functions, but I think it certainly has potential. All of the results of tests are written to csv files as the result of the doctest. To run the test, go into the stormdemo/src directory and run 'python runtest.py'. Thanks, Zack

P.S. remember to install pysqlite2.

Attachments

stormdemo.tar.gz 61.28KB

Comment #7

Posted on Jan 24, 2008 by Happy Lion

OK. An excellent job overall. I like the idea of using doctest to run the output. I have a very few quibbles, none of which will prevent me from marking the task as complete. One is that your algorithm for finding the work "dark" misses, for example, "Dark Quetzal" which is not preceded by a space. The output of genres-without-books is using the repr rather than the string, which makes it difficult to see if they're the right ones.

The final thing is that there are a number of place where you could have avoided using a list comprehension (which involves creating an auxiliary list) and used a generator expression instead. To an extent this is a matter of personal taste, but in local environments -- where you're not trying to pass things around -- I find it easier to read and it should be no slower. So where, for example, you pass a list of books by John Buchan to the csv_writer, you could get away with:

csv_writer.writerows( [book.name] for book in a_john_buchan.books )

and so on.

Finally, be aware that the csv module on Windows usually requires a file to be opened in binary mode for reading or writing to avoid a double-space effect which you otherwise have to cope with down the line.

google-highly-open-participation-psf - issue #329

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7