Export to GitHub

google-highly-open-participation-psf - issue #327

Compare Python ORMs - Elixir


Posted on Jan 17, 2008 by Happy Lion

Taking a simple but complete schema, outlined below, and some sample data, also supplied, use the Elixir Python ORM toolkit to perform operations on the data:

http://elixir.ematia.de/trac/wiki

It is suggested, but not required, that sqlite be used as the database for the task: its driver is included with Python 2.5+ and all the ORMs support it out of the box.

Schema:

This suggested schema is the basis for a book review site. The focus of the site is the book, but within the book article you can link to other books by the same author, other books in the same series, or other books in the same genres. The schema is very slightly artificial as it tries to include 1:m, m:n, optional 1:1 and extended m:n relationships.

The key entity is the BOOK. It has a title which is unique and is limited to 100 characters and review notes which may be as long as you like. It must be written by an author who may have written other books. It may occupy a specific place in a series. It may belong to one or more genres, of which one is considered the principal genre. The book has a type -- Fiction or Nonfiction. If the book is Nonfiction, it is linked to a set of nonfiction data.

The AUTHOR has a unique name and, optionally, biographical notes. The author must have written at least one book in the database and may have written more.

The SERIES has a name and, optionally, overall review notes for the series. The series must contain at least one book in the database and may contain more.

The GENRE has a name and a brief description. A genre may contain no books, one or some.

The NONFICTION_DATA includes the publisher, publication date and number of pages of the book. Nonfiction data refers to only one book.

Task steps and completion criteria:

The first part of the task would be to use the ORM to create the schema according to its own techniques. Then the data should be loaded from the .csv files supplied.

The second part of the task would involve running a series of sample queries, exercising common data needs. A set of queries to use is listed below.

Completion criteria: upload the source code used for this ORM and the output from the query runs. Include a brief note indicating where the ORM appeared to facilitate a particular structure or query and where it seemed to hamper.

Queries:

Formulate and then print the output of these queries.

  • List all books (title only) by the author John Buchan
  • List all books for which Adventure is a genre
  • List all authors with only one book in the database
  • List the author and title of all books which don't form part of a series
  • List the title, author, publisher and publication date for all books whose author's surname begins with 'B'. (For fiction books, the last two will be empty).
  • List the top 5 most prolific authors
  • List any genre which doesn't include any books
  • Create a new genre "Python" and apply it to all books with the word "dark" in their title.
  • Add a new non-fiction book "The Python Programming Language", published by "Python Press" in 2007.
  • Update the notes on the book "Scorpia" to add the phrase: "I loved this book".

Data:

A data file has been made available in the ghop-python google group's files section at http://ghop-python.googlegroups.com/web/book-schema.zip. Although the data set comes from a real-world book review site (goodtoread.org) it has been artificially adjusted to suit the task. In particular, two non-fiction books were added with their additional data to be able to have a 1:1 optional relationship. And to extend the m:n between books and genres, the first listed of the genres should be considered the principal genre for a book.

Comment #1

Posted on Jan 17, 2008 by Helpful Giraffe

(No comment was entered for this change.)

Comment #2

Posted on Jan 18, 2008 by Swift Camel

I claim this task.

Comment #3

Posted on Jan 18, 2008 by Swift Dog

Just in case, I'll be on holidays for the next week, so I won't be able to answer any question about Elixir. Hopefully somebody else will answer your questions if you have any.

Comment #4

Posted on Jan 18, 2008 by Helpful Kangaroo

This task is due January 23, 2008 20:40:00 UTC

Comment #5

Posted on Jan 18, 2008 by Helpful Kangaroo

(No comment was entered for this change.)

Comment #6

Posted on Jan 18, 2008 by Helpful Kangaroo

shadytrees, you'll have to finish issue 328 before you can claim this one.

Comment #7

Posted on Jan 19, 2008 by Swift Camel

Would it be all right if I uploaded the files for the task and then claimed it? I'm 2/3rds of the way done with the task; I overlooked the fact that I couldn't claim more than one.

Comment #8

Posted on Jan 19, 2008 by Helpful Kangaroo

As long as the task is open, anyone else can claim it. You should probably not give them the solution. :-)

You can claim this task as soon as the other is marked completed.

Comment #9

Posted on Jan 19, 2008 by Helpful Lion

I claim this task.

Comment #10

Posted on Jan 19, 2008 by Helpful Kangaroo

This task is due January 24, 2008 15:05:00 UTC

Comment #11

Posted on Jan 24, 2008 by Helpful Lion

I didn't finished the task yet because of time problems.

I still have problems with the relations. Strange behaviors with Author.query.first().author.books / Author.get_by(name='..').books etc.

Attached are the current files. (Issue attachment storage quota exceeded. -> http://www.sendspace.com/file/0k34j0)

It would be great if I could get an extension time.

Comment #12

Posted on Jan 24, 2008 by Helpful Kangaroo

I'll try to look at those files tonight, if Tim doesn't beat me to it. In the mean time, since I know you've asked for a help at least once before I'll go ahead and extend the deadline a few days.

Comment #13

Posted on Jan 24, 2008 by Happy Lion

I'm not going to have time tonight; if Doug or someone hasn't had a chance by tomorrow, I'll try to pick it up again then.

Comment #14

Posted on Jan 26, 2008 by Helpful Kangaroo

The queries are all returning errors for me: $ python Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04) [GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin Type "help", "copyright", "credits" or "license" for more information.

import queries print query_author() Traceback (most recent call last): File "", line 1, in NameError: name 'query_author' is not defined print queries.query_author() Traceback (most recent call last): File "", line 1, in File "queries.py", line 8, in query_author return Book.query.filter(Book.author.has(name='John Buchan')).all() AttributeError: type object 'Book' has no attribute 'query' print queries.query_surname() Traceback (most recent call last): File "", line 1, in File "queries.py", line 32, in query_surname return Author.query.filter(Author.name.contains(' B')).all() AttributeError: type object 'Author' has no attribute 'query'

Comment #15

Posted on Jan 26, 2008 by Helpful Lion

Please don't focus on the queries right now. That's not my problem.

But it fails because you didn't call "setup_all(True)" (creates tables and mappers).

Comment #16

Posted on Jan 26, 2008 by Helpful Lion

Now I finished with help of Doug Hellmann, thank you.

To run it, execute "python insert.py", open your IDE and make "from insert import *; setup_all()" and now you can play around with the queries after importing queries.py.

I already had some small experiences with the "raw" SQLAlchemy and I think it's better. But however, I like the philosophy behind SQLAlchemy and it's very powerful.

At coding I had problems with inserting the books and the relations. That's the reason why there is this strange "handle_relation()" function :-) The rest was simple.

Here's the archive link - external again: http://www.sendspace.com/file/d8c97o

Regards, Rafael

Comment #17

Posted on Jan 28, 2008 by Happy Lion

Sorry it's taken so long to get round to this. I'm happy enough with this one although you've missed a couple of the aspects of the data model: the optional one-to-one of nonfiction books; and the priority of the first genre. That said, most of the other students missed the latter at least.

A couple of points: in the insert.py, your're declaring a function within a for loop. While this is technically possible, it's certainly not what you intended to do here. It would have been good to have had the queries.py module self execute ("if name == 'main':") to run the queries automatically, perhaps outputting some kind of result sets rather than requiring the mentors to go in manually and run them one by one. I did get a sqlalchemy error from the query_empty_genre but the code looks right. However, it does highlight the sensitive interdependence of Elixir & SQLAlchemy.

Comment #18

Posted on Jan 29, 2008 by Helpful Lion

Sorry, had no time to answer yesterday.

The function definition in the for-loop was my fault, fixed. I also added the main-clause to queries.py. Now just call "python queries.py > ~/output.txt".

query_empty_genre works for me. Please show me the traceback.

Attachments

Comment #19

Posted on Jan 30, 2008 by Swift Dog

Hi,

I'm one of the authors of Elixir. In comment 16, you say you prefer raw SQLAlchemy. Could you explain why? And if you encountered any problems, could you describe them, so that we can try to fix them (whether they are documentation or code related). Btw: I guess it's too late for this task, but if you have any problem with Elixir, don't hesitate to ask for help on our mailing list.

Thanks,

Comment #20

Posted on Jan 30, 2008 by Helpful Lion

The SQLAlchemy docs are really good and I was a bit irritated with the relationships and had some small problems with them at first. But when you're into it, it's nice I think. The raw SQLAlchemy is clearer to me.

Status: Completed

Labels:
orm thirdparty database Due-20080127.1505 ClaimedBy-rafa...gmx.net