Export to GitHub

fuxi - issue #4

Switch over to current rdflib


Posted on Sep 24, 2009 by Helpful Cat

Switch FuXi to using the latest version of rdflib that includes changes to the module structure, etc..

Comment #1

Posted on May 13, 2011 by Happy Hippo

I think it is about the time to switch to rdflib v3+ and python 2.7+

how big is the effort to upgrade it? when will it be taken to that level?

Comment #2

Posted on Oct 4, 2011 by Quick Rabbit

In case it's of interest, I propagated my rdflib3 refactoring of FuXi to the new 1.3 release. It's a relatively trivial refactoring that aims simply to maintain compatibility with layer cake rdflib 2.4.X. It doesn't address (and in fact is pretty much ignorant of) the module-level changes that Chimezie envisaged. Four/five tests are failing, indicative of further work required to get this up to production level-code.

http://code.google.com/r/gjhiggins-fuxi-rdflib3/source/list?name=rdflib3

FWIW.

Comment #3

Posted on Oct 9, 2011 by Quick Rabbit

In response to Chimezie's post to the forum:

++ Excellent! I will take a look and try to get a sense of the effort ++ it would take to take this to its conclusion: switching back to ++ rdflib while maintaining the divergent module-changes and components ++ (such as the pure python parser, the Generic SPARQL Store, ++ the MySQL/SPARQL implementation, etc.). Do you have any sense of this?

I am able to report that all the above-referenced work has been restored, refactored and recently merged back into the default branch of my clone of rdfextras, ready for pushing to the "official" repos:

http://code.google.com/r/gjhiggins-rdfextras/source/browse/#hg%2Frdfextras

There is a Hudson CI build which tracks commits:

http://bel-epa.com/hudson/job/rdfextras-test/

and which maintains reports of test runs, currently standing at 369 tests with 4 failures and 13 skips (of known issues, mostly with SQL stores):

http://bel-epa.com/hudson/job/rdfextras-test/lastCompletedBuild/testReport/

and (fwiw) coverage reports:

http://bel-epa.com/hudson/job/rdfextras-test/Test_coverage_Report/

With respect to detail - the MySQL/SPARQL implementation is available as rdfextras.sparql2sql and shows little or no difference in test results to the extant default rdfextras SPARQL implementation.

Most of the stores have been recovered and refactored but I'm unsure of what you mean by "the Generic SPARQL Store" - the recovered stores are in:

http://code.google.com/r/gjhiggins-rdfextras/source/browse/#hg%2Frdfextras%2Fstore

Whilst the key-value stores required little change other than a mild refactoring, the SQL stores are evincing problems when running tests that involved contexts and Statements.

Many of the tests make assertions about the length of the graph but this seems to be broken for contexts, as this Pdb interaction apparently demonstrates (if the comment formatting screws up the layout, I'll attach a text file):

python run_tests.py --pdb-failure test/test_store/test_sqlite.py:SQLiteContextTestCase.testLenInMultipleContexts Running nose with: --pdb-failure test/test_store/test_sqlite.py:SQLiteContextTestCase.testLenInMultipleContexts --attr=!performancetest --where=./ --with-doctest --doctest-extension=.doctest --doctest-tests

/usr/lib/python2.7/unittest/case.py(496)_baseAssertEqual() -> raise self.failureException(msg) (Pdb) u /usr/lib/python2.7/unittest/case.py(503)assertEqual() -> assertion_func(first, second, msg=msg) (Pdb) u ~rdfextras/test/test_store/test_context.py(146)testLenInMultipleContexts() -> self.assertEquals(len(self.graph), oldLen + 1) (Pdb) oldLen 0 (Pdb) self.graph.serialize() * Exception: Can't split 'hates' (Pdb) self.graph.serialize(format="n3") '\n .\n\n' (Pdb) len(self.graph) 3 (Pdb) self.assertEquals(len([y for y in self.graph.triples((None, None, None))]), oldLen + 1)

The failure to serialize the test statements as XML is rather inconvenient and perhaps even a bug.

Still, even with the limitation of several significant test failures, it is possible to run FuXi's test suite with rdflib 3.2 dev and the "restoration" rdfextras clone.

Again, there is a Hudson CI build:

http://bel-epa.com/hudson/job/fuxi-rdflib3/

similarly tracking commits and maintaining reports of test runs, currently standing at 87 tests and 31 failures

http://bel-epa.com/hudson/job/fuxi-rdflib3/lastCompletedBuild/testReport/

and (again, fwiw) test coverage

http://bel-epa.com/hudson/job/fuxi-rdflib3/Test_coverage_Report/

The complete console output is captured here:

http://bel-epa.com/hudson/job/fuxi-rdflib3/17/console

For my own convenience, I adjusted matters so that I could run nose, its --pdb and --pdb-failure options are extremely useful conveniences. The existing test/suite.py seems to find 469 doctests, I can't explain the difference as yet.

I can't detect any significant difference between the result of suite.py run with FuXi+layercake and the same test run with refactoredFuXi + rdflib3/restorationrdfextras; the numbers of tests, passes and fails were pretty much the same (to a casual inspection).

The overwhelming majority of the test failures would seem to be due to simple case mismatches and other format mismatches, e.g.

Expected: ( ex:Fire and ex:Water ) Got: ( ex:Fire AND ex:Water )

If this were any domain other than RDF, I would readily opine that a fix would appear to be trivial - but I've learned to be circumspect, even with what seems obvious.

I have recently removed the rdflib2/rdflib3 import switching because FuXi does not run with either rdflib-2.4.1 or rdflib-2.4.2, only with the "layercake" fork. This is immediately apparent with the failure of imports of a non-existent "parse" function in rdflib.sparql.parser and then an rdflib.OWL module which is missing completely from the 2.4.1/2.4.2 package (that's as far as I got before the realisation settled in).

HTH,

Cheers,

Graham Higgins

Comment #4

Posted on Nov 3, 2012 by Helpful Cat

This is something I'm committed to doing. The next milestone will be about OWL 2 EL reasoning and proof generation and the milestone after that will be completely focused on switching FuXi over to rdflib3

Status: Started

Labels:
Type-Defect Priority-Low