|
WhyDOAP
A discussion of why Simal chose DOAP as its data format
Phase-Design Description of a Project (DOAP) was originally created for describing an open source software project and the people involved in it. However, it is not limited to this, it could be used to describe a great many projects. Being an RDF data schema it is possible to extend DOAP where necessary (it uses FOAF to describe contributors for example). Simal uses DOAP (as opposed to any other way of describing projects in RDF) as there are an increasing number of projects adopting it "out there". That is projects are starting to record their information as DOAP and other projects are starting to consume it. At the time of writing Ping The Semnatic Web is aware of 19,035 DOAP descriptions (which Simal can import). As of yesterday, Simal is able to import the 15,467 projects in Ohloh. Note, that these are both pre-alpha tools at present. We've also just been given a username and password for the Flossmetrics database which, whilst it doesn't produce DOAP, does give me the raw data necessary to import the whole of Sourceforge (about 160,000 projects), Freshmeat (about 45,000), FSF (I have no idea, but it's not a small number) and numerous smaller "forges". We welcome any assistance in writing the code necessary to grab and massage this data from FlossMetrics. By choosing DOAP, Simal has access (at the time of writing) to somewhere between 16,000 and 20,000 project descriptions. With work we will have access to at least 200,000 projects. The ability to import this data is scheduled for 0.3 release, with 0.2 due by the end of September 2008 - however it is unlikely the system will be usable with that much data until version 0.6 (unless you need it sooner and help us out). Until 0.6 we'll probably be limited to considerably less data. | |