hibiscusfederation

HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation

HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation http://svn.aksw.org/papers/2014/HiBISCuS_ESWC/public.pdf'>[Correct pdf]

Source Code

The HiBISCuS source code along with all of the 3 extensions (SPLENDID, FedX, DARQ) can be checkout from https://code.google.com/p/hibiscusfederation/source/checkout

FedBench

FedBench queries can be downloaded from project website https://code.google.com/p/fbench/

Datasets Availability

All the datasets and corresponding virtuoso SPARQL endpoints can be downloaded from the links given below. We strongly recommend to directly download the endpoints as some of the datadumps are quite big and require a lot of upload time. You may start a SPARQL endpoint from bin/start.bat (for windows) and bin/start_virtuoso.sh (for linux).

| Dataset | Data-dump | Windows Endpoint | Linux Endpoint | Local Endpoint Url| Live Endpoint Url| |:------------|:--------------|:---------------------|:-------------------|:----------------------|:---------------------| | ChEBI |Download| Download|Download|your.system.ip.address:8890/sparql | http://chebi.bio2rdf.org/sparql | | DBPedia-Subset|Download| Download|Download|your.system.ip.address:8891/sparql |http://dbpedia.org/sparql | | DrugBank|Download | Download| Download|your.system.ip.address:8892/sparql | http://wifo5-04.informatik.uni-mannheim.de/drugbank/sparql| | Geo Names|Download | Download | Download |your.system.ip.address:8893/sparql | http://factforge.net/sparql| | Jamendo |Download | Download | Download |your.system.ip.address:8894/sparql | http://dbtune.org/jamendo/sparql/| | KEGG |Download | Download | Download |your.system.ip.address:8895/sparql |http://cu.kegg.bio2rdf.org/sparql | | Linked MDB |Download | Download | Download |your.system.ip.address:8896/sparql |http://www.linkedmdb.org/sparql | | New York Times |Download | Download | Download |your.system.ip.address:8897/sparql | - | | Semantic Web Dog Food |Download | Download | Download |your.system.ip.address:8898/sparql | http://data.semanticweb.org/sparql|

Usage Information

In the following we explain how one can setup the BigRDFBench evaluation framework and measure the performance of the federation engine.

SPARQL Endpoints Setup

The first step is to download the SPARQL endpoints (portable Virtuoso SAPRQL endpoints from second table above) on different machines, i.e., computers. Best would be one SPARQL endpoint per machine. Therefore, you need a total of 13 machines. However, you can start more than one SPARQL endpoints per machine.
The next step is to start the SPARQL endpoint from bin/start.bat (for windows) or bin/start_virtuoso.sh (for Linux). Make a list of the all SPARQL endpoints URL's ( required as input for index-free SPARQL query federation engines, i.e., FedX). It is important to note that index-assisted federation engines (e.g., SPLENDID, DARQ, ANAPSID) usually stores the endpoint URL's in its index. The local SPARQL endpoints URL's are given above in second table.

Running SPARQL Queries

Provides the list of SPARQL endpoints URL's, and a FedBench query to the underlying federation engine. The query evaluation start-up files for the selected systems (which you can checkout from https://code.google.com/p/hibiscusfederation/source/checkout) are given below.

----------FedX-original-----------------------

package : package org.aksw.simba.start;

File:QueryEvaluation.java

----------FedX-HiBISCuS-----------------------

package : package org.aksw.simba.fedsum.startup;

File:QueryEvaluation.java

----------SPLENDID-original-----------------------

package : package de.uni_koblenz.west.evaluation;

File:QueryProcessingEval.java

----------SPLENDID-HiBISCuS-----------------------

package : package de.uni_koblenz.west.evaluation;

File:QueryProcessingEval.java

----------ANAPSID-----------------------

Follow the instructions given at https://github.com/anapsid/anapsid to configure the system and then use anapsid/ivan-scripts/runQuery.sh to run a query.

SPARQL Endpoints Specification

The specification of the systems used in our evaluation is given below.

Note: We hosted each of the SPARQL endpoint on a separate physical machine.

| Endpoint. | CPU | RAM | Hardisk | |:--------------|:--------|:--------|:------------| |1. ChEBI-virtuoso1| 2.2, i3 | 4GB | 300 GB | |2. DrugBank-virtuoso2| 2.2, i3 | 4GB | 300 GB | |3. Jamendo-virtuoso3| 2.6, i5 | 4 GB | 150 GB | |4. KEGG-Virtuoso4| 2.53, i5 | 4 GB | 300 GB | |5. NYT (New York Times| 2.3, i5 | 4 GB | 500 GB | |6. LinkedMDB-virtuoso6| 2.53, i5 | 4 GB | 300 GB | |7. SWDF (Semantic Web Dog Food)-virtuoso7| 2.9, i7 | 8 GB | 450 GB | |8. GeoNames-Virtuoso8| 2.9, i7 | 16 GB | 500 GB | |9. DbpediaSubset-virtuoso9| 2.9, i7 | 16 GB | 500 GB |

Source Prunning Example

Here is the example of source pruning of query SSQ1 given in HiBISCuS paper. ?s is a star node containing two outgoing hyperedges. Each hyperedge is labelled with relevant data sources by step 1 (i.e Index-dominant or ASK-dominant) of Hibiscus source selection algorithm. Since ?s has only outgoing hyperedges, the subject authorities are collected for each data source of outgoing hyperedge. The intersection set (i.e auth13) of the all the authorities is highlighted in cyan. The sources highlighted in red are finally selected. More detail example is given in paper.

Motivating Examples

Some other motivating examples are shown in figure below. Relevant sources are shown next to each triple pattern of query. Sources which actually contribute to the final result set of each query are highlighted in red.

Project Information

The project was created on Jan 4, 2014.

License: GNU GPL v3
1 stars
svn-based source control

Labels:
Database Java SPARQL RDF SourceSelection QueryFederation