
sig2biopax
Sig2BioPAX: Java tool for converting flat text files to BioPAX Level 3 format
Abstract
The World Wide Web plays a critical role in enabling researchers to exchange, search, process, visualize, integrate and analyze experimental data. Such efforts can be further enhanced through the development of the concept of the semantic web. The semantic web idea is to enable machines to understand data through the development of protocol free data exchange formats such as Resource Description Framework (RDF) and the Web Ontology Language (OWL). These standards provide formal descriptors of objects, object properties and their relationships within a specific knowledge domain. However, the overhead of converting datasets typically stored in data tables such as Excel or other types of spreadsheets into RDF or OWL formats is not trivial for non-specialists and as such produce a barrier to seamless data exchange between researchers, databases and analysis tools. This problem is particular of need and importance in the field of network systems biology where biochemical interactions between genes and their products are abstracted to networks. For the purpose of converting biochemical interactions into the BioPAX format, the leading standard developed by the computational systems biology community, we developed an open-source command line tool that takes as input tabular data describing different types of biochemical interactions. The tool converts such interactions into the BioPAX level 3 OWL format. We used the tool to convert several existing and novel mammalian networks of protein interactions, signaling pathways and transcriptional regulatory networks into BioPAX and deposited these into PathwayCommons a repository for consolidating and organizing biochemical networks. Our command line tool sig2biopax is a useful resource that can enable experimental and computational systems biologists to contribute their identified networks for integration and reuse with the research community.
Running Sig2BioPAXv4
Sig2BioPAXv4 is packaged as an executable JAR file. You must have Java Virtual Machine installed on your computer. JVM is available from http://www.java.com/getjava. To run the GUI (graphical user interface) version of Sig2BioPAXv4, simply double click the sig2biopaxv4.jar, which is the distribution file. To use the command line version of the program open a command prompt and navigate to the folder containing the Sig2BioPAXv4.jar. Enter the command: java –jar sig2biopaxv4.jar -cmd args
, where args are the arguments you wish to supply as described below. For example, to use input file foo.txt, output file bar.owl, with the overwrite option, the command is: java -jar sig2biopaxv4.jar -cmd -in: foo.txt -out:bar.owl -o
If no arguments are used, the default input is input.txt, the default output is output.owl, and a sig input template, as well as the non-overwriting option will be used.
The command line tool may be accessed by using the command line argument –cmd
. In the command line tool, there are four different options which may be fed into the program as command-line arguments separated by spaces. The four options are:
1. Input File name. This is specified by the syntax -in:filename
, where filename
is the path to the input file. If no input file is specified, the default input.txt will be automatically attempted by the program. The file may be specified with either: name only, or directory structure + name. If name only, the program will search for the file in the same directory as the EXE. IMPORTANT – if the directory has spaces in the name, this argument must be surrounded by double quotation marks, “”.
2. Output File name. This is specified by the syntax -out:filename
, where filename
is the path to the output file. If no output file is specified, the default output.owl will be used. The file may be specified with either: name only, or directory structure + name. If name only, the program will create the output file in the same directory as the EXE. If directory + name, you must create the directory yourself first or an exception will be thrown. IMPORTANT – if the directory has spaces in the name, this argument must surrounded by the double quotation marks, “”.
3. Overwrite Output: -o . This switch, if given, will cause the program to erase either the default output file or the output file that was specified previously. The default is OFF, i.e., don't overwrite. In this case, a number will be appended onto the end of the output file name.
4. Input Template Name. This is specified by the syntax -t:type
, where type
is the typename of the desired input template. Currently there are three supported templates. First, the default, sig. This option will parse files having the following line syntax:
SN SHA SMA ST SL TN THA TMA TT TL E TOI PID
KEY:
SN = SourceName: Name of source molecule
SHA = SourceHumanAccession: Source Swiss-Prot human accession number
SMA = SourceMouseAccession: Source Swiss-Prot mouse accession number
ST = SourceType: Type of source molecule
SL = SourceLocation: Location of source molecule in the cell
TN = TargetName: Name of target molecule
THA = TargetHumanAccession: target Swiss-Prot human accession number
TMA = TargetMouseAccession: target Swiss-Prot mouse accession number
TT = TargetType: Type of target molecule
TL = TargetLocation: Location of target molecule in the cell
E = Effect: Effect of source on target. + (activating), _ (deactivating), or 0 (neutral)
TOI = TypeOfInteraction: Reaction type definition
PID = PubMedID: ID of article that identified this reaction
The second format can be chosen using the string argument source_target
. This option tells the program to parse the input file as having only six columns:
SN SL TN TL E TOI PID
The third format is tf_target
. This format is for converting transcription-factor target-gene interaction pairs. The field names (columns) are SourceName TargetName and PubMedID
. Sometimes, in this format, the PubMedID comes appended to the SourceName
like this: SourceName-PubMedID
. If this is the case, Sig2BioPAX will strip off the PubMedID
from the SourceName
.
Real example
The focal adhesome network sig file from the http://www.adhesome.org'>adhesome.org web site can be converted into BioPAX Level 3 by typing on the command-line:
java –jar sig2biopaxv4.jar -cmd –in:fa.sig –out:fa.owl –o
The input and output files can be viewed here: http://sig2biopax.googlecode.com/files/fa.sig'>fa.sig and http://sig2biopax.googlecode.com/files/fa.owl'>fa.owl
Contact
avi dot maayan at mssm dot edu and
ryan dot webb at mssm dot edu
Project Information
- License: GNU GPL v3
- 2 stars
- svn-based source control
Labels:
SystemsBiology
BioPAX
Protein-proteinInteractions
NetworkBiology
SignalingNetworks
GeneRegulatryNetworks
InteroperabilityTool
PathwayCommons
SBCNY