|
Project Information
Featured
Downloads
Links
|
OMSSA Parser
OMSSA Parser Publications:
SearchGUI:
PeptideShaker:
NewsApril 16. 2012: OMSSA Parser v1.5 is now available:
January 31. 2012: OMSSA Parser v1.4.7 is now available:
September 9. 2011: OMSSA Parser v1.4.6 is now available:
What is OMSSA Parser?OMSSA Parser is a Java based parser for OMSSA (Open Mass Spectrometry Search Algorithm) omx files. Initially created by Steffen Huber for Prof. Dr. Albert Sickmann under the guidance of Dr. Lennart Martens, and further developed by Harald Barsnes, including the addition of a simple, lightweight and platform independent OMSSA Viewer. DownloadsThe zip file in the downloads section contains the OMSSA Parser jar file and all the libraries needed to run the parser. An example OMSSA data set (with mods.xml and usermods.xml files) can also be found in the downloads section. Note that you'll need to have an XML Pull Parser implementation in the classpath. The one OMSSA Parser was developed with, is the Extreme Pull Parser, version 1.1.4C. More information about this XML parser can be found here. The source code for OMSSA Parser can be viewed and downloaded by clicking the Source tab. The code has been developed in Netbeans. Javadoc for OMSSA Parser can be found in the downloads section, or can be created from the source code. Using OMSSA ParserThere are basically two ways of using OMSSA Parser:
OMSSA ViewerTo visualize and analyze OMSSA results we now recommend the use of PeptideShaker. OMSSA Viewer was mainly created for showing the usefulness of OMSSA Parser and as an example of how to use the parser, but it is also an easy way of visualizing the content of an omx file. It displays the spectra file details (spectrum number, filename, charge and m/z-value), the identification details for a selected spectra (peptide sequence, E-value, P-value, protein accession number, etc.), and the selected spectrum (both as a table and as a graph) including ion coverage. All in the same frame. OMSSA Viewer also includes the possibility of exporting the different components of an omx file as tab delimited text files (compatible with Microsoft Excel and other spreadsheets), and all the spectra as dta files. A screenshot of OMSSA Viewer can be found here. Using OMSSA Viewer is straight-forward. Simply download the latest version of the OMSSA Parser zip file in the downloads section, unzip the file and double click on the jar file. (If nothing happens, download Java 1.5 (or newer) and try again. If this does not fix your problem, see the Troubleshooting section below.) In the file selection dialog select the OMSSA omx file you want to view and the OMSSA modification files (mods.xml and usermods.xml) if available and click on 'OK' to start parsing the files. A graphical user interface will then appear displaying a table containing the details of all the spectra found in the file. Clicking on one of the rows in this table brings up the associated identifications and the spectrum details including ion-coverage. (See also screenshot.) OMSSA Parser Jar FileUsing OMSSA Parser directly requires that you create your own Java project that uses the OMSSA Parser jar file as a library. The library consists of three packages: (i) de.proteinms.omxparser, (ii) de.proteinms.omxparser.util and (iii) de.proteinms.omxparser.tools. (A complete UML class diagram is found in the download section.) The de.proteinms.omxparser package only contains the OmssaOmxFile class which manages the parsing of the omx file, and provides several methods to easily retrieve the parsed information. To parse an omx file into an OmssaOmxFile object use the class’s constructor with the following input parameters: (i) the omx file, (ii) the OMSSA mods.xml file, and (iii) the OMSSA usermods.xml file. (The latter two are the files where OMSSA stores the properties of the amino acid modifications.) Only the first file is mandatory, but if the modification files are not provided no additional information about the modifications, besides a modification number, can be extracted. The creation of the OmssaOmxFile object creates an OmxParser object that does the actual parsing of the omx file. Creating an OmssaOmxFile also takes care of processing the results and creates several data structures that make it easier to extract commonly used information from the omx file. Among these is the method getParserResult(), that returns all data from the original omx file as an MSSearch object, and the method getSpectrumToPeptideMap(), that returns a HashMap where every spectrum is allocated to its corresponding peptides. (See the Javadoc for a complete list of the available methods.) The OmxParser class is located in the de.proteinms.omxparser.util package along with the other classes needed for the actual parsing. The structure of the OMSSA Parser code is closely related to the omx file structure. When using the parser it is therefore recommended to have a copy of OMSSA.mod.xsd available as a reference. The example omx file contains excerpts of an omx file. As an example of how to use OMSSA Parser we describe the code for extracting the m/z-values of all the spectra in a file (see example omx file - Spectra): OmssaOmxFile omxFile = new OmssaOmxFile(“C:\\OMSSA_Files\\BSA.omx”);
HashMap<MSSpectrum, MSHitSet> results = omxFile.getSpectrumToHitSetMap();
Iterator<MSSpectrum> iterator = results.keySet().iterator();
ArrayList<List<Integer>> allMzValues = new ArrayList();
while (iterator.hasNext()) {
MSSpectrum tempSpectrum = iterator.next();
allMzValues.add(tempSpectrum.MSSpectrum_mz.MSSpectrum_mz_E);
}The OMSSA Parser code uses public fields to move from a tag to its subtags, e.g., MSSpectrum_mz.MSSpectrum_mz_E moves from the MSSpectrum_mz tag to its subtag MSSpectrum_mz_E, and uses linked list whenever a subtag can occur more than once, e.g., MSSpectrum_mz_E is a linked list of integer values containing all the m/z-values for a given spectrum. The last package in OMSSA Parser is de.proteinms.omxparser.tools, which contains a simple, lightweight viewer for OMSSA omx files called the OMSSA Viewer. In addition to the packages detailed above, OMSSA Parser also requires the following four libraries: xpp3-1.1.4c.jar (for the XML parsing), utilities-2.9.jar (for the spectrum graph), looks-2.2.0.jar (for the graphical user interface) and log4j-1.2.15.jar (for logging functionality). All of these are available in Maven compliant repositories. They are also included in the *OMSSA Parser* zip file. UML Class DiagramSee downloads section. Unfortunately, the diagram is too big to display directly in the browser. Modification DetailsIn the omx file the only information present about a given amino acid modification is its location in the peptide and a modification reference number. To extract additional information about a modification, e.g., the type or the modification mass, the two XML files mods.xml and usermods.xml are needed. These are normally located in the OMSSA installation folder, and if provided allows you to map a given modification reference number to an OmssaModification object containing the details about the modification. Use the method getModifications() in the OmssaOmxFile class to extract this information after parsing an omx file. OMSSA EnumerationsThe OMSSA OMX file uses several enumerations. When parsing a tag referring to an enumeration an integer ID is return. To map this ID to the corresponding text element, use the OmssaEnumerators class. This class has maps for all enumerations, e.g., MSEnzymes, MSIonType etc. By providing the integer ID the corresponding text element wil be returned. Scaled ValuesNote that all the m/z and abundance values in the omx file are stored as integers, and needs to be scaled to get the real values. Each spectrum has its own abundance scale, MSSpectrum_iscale, while the m/z-scales are given by MSSearchSettings_scale and MSResponse_scale. Converting OMSSA OMX Files to PRIDE XMLIf you want to convert your OMSSA omx file into PRIDE XML for submission to the online PRIDE repository, please check out PRIDE Converter which uses OMSSA Parser for this purpose. Result AnalysisTo visualize and analyze OMSSA results we recommend the use of PeptideShaker. PeptideShaker is a search engine independent platform for visualization of peptide and protein identification results from multiple search engines. Maven DependencyOMSSA Parser is available for use in Maven projects: <dependency>
<groupId>de.proteinms.omxparser</groupId>
<artifactId>omssa-parser</artifactId>
<version>X.Y.Z</version>
</dependency><repository>
<id>genesis-maven2-repository</id>
<name>Genesis maven2 repository</name>
<url>http://genesis.UGent.be/maven2</url>
</repository>Update the version number to latest released version. Note that OMSSA Parser does not yet build in Maven 3. We are working on fixing this but in the meantime please use Maven 2 to build OMSSA Parser. TroubleshootingMemory SettingsIf you are parsing a large omx file (say bigger than 100MB), you'll need to allow for more memory space for the Java Virtual Machine (JVM). You can do this by appending the following start-up arguments to the java command: java -Xmx999m ... where you can substitute the '999' by the maximum amount of megabytes you want the JVM to access. When using OMSSA Viewer the memory boundaries can be set it the JavaOptions.txt file in the ../Properties folder using the same syntax as above. Note that most 32 bit platforms will only allow memory allocations up to 1500 megabytes. MSBioSeqNote that the OMSSA OMX file may contain references to the Bioseq module/schema FROM NCBI-Sequence. These elements are not currently parsed by OMSSA Parser. However, files containing such elements are of course parsed without errors. Screenshot(Click on figure to see the full size version) Example OMX FileFor the complete file see the downloads section. Spectra<MSSearch ...>
<MSSearch_request>
<MSRequest>
<MSRequest_spectra>
<MSSpectrumset>
<MSSpectrum>
<MSSpectrum_number>0</MSSpectrum_number>
<MSSpectrum_charge>
<MSSpectrum_charge_E>1</MSSpectrum_charge_E>
</MSSpectrum_charge>
<MSSpectrum_precursormz>815340</MSSpectrum_precursormz>
<MSSpectrum_mz>
<MSSpectrum_mz_E>674700</MSSpectrum_mz_E>
...
<MSSpectrum_mz_E>850500</MSSpectrum_mz_E>
</MSSpectrum_mz>
<MSSpectrum_abundance>
<MSSpectrum_abundance_E>149900000</MSSpectrum_abundance_E>
...
<MSSpectrum_abundance_E>119000000</MSSpectrum_abundance_E>
</MSSpectrum_abundance>
<MSSpectrum_iscale>100000</MSSpectrum_iscale>
<MSSpectrum_ids>
<MSSpectrum_ids_E>LCQ10486.10.10.1.dta</MSSpectrum_ids_E>
</MSSpectrum_ids>
</MSSpectrum>
...
</MSSpectrumset>
</MSRequest_spectra>
…
</MSRequest>
</MSSearch_request>
…
</MSSearch>Identifications<MSSearch ...>
…
<MSSearch_response>
<MSResponse>
<MSResponse_hitsets>
<MSHitSet>
<MSHitSet_number>21</MSHitSet_number>
<MSHitSet_hits>
<MSHits>
<MSHits_evalue>3.95569513003838e-006</MSHits_evalue>
<MSHits_pvalue>5.34553395951132e-009</MSHits_pvalue>
<MSHits_charge>2</MSHits_charge>
<MSHits_pephits>
<MSPepHit>
<MSPepHit_start>528</MSPepHit_start>
<MSPepHit_stop>543</MSPepHit_stop>
<MSPepHit_gi>1351907</MSPepHit_gi>
<MSPepHit_accession>P02769</MSPepHit_accession>
<MSPepHit_defline>Serum albumin precursor (Allergen Bos d 6) (BSA)</MSPepHit_defline>
<MSPepHit_protlength>607</MSPepHit_protlength>
<MSPepHit_oid>1064518</MSPepHit_oid>
</MSPepHit>
</MSHits_pephits>
<MSHits_mzhits>
<MSMZHit>
<MSMZHit_ion>
<MSIonType>1</MSIonType>
</MSMZHit_ion>
<MSMZHit_charge>1</MSMZHit_charge>
<MSMZHit_number>4</MSMZHit_number>
<MSMZHit_mz>646335</MSMZHit_mz>
</MSMZHit>
...
</MSHits_mzhits>
<MSHits_pepstring>LFTFHADICTLPDTEK</MSHits_pepstring>
<MSHits_mass>1908857</MSHits_mass>
<MSHits_pepstart>K</MSHits_pepstart>
<MSHits_pepstop>Q</MSHits_pepstop>
<MSHits_theomass>1906913</MSHits_theomass>
</MSHits>
</MSHitSet_hits>
<MSHitSet_ids>
<MSHitSet_ids_E> LCQ10486.10.10.1.dta</MSHitSet_ids_E>
</MSHitSet_ids>
<MSHitSet_settingid>0</MSHitSet_settingid>
</MSHitSet>
...
</MSResponse_hitsets>
...
</MSResponse>
</MSSearch_response>
</MSSearch>
|
