My favorites | Sign in
Project Home Downloads Wiki Issues Source
New issue   Search
for
  Advanced search   Search tips   Subscriptions
Issue 56: PeptideEvidence in mzIdentML
1 person starred this issue and may be notified of changes. Back to list
Status:  Fixed
Owner:  ----
Closed:  Apr 2011


Sign in to add a comment
 
Project Member Reported by johannes...@gmail.com, Feb 11, 2011
This issue focuses on the proposal to move the PeptideEvidence (PE)
object from being a child object of SpectrumIdentificationItem (SII) to
the same level as DBSequence and Peptide.

The current schema forces software and databases to create a vast amount
of PE objects which creates a significant computational overhead and
makes working with the file format extremely difficult.

The idea of this proposal is to use PE to represent the link between
Peptides and Protein Sequences. Basically, to represent the phrase "a
certain peptide in a certain protein at a certain position". Therefore,
a Peptide_ref was added to PE and PE was moved to be a child element of SequenceCollection.

As PE is no longer enzyme specific the enzyme specific attribute "missedCleavages" was removed. It has to be discussed if this information should be present in the schema and if so, where it should be put.

The SII was adapted to now hold a number of 0 .. n references to PE.
These optional references should represent all "peptides in proteins"
that could be inferred from the spectrum without any regard to protein
inference. It is also valid to just provide a Peptide_ref without any PE
refs in a SII. In such a case, the API would generate a list of all
possible PEs associated with this SII.

Protein inference should now be handled completely at the
ProteinDetectionHypothesis (PDH) level. Therefore, the
PeptideDetectionHypothesis was adapted to hold a number of 1..n SII_refs
(together with the PE ref as attribute). This would resemble the
statement that the PDH is backed up by this PE identified through the
following SIIs.
mzIdentML1.1.0.xsd
110 KB   Download
Feb 14, 2011
#1 a...@cuckundoorecords.com
I like the look of PE being in the sequence collection.

It appears to me that if we want to keep missedCleavages (and I would vote that we do), that this makes sense to be on SII either as an attribute, a new element or a cvParam.

For all simple cases, I think this model holds up, for complex cases with multiple enzymes, we may need to model which enzyme this refers to. Although this is probably equally not covered in the 1.0 schema.

"It is also valid to just provide a Peptide_ref without any PE
refs in a SII. In such a case, the API would generate a list of all
possible PEs associated with this SII."

Intuitively I don't like the sound of this, some file writers would produce PEs, others would not. If all this can be inferred by an API, the argument goes that PE is not needed at all. However, we do not enforce that the protein sequence be reported (since for some output formats this is not always possible without the searched database) so an API would not be able to infer pre, post or position. I would prefer that PE must be reported for all valid peptide to protein matches by the file writer

"Protein inference should now be handled completely at the
ProteinDetectionHypothesis (PDH) level. Therefore, the
PeptideDetectionHypothesis was adapted to hold a number of 1..n SII_refs
(together with the PE ref as attribute). This would resemble the
statement that the PDH is backed up by this PE identified through the
following SIIs."

Generally I agree with linking to SIIs from PDH. I'm coming round to the idea of also including the PE_ref, as a quick link to get to non redundant peptides identified without going via all SIIs. It makes a bit more work for writers but for some use cases, saves work for file readers. If we stick with this though, again I think PE cannot be optional.
Feb 14, 2011
Project Member #2 johannes...@gmail.com
"Intuitively I don't like the sound of this, some file writers would produce PEs, others would not. If all this can be inferred by an API, the argument goes that PE is not needed at all. However, we do not enforce that the protein sequence be reported (since for some output formats this is not always possible without the searched database) so an API would not be able to infer pre, post or position. I would prefer that PE must be reported for all valid peptide to protein matches by the file writer"

This was meant differently. Not PEs are optional but the PE references from SII to PE thus PE_refs. As all PEs link to a Peptide the PEs that a SII refers to can be inferred from the Peptide(_ref). It is still mandatory to provide all possible PEs.

"Generally I agree with linking to SIIs from PDH. I'm coming round to the idea of also including the PE_ref, as a quick link to get to non redundant peptides identified without going via all SIIs. It makes a bit more work for writers but for some use cases, saves work for file readers. If we stick with this though, again I think PE cannot be optional."
This is exactly our current proposal. The PeptideHypothesis contains the PE_ref as attribute and a list of SII_refs as child elements (since several SIIs can link to the same PE). In this list, only the SIIs that were used for scoring should be included.

If we want to keep enzyme specific information I would not want to put them into SII. Even though this might be convenient at the moment it is not reflecting the nature of the information. Basically, it is part of a Peptide's properties in respect to a certain protein, thus would go at the PE level. In my opinion parameters or sub-elements with a reference to the respective enzymes seems more suited.
Feb 28, 2011
Project Member #3 johannes...@gmail.com
As discussed in the previous mzIdentML conference we updated the schema
proposal to solve the problem of enzyme specific information at the
PeptideEvidence (PE) level. A new element was created under "SequenceCollection" called "PeptideEvidenceList" (PEList). These 1:n PELists contains 1:n PEs plus 0:n enzyme references (and optional cv / user parameters). Furthermore, EnzymeType was changed to be an extension of "IdentifiableType".

If a protocol with two enzymes (A and B) is being used PEs can now be
grouped according to the enzyme(s) they come from. F.e. all PEs from
peptide A, all PEs from enzyme B and a third PEList for all PEs where
it's not sure if they come from A or B (this list then contains two
references).

As enzyme specific information should now no longer be a problem at the
PE level the previously removed attribute "missedCleavages" was added
again.

Additionally, we simplified the names of several elements removing the
f.e. "PSI....." part from the beginning of the name. At last, we changed
"SearchModificationType" as proposed in the last call. ModParam was
removed and all attributes as well as the cvParam were added to
"SearchModificationType". The multiplicity of cvParam was furthermore
changed to 1:n.

The proposed schema was added to the repository:
https://code.google.com/p/psi-pi/source/browse/trunk/schema/mzIdentML1.1.0.xsd
Apr 3, 2011
Project Member #5 eisena...@googlemail.com
(No comment was entered for this change.)
Labels: -Version1.1 Milestone-Release1.1
Apr 12, 2011
Project Member #6 eisena...@googlemail.com
agreed at Heidelberg
Status: Fixed
Sign in to add a comment

Powered by Google Project Hosting