My favorites | Sign in
Project Home Wiki Issues Source
READ-ONLY: This project has been archived. For more information see this post.
Search
for
  Advanced search   Search tips   Subscriptions
Issue 3: Reverse match database representation
1 person starred this issue and may be notified of changes. Back to list
 
Project Member Reported by rr.weinb...@gmail.com, Mar 29, 2011
we need to fix the way the reverse matches are stored in the db.  currently the proteinacc for a reverse match is rm and the proteindescr is rm:######## and the Acctype is NCBIAcc.  It should be rm:##### as the proteinacc and the acctype should be some pre-defined type.

Probably the easiest way is to fix the code that parses the hits to recognize rm as a reverse match and put rm###### as the proteinAcc, the  Acctype as ReverseMatch (instead of NCBIAcc)  This probably in the parse hits script.

We can do a blanket fix for the records currently in db
update TppProtein set AccType = "ReverseMatch" where ProteinAcc = "rm"
update TppProtein set ProteinAcc = ProteinDec where ProteinAcc = "rm"

With the reverse matches represented correctly we will be able to incorporate them into the Saint output and be better able to fluctuate our false positive results in the input to saint (and see how it preforms).
Mar 29, 2011
Project Member #1 rr.weinb...@gmail.com
Updated records on Tin using the following commands: (couldn't use "ReverseMatch" becaues the field length wasn't long enough)
update TppProtein set AccType="RM" where ProteinAcc="rm";

update TppProtein set ProteinAcc=ProteinDec where AccType="RM";

Reverse Matches are now in the TppProtein table correctly.

Need to update:
  1.  Code that parse the hits to recognize reverse matches and enter them correctly.
  1. Tpppeptide view to not have a link to NCBI with reverse matches
Mar 29, 2011
Project Member #2 chocomoo...@gmail.com
I would recommend informing Frank that you want to make this change, since it will affect his code.  I can easily change the code for inserting rm proteins in import_stt.pl; I will wait however until I get some kind confirmation that we're all on the same page.
Mar 29, 2011
Project Member #3 rr.weinb...@gmail.com
Did you throw out all rm hits with the import_stt script?
Mar 29, 2011
Project Member #4 chocomoo...@gmail.com
No, I kept them, following the convention that already existed in the database.  In particular, I noted in my script that:

When this script was written, it was found that 'rm' source proteins appeared in the database with their ProteinAcc value as 'rm', and their ProteinDec as 'rm $identifier', where $identifier is some generic and meaningless id.  Also, it was found that marker peptides, which are proteins with a source of 'JH001', were inserted with ProteinAcc = 'JH001'.  As such, we follow this convention.  (I also noted that their AccType wasn't specified).
Mar 29, 2011
Project Member #5 rr.weinb...@gmail.com
Added an additional change so we can get rm data in the correct format (need to make sure that AccType is checked when matching to table because don't want reverse matches to be mistaken for GIs or EGs):

update TppProtein set ProteinAcc = trim(LEADING 'rm ' from ProteinAcc) where AccType='RM';
Mar 29, 2011
Project Member #6 rr.weinb...@gmail.com
There is a reference to the Protein Acc in the TppPeptide table that also needs to be updated:

select PR.ProteinAcc,PR.ID, PR.BandID, PEG.ID as GroupID, PEP.ID as PeptideID, PEP.Protein 
from TppProtein PR, TppPeptideGroup PEG, TppPeptide PEP 
where PR.AccType = "RM" and PR.ID = PEG.ProteinID and PEG.ID = PEP.GroupID and PR.BandID = PEP.BandID;

Need indexes on TppProtein ID,BandID,AccType TppPeptideGroup ID, ProteinID, TppPeptide GroupID, BandID
Missing Indexes: TppPeptideGroup ProteinID

create index TPG_protid on TppPeptideGroup(ProteinID) using BTREE;
Apr 13, 2011
Project Member #7 rr.weinb...@gmail.com
Procedure:

update TppProtein set AccType="RM" where ProteinAcc="rm";

update TppProtein set ProteinAcc=ProteinDec where AccType="RM";

update TppProtein set ProteinAcc = replace(ProteinAcc,'rm ','rm|') where AccType='RM';


Powered by Google Project Hosting