| Issue 3: | Reverse match database representation | |
| 1 person starred this issue and may be notified of changes. | Back to list |
we need to fix the way the reverse matches are stored in the db. currently the proteinacc for a reverse match is rm and the proteindescr is rm:######## and the Acctype is NCBIAcc. It should be rm:##### as the proteinacc and the acctype should be some pre-defined type. Probably the easiest way is to fix the code that parses the hits to recognize rm as a reverse match and put rm###### as the proteinAcc, the Acctype as ReverseMatch (instead of NCBIAcc) This probably in the parse hits script. We can do a blanket fix for the records currently in db update TppProtein set AccType = "ReverseMatch" where ProteinAcc = "rm" update TppProtein set ProteinAcc = ProteinDec where ProteinAcc = "rm" With the reverse matches represented correctly we will be able to incorporate them into the Saint output and be better able to fluctuate our false positive results in the input to saint (and see how it preforms).
Mar 29, 2011
Project Member
#1
rr.weinb...@gmail.com
Mar 29, 2011
I would recommend informing Frank that you want to make this change, since it will affect his code. I can easily change the code for inserting rm proteins in import_stt.pl; I will wait however until I get some kind confirmation that we're all on the same page.
Mar 29, 2011
Did you throw out all rm hits with the import_stt script?
Mar 29, 2011
No, I kept them, following the convention that already existed in the database. In particular, I noted in my script that: When this script was written, it was found that 'rm' source proteins appeared in the database with their ProteinAcc value as 'rm', and their ProteinDec as 'rm $identifier', where $identifier is some generic and meaningless id. Also, it was found that marker peptides, which are proteins with a source of 'JH001', were inserted with ProteinAcc = 'JH001'. As such, we follow this convention. (I also noted that their AccType wasn't specified).
Mar 29, 2011
Added an additional change so we can get rm data in the correct format (need to make sure that AccType is checked when matching to table because don't want reverse matches to be mistaken for GIs or EGs): update TppProtein set ProteinAcc = trim(LEADING 'rm ' from ProteinAcc) where AccType='RM';
Mar 29, 2011
There is a reference to the Protein Acc in the TppPeptide table that also needs to be updated: select PR.ProteinAcc,PR.ID, PR.BandID, PEG.ID as GroupID, PEP.ID as PeptideID, PEP.Protein from TppProtein PR, TppPeptideGroup PEG, TppPeptide PEP where PR.AccType = "RM" and PR.ID = PEG.ProteinID and PEG.ID = PEP.GroupID and PR.BandID = PEP.BandID; Need indexes on TppProtein ID,BandID,AccType TppPeptideGroup ID, ProteinID, TppPeptide GroupID, BandID Missing Indexes: TppPeptideGroup ProteinID create index TPG_protid on TppPeptideGroup(ProteinID) using BTREE;
Apr 13, 2011
Procedure: update TppProtein set AccType="RM" where ProteinAcc="rm"; update TppProtein set ProteinAcc=ProteinDec where AccType="RM"; update TppProtein set ProteinAcc = replace(ProteinAcc,'rm ','rm|') where AccType='RM'; |