GPM Literature References

The Global Proteome Machine Organization

The Global Proteome Machine was set up so that scientists involved in proteomics using tandem mass spectrometry could use that data to analyze proteomes. The following references to the reviewed literature are suggestions for assisting scientists in preparing publications that use the system.

The Biopolymer Markup Language, David Fenyö,Bioinformatics. 1999, 15, 339-40.
This is the best reference for the underlying XML that is used throughout the GPM system.
Informatics and data management in proteomics, David Fenyö and Ronald C. Beavis, Trends Biotechnol. 2002, 20, S35-8.
This is a good reference for our underlying philosophy of how bioinformatics and proteomics go together.
A Method for Assessing the Statistical Significance of Mass Spectrometry-Based Protein Identifications Using General Scoring Schemes, David Fenyö and Ronald C. Beavis, Anal. Chem., 2003, 75, 768-774.
This reference describes how peptides are scored by X! TANDEM. The expectation values on the individual peptides are calculated using this method.
TANDEM: matching proteins with mass spectra, Robertson Craig and Ronald C. Beavis, Bioinformatics, 2004, 20, 1466-7.
This reference is the official reference for X! TANDEM as open source software.
A Method for Reducing the Time Required to Match Protein Sequences with Tandem Mass Spectra, Robertson Craig and Ronald C. Beavis; Rapid Commun. Mass Spectrom., 2003, 17: 2310-2316.
This contains most of the technical details of how X! TANDEM speeds up searches.
Probity: A Protein Identification Algorithm with Accurate Assignment of the Statistical Significance of the Results, Jan Eriksson and David Fenyö, J. Proteome Res., 2004, 3, 32-36.
This reference describes the statistical model of how protein expectations can be calculated from a selected group of peptides. It is referenced in 4 and 5 (although it wasn't in print yet). The expectation values for proteins are calculated with this method, together with the expectation values of the individual peptides.
Evaluation of Multidimensional Chromatography Coupled with Tandem Mass Spectromety (LC/LC-MS/MS) for Large-Scale Protein Analysis: The Yeast Proteome, J. Peng, J.E. Elias, C.C. Thoreen, L.J. Licklider and S. P. Gygi, J. Proteome Res., 2003, 2, 43-50.
This reference describes the idea of using reversed sequences to validated large collections of protein identifications. The GPM has this method built-in as a possible method for validation.
N.B.: We strongly recommend that you do not use this type of method for any purpose other than comparison with other search engines. The "decoy" search methods that have been developed from this manuscript are deeply flawed algorithms for determining the confidence of peptide identification assignments.
An Open Source System for Analyzing, Validating and Storing Protein Identification Data, Robertson Craig, John P. Cortens and Ronald C. Beavis, , J. Proteome Res., 2004, 3, 1234-42.
This reference describes the underlying technical aspects of how the GPMDB was constructed and some of its potential uses in proteomics.
An improved model for prediction of retention times of tryptic peptides in ion-pair reverse phase HPLC; its application to protein peptide mapping by off-line HPLC-MALDI MS, O. V. Krokhin, R. Craig, V. Spicer, W. Ens, K. G. Standing, R. C. Beavis and J. A. Wilkins, Mol. Cell Proteomics, 2004, 3, 908-919.
GPM uses calculated reverse phase HPLC retention times to create synthetic HPLC displays. This article describes the theoretical and practical aspects of how the system does this calculation.
Reference points for comparisons of two-dimensional maps of proteins from different human cell types defined in a pH scale where isoelectric points correlate with polypeptide compositions., B. Bjellqvist, B. Basse, E. Olsen and J.E. Celis, Electrophoresis, 1994, 15, 529-39.
GPM uses the method described in this manuscript to calculate protein and peptide pI values.
The use of proteotypic peptide libraries for protein identification., R. Craig, J.P. Cortens, R.C. Beavis, Rapid Commun Mass Spectrom. 2005 Jun 8;19(13):1844-1850.
The initial publication describing how proteotypic peptide libraries are constructed and used to improve protein identifications in the GPM.
Using Annotated Peptide Mass Spectrum Libraries for Protein Identification., R. Craig, J.P. Cortens, D. Fenyo and R.C. Beavis, J. Proteome Res. 2006, 10.1021/pr0602085.
The initial publication describing the use of Annotated Spectrum Libraries (ASLs) and X! Hunter to identify proteins.
Determining the overall merit of protein identification data sets: rho-diagrams and rho-scores., Fenyö D, Phinney BS, Beavis RC., J Proteome Res. 2007 May;6(5):1997-2004
The publication describing rho-scoring (ρ-scoring) and how it can be used to evaluate the relative merit of a proteomics data set.
The GPMDB REST interface, Fenyö D, Beavis RC, Bioinformatics (2015) 31(12): 2056-8
Describes the GPMDB JSON-REST interface for accessing information about peptides, proteins and datasets in GPMDB.
g2pDB: A Database Mapping Protein Post-Translational Modifications to Genomic Coordinates, Keegan S, Cortens JP, Beavis RC, Fenyö D, J Proteome Res (2016)
Describes the g2pDB PTM annotation system and a JSON-REST interface for accessing PTM information in genomic coordinates.