The idea of using "proteotypic peptides" is a relatively new notion in protein/peptide
identification. It is simply the recognition of the fact that if you cleave a protein into peptides,
not all of the peptides are equally likely to be detected by current mass spectrometry-based
techniques. Some peptides from a particular protein sequence are detected easily, while others
are very difficult to find. The peptides generated from a sequence that are always
detected are called proteotypic, i.e., those peptides alone are indicative of a
the presence of a particular protein.
This idea suggests that it should be possible to scan through a set of data, for example an
LC/MS/MS run, looking only for the known proteotypic peptides for a particular organism. Finding
those proteotypic peptides is enough to know that the protein was present in the original sample.
Because there will only be a few proteotypic peptides for a protein, it should be possible to
improve both the speed and accuracy of the resultant protein identifications.
The X! P3 (Proteotypic Peptide Profiler) project is the first publically available search engine that takes
advantage of this idea. Built using the X! TANDEM refinement idea and the open source X! TANDEM
code, X! P3 takes the proteotypic peptide idea to its logical conclusion by adding
a few simple steps. Rather than simply identifying the proteins, a proteotypic approach is used to
find protein sequences and then refinement is used on the full spectrum data set to find all
of the peptides present, as well as looking for post-translational modifications, point mutations
and unanticipated peptide cleavages. It works this way:
- In the first round, the spectrum data set is examined for the presence of proteotypic peptides.
- The full protein sequences of the proteins identified in the first round are then pulled from
a sequence library.
- Using this small set of full sequences, multiple rounds of refinement are performed to
extract all of the non-proteotypic peptides from the full spectrum data set
A potential problem with this type of approach is clearly the lack of a good set of proteotypic peptides
to use. This has been solved through the GPMDB, which is the largest collection of proteomics data
available to the public. By querying GPMDB to find the best peptides representative of a particular
protein, it is now possible to produce very good quality libraries of these peptides for two model organisms, namely
Homo sapiens and Saccharomyces cerevisiae, as well as several commonly observed experimental
artifacts, such as BSA and trypsin. The sequence libraries are updated daily from GPMDB, so the system
has the ability to learn about new proteotypic peptides, as they are generated by the overall
Global Proteome Machine.
An X! P3 server has been established for these model
species. Please give it a try. We will be releasing the X! P3 code thorough our code repository and
the sequence libraries by ftp at ftp://ftp.thegpm.org/proteotypic_peptide_profiles.