The Global Proteome Machine Organization The Global Proteome Machine
The home of proteomics crowd-sourced "Big Data"
    How to interpret sequence site assignments

Proteomics frequently involves the use of large lists of protein sequence related assignments based on the interpretation of tandem mass spectra. This page describes some general rules-of-thumb for interpretting this type of information when you need to apply it to a specific experimental or theoretical case.

    Some advice on sequence site assignments
  1. Adjacent residues may be hard to distinguish from the measurements.
    It may be very difficult to assign the exact residue associated with any particular assignment. For example, if a peptide has the sequence "ASTYYLFR", it may be easy to determine from a mass spectrum that this peptide sequence is phosphorylated. It may be very difficult, however, to determine whether the phosphorylation is on S[2] as opposed to T[3] based on the fragmentation pattern in that spectrum. Similarly it may be difficult to distinguish between phosphorylation at Y[4] or Y[5]. In GPMDB, if there is no data in the spectrum that clearly distinguishes the two (or more) cases, both will be reported. Therefore if there are assignments at nearly adjacent residues, exercise caution and consult the original data (using pSYT) to determine how well (or poorly) these alternative cases are supported by the original experimental observations.
  2. Splice variants may be difficult to assign unambiguously.
    The data obtained from experiments that pull down peptides with specific modifications, e.g., metal-oxide columns for phosphopeptides, usually will only retain a small number of peptide sequences for a particular protein. Given the very limited sequence coverage associated with a small number of peptides, it is usually not possible to specify which alternate splice variant or protein isoform has been detected. GPMDB reports all protein variants that contain the detected peptide sequence in an individual experiments. If it is important to know which variant has been modified, it will be necessary to examine the data in detail. Therefore, it is easier to exclude a variant on the basis of a missing site assignment than it is to distinguish between alternate sequences all of which contain the same site assignment.
  3. Isobaric interference.
    All assignments made by tandem mass spectrometry can only distinguish between things that result in measurable mass differences. Some modifications are simply too close in mass to be confidently distinguished using the types of measurements that are commonly used in high-throughput proteomics. One important example is tyrosine phosphorylation vs. sulfonation. Both modifications are very similar in mass (79.966331 Da vs. 79.956815 Da) and to conclusively measure one or the other requires high resolution mass spectrometry. Another example is lysine acetylation vs. trimethylation (42.010565 Da vs 42.046950 Da) which requires careful measurement to ensure correct assignment. Fortunately phosphorylation and acetylation are much more frequent post-translational modifications than sulfonation or trimethylation in general, so simply assigning the most common modication is often justified. Some protein sequences, such as histones, in which multiple modifications can occur may also require more careful treatment than is possible in generating high-throughput sequence modification information.

Copyright © 2010, The Global Proteome Machine Organization

Privacy Statement