The Global Proteome Machine Organization

The Global Proteome Machine Organization

  QUARTZ Project - Jasper

The goal of the Jasper project is to create a report in xml format that contains a defined set of information about the best scoring, distinct peptides from GPMDB with expect values found in defined windows for charge states 1,2 and 3. In this case distinct peptide not only means that the sequence itself was distinct (no modifications), but also the combination of complete and/or potential residue modifications was unique.

To create the Jasper spectra xml files, there are two main steps required:

  1. populate the GPMDB DistinctSeq table with the distinct sequences found in the GPMDB Peptide table. This is done using the MySQL client at the command prompt with the query:
    " INSERT INTO DistinctSeq (seq) SELECT DISTINCT Peptide.seq FROM Peptide;"
  2. run a PERL script jasper.pl, using two parameters (minimum and maximum expect values) which define the window in which a peptide's score must exist to be included in the report. These values are indicated in the file names, e.g. jasper_.0001_.001_1.xml is the first file in the data set that contains spectra assigned to peptide sequences with expectation values from 0.0001 to 0.001. See reference 3 for the details of how the expectation values were calculated.

The first dataset created used a minimum expect of .0001 (log(e) = -4) and a maximum expect of .001 (log(e) = -3) and resulted in approximately 75MB of data spread across 16 files. If compressed, the size will be approximately 16MB. The report files created have a defined maximum size of approximately 5MB. They are available as zip or tar archive files. These files contain 64,515 assigned peptide spectra:

  • z = 1 — 6,290 spectra;
  • z = 2 — 40,576 spectra; and
  • z = 3 — 17,649 spectra.

The second dataset created used a minimum expect of .00001 (log(e) = -5) and a maximum expect of .0001 (log(e) = -4) and resulted in approximately 58MB of data spread across 13 files. If compressed, the size will be approximately 14MB. The report files created have a defined maximum size of approximately 5MB. They are available as zip or tar archive files. These files contain 52,469 assigned peptide spectra:

  • z = 1 — 3,657 spectra;
  • z = 2 — 35,107 spectra; and
  • z = 3 — 13,706 spectra.
Copyright © 2004-2011, The Global Proteome Machine Organization