The goal of the Jasper project is to create a report in xml format that contains a
defined set of information about the best scoring, distinct peptides from GPMDB with
expect values found in defined windows for charge states 1,2 and 3.
In this case distinct peptide not only means that the sequence itself was distinct
(no modifications), but also the combination of complete and/or potential residue
modifications was unique.
To create the Jasper spectra xml files, there are two main steps required:
- populate the GPMDB DistinctSeq table with the distinct sequences found in the
GPMDB Peptide table. This is done using the MySQL client at the command prompt with
the query:
"
INSERT INTO DistinctSeq (seq) SELECT DISTINCT Peptide.seq FROM Peptide;"
- run a PERL script jasper.pl, using two parameters
(minimum and maximum expect values) which define the window in which a peptide's
score must exist to be included in the report. These values are
indicated in the file names, e.g. jasper_.0001_.001_1.xml is the first file in the data set that
contains spectra assigned
to peptide sequences with expectation values from 0.0001 to 0.001. See
reference 3 for the details of how the expectation
values were calculated.
The first dataset created used a minimum expect of .0001 (log(e) = -4) and a maximum expect
of .001 (log(e) = -3) and resulted in approximately 75MB of data spread across 16 files.
If compressed, the size will be approximately 16MB. The report files created have
a defined maximum size of approximately 5MB. They are available as
zip or tar archive files. These files
contain 64,515 assigned peptide spectra:
- z = 1 — 6,290 spectra;
- z = 2 — 40,576 spectra; and
- z = 3 — 17,649 spectra.
The second dataset created used a minimum expect of .00001 (log(e) = -5) and a maximum expect
of .0001 (log(e) = -4) and resulted in approximately 58MB of data spread across 13 files.
If compressed, the size will be approximately 14MB. The report files created have
a defined maximum size of approximately 5MB. They are available as
zip or tar archive files. These files
contain 52,469 assigned peptide spectra:
- z = 1 — 3,657 spectra;
- z = 2 — 35,107 spectra; and
- z = 3 — 13,706 spectra.
|