|
|
|
X! TANDEM Spectrum Modeler
|
|
X! Tandem open source is software that can match tandem mass spectra with
peptide sequences, in a process that has come to be known as protein
identification.
This software has a very simple, sophisticated application programming interface
(API): it simply takes an XML file of instructions on its command line, and
outputs the results into an XML file, which has been specified in the input XML
file. The output format is described here (PDF).
This format is used for all of the X! series search engines, as well as the GPM and GPMDB.
Unlike some earlier generation search engines, all of the X! Series search engines
calculate statistical confidence (expectation values) for all of the individual
spectrum-to-sequence assignments. They also reassemble all of the peptide assignments
in a data set onto the known protein sequences and assign the statistical confidence
that this assembly and alignment is non-random. The formula for which can be found
here. Therefore, separate assembly and
statistical analysis software, e.g. PeptideProphet and ProteinProphet, do not need to
be used.
|
| Latest release: 2008.12.01 |
| This is a maintenance release of X! Tandem TORNADO. It includes an improvement
in the threading mechanism that should improve overall performance for analyzing large datasets on multiprocessor/multicore
computers and preliminary support for mzML data files. |
| System level changes |
- A preliminary implimentation of the mzML file type has been made, compatible with files generated by ReAdW and ProteoWizard
has been added. No changes to existing data file format implimentations have been made.
- The way that spectra are divided up between executing threads has been altered to better balance
processor use for large LC/MS/MS datasets. In previous versions, spectra were divided up into equally-sized contiguous blocks
and distributed to the available threads. For example, if there were 6 spectra and 2 threads:
- thread 1 = spectra #1, #2, #3; and
- thread 2 = spectra #4, #5, #6.
This method works, but it can run into load balancing problems for large datasets where there is a bias in the type of spectra
in the first part of a data set compared to the last part of a data set. This problem often occurs in large LC/MS/MS data sets,
inwhich there tend to be spectra with larger parent ion masses in the latter half of the data. These larger peptides take longer
to solve: in the example with 2 threads, that would mean that the 1st thread finishes before the 2nd thread, leaving one processor
idle for some period. To address this problem, the new threading system assigns spectra to threads in an alternating pattern:
- thread 1 = spectra #1, #3, #5; and
- thread 2 = spectra #2, #4, #6.
This should have the effect of better balancing the complexity of the calculation between all threads.
|
|