GPM Installation

The Global Proteome Machine Organization

GPM Installation

FAQ | more ...

Can I rename the result XML files?
Why don't my spectra images appear (correctly) in my Firefox browser? (Updated for Firefox 1.5)
I updated GPM to version 20040715, where are my results being created?
I updated GPM to version 20040715, what is the defines.pl file used for?
Why do I get extra peptides identified when I run multiple spectra in one file as opposed to one at a time?
If I increase the total peak number (spectrum, total peaks), why does the number of resultant matches decrease?
How are the false positives calculated?
What do the various diagrams represent?
How can I add a custom database?

<< previous

11. Can I rename the result XML files?

Most labs want to rename the result XML files to something that makes sense for the spectra file or experiment. The files may be renamed, but as of the 20040201 release of The GPM, only alphanumeric characters and the underscore character (_) may be used. Future releases will accept any character in the renamed XML file.

12. Why don't my spectra images appear (correctly) in my Firefox browser? (Updated for Firefox 1.5)

Windows: Firefox needs to have version 6.0 of the Adobe SVG viewer to display SVG images. It is a pre-release version (07/2003), but works fine in IE 6.x also. Download is available here. If the images still don't show up, copy the files NPSVG6.dll and NPSVG6.zip
from C:\Program Files\Common Files\Adobe\SVG Viewer 6.0\Plugins
to C:\Program Files\MozillaFirefox\plugins.
Restart the browser and you should be able to see the spectra images.
Linux: You need to copy the file named /usr/local/adobe-svg-3.01/libNPSVG.co to the firefox/plugins folder. Restart firefox and the spectra images should appear.
Update: If you have upgraded to Firefox 1.5, spectra images may not be rendered correctly. This is because Firefox now has its own SVG viewer embedded in the code. If this is causing problems, disable the embedded viewer by editing the preferences file all.js and changing the directive pref("svg.enabled", true); to pref("svg.enabled", false); .
On Windows the file is probably in the folder C:\Program Files\Mozilla Firefox\greprefs
On Linux, it might be in /usr/lib64/firefox-1.5/greprefs or /usr/lib/firefox-1.5/greprefs .

13. I updated GPM to version 20040715, where are my results being created?

The results are now stored in '/thegpm/gpm/archive/'. The new results also use a new naming schema: GPM00100000001.xml. Where GPM is a standard prefix for all GPM results, 001 is the server id (local installation) and 00000001 is the next available GPM accession number. Using a file called '/gpm/uid.txt' it finds the last number assigned to a result and increments that number by one each time a search is run.

14. I updated GPM to version 20040715, what is the defines.pl file used for?

The defines.pl file needs to be updated with the name of your server, unless it is only being run locally, in which case it needs no changes.


sub get_server_name
{
	return "localhost";
}

Change localhost to the name of your server.

15. Why do I get extra peptides identified when I run multiple spectra in one file as opposed to one at a time?

This is because of the refinement step.

Case: Single spectrum.

A single spectrum is run against a protein file. A peptide is identified that belongs to a protein sequence from the file. Refinement is performed using the protein found in step one and the modification(s) that have been specified in the refinement parameters to try and make a match against the input spectrum. Since there is only one input spectrum and it has already been matched, there is nothing more to match the peptides from the protein to.

Case: Multiple spectra.

Multiple spectra are run against a protein file. A peptide is identified that belongs to a protein sequence from the file. Refinement is performed using the protein found in step one and the modification(s) that have been specified in the refinement parameters to try and make a match against the input spectra. Because there are spectra that have not been matched in the first step, they are available to be matched here. This is when you will see an extra peptide being matched.

So in the initial search, spectra are matched to peptides, which can be found in proteins from the file. In refinement, proteins that have been found in the first step are modified using the refinement parameters. The modified peptides are matched, if possible, to the spectra.

16. If I increase the total peak number (spectrum, total peaks), why does the number of resultant matches decrease?

This is the expected behavior in some cases, depending on the quality of the spectra and the number of peaks in the file. Including more than 50 peaks can increase the number of 'garbage' matches, which will be discarded by tandem. We have done significant amounts of testing using spectra from different mass spectrometers and have found that 50 peaks is most effective. For more information on this parameter see: the api documentation

17. How are the false positives calculated?

The false positives are calculated using the number of true positives and the expectation value limit that was used for the search. The plus/minus value is the square root of the false positives value.

18. What do the various diagrams represent?

The coverage diagram is displayed at the top of the protein page, on the accession number page as a list and on the protein validation page. The red sections represent the peptides that have been identified for the result and their placement along the entire protein length. The opacity is based on the expectation value for that peptide. The darker the color, the better the expectation value.

Example:

The spectrum diagram is displayed on the peptide page. It represents the mass spectrum data for the current peptide identification using vertical lines to represent peaks. The y-axis is the relative intensity. It is calculated by taking the value from spectrum, dynamic range input parameter and 'normalizing' the most intense peak with that value. See the API documentation for more details on this parameter. The x-axis is the mass range of the spectra data. The vertical lines represent the peaks. The different colors represent the ion type.

Red: y ions
Yellow: y-17 ions
Blue: b ions
Green: b-17 ions
Black: unassigned ions
Mauve: trivial neutral loss ions

The diagram above the spectrum diagram (showing the sequence with vertical lines between the residues) represents the b and y ion break down from the spectrum for assigned ions. The length of the vertical lines represents the intensity of the ion peak at that point in the sequence.

The diagram to the left of the spectrum is the delta scatter diagram. The position of the red and blue 'dots' is based on the identified ion mass (y-axis) and the difference, in Daltons, between the observed and calculated masses (x-axis). Hover the mouse over the 'dots' to see the ion mass.

Example:

The diagrams on the details|supporting evidence page are:

Hyperscore Expectation Function and Convolution Survival Function. These diagrams represent the values that were used in scoring the spectrum. Details on the formulae used for this can be found in the paper called: A Method for Assessing the Statistical Significance of Mass Spectrometry-Based Protein Identifications Using General Scoring Schemes, David Fenyö and Ronald C. Beavis, Anal. Chem., 2003, 75, 768-774.
y and b ion Histograms. The y-axis is the number of peptides and the x-axis is the number of ions. So in the b ion example below, the first vertical line means that there were 456 peptides with 0 b ions, the second line is 743 peptides with 1 b ion, the third line is 229 peptides with 2 b ions and so on. The last value that is shown is the largest non-zero value. In this case, 3 peptides had 10 b ions.
Spectra Histogram. A simplified version of the spectrum diagram as shown above.

Example:

19. How can I add a custom database?

You can add custom databases to GPM if they are in fasta format by following these steps:

Edit the file called species.js found in thegpm/tandem/ folder. Copy an entire line and paste it at the end of the file, editing the values to something appropriate for the new entry:
```
  document.writeln("<option value=\"yeast\">S.cerevisiae</option>");
  document.writeln("<option value=\"newdatabase\">New database name</option>");
```

Edit the file called taxonomy.xml found in thegpm/tandem/ folder. Copy an entire taxon entry, paste it at the end of the file and edit the URL and taxon label to match the file name and the value used in the species.js file:

	<taxon label="human">
		<file format="peptide" URL="../fasta/human_e.fasta.pro" />
		<file format="peptide" URL="../fasta/crap.fasta.pro" /
	</taxon>
	<taxon label="newdatabase">
		<file format="peptide" URL="../fasta/newdatabase.fasta" />
	</taxon>

Refresh The GPM search page and the value 'New database name' will will be displayed in the taxonomy drop down list.

Also the fasta files can be optimized for Tandem by running them through fasta_pro.exe. See here for details on fasta_pro. The source can be downloaded here.