New GPMDB site
Thanks to David Fenyö and the NIH National Research Resource Center at
Rockefeller University, we have been able to upgrade the capabilities of the
GPMDB server system. The new system features some improved navigation and
search pages as well as an improved system architecture to make adding
additional servers easier (NIH Research Resource grant RR00862).
Sequence updates (2005/12/13)
We have updated some of the proteome sequence files, to reflect new data from
our primary sequence sources. These updates are as follows:
ENSEMBL Bos taurus has been updated to the BTAU 2.0 version of the
genome (this is a significantly better translation than the previous BTAU 1.0);
ENSEMBL Gallus gallus has been updated to a better build of WASHUC1; and
SGD S. cerevisiae has been updated to the Dec. 2005 build, which has
changes to several genes.
New release of X! Tandem (2005/12/01)
A new maintenance release of X! Tandem (2005.12.01) is available from the
FTP site. This revision was made to maintain compatibility with the
evolving XML standards for representing mass spectra, as well as to add one new
protein cleavage type. This new version supports the "msRun" variant
of the mzXML, as well as three variants of mzData's specification for parent
An improved handling of hex encoded binary information in mzXML and mzData
files, for 64-bit processors, and an improved system for detecting XML file
types added by Steven Wiley (VLST Corp.).
Addition of testing for N-terminal glutamic acid cyclization, suggested by Oleg
Krohkin (Manitoba Centre for Proteomics and Systems Biology).
Addition of "semi" enzymatic cleavage (specific enzyme cleavage at
one end of a peptide and non-specific cleavage at the other), suggested by Matt
Support for variant methods of expressing parent ion charge in mzData v. 1.05,
added by Fredrik Levander (University of Lund).
New tool from proteomecommons.org (2005/11/28)
The busy folks at the University of Michigan have created an interesting tool
that uses information gathered in GPMDB to improve the confidence of their
protein assignments. In their words:
A new tool has been added to the ProteomeCommons.org collection. This tool will
take a protein id and look up the peptides you'd expect to identify for that
protein using GPMDB, i.e. ask what have others found. You can then restrict the
list of known peptides by a given mass range. Optionally you can add in
peptides from the protein's tryptic digest or you can modify peptides with
known amino-acid modifications or you can add any arbitrary mass shift. When
you are all done the tool will create a plain-text file of the peptide's masses
for inclusion in a MSMS analysis.
You can retrieve the tools and get more information from the
project homepage at proteomecommons.org
Overall system updates (2005/11/3)
We've had a busy month, updating our servers and adding new features to the GPM.
As GPMDB gets closer and closer to the 10,000,000 peptides-assigned mark, we
have been trying to keep up with new information sources that have become
available. Two of the new services available for Homo sapiens proteins
are the Human Protein Atlas and
the Haplotype Mapping Project
The Human Protein Atlas contains annotated photomicrographs showing
immunologically stained tissue sections from a large set of healthy and
diseased human tissues. The goal of the project is to produce protein
expression information for all of the genes in the human genome. Currently,
they have a full set of data for approximately 1000 genes.
The International HapMap Project is a survey of the differences in haplotype for
a cross-section of the human population (click
here for their explanation of the project). It has amassed a large
amount of useful information about variations in the human genome.
We have also just added a new server for Mus musculus searches, similar
to those already in place for other species. It can be accessed at
mouse.thegpm.org. This computer is also the first 64-bit server in the
GPM system. We plan to have upgraded all of our search engine systems to 64-bit
processors by the end of February, 2006.
X! Tandem update available (2005/10/19)
Thanks to the tireless efforts of our testers , several problems with the
2005.10.01.3 release version of X! Tandem have been corrected. The chief
problem was that under some rare circumstances, incorrect assignments of
modified peptides could be made, if a particular peptide had a very large
number of residues that could be modified. We'd particularly like to thank to
Achim Treumann, at the Royal College of Surgeons in Ireland, who first noticed
The release versions of the GPM and X! Tandem for all platforms have been
updated to the 2005.10.01.5 version of X! Tandem. Our apologies for any
inconvenience this may have caused. This problem did not affect P3, or any of
our other projects.
X! Tandem available on Biowulf (2005/10/6)
The Biowulf MPI cluster at the NIH has
added X! Tandem as an
application for NIH users. This large cluster (2400 Opteron, Xeon, and
XP/Athlon processors with an aggregate floating-point performance of 10 TFLOPS)
is used for bioinformatics calculations.
New releases of X! Tandem, the GPM and GPMDB available (2005/10/5)
New releases of the X! Tandem, the GPM and GPMDB are now available
from ftp.thegpm.org. These
new releases contain all of the new features and fixes that have been added
since the 2005/06/15 release, including:
- GO annotation diagrams;
- improved potential modification searching;
- PRIDE 2.0 XML compatibility;
- protein "intersection" searches; and
- multi-window species selection.
In addition, a new service pack
for existing GPM-USB devices
is available. Once the service pack is installed, it is now possible to configure these devices
as full web servers. A CD-installable version of the GPM is also available, for educational and
GPMDB-US comes on-line (2005/09/25)
GPMDB, our proteomics data repository and
experiment validation database, has broadened its connectivity with the
addition of a sister site, GPMDB-US.
This site contains all of the information in GPMDB and it is located at
Rockefeller University, in the Mass Spectrometry and Gaseous Ion Chemistry Laboratory
headed by Brian Chait.
David Fenyö has taken on the task of setting up and maintaining the
servers. This site will receive daily updates of information gathered by GPM. We would
like to thank the National Institutes of Health National Centers for Research
Resources program for providing the funding that made this new site
New look for the GPM (2005/09/14)
We are in the final stages of putting together the October release
of the GPM. As a preview, the public GPM sites will be converted over to
the new interface style over the next few days. These changes include:
- Two taxon entry panes, one with eukaryote proteomes and the other
with prokaryotes. The normal eukaryote sites will have a selection
of prokaryotes, while the dedicated prokaryote site will
have all of the prokaryotes that NCBI provides. Remember that you can select as many entries as you like from
- The ability to select which set of fragment ion series (a, b, c, x, y, or z, on the Advanced search page) you would like to
use for your search. Previously, this had been fixed to only b & y ions.
- You may select to use either monoisotopic or average fragment ion masses for a search (Advanced search page).
- Addition of Apis mellifera (domestic honey bee), Bos taurus (domestic cow) and Silurana tropicalis (African clawed frog)
to the normal eukaryote sites. Silurana tropicalis is a close relative of
Xenopus laevis, previously known as Xenopus tropicalis.
A more detailed description of the changes to X! Tandem that allow some of these
new features will be made available, once the code is ready for release.
GPMDB Maintenance (2005/09/13)
GPMDB will be taken off line for maintenance at 6:00 PM on Sept 13, 2005 and
brought back up by 9:00 AM Sept. 14, 2005. We are performing some maintenance
and testing necessary to bring a new mirror site at Rockefeller University
Peptide spectrum library searches (2005/09/10)
A new GPM application, X! Hunter,
has reached to point where it is
ready for public testing. X! Hunter is a different style of peptide
identification search engine. Rather than predicting spectra from
a peptide sequence, it directly compares an input spectrum to a library
of spectra that have been confidently assigned to a particular peptide
sequence. This type of pattern matching tool is ideal for applications such
as biomarker discovery, molecular scanners and instrument control, where
obtaining a confident match for a single spectrum quickly is important.
Using spectrum libraries is not at all new:
this type of pattern matching strategy has been
used in all forms of analytical spectroscopy (including mass spectrometry) since
the 1950's. The only reason it hasn't been applied to peptide mass spectra is
the obvious difficulty of obtaining exemplar spectra for all of the possible
peptides in a proteome.
Fortunately, we happen to have a database of nine million
examples, GPMDB. To create the libraries for X! Hunter, all of the confident assignments
for human and yeast peptides were extracted from GPMDB. Then spectra that were
replicate observations of the same peptide were averaged together and a final list of about
110,000 averaged peptide spectra was produced.
Please give X! Hunter a try (there are
several examples). Let us know what you think.
Experiments with Gene Ontology (2005/08/22)
Selected Gene Ontology (GO)
terms have be selected as a permanent part of the GPM display structure.
On the top of model listing pages for ENSEMBL human and SGD yeast sequences,
a new link to the "GO" page is now available. You can view histograms
or pie charts of your data, classified according to the ENSEMBL GO annotations.
- GPM10100001010, human sample, histogram
- GPM06600002542, yeast sample, pie chart
Communication/cross-posting with PRIDE (2005/08/22)
The European Bioinformatics Institute's entry into the proteomics
repository field, the PRoteomics IDEntification database
(PRIDE), has recently been upgraded. It is now possible to interchange data between
GPMDB and PRIDE, using their newly defined PRIDE 2.0 XML, which can be easily generated
from GPMDB's BIOML data files. We are beginning to transfer selected information
into PRIDE, which can be accessed through the PRIDE experiment number query interface.
The initial entries from GPMDB can be accessed by PRIDE_EXP:0000108 to
New version of X! Tandem available (2005/08/16)
A new version of X! Tandem (v. 2005.08.15.3) has been released
that adds some new features and improves on some older ones. We would
like to thank the following contributors:
- Brendan Maclean (Fred Hutchinson Cancer Research Center) for improving
the internal consistency of high accuracy mass calculations;
- Patrick Lacasse (Laval University) for suggesting a mechanism to
force the selection of a given file format, even if it does not
meet the requirements for automatic detection;
- Rob Craig (Beavis Informatics) for completing the conversion of the
older, custom XML handlers into ExPat-compatible handlers; and
- Torsten Schwede and Michael Podvinec (Biozentrum, University of Basel) for
tracking down a memory access issue that resulted in stability problems when
X! Tandem was deployed across a PC Grid system.
Further Indexing by Google (2005/08/16)
In addition to the earlier indexing, Google has begun indexing individual
results in the GPMDB. Google queries such as:
- "gpmdb clathrin" (protein keyword);
- "gpmdb SNEEGSEEKGPEVR" (tryptic peptide sequence);
- "gpmdb GPM87400000110" (GPM ID number); or
- "gpmdb apolipoprotein haptoglobin" (multiple keywords)
all return results now. This facility should make it easy for users
to quickly enter into the GPMDB to find their own data, as well as
to cross-reference their results with those obtained by other researchers.
Bos taurus ENSEMBL genome available (2005/08/02)
ENSEMBL has recently added the annotation of
Btau 1.0 to its site. We have updated the B.
taurus GPM site to include this new information.
New Human Plasma Data Available (2005/08/02)
Dick Smith's group at Pacific Northwestern National Laboratories have kindly
made a large set of measurements on human plasma available to GPMDB. These
measurements are a strong supplement to the Human Plasma Proteome data
deposited by Gil Omen's HUPO team earlier this year.
The results can be accessed individually (they are numbered sequentially) from
GPM Disruption (2005/07/17)
After recovering gracefully from the power disruption last week, some parts of
the GPM were knocked out by a large thunderstorm in Winnipeg on Sunday morning.
Thanks to Shawn Walbridge of SynAck Hosting for his repairs to the system.
GPM Maintenance Service Disruption (2005/07/08)
Scheduled maintenance of the power system at one of the two main GPM data
centres will occur between 18:00 and 19:00 (CDT) on Sunday July 10, 2005. It is
possible that some service disruption will occur. We will try and get
everything back up and running smoothly as quickly as possible.
S. pombe and T. annulata added to GPM (2005/07/08)
The proteome of the fission yeast S. pombe has been added to the species
list for the eukaryote dedicated mirrors of
GPM. These sequences link through to GeneDB
as the primary source of sequence information. Also from GeneDB, the tick-borne
cattle parasite Theileria annulata, has been added to the
Two new cluster versions of X! Tandem (2005/06/28)
We are very happy to announce the release of two new clustering interfaces for
X! Tandem, designed and implemented by Andy Link's group at Vanderbilt
University. These interfaces use the popular Message Passing Interface (MPI)
and the Parallel Virtual Machine (PVM) standards to tie together multiple
computers to allow a single X! Tandem job execute on multiple computers.
Initial documentation about the project can be found here
and the code found at our ftp site.
The details of the project have been accepted for publication in the Journal of
A new service pack release of GPM-USB (2005/06/28)
For those people who have purchased a
GPM-USB device from Beavis Informatics, a new service pack (2005.07.01)
has been released. To update your system, click
here and follow the instructions. The service pack includes a number of
integrated P3 support;
support for custom amino acid residue mass definitions;
numerous upgrades to display scripts; and
the most recent version of GPM Manager.
Bos taurus (domestic cow) now has its own site (2005/06/28)
Due to popular demand, a site dedicated to B. taurus
has been constructed. The bovine genome has not yet been entered into the
ENSEMBL system, so the proteome sequences are derived from the latest version
of the genome held at NCBI. When the ENSEMBL system is available, the site will
be updated to include the more informative genome links.
Aurum data added to GPMDB(2005/06/16)
data collection has been analyzed and imported into GPMDB. This data
set was produced from recombinant human proteins and can be used as a set of
high-quality examples of peptide spectra from the ABI 4700 TOF-TOF instrument.
The results, by plate number, are as follows:
A new release of X! Tandem and P3(2005/06/03)
The 2005.06.01.2 release of X! Tandem and P3 is now available. This
new release brings the code base for the two projects much closer together,
adding the ability to read MSDATA and MSXML files to P3. It also
corrects an issue pointed out by Phillip Wilmarth at OHSU, that could result in
some incorrect protein expectation values in very large MudPIT datasets with
large numbers of redundant identifications.
GPMDB has been googled (2005/06/02)
The popular web server indexing service Google
has indexed a large portion of the GPMDB data collection. Querying Google with
protein id number (such as an ENSEMBL id number) will now produce links in to
GPMDB results for that protein. Thanks to Google for providing this additional
indexing for us.
GPMDB peptide count jumps to over 6.5 million (2005/05/03)
As of today, the number of annotated peptides in GPMDB has reached 6,613,809.
Detailed statistics can be found here.
The addition of a statistics archive link enables users to browse previous
summaries and watch the GPMDB progress.
GPMDB adds HUPO PPP results (2005/4/14)
The GPMDB has added a special range of model accession numbers for the results
generated by the Human Proteome Organization Plasma Proteome Project. The first
set of 611 results, obtained by analyzing publically available data from the
PPP web site, has been made available. The results can be accessed by
GPM number, in the range
GPM10100000611. We would like to thank David States and Gil Omenn for
their cooperation and for allowing us to add this data to the GPMDB.
A new release of X! TANDEM (2005/3/21)
This is the first release of X! TANDEM to fully support the mzXML and mzData
spectrum input formats. The design and initial implementation for both formats
was done by Patrick Lacasse (Université Laval, Dept. of Medicine,
supported by Genome Québec), with eXpat support and refinement of the code
by Brendan Maclean at Fred Hutchinson Cancer Research Center. We would also
like to thank Pedrioli Patrick from the Institute for Systems Biology, who
wrote the mzXML parser that Patrick used as a model for his implementation and
who has allowed us to make this available under the Artistic License. It should
be noted that our support for these standards, much like the standards
themselves, is preliminary and there may be some "flavours" of either
format that do not work as expected.
In addition, this release has a new optional parameter that allows the user to
specify a parameter file that contains masses for one or all of the amino acid
residues. This feature makes it possible to use non-standard amino acids, or
isotopically labelled amino acids. An example of using this feature to find
proteins that were made using all 15N amino acids is available at
First release of P3 (2005/2/16)
The X! TANDEM P3 project is the first protein identification system
capable of using proteotypic peptides to accelerate searching and improve the
confidence of results. The system is built out of the X! TANDEM framework and
utilizes the GPM interface for its displays. The necessary proteotypic peptide
libraries are continuously updated from the GPMDB for human and yeast
proteomes. Proteotypic peptide libraries are much smaller than full proteomes,
so this type of searching runs quite a bit quicker than standard searches.
First release of the Jasper spectrum collection (2005/2/16)
The Jasper spectrum collection is a new type of bioinformatics resource, made
available as part of the Quartz spectrum library. Jasper collections contain
the best spectrum-to-peptide assignments from the GPMDB, broken down into
categories based on the reliability of the assignment (based on the measured
expectation value for a peptide). These libraries of spectra are in XML files,
containing the peptide sequences (with PTM's) associated with individual
spectra that were assigned to those peptides. The first library contains about
64,000 high quality spectrum-to-sequence assignements.
New release of X! Tandem available (2005/2/16)
The 2005.02.01 release of X! Tandem is now available. The new features of this
release are mainly for programmers, particularly an improved mechanism for
adding in new scoring systems, elegantly added by Brendan MacLean. Some changes
have also been made to take further advantage of high accuracy parent ion mass
S. tropicalis genome sequence available(2005/2/10)
In addition to the other sequence resources available on the
Xenopus, the newly released protein predictions from the S. tropicalis
genome are now available. They are annotated using information from ENSEMBL.
This genome represents the first full sequence of an amphibian genome.
X! Tandem to use mzData and mzXML input standards (2005/1/21)
We are happy to announce that as the result of the most recent Standards in
Proteomics meeting held by the NIDDK in Washington earlier this month, X!
Tandem will support both MS/MS data representations, as proposed by HUPO-PSI
and the Institute for Systems Biology. The development work to incorporate the
two standards has begun and the finished software should be available by the
end of February. Many thanks to Patrick Lacasse (Université Laval, Dept.
of Medicine, supported by Genome Québec) for generating the mzXML and
mzData parser classes. We would also like to thank Randy Julian (Eli Lilly) for
his co-operation and help with adding mzData.
GPMDB begins collaboration with NIST (2005/1/21)
We are happy to announce that Dr. Steve Stein from the US National Institute of
Standards and Technology is now collaborating with us to produce a standardized
library of peptide MS/MS spectra to be used for the improvement of protein
identification algorithms. The donated entries in GPMDB will be statistically
evaluated and an "average" spectrum for a particular peptide, based
on its modifications and charge state, will be developed. Dr. Stein has worked
with the development of similar spectrum libraries for use with small molecule
identification for many years and we are very happy to be of assistance in
developing similar approaches for proteomics. Dr. Stein expects to announce the
preliminary results of his work at the US-HUPO meeting this spring.
GPM source code now mirrored on Proteome Commons (2005/1/18)
As a result of our collaboration with the Michigan Proteomics Consortium, we are
happy to announce the inclusion of all GPM software in the new
proteomecommons.org open source software archive. Many thanks to Jayson
Falkner, Pete Ulintz and Phil Andrews for creating this new site, which we hope
will be of general value to the proteomics community.
Copyright © 2005, The Global Proteome Machine Organization