The GPM News Archive, 2005

The Global Proteome Machine Organization

News Archive

2016 | 2015 | 2014 | 2013 | 2012 | 2011 | 2010
2009 | 2008 | 2007 | 2006 | 2005 | 2004

New GPMDB site launch (2005/12/13)

Thanks to David Fenyö and the NIH National Research Resource Center at Rockefeller University, we have been able to upgrade the capabilities of the GPMDB server system. The new system features some improved navigation and search pages as well as an improved system architecture to make adding additional servers easier (NIH Research Resource grant RR00862).

Sequence updates (2005/12/13)

We have updated some of the proteome sequence files, to reflect new data from our primary sequence sources. These updates are as follows:

ENSEMBL Bos taurus has been updated to the BTAU 2.0 version of the genome (this is a significantly better translation than the previous BTAU 1.0);
ENSEMBL Gallus gallus has been updated to a better build of WASHUC1; and
SGD S. cerevisiae has been updated to the Dec. 2005 build, which has changes to several genes.

New release of X! Tandem (2005/12/01)

A new maintenance release of X! Tandem (2005.12.01) is available from the FTP site. This revision was made to maintain compatibility with the evolving XML standards for representing mass spectra, as well as to add one new protein cleavage type. This new version supports the "msRun" variant of the mzXML, as well as three variants of mzData's specification for parent ion charges.

An improved handling of hex encoded binary information in mzXML and mzData files, for 64-bit processors, and an improved system for detecting XML file types added by Steven Wiley (VLST Corp.).
Addition of testing for N-terminal glutamic acid cyclization, suggested by Oleg Krohkin (Manitoba Centre for Proteomics and Systems Biology).
Addition of "semi" enzymatic cleavage (specific enzyme cleavage at one end of a peptide and non-specific cleavage at the other), suggested by Matt Monroe (PNNL).
Support for variant methods of expressing parent ion charge in mzData v. 1.05, added by Fredrik Levander (University of Lund).

New tool from proteomecommons.org (2005/11/28)

The busy folks at the University of Michigan have created an interesting tool that uses information gathered in GPMDB to improve the confidence of their protein assignments. In their words:

A new tool has been added to the ProteomeCommons.org collection. This tool will take a protein id and look up the peptides you'd expect to identify for that protein using GPMDB, i.e. ask what have others found. You can then restrict the list of known peptides by a given mass range. Optionally you can add in peptides from the protein's tryptic digest or you can modify peptides with known amino-acid modifications or you can add any arbitrary mass shift. When you are all done the tool will create a plain-text file of the peptide's masses for inclusion in a MSMS analysis.

You can retrieve the tools and get more information from the project homepage at proteomecommons.org

Overall system updates (2005/11/3)

We've had a busy month, updating our servers and adding new features to the GPM. As GPMDB gets closer and closer to the 10,000,000 peptides-assigned mark, we have been trying to keep up with new information sources that have become available. Two of the new services available for Homo sapiens proteins are the Human Protein Atlas and the Haplotype Mapping Project .

The Human Protein Atlas contains annotated photomicrographs showing immunologically stained tissue sections from a large set of healthy and diseased human tissues. The goal of the project is to produce protein expression information for all of the genes in the human genome. Currently, they have a full set of data for approximately 1000 genes.

The International HapMap Project is a survey of the differences in haplotype for a cross-section of the human population (click here for their explanation of the project). It has amassed a large amount of useful information about variations in the human genome.

We have also just added a new server for Mus musculus searches, similar to those already in place for other species. It can be accessed at mouse.thegpm.org. This computer is also the first 64-bit server in the GPM system. We plan to have upgraded all of our search engine systems to 64-bit processors by the end of February, 2006.

X! Tandem update available (2005/10/19)

Thanks to the tireless efforts of our testers , several problems with the 2005.10.01.3 release version of X! Tandem have been corrected. The chief problem was that under some rare circumstances, incorrect assignments of modified peptides could be made, if a particular peptide had a very large number of residues that could be modified. We'd particularly like to thank to Achim Treumann, at the Royal College of Surgeons in Ireland, who first noticed this issue.

The release versions of the GPM and X! Tandem for all platforms have been updated to the 2005.10.01.5 version of X! Tandem. Our apologies for any inconvenience this may have caused. This problem did not affect P3, or any of our other projects.

X! Tandem available on Biowulf (2005/10/6)

The Biowulf MPI cluster at the NIH has added X! Tandem as an application for NIH users. This large cluster (2400 Opteron, Xeon, and XP/Athlon processors with an aggregate floating-point performance of 10 TFLOPS) is used for bioinformatics calculations.

New releases of X! Tandem, the GPM and GPMDB available (2005/10/5)

New releases of the X! Tandem, the GPM and GPMDB are now available from ftp.thegpm.org. These new releases contain all of the new features and fixes that have been added since the 2005/06/15 release, including:

GO annotation diagrams;
improved potential modification searching;
PRIDE 2.0 XML compatibility;
protein "intersection" searches; and
multi-window species selection.

In addition, a new service pack for existing GPM-USB devices is available. Once the service pack is installed, it is now possible to configure these devices as full web servers. A CD-installable version of the GPM is also available, for educational and laboratory use.

GPMDB-US comes on-line (2005/09/25)

GPMDB, our proteomics data repository and experiment validation database, has broadened its connectivity with the addition of a sister site, GPMDB-US. This site contains all of the information in GPMDB and it is located at Rockefeller University, in the Mass Spectrometry and Gaseous Ion Chemistry Laboratory headed by Brian Chait. David Fenyö has taken on the task of setting up and maintaining the servers. This site will receive daily updates of information gathered by GPM. We would like to thank the National Institutes of Health National Centers for Research Resources program for providing the funding that made this new site possible.

New look for the GPM (2005/09/14)

We are in the final stages of putting together the October release of the GPM. As a preview, the public GPM sites will be converted over to the new interface style over the next few days. These changes include:

Two taxon entry panes, one with eukaryote proteomes and the other with prokaryotes. The normal eukaryote sites will have a selection of prokaryotes, while the dedicated prokaryote site will have all of the prokaryotes that NCBI provides. Remember that you can select as many entries as you like from either pane.
The ability to select which set of fragment ion series (a, b, c, x, y, or z, on the Advanced search page) you would like to use for your search. Previously, this had been fixed to only b & y ions.
You may select to use either monoisotopic or average fragment ion masses for a search (Advanced search page).
Addition of Apis mellifera (domestic honey bee), Bos taurus (domestic cow) and Silurana tropicalis (African clawed frog) to the normal eukaryote sites. Silurana tropicalis is a close relative of Xenopus laevis, previously known as Xenopus tropicalis.

A more detailed description of the changes to X! Tandem that allow some of these new features will be made available, once the code is ready for release.

GPMDB Maintenance (2005/09/13)

GPMDB will be taken off line for maintenance at 6:00 PM on Sept 13, 2005 and brought back up by 9:00 AM Sept. 14, 2005. We are performing some maintenance and testing necessary to bring a new mirror site at Rockefeller University on line.

Peptide spectrum library searches (2005/09/10)

A new GPM application, X! Hunter, has reached to point where it is ready for public testing. X! Hunter is a different style of peptide identification search engine. Rather than predicting spectra from a peptide sequence, it directly compares an input spectrum to a library of spectra that have been confidently assigned to a particular peptide sequence. This type of pattern matching tool is ideal for applications such as biomarker discovery, molecular scanners and instrument control, where obtaining a confident match for a single spectrum quickly is important.

Using spectrum libraries is not at all new: this type of pattern matching strategy has been used in all forms of analytical spectroscopy (including mass spectrometry) since the 1950's. The only reason it hasn't been applied to peptide mass spectra is the obvious difficulty of obtaining exemplar spectra for all of the possible peptides in a proteome.

Fortunately, we happen to have a database of nine million examples, GPMDB. To create the libraries for X! Hunter, all of the confident assignments for human and yeast peptides were extracted from GPMDB. Then spectra that were replicate observations of the same peptide were averaged together and a final list of about 110,000 averaged peptide spectra was produced.

Please give X! Hunter a try (there are several examples). Let us know what you think.

Experiments with Gene Ontology (2005/08/22)

Selected Gene Ontology (GO) terms have be selected as a permanent part of the GPM display structure. On the top of model listing pages for ENSEMBL human and SGD yeast sequences, a new link to the "GO" page is now available. You can view histograms or pie charts of your data, classified according to the ENSEMBL GO annotations. For example:

GPM10100001010, human sample, histogram
GPM06600002542, yeast sample, pie chart

Communication/cross-posting with PRIDE (2005/08/22)

The European Bioinformatics Institute's entry into the proteomics repository field, the PRoteomics IDEntification database (PRIDE), has recently been upgraded. It is now possible to interchange data between GPMDB and PRIDE, using their newly defined PRIDE 2.0 XML, which can be easily generated from GPMDB's BIOML data files. We are beginning to transfer selected information into PRIDE, which can be accessed through the PRIDE experiment number query interface. The initial entries from GPMDB can be accessed by PRIDE_EXP:0000108 to PRIDE_EXP:0001620.

New version of X! Tandem available (2005/08/16)

A new version of X! Tandem (v. 2005.08.15.3) has been released that adds some new features and improves on some older ones. We would like to thank the following contributors:

Brendan Maclean (Fred Hutchinson Cancer Research Center) for improving the internal consistency of high accuracy mass calculations;
Patrick Lacasse (Laval University) for suggesting a mechanism to force the selection of a given file format, even if it does not meet the requirements for automatic detection;
Rob Craig (Beavis Informatics) for completing the conversion of the older, custom XML handlers into ExPat-compatible handlers; and
Torsten Schwede and Michael Podvinec (Biozentrum, University of Basel) for tracking down a memory access issue that resulted in stability problems when X! Tandem was deployed across a PC Grid system.

Further Indexing by Google (2005/08/16)

In addition to the earlier indexing, Google has begun indexing individual results in the GPMDB. Google queries such as:

"gpmdb clathrin" (protein keyword);
"gpmdb SNEEGSEEKGPEVR" (tryptic peptide sequence);
"gpmdb GPM87400000110" (GPM ID number); or
"gpmdb apolipoprotein haptoglobin" (multiple keywords)

all return results now. This facility should make it easy for users to quickly enter into the GPMDB to find their own data, as well as to cross-reference their results with those obtained by other researchers.

Bos taurus ENSEMBL genome available (2005/08/02)

ENSEMBL has recently added the annotation of Btau 1.0 to its site. We have updated the B. taurus GPM site to include this new information.

New Human Plasma Data Available (2005/08/02)

Dick Smith's group at Pacific Northwestern National Laboratories have kindly made a large set of measurements on human plasma available to GPMDB. These measurements are a strong supplement to the Human Plasma Proteome data deposited by Gil Omen's HUPO team earlier this year.

The results can be accessed individually (they are numbered sequentially) from

GPM10100000612 - GPM10100001201

GPM Disruption (2005/07/17)

After recovering gracefully from the power disruption last week, some parts of the GPM were knocked out by a large thunderstorm in Winnipeg on Sunday morning. Thanks to Shawn Walbridge of SynAck Hosting for his repairs to the system.

GPM Maintenance Service Disruption (2005/07/08)

Scheduled maintenance of the power system at one of the two main GPM data centres will occur between 18:00 and 19:00 (CDT) on Sunday July 10, 2005. It is possible that some service disruption will occur. We will try and get everything back up and running smoothly as quickly as possible.

S. pombe and T. annulata added to GPM (2005/07/08)

The proteome of the fission yeast S. pombe has been added to the species list for the eukaryote dedicated mirrors of GPM. These sequences link through to GeneDB as the primary source of sequence information. Also from GeneDB, the tick-borne cattle parasite Theileria annulata, has been added to the protista site.

Two new cluster versions of X! Tandem (2005/06/28)

We are very happy to announce the release of two new clustering interfaces for X! Tandem, designed and implemented by Andy Link's group at Vanderbilt University. These interfaces use the popular Message Passing Interface (MPI) and the Parallel Virtual Machine (PVM) standards to tie together multiple computers to allow a single X! Tandem job execute on multiple computers. Initial documentation about the project can be found here and the code found at our ftp site. The details of the project have been accepted for publication in the Journal of Proteome Research.

A new service pack release of GPM-USB (2005/06/28)

For those people who have purchased a GPM-USB device from Beavis Informatics, a new service pack (2005.07.01) has been released. To update your system, click here and follow the instructions. The service pack includes a number of updates, including:

integrated P3 support;
support for custom amino acid residue mass definitions;
numerous upgrades to display scripts; and
the most recent version of GPM Manager.

Bos taurus (domestic cow) now has its own site (2005/06/28)

Due to popular demand, a site dedicated to B. taurus has been constructed. The bovine genome has not yet been entered into the ENSEMBL system, so the proteome sequences are derived from the latest version of the genome held at NCBI. When the ENSEMBL system is available, the site will be updated to include the more informative genome links.

Aurum data added to GPMDB(2005/06/16)

The Aurum data collection has been analyzed and imported into GPMDB. This data set was produced from recombinant human proteins and can be used as a set of high-quality examples of peptide spectra from the ABI 4700 TOF-TOF instrument. The results, by plate number, are as follows: T10467; T10475; T10622; T10445; T10707; T10739; and T10761.

A new release of X! Tandem and P³(2005/06/03)

The 2005.06.01.2 release of X! Tandem and P³ is now available. This new release brings the code base for the two projects much closer together, adding the ability to read MSDATA and MSXML files to P³. It also corrects an issue pointed out by Phillip Wilmarth at OHSU, that could result in some incorrect protein expectation values in very large MudPIT datasets with large numbers of redundant identifications.

GPMDB has been googled (2005/06/02)

The popular web server indexing service Google has indexed a large portion of the GPMDB data collection. Querying Google with protein id number (such as an ENSEMBL id number) will now produce links in to GPMDB results for that protein. Thanks to Google for providing this additional indexing for us.

GPMDB peptide count jumps to over 6.5 million (2005/05/03)

As of today, the number of annotated peptides in GPMDB has reached 6,613,809. Detailed statistics can be found here. The addition of a statistics archive link enables users to browse previous summaries and watch the GPMDB progress.

GPMDB adds HUPO PPP results (2005/4/14)

The GPMDB has added a special range of model accession numbers for the results generated by the Human Proteome Organization Plasma Proteome Project. The first set of 611 results, obtained by analyzing publically available data from the PPP web site, has been made available. The results can be accessed by GPM number, in the range GPM10100000001 to GPM10100000611. We would like to thank David States and Gil Omenn for their cooperation and for allowing us to add this data to the GPMDB.

A new release of X! TANDEM (2005/3/21)

This is the first release of X! TANDEM to fully support the mzXML and mzData spectrum input formats. The design and initial implementation for both formats was done by Patrick Lacasse (Université Laval, Dept. of Medicine, supported by Genome Québec), with eXpat support and refinement of the code by Brendan Maclean at Fred Hutchinson Cancer Research Center. We would also like to thank Pedrioli Patrick from the Institute for Systems Biology, who wrote the mzXML parser that Patrick used as a model for his implementation and who has allowed us to make this available under the Artistic License. It should be noted that our support for these standards, much like the standards themselves, is preliminary and there may be some "flavours" of either format that do not work as expected.

In addition, this release has a new optional parameter that allows the user to specify a parameter file that contains masses for one or all of the amino acid residues. This feature makes it possible to use non-standard amino acids, or isotopically labelled amino acids. An example of using this feature to find proteins that were made using all ¹⁵N amino acids is available at the human boutique site.

First release of P³ (2005/2/16)

The X! TANDEM P³ project is the first protein identification system capable of using proteotypic peptides to accelerate searching and improve the confidence of results. The system is built out of the X! TANDEM framework and utilizes the GPM interface for its displays. The necessary proteotypic peptide libraries are continuously updated from the GPMDB for human and yeast proteomes. Proteotypic peptide libraries are much smaller than full proteomes, so this type of searching runs quite a bit quicker than standard searches.

First release of the Jasper spectrum collection (2005/2/16)

The Jasper spectrum collection is a new type of bioinformatics resource, made available as part of the Quartz spectrum library. Jasper collections contain the best spectrum-to-peptide assignments from the GPMDB, broken down into categories based on the reliability of the assignment (based on the measured expectation value for a peptide). These libraries of spectra are in XML files, containing the peptide sequences (with PTM's) associated with individual spectra that were assigned to those peptides. The first library contains about 64,000 high quality spectrum-to-sequence assignements.

New release of X! Tandem available (2005/2/16)

The 2005.02.01 release of X! Tandem is now available. The new features of this release are mainly for programmers, particularly an improved mechanism for adding in new scoring systems, elegantly added by Brendan MacLean. Some changes have also been made to take further advantage of high accuracy parent ion mass measurements.

S. tropicalis genome sequence available(2005/2/10)

In addition to the other sequence resources available on the Xenopus, the newly released protein predictions from the S. tropicalis genome are now available. They are annotated using information from ENSEMBL. This genome represents the first full sequence of an amphibian genome.

X! Tandem to use mzData and mzXML input standards (2005/1/21)

We are happy to announce that as the result of the most recent Standards in Proteomics meeting held by the NIDDK in Washington earlier this month, X! Tandem will support both MS/MS data representations, as proposed by HUPO-PSI and the Institute for Systems Biology. The development work to incorporate the two standards has begun and the finished software should be available by the end of February. Many thanks to Patrick Lacasse (Université Laval, Dept. of Medicine, supported by Genome Québec) for generating the mzXML and mzData parser classes. We would also like to thank Randy Julian (Eli Lilly) for his co-operation and help with adding mzData.

GPMDB begins collaboration with NIST (2005/1/21)

We are happy to announce that Dr. Steve Stein from the US National Institute of Standards and Technology is now collaborating with us to produce a standardized library of peptide MS/MS spectra to be used for the improvement of protein identification algorithms. The donated entries in GPMDB will be statistically evaluated and an "average" spectrum for a particular peptide, based on its modifications and charge state, will be developed. Dr. Stein has worked with the development of similar spectrum libraries for use with small molecule identification for many years and we are very happy to be of assistance in developing similar approaches for proteomics. Dr. Stein expects to announce the preliminary results of his work at the US-HUPO meeting this spring.

GPM source code now mirrored on Proteome Commons (2005/1/18)

As a result of our collaboration with the Michigan Proteomics Consortium, we are happy to announce the inclusion of all GPM software in the new proteomecommons.org open source software archive. Many thanks to Jayson Falkner, Pete Ulintz and Phil Andrews for creating this new site, which we hope will be of general value to the proteomics community.