The GPM News Archive, 2004

The Global Proteome Machine Organization

News Archive

2016 | 2015 | 2014 | 2013 | 2012 | 2011 | 2010
2009 | 2008 | 2007 | 2006 | 2005 | 2004

GPMDB peptide count breaks 4 million mark (2004/11/22)

As of today, the number of annotated peptides in GPMDB has reached 4,121,723. Detailed statistics can be found here. The addition of a statistics archive link enables users to browse previous summaries and watch the GPMDB progress.

Xenopus sp. site added (2004.11.22)

In response to a request, we have added a new site xenopus.thegpm.org with a set of sequence resources dedicated to the genus Xenopus. It includes the most recent builds of UNIGENE for two Xenopus species (laevis and tropicalis) as well as the nr sequences for the subfamily Xenopodinae.

New features added to GPM (2004/11/12)

The public GPM interface has been updated to allow users to customize their results and to use some of the data clustering ideas that we have developed. The new features will be become available in a release of the open source installation versions on Nov. 22. These changes include:

Addition of spectrum prefiltering to remove repeated spectra from the initial set of mass spectra. This feature compares spectra using a dot product calculation and removes spectra that have vector representations that point in the same direction. The most intense spectrum out of a set of repeated spectra is kept and used for analysis. This type of filtering can remove up to 90% of spectra from a MudPit-style run, making data analysis and interpretation easier.
The protein listing and display pages can be customized to limit the proteins displayed to those with expectation values better than a value set by the user. This feature can be used to simplify reports.
A pseudo-HPLC display has been added, which graphically illustrates the intensity vs. retention time plot expected given the peptide sequences discovered and the relative intensity of the MS/MS spectra. The retention times are calculated using the algorithm described in Reference 9.
Dot product calculations have been added to the spectrum validation routine used by GPMDB to show the best match to a given spectrum-to-sequence assignment. This new routine orders the exemplar spectra drawn from GPMDB on the basis of similarity to the spectrum that is to be validated. Previous versions of this routine simply listed the best spectra (based on expectation value).
A clustering feature has been added to the protein detailed display page, which allows the user to hide repeated peptide sequences, if desired.

GPMDB peptide count breaks 2 million mark (2004/09/28)

As of today, the number of indexed peptides in GPMDB has reached 2,010,819. Detailed statistics can be found here. The addition of a statistics archive link enables users to browse previous summaries and watch the GPMDB progress.

GPMO launches message board (2004/09/28)

Visit the message board and post questions, comments, experiences or developments with GPMO software. Feel free to help others in the community by sharing your knowledge of GPMO applications. It's quick and easy to join so sign up today!

New members added to GPM Scientific Advisory Board (2004/09/03)

We are pleased to announce that Brian T. Chait (Rockefeller University), David Fenyö (GE Healthcare) and Stephen B.H. Kent (University of Chicago) have been named to the Scientific Advisory Board of the GPM.

Updates to GPMDB (2004/09/03)

GPMDB is the publically available index to all of the data that has come in through the GPM's various interfaces. As of today, it has 1.6x10⁶ annotated MS/MS spectra, although that is increasing all the time.

It now has some improved database browsing capabilities, such as a dedicated keyword searching interface and a multiple accession number search interface. It has evidence for more than 3,300 yeast ORFs and 10,700 ENSEMBL human protein ids. We have a manuscript describing the technical features of the database, as well as some use cases for answering questions with the system. If you'd like a copy, please us, and we will send you a manuscript preprint.

A new release of X! Tandem available (2004/09/01)

This new release of X! Tandem corrects a number of minor problems that have been reported by users. It also adds new functionality:

The ability to use multiple "taxon" names in a single session. This change allows the use on multiple species selection on the GPM sites. This change is particularly important for users of the plant and prokaryote sites, where mixing and matching the sequences sources to be used can be quite handy.
Extension of the scoring model to improve scoring for parent ions with z > 2.

We'd like to thank Jimmy Eng and Mike Knierman for pointing out specific problem spectra that helped a lot in improving the code.

This release is available on the ftp site, but it is the first release that is also available through our new code repository. We are now using the Subversion system for code revision control, to co-ordinate our various code projects. A new release of the GPM site installation is in preparation: it should be available on Sept. 10, 2004.

Major new release of X! Tandem available (2004/7/15)

X! TANDEM marks it first year by the release of X! TANDEM 2. Version 2 features improved memory management, fast execution, and better use of multiprocessor machines. It also has built-in reversed-sequence validation method features as well as its own stochastic histogramming method validation. See the release notes for more details. Version 2 has been deployed on all of the GPM sites.

Important Note

When updating from previous versions of GPMO software, be sure to back up your current files. This includes result files and any of the web interface or perl script files that may have been customized for your particular installation.

Updated versions of GPM and X! Tandem available (2004/6/1)

New versions of GPM and X! Tandem were made available on June 1, 2004. Thanks to everyone who tested the new versions and suggested new views and features.

The new version of GPM includes a 1D/2D PAGE gel simulation view, an improved tabular view for writing reports and a protein chip view.

The new version of X! Tandem includes the ability to specify PROSITE-style motifs for potential modifications as well as the possibility of specifying potential modifications as having prompt neutral losses (e.g., the loss of 98 from phosphoserine or phosphothreonine).

Two new Projects available: LiveCD and Quartz (2004/4/15)

The GPMO has added two new projects, LiveCD and Quartz to the site. LiveCD, a project from the University of Michigan NCRR Center for Proteomics, provides a simple method to install a Linux-based version of X! TANDEM and the GPM on a large number of computers for instructional and demonstration purposes. It also includes some software allowing the use of X! TANDEM on clusters of computers running LiveCD.

Quartz is a GPMO staff project. It is a set of annotated spectrum collections, meant to be used for bioinformatics research. The current collections contain > 2000 MS/MS spectra, along with XML-formated annotation files.

X! TANDEM and the GPM release updates (2004/4/10)

New releases of both X! TANDEM and the GPM were released today. This is a maintenance release, including fixes for small problems observed with previous versions. The collections of sequences for the GPM have been updated to include the latest sequence releases from ENSEMBL (1/4/2004).

Probity model published (2004/3/1)

The GPM takes advantage of the "Probity" statistical model to combine the results of multiple peptide identifications into an expectation value for a protein. This model, formulated by Jan Eriksson and David Fenyö, has just been published in the Journal for Proteome Research (Abstract).

GPM and Tandem updated (2004/3/1)

The GPM Perl scripts and Tandem code have been updated. The new scripts allow for more complete viewing of data supporting identifications, particularly the histograms that are used to perform the statistical analysis for distinguishing stochastic results from true ones. Tandem has been altered to correct a few unexpected behaviors and to improve its support for N-and C-terminal post-translational modifications.

GPM sequences updated (2004/2/16)

The sequences available to search have been updated to reflect the Feb. 9, 2004 release of most of the proteomes. The new sequences were downloaded from the ENSEMBL site and tested on the public installations of the GPM. The new databases are available for download from the GPM ftp site, in the "gpm_current_version" folder.

New versions of Tandem and GPM released (2004/2/1)

As of February 1, the 2004.02.01 versions of both the GPM and Tandem have been released. They include the updates necessary to use point mutation analysis in local installations. The GPM has been updated to include a new data view mode: "details". This new mode allows the user to examine the results at a spectrum by spectrum level, viewing all of the raw data involved, including all of the scoring histograms and spectrum peak lists.

Over 400,000 served! (2004/1/30)

At the end on January, the total number of spectra modelled using the public version of the GPM reached 400,000.

The GPM identifies its 4000^th gene (2004/1/27)

After only 27 days of operation, the GPM has discovered more than 4000 individual genes, using mass spectrum sets sent in by the proteomics community. The GPM only imports information from genomic gene collections as necessary, so this high rate of discovery has meant that the Machine's cached records are improving at a rapid rate. We'd like to thank the proteomics community for using the Machine, helping it learn about this large collection of observed proteins.

Point mutation analysis with GPM (2004/1/18)

The GPM has been updated to include a new modeling feature in the Tandem engine. It now allows modeling of all possible point mutations in a sequence during the sequence refinement process. This new capability is still experimental: see the Tandem project's explanation of this new capability.

Modifications have been made to some of the other report pages, in an effort to increase the amount of genomic and proteomic information made available when a valid model sequence has been found.

Updating the GPM (2004/1/10)

After 10 days operation, the released version of the GPM has been updated to include a set of patches to answer questions that cropped up. Thanks to the many users who used the GPM and sent in helpful suggestions, as well as those enthusiasts who actually installed their own local versions of the GPM.

Opening the Global Proteome Machine (2004/1/1)

As of January 1, 2004, the Global Proteome Machine has become active. It is a simple, open source interface for analyzing tandem mass spectra against eukaryote genomes. Using the GPM is free and available to anyone interested in proteomics. The initial GPM configuration has the capacity to search approximately 10¹⁰ MS/MS spectra per year.