Index of GPMDB lists
Proteomics often requires the assembly of category wide lists of things. These
categories can be proteins associated with particular sequence or biological
properties, post-translational modifications, or types of experiments. GPMDB can be
used to generate of these lists and this page serves as an index to the lists announced
for the system.
Available lists of things
Post-translational modifications:
Amino acid polymorphisms:
Proteins by classifiers:
Proteotypic peptides and annotated spectrum libraries:
GPMDB Guide to the Saccharomyces cerevisiae Proteome v. 2 (2016/8/4)
The Saccharomyces cerevisiae protein identification
information in GPMDB has been summarized into a collection of spreadsheets,
GPMDB Guide to the S. cerevisiae Proteome (GYP). This guide has the information
organized into separate spreadsheets for each evidence code, as well as
an overall listing. All of the spreadsheets are sorted by chromosome and
the centrosomal naming convention commonly used for yeast ORFs.
The protein accession numbers and other information was obtained from ENSEMBL's
EF4.72 release of the yeast proteome. The
NBS v. 2 algorithm was used to determine the evidence codes for this edition.
This 2nd edition of the Guide (GYP 2016.08.01) is available in the following formats:
The files are also available at the GPM FTP site:
ftp://ftp.thegpm.org/projects/annotation/yeast_proteome_guide/
GPMDB Guide to the Human Proteome v. 22 (2016/8/3)
The human protein identification
information in GPMDB has been summarized into a collection of spreadsheets,
GPMDB Guide to the Human Proteome (GHP). This guide has the information
organized into separate spreadsheets for each chromosome, as well as
mitochrondrial DNA. The protein accession numbers, HGNC names and chromosomal
coordinates were taken from ENSEMBL v. 76 (genome assembly GRCh38). The
NBS v. 2 algorithm was used to determine the evidence codes for this edition.
This 22nd edition of the Guide (GHP 2016.7.01) is available in the following formats:
The files are also available at the GPM FTP site:
ftp://ftp.thegpm.org/projects/annotation/human_proteome_guide/
GPMDB Guide to the Mouse Proteome v. 22 (2016/8/4)
The mouse protein identification
information in GPMDB has been summarized into a collection of spreadsheets,
the GPMDB Guide to the Mouse Proteome (GMP). This guide has the information organized into
separate spreadsheets for each chromosome, as well as mitochrondrial
DNA. The protein accession numbers, MGI names and chromosomal coordinates were taken
from ENSEMBL v. 76 (genome assembly GRCm38). The new
NBS v. 2 algorithm was used to determine the evidence codes for this edition.
This 22nd edition of the Guide (GMP 2016.7.01) is available in the
following formats:
The files are also available at the GPM FTP site:
ftp://ftp.thegpm.org/projects/annotation/mouse_proteome_guide/
C. elegans protein phosphorylation sites (2010/08/11)
These files represent a comprehensive list of all C. elegans protein
phosphorylation sites represented by good quality data in GPMDB. All of the splice variants listed
by ENSEMBL have been annotated.
The files associated with the annotation for a merged list of all chromosomes is now available
by FTP. A description of the
format of these files
(README.txt) is
in the same directory. A short summary of the number of phospho-proteins, genes and sites is given
here.
For unique protein sequences in the proteome, the overall totals are as follows:
Fruit fly protein phosphorylation sites (2013/05/13)
These files represent a comprehensive list of all D. melanogaster protein
phosphorylation sites represented by good quality data in GPMDB. All of the splice variants listed
by ENSEMBL have been annotated.
The files associated with the annotation for a merged list of all chromosomes is now available
by FTP. A description of the
format of these files
(README.txt) is
in the same directory. A short summary of the number of phospho-proteins, genes and sites is given
here.
For unique protein sequences in the proteome, the overall totals are as follows:
Yeast protein acetylation sites (2013/06/17)
These files represent a comprehensive list of all S. cerevisiae protein
N-terminal and lysine acetylation sites represented by good quality data in GPMDB. All of the splice variants listed
by ENSEMBL have been annotated.
The files associated with the annotation and a merged list of all chromosomes are now available by FTP for
lysine &
N-terminal acetylation. A description of the
format of these files is available in the associated "README.txt" file in
in the same directory. A short summary of the number of acetylated proteins, genes and sites of each typeis given
"stats/stats.txt" file.
Yeast protein phosphorylation sites (2013/05/13)
These files represent a comprehensive list of all S. cerevisiae protein
phosphorylation sites represented by good quality data in GPMDB. All of the splice variants listed
by ENSEMBL have been annotated.
The files associated with the annotation for a merged list of all chromosomes is now available
by FTP. A description of the
format of these files
(README.txt) is
in the same directory. A short summary of the number of phospho-proteins, genes and sites is given
here.
For unique protein sequences in the proteome, the overall totals are as follows:
Mouse protein acetylation sites (2013/06/17)
These files represent a comprehensive list of all Mus muscullus protein
N-terminal and lysine acetylation sites represented by good quality data in GPMDB. All of the splice variants listed
by ENSEMBL have been annotated.
The files associated with the annotation and a merged list of all chromosomes are now available by FTP for
lysine &
N-terminal acetylation. A description of the
format of these files is available in the associated "README.txt" file in
in the same directory. A short summary of the number of acetylated proteins, genes and sites of each typeis given
"stats/stats.txt" file.
These files represent a comprehensive list of all Homo sapiens protein
N-terminal and lysine acetylation sites represented by good quality data in GPMDB. All of the splice variants listed
by ENSEMBL have been annotated.
The files associated with the annotation and a merged list of all chromosomes are now available by FTP for
lysine &
N-terminal acetylation. A description of the
format of these files is available in the associated "README.txt" file in
in the same directory. A short summary of the number of acetylated proteins, genes and sites of each type is given
"stats/stats.txt" file.
Mycobacterium tuberculosis protein phosphorylation sites
(2010/08/10)
This list is a compilation of observed serine/threonine phosphorylation sites for the
Mycobacterium tuberculosis proteome (strain CDC1551), based on the data in
GPMDB. This list is available in Excel spreadsheet, tab-separated text
and HTML
formats. It contains 41 phosphorylation sites on 35 protein sequences, with the
following composition:
Each ENSEMBL splice variant protein accession number has a listing of all observed
sites in a single row, that looks like the following:
The columns have the following interpretation:
We have to again thank all of the data contributors who have made these comprehensive
lists possible. When using this type of information, please use normal caution.
Click here for our recommendations for using lists
of site assignments.
Mouse protein phosphorylation sites (2013/05/13)
These files represent a comprehensive list of all mouse protein
phosphorylation sites represented by good quality data in GPMDB. This list has been subdivided on a chromosome-by-chromosome
basis, using ENSEMBL v. 71 as the source of the protein and gene sequences. All of the splice variants listed
by ENSEMBL have been annotated.
The files associated with the annotation for each chromosome (and a merged list of all chromosomes) is now available
by FTP. A description of the
format of these files
(README.txt) is
in the same directory. A short summary of the number of phospho-proteins, genes and sites is given
here.
For unique protein sequences in the proteome, the overall totals are as follows:
Human protein phosphorylation sites (2013/05/12)
As part of our contribution to the Human Proteome Project, we have compiled a comprehensive list of all human protein
phosphorylation sites represented by good quality data in GPMDB. This list has been subdivided on a chromosome-by-chromosome
basis, using ENSEMBL v. 70 as the source of the protein and gene sequences. All of the splice variants listed
by ENSEMBL have been annotated.
The files associated with the annotation for each chromosome (and a merged list of all chromosomes) is now available
by FTP. A description of the
format of these files
(README.txt) is
in the same directory. A short summary of the number of phospho-proteins, genes and sites is given
here.
For unique protein sequences in the proteome, the overall totals are as follows:
Human protein ubiquitination sites (2013/09/01)
We have compiled a comprehensive list of all human protein
ubiquitination sites represented by good quality data in GPMDB. This list has been subdivided on a chromosome-by-chromosome
basis, using ENSEMBL v. 70 as the source of the protein and gene sequences. All of the splice variants listed
by ENSEMBL have been annotated.
The files associated with the annotation for each chromosome (and a merged list of all chromosomes) is now available
by FTP. A description of the
format of these files
(README.txt) is
in the same directory. A short summary of the number of ubiquitin-modified proteins, genes and sites is given
here.
For unique protein sequences in the proteome, the overall totals are as follows:
Amino acid polymorphisms in GPMDB
(2013/1/2)
The GPM has been generating information about amino acid polymorphisms in
model species for the last 5 years. This information has been recorded in
GPMDB, which as of Jan. 1, 2013 had approximately 4.8 million observations of
amino acid polymorphisms. The information about these observations has been
dumped into a file, using either tab-separated value (.txt)
or SQLite (.db) formats via FTP.
The specific entries in these files are as follows:
If available, the first column corresponds to an identifier for the associated single nucleotide polymorphism. In cases
were there was no associated SNP information the "HGVS id" information was repeated in this column. The "GPMDB obs. id"
is the unique id for the specific peptide sequence identification that was the evidence for each polymorphism.
Observed proteins categorized by Gene Ontology terms
(2010/05/01)
The ENSEMBL protein accessions used in GPMDB can be readily assigned to specific Gene
Ontology (GO) terms, using ENSEMBL's BioMart utility. These lists for all available GO
terms have been constructed for three species:
The lists are divided up into the three main GO categories: biological process;
cellular component; and molecular function. For each individual has an entry like:
The first column has a link to the list of proteins associated with the GO term
accession number. The notation following the accession number "[n/m]" indicates that
"n" proteins have been observed in GPMDB out of the "m" proteins in the proteome
assigned to this category. The second category is a the controlled vocabulary
description of each GO category.
Observed human proteins by tissue type (2010/05/01)
The lists below were constructed from data supplied by the Normal
Clinical Tissue Alliance. Proteomics data from selected studies of clinical tissue
were analyzed and conservative lists of indentified proteins were constructed. The
lists are organized by the best available BRENDA ontology term for the tissue, with the
exception of red blood cells, which are not currently in BRENDA.
The lists given below have the proteins in plasma removed (with the exception of the
plasma list).
The 1,000 most observed human & mouse proteins (updated
2010/07/07)
These spreadsheets (top_1000_human_100707.xls
and top_1000_mouse_100707.xls)
list protein sequences that have been observed most often by GPM users who used the
"human" or "mouse" ENSEMBL proteome sequences. The columns in the spreadsheet are as
follows:
A "dataset" corresponds to a submitted set of MS/MS spectra, which results in a GPM
result file, so it is roughly equivalent to the set of data from an LC/MS/MS run. A
protein can only be observed once in a dataset. The value in Column F was calculated by taking the number of times (ni) that
the protein was observed in the approximately 24,000 (N) datasets examined and doing
the simple calculation:
pi = 100(ni/N)
Copyright © 2010-2011, The Global Proteome Machine Organization
|