CDs in pee (2024/2/21)
Proteins in the "cluster of differentiation" family are frequently found on the surface of blood cells, where they are used for immunophenotyping (hence the name). Many of them are type I membrane proteins, consisting of an extracellular N-terminal domain, a transmembrane domain and an intracellular domain. While this is not the sort of protein you might expect to see in urine, there are actually quite a few that are quite prominent in urine MS/MS data. These are their stories.

Ramblings on small urinary proteins (2024/2/18)

This protein is secreted, bristling with phosphorylation at >30 S/T acceptors & O-linked glycosylation at 5 T acceptors. There are 3 easily distinguished splice variants (SPP1-201, SPP1-202, SPP1-203), with the mature N-terminus for all at I17. It is the major phosphoprotein in human urine & milk where it keeps divalent cations (esp. Ca²⁺) in solution.

Human secreted phosphoprotein 1 (SPP1:p)―HPA mRNA tissue distributions & GPMDB protein tissue tabbulations.

The 'zines & database blurbs have done a particularly bad job trying to come to terms with this abundant protein, probably because anything that can change local Ca²⁺ concentration may have many biological effects. Its traditional name (osteopontin) hasn't helped, as it has tethered the discussion to the tissue where it was originally described.

The big Kahuna of protein hormones tends to be rather illusive in proteomics data. The proprotein version of the sequence is observable in the source of the protein (pancreatic islets). In urine, the mature protein is below it's LOD; in its place the peptide removed to generate the mature hormone (57-87, C-peptide) may be observed.

Human insulin (INS:p)―HPA mRNA tissue distributions & GPMDB protein tissue tabbulations.

Note: the C-peptide observations in urine are from peptidome-style experiments using methods that deliberately retrieved shorter sequences. It is absent from all studies that employed any variant of the popular filter aided sample preparation idea.

The rare observations of INS:p in colon and liver are the intact mature protein. The mature protein is also observed in some cell lines (e.g., HeLa), presumably from the use of recombinant human sequence added to the growth medium as a supplement..

This little guy starts out life as a type I plasma membrane protein in many tissues, with a single TM domain (169-191): this is rarely observed. The common form is caused by a proteolytic cleavage at V156 that generates a soluble protein that migrates through CSF & blood to urine. The N-terminus of both forms is either Y20 or S22.

Human phosphoinositide-3-kinase interacting protein 1 (PIK3IP1:p)―HPA mRNA tissue distributions & GPMDB protein tissue tabbulations.

This small (260 residue) secreted protein is named for its activity: dismantling DNA chains. Extracellular DNA is an enormous PITA: the resulting sticky nets (snotballs) are weaponized by neutrophils to snare bacteria. Most of the protein's function have been examined in blood, although pee & poop serve as is its main reservoirs in humans. Mature N-terminus: L23.

Human deoxyribonuclease 1 (DNASE1:p)―HPA mRNA tissue distributions & GPMDB protein tissue tabbulations.

This very small (60 residue) secreted protein is named for having 3 disulphide bonds. It is produced by the stomach's epithelium and has a proposed role in the preservation of the gastric surface mucus layer. How it ends up the blood & how it is concentrated in the urine has remained a puzzle. The mature protein's N-terminus is E25.

Human trefoil factor 1 (TFF1:p)―HPA mRNA tissue distributions & GPMDB protein tissue tabbulations.

TFF1:r & TFF1:p do spike in some cancers, but the mechanism invoking them & whether they are simply an inadvertent ectopic effect or activation of some mechanism is unknown.

Human trefoil factor 1 (TFF1:p)―HPA mRNA cancer tissue distributions & GPMDB protein cancer tissue tabbulations.

Once you get past the awful name, this small secreted protein is produced in epithelium cells in the epidermus & esophagous and sees its highest concentration in urine. It has no agreed upon function in either place, although it is not for lack of trying to find one. The mature protein's N-terminus is L23.

Human secreted LY6/PLAUR domain containing 1 (SLURP1:p)―HPA mRNA tissue distributions & GPMDB protein tissue tabbulations.

This small secreted protein is pretty much tailor-made to illustrate the importance of sources-and-sinks when trying to understand mRNA & protein tissue concentration, particularly wrt clinical applications. Mature protein N-terminus is V22.

Human guanylate cyclase activator 2A (GUCA2A:p)―HPA mRNA tissue distributions & GPMDB protein tissue tabbulations.

Uncoupling proteins (2024/2/8)
Uncoupling proteins 1-5 are mitochondrial inner membrane proteins that modulate mitochondrial function in specific tissues. Each one has a solute carrier family 25 designation as well as a UCP gene symbol.

This protein is most frequently discussed for its role in obligate hibernator mammal brown fat. It is localized in the mitochondrial inner membrane and it functions when the animal is being roused from hibernation, shutting down ATP production in mitochondria, channeling the energy produced by the organelle into heat instead. In non-hibernating mammals its role is a little mysterious.

Human & mouse uncoupling protein 1 (:p | :p)―GPMDB tissue tabbulations

Like UCP1:p, this protein seems to be utilized more frequently by mice compared to humans, although its tissue distribution is different. The case for its role in non-shivering thermogenisis is less established than that of UCP1:p.

Human & mouse uncoupling protein 2 (:p | :p)―GPMDB tissue tabbulations

UCP3:p is also a mitochondrial inner membrane transport molecule, but it is most frequently observed in striated muscle. The idea that this sequence is used to "uncouple" ATP production to produce heat has been challenged & alternate functions proposed, e.g. Di Marchi, 2011 (

Human & mouse uncoupling protein 3 (:p | :p)―GPMDB tissue tabbulations

The next one in the series is located in the central nervous system of both species, probably in modulating glutamate controlled proton gradients in neuron mitochondrial inner membranes. It does not appear to have any role in thermogenisis.

Human & mouse uncoupling protein 4 (:p | :p)―GPMDB tissue tabbulations

UCP5 is also found in the central nervous system of both species, with a surprise appearance in sperm/testis in humans. The role of the protein has been as a hypothetical regulator of reactive oxygen species (Ramsden 2012 & more recently as an anion transporter (Gorgolione 2019

Human & mouse uncoupling protein 5 (:p | :p)―GPMDB tissue tabbulations

Class 2 Propellers? (2024/2/6)

First time I noticed MHC class 2 peptides so tightly bunched around a specific structural feature (in this case a β-propeller domain, annotated on 399-659).

Human low density lipoprotein receptor (LDLR:p), GPMDB coverage diagram of observed MHC 2 peptides.

Propeller circled in red.

AF model of P01130.

Same thing for the presented HLA class II peptides of several of the other lipoprotein receptors that have β-propellers in their structures, e.g.

Human very low density lipoprotein receptor (VLDLR), GPMDB coverage diagram of observed MHC 2 peptides.

The VLDLR:p MHC 2 peptides come from the propeller in the red circle.

AF model of P98155

And another case where these peptides are selected from a single β-propeller domain (in red circle) within a pretty complex structure.

LDL receptor related protein 4 (LRP4), GPMDB coverage diagram of observed MHC class 2 peptides & AF model of O75096.

How to Model PSM-based Proteomics (2024/1/28)

A logistic model for protein-centric PSM collections from MS/MS proteomics data.

Assuming a set of peptide sequences (p) derived from a protein sequence (P), i.e., {Pⱼ | pᵢ}
Nⱼ - number of PSMs for Pⱼ
cⱼ - concentration of Pⱼ
nᵢ - constant for pᵢ
kᵢ - constant for pᵢ
c₀ᵢ - constant for pᵢ

This model can give rise to many types of PSM v concentration curves depending on the number of peptides & the parameter values chosen. The 1st curve is for a single peptide with up to nᵢ = 10, while the 2nd corresponds to 10 peptides with nᵢ = 5 & evenly spread c₀ᵢ values.

Fun with Ubiquitination (2024/1/27)

If you would like to monitor the overall level of ubiquitination in an experiment, but you don't want to go to the trouble & strife of an enrichment experiment, you can get a good feel for the overall level of this PTM by just checking for it on any one of the ubiquitin-containing genes, e.g., RPS27A:p.

This monitoring method takes advantage of the tendency of ubiquitin to seed its own polymerization into various types of chains, triggered by its attachment to a substrate. Therefore, every Ub modified protein will generate multiple Ub-Ub crosslinks, amplifying very small, transitory individual signals into something that easily ends up over the LOD in even single shot LC/MS experiments.

Mediator Complexification (2024/1/26)
The Mediator complex is a large multi-subunit assembly that acts as a "general transcription factor", modulating the activity of RNA polymerase II.

α⧸ω ST phosphorylation diagrams for human & mouse mediator complex subunit 1 (:p). Both sequences are have an nward structured domain, followed by a complex, multiple acceptor phosphoIDR that extends to the C-terminus. In addition to phosphorylation, this region features a number of interesting low complexity domains.

What is now :p has gone under many names in the 'zines, making the written accounts of its adventures difficult to navigate. Also known as: Activator-recruited cofactor 205 kDa component; Peroxisome proliferator-activated receptor-binding protein; Thyroid hormone receptor-associated protein complex 220 kDa component; Thyroid receptor-interacting protein; Vitamin D receptor-interacting protein complex component; & p53 regulatory protein RB18A.

α⧸ω ST phosphorylation diagrams for human & mouse mediator complex subunit 29 (:p, aka :p). Unlike :p, this sequence does not use phosphorylation, even though its only has a few, scattered helical domains. Much of the information about this gene is buried in the 'zines under the subunit's older (but salacious) name "Intersex-like, IXL" (e.g., Kuuselo, 2007

α⧸ω ST phosphorylation diagrams for human & mouse mediator complex subunit 24 (:p, aka :p). This sequence's structure is a brillo-pad-like jumble of helixes, with a loopy phosphoIDR that has paired high occupancy S+phosphoryl acceptors (LxRxxΦ & ΦP)

This view of the AF structure model of human has the phosphoIDR oriented so that it is sticking up on the top-centre. As with many of the Mediator complex subunits, it has been given an impressive collection of sobriquets in the 'zines: TRAP100, KIAA0130, DRIP100, CRSP100, MED5, THRAP4 & CRSP4.

α⧸ω ST phosphorylation diagrams for human & mouse mediator complex subunit 12 (:p). This sequence's structure is a tumbleweed of helixes, with short phosphoIDRs forming fenestrations between the helical domains. It also has some low complexity regions of interest, e.g., (human 2049-2122):


Like so many of the Mediator complex subunits, the 'zines have blessed MED12:p with a cornucopia of names & hypothetical functions: CAGH45, HOPA, OPA1, TRAP230, KIAA0192, OKS, ARC240, Kto, TNRC11 & FGS1.

α⧸ω ST phosphorylation diagrams for human & mouse mediator complex subunit 13 (:p). It has one large, multi-acceptor phosphoIDR between an N-terminal structured region & the second structured region, as well as a single acceptor phosphoIDR adjacent to a C-terminal structured domain. Also known as ARC250, DRIP250, HSPC221, THRAP1, TRAP240 & MRD61.

α⧸ω ST phosphorylation diagrams for human & mouse mediator complex subunit 14 (:p). The low complexity region (980-1150) with multiple S/TP motifs has several occupied acceptors, biased towards the edges of this phosphoIDR. The human acceptor S617+phosphoryl is not present in the mouse sequence. Aka EXLM1, CRSP150, TRAP170, RGR1, CSRP, CXorf4 & CRSP2.

α⧸ω ST phosphorylation diagrams for human & mouse mediator complex subunit 15 (:p). S/TP type acceptors bracket one of the cward structured domains. One of the acceptors (T603 human, T604 mouse) is nearly always occupied. aka Activator-recruited cofactor 105 kDa component; Full=CTG repeat protein 7a; Positive cofactor 2 glutamine/Q-rich-associated protein; TPA-inducible gene 1 protein; Trinucleotide repeat-containing gene 7 protein

:p is one of the mediator subunits with a curious John-de-Lancie-type low complexity region (human 150-262 shown):


α⧸ω ST phosphorylation diagrams for human & mouse mediator complex subunit 16 (:p). The single, low occupancy acceptor (human S326) is in a narrow phosphoIDR linking domains with extended all-strand structures. Also known as Thyroid hormone receptor-associated protein 5; Thyroid hormone receptor-associated protein complex 95 kDa component; & Vitamin D3 receptor-interacting protein complex 92 kDa component.

Ubiquitination seems to be doing some heavy lifting with :p.

α⧸ω ST phosphorylation diagrams for human & mouse mediator complex subunit 19 (MED19:p). Has a dragon's tail phosphoIDR with single high occupancy SP-type acceptor.

The Four Horsemen of HEK (2024/1/24)

These 4 viral proteins are all observable in HEK-293T proteomics data, to various extents. The SV40 "T" protein is a deliberate addition to aid in molecular biology manipulations, while the mastadenovirus proteins were added as part of the transformation creating the original HEK-293 cell line.

