Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Signal transduction wikipedia , lookup
Hedgehog signaling pathway wikipedia , lookup
Protein (nutrient) wikipedia , lookup
Protein moonlighting wikipedia , lookup
Protein phosphorylation wikipedia , lookup
Magnesium transporter wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Histone acetylation and deacetylation wikipedia , lookup
Human Genome Program, U.S. Department of Energy, Genomic s and Its Impact on Medicine and Society: A 2001 Primer, 2001 • Two examples • A challenge Liquid Association (LA) • LA is a generalized notion of association for describing certain kind of ternary relationship between variables in a system. (Li 2002 PNAS) • Liquid Association high (+) • low (-) Y transit state 1 state 2 Linear (state 1) Linear (state 2) • • low (-) X Green points represent four conditions for cellular state 1. Red points represent four conditions for cellular state 2. Blue points represent the transit state between cellular states 1 and 2. (X,Y) forms a LA. high (+) Profiles of genes X and Y are displayed in the above scatter plot. Important! Correlation between X and Y is 0 Mathematical Statistics on LA • EX=0, EY=0, SD(X)=SD(Y)=1 • LA is defined by following equation. g(Z) is the conditional expectation of the correlation between X and Y. LA(X,Y|Z) is the expected changes of the correlation between X and Y. g(Z ) EX,Y (XY Z ) LA(X,Y Z ) EZ (g(Z )) Stein Lemma • To compute E(g’(Z)) is not easy. With help from mathematical statistics theory, the LA(X,Y|Z) can be simplified as E(XYZ) when Z follows normal distribution. Stein lemma LA(X,Y | Z ) E (g(Z )) E (Zg(Z)) E(ZE(XY | Z )) E (E(XYZ | Z )) E(XYZ ) Human Genome Program, U.S. Department of Energy, Genomics and Its Impact on Medicine and Society: A 2001 Primer, 2001 gene-expression data cond1 cond2 …….. condp gene1 gene2 gene n x11 x21 x12 …….. x1p x22 …….. x2p … … Correlation Coefficient has been used by Gauss, Bravais, Edgeworth … Sweeping impact in data analysis is due to Galton(1822-1911) “Typical laws of heredity in man” Karl Pearson modifies and popularizes its use. A building block in multivariate analysis, of which clustering, classification, dimension reduction are recurrent themes An application Two classes problem ALL (acute lymphoblastic leukemia) AML(acute myeloid leukemia) Why clustering make sense biologically? The rationale is expression similarity functionally related. Genes with high degree of likely to be may form are structural complex, may participate in common pathways. may be co-regulated by common upstream regulatory elements. Simply put, Profile similarity implies functional association However, the converse is not true The expression profiles of majority of functionally associated genes are indeed uncorrelated • Microarray is too noisy •Biology is complex Why no correlation? • Protein rarely works alone • Protein has multiple functions • Different biological processes or pathways have to be synchronized • Competing use of finite resources : metabolites, hormones, • Protein modification: Phosphorylation, proteolysis, shuttle, … Transcription factors serving both as activators and repressors Transcription factors: proteins that bind to DNA Activator; repressors Going subtle:Protein modification Histone inhibits transcription To activate transcription, the lysine side chain must be acetylated. Weaver( 2001) Corepressor : histone deacetylase Thyroid hormone Coactivator: Histone acetyltransferase Math. Modeling : a nightmare Current mRNA Observed mRNA hidden mRNA protein kinase ATP, GTP, cAMP, etc Cytoplasm Nucleus localization Mitochondria Vacuolar DNA methylation, chromatin structure Nutrients- carbon, nitrogen sources Temperature Water Next F I T N E S S F U N C T I O N Statistical methods become useful What is LA? PLA? Concept of “mediator” Schematic illustration of LA low(-) gene Y high(+) Fig2-Top s tate1 transit s tate2 Linear (state1 ) Linear (state2 ) low(-) gene X high(+) condition Fig2-Bottom lo w (-) g e ne Z hig h(+) Example 1. Positive-to-negative • X=ARP4,Y=LAS17, Z=MCM1 • Corr =0 in each plot • For low Z (marked points in A), X and Y are coexpressed • (B). For high Z (marked points in B), X and Y are contra-expressed Arp4 Protein that interacts with core histones, member of the NuA4 histone acetyltransferase complex; actin related protein Las17 Component of the cortical actin cytoskeleton Figure 2. (A) (B) Example 2 -Negative to Positive • X=QCR9, Y= ROX1, Z=MCM1 • Corr=0 in each plot • For low Z (marked points in A), X and Y are contra-expressed • (B). For high Z (marked points in B), X and Y are co-expressed Rox1 Heme-dependent transcriptional repressor of hypoxic genes including CYC7(iso-2-cytochrome c ) and ANB1(translation initiation, ribosome) Qcr9 Ubiquinol cytochrome c reductase subunit 9 Figure 3. (A) (B) A Challenge • What genes behave like that ? • Can we identify all of them ? • N=5878 ORFs • N choose 3 = 33.8 billion triplets to inspect Statistical theory for LA • X, Y, Z random variables with mean 0 and variance 1 • Corr(X,Y)=E(XY)=E(E(XY|Z))=Eg(Z) • g(z) an ideal summary of association pattern between X and Y when Z =z • g’(z)=derivative of g(z) • Definition. The LA of X and Y with respect to Z is LA(X,Y|Z)= Eg’(Z) Statistical theory-LA • Theorem. If Z is standard normal, then LA(X,Y|Z)=E(XYZ) • Proof. By Stein’s Lemma : Eg’(Z)=Eg(Z)Z • • • • • =E(E(XY|Z)Z)=E(XYZ) Additional math. properties: bounded by third moment =0, if jointly normal transformation Normality ? • Convert each gene expression profile by taking normal score transformation • LA(X,Y|Z) = average of triplet product of three gene profiles: (x1y1z1 + x2y2z2 + …. ) / n • • How does LA work in yeast? Urea cycle/arginine biosynthesis Yeast Cell Cycle (adapted from Molecular Cell Biology, Darnell et al) Most visible event ARG1 Glutamate ARG2 ARG1 Glutamate ARG2 ARG1 ARG1 aspartate 8th place negative Glutamine CPA2 ARG4 fumarate citrulline ARG3 carbamoyl phosphate CPA1 arginine ornithine CAR1 urea CAR2 N-acetylglutamate Glutamate L-argininosuccinate L-glutamate-5-semialdehyde ARG2 Y Proline Figure 2 . The four genes in the urea cycle are coded by ARG3, ARG1, ARG4, and CAR1 in S. Cerevisiae. ARG2 enocodes acetyl-glutamate synthase, which catalyzes the first step of ornithine biosynthesis. CPA1 and CPA2 enocode small and large units of carbamoylphosphate synthetase. CAR2 encodes ornithine aminotransferase. This chart is adapted from KEGG. Adapted from KEGG X Compute LA(X,Y|Z) for all Z Rank and find leading genes Why negative LA? high CPA2 : signal for arginine demand. up-regulation of ARG2 concomitant with down-regulation of CAR2 prevents ornithine from leaving the urea cycle. When the demand is relieved, CPA2 is lowered, CAR2 is up-regulated, opening up the channel for orinthine to leave the urea cycle. 2 0 -2 -1 0 Low CAR2 High 1 1 -1 -2 Low ARG2 High 2 low CPA2 median CPA2 high CPA2 Linear (low CPA2) Linear (high CPA2) Other examples (see Li 2002) • X=GLN3(transcription factor), Y=CAR1, Z=ARG4 (8th place negative end) • Electron transport: X=CYT1(cytochome c1), gives ATP1 (11 times), ATP5 (subunits of ATPase) • Calmodulin CMD1, NUF1 (binding target of CMD1), CMK1(calmodulin-regulated kinase), YGL149W • Glycolysis genes PFK1, PFK2 (6-phospho-fructokinase) • CYR1(adenylate cyclase) , GSY1 (glycogen synthase), GLC2( glucan branching), SCH9(serine/threonine protein kinase; longevity) • SCH9 Protein kinase that regulates signal transduction activity and G1 progression, controls cAPK activity, required for nitrogen activation of the FGM pathway, involved in life span regulation, homologous to mammalian Akt/PKB (SGD summary) • Science. 2001 Apr 13;292(5515):288-90. Regulation of longevity and stress resistance by Sch9 in yeast.Fabrizio P, Pozza F, Pletcher SD, Gendron CM, Longo VD. • The protein kinase Akt/protein kinase B (PKB) is implicated in insulin signaling in mammals and functions in a pathway that regulates longevity and stress resistance in Caenorhabditis elegans. We screened for long-lived mutants in nondividing yeast Saccharomyces cerevisiae and identified mutations in adenylate cyclase and SCH9, which is homologous to Akt/PKB, that increase resistance to oxidants and extend lifespan by up to threefold. Stress-resistance transcription factors Msn2/Msn4 and protein kinase Rim15 were required for this life-span extension. These results indicate that longevity is associated with increased investment in maintenance and show that highly conserved genes play similar roles in life-span regulation in S. cerevisiae and higher eukaryotes. ARG1 ARG1 ARG2 • Blue : low SCH9 • Red: high SCH9 ARG3 ARG3 ARG2 ARG2 ARG4 ARG4 ARG2 ARG2 CAR1 CAR1 ARG2 ARG2 ARG2