Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Stages of Lung Development • Embryonic (E9 – E12) – Primitive lung buds emerge from ventral gut epithelium • Pseudoglandular (E12-E15) – Stereo-specific branching of the lung bronchi. Differentiation of epithelial cells to form prealveolate saccules • Canalicular (E15-E17) – Formation of terminal sacs and vasculature • Saccular (E17 – Birth) – Expansion in the numbers of terminal sacs and capillaries. Differentiation of Type I and II alveolar cells • Alveolar (Birth-P30) – terminal sacs develop into mature alveolar ducts and alveoli http://www.cincinnatichildrens.org/research/div/pulmonarybiology/faculty-research/whitsett-lab/projects.htm E = Embryonic P = Postnatal Transcriptional profiling to discover lung development genes in the mouse (C57BL/6J) E11.5 E = embryonic P = postnatal E13.5 E14.5 E16.5 P5 Images from Malpel S, Development (2000) 127:3057-67 Temporal gene expression patterns • Used Short Time-series Expression Miner (STEM) • STEM first builds model expression profiles based on the number of time points – Profiles are complete and distinct • Clustering algorithm assigns each gene to the profile that most closely matches its expression pattern across the time series • Permutation tests used to determine significance of the profiles Ernst et al. (2005) Bioinformatics 21:159. Ernst and Bar-Joseph. (2006) BMC Bioinformatics 7:191. Gene expression profiles in normal mouse lung development Number of genes that match the expression profile 312 349 320 139 196 141 431 Data shown for three time points: E14.5, E16.5, P5 http://www.cs.cmu.edu/~jernst/stem/ Expression change plots for STEM profiles Gene List Interpretation 1110005A23Rik, 1700009P03Rik, 1700020C11Rik, 1810058I14Rik, 2210018M11Rik, 2610301G19Rik, 2810407C02Rik, 4931406I20Rik, 4932432K03Rik, 5730467H21Rik, 5830411E10Rik, 6330581L23Rik, 9030612M13Rik, AI848100, Abca3, Abcc4, Abcd1, Acad10, Acads, Acsbg1, Acsl5, Adam12, Adamts20, Adamts5, Adamts9, Adcy3, Akap2, Alas1, Aldh1a1, Aldh1l1, Aldoc, Alg14, Alg6, Amph, Aox3, Aplp2, Appbp2, Aqp5, Arf2, Arf4, Arhgap6, Art3, Atf6, Atm, Atp1b1, Atp6v0b, Atp6v1e1, Atp7a, Atp8a1, Atp8b2, B230118G17Rik, BC016495, Bbs4, Bcat1, Bcl2l2, Bclaf1, Bid, Bpgm, Bphl, *Braf, Brunol4, Btbd4, Bzw1, C1qtnf3, C730048C13Rik, Cacna1d, Cadps2, Calm2, Camk2d, Camkk2, Cart1, Casp7, Cav1, Ccnb1, Ccni, Cd36, Cdc26, Cdca5, Cdkn1b, Cdkn1c, Cdkn3, Cdx2, Cebpg, Ches1, Cited1, Clca1, Clta, Clu, Cmpk, Cnot6, Cntn4, Col18a1, Col3a1, Col4a1, Col4a6, Col9a1, Cox6b2, Cpm, Cpne5, Crbn, Crls1, Cse1l, *Ctnnb1, Ctps2, Ctse, Cul3, Cyp11a1, D11Ertd333e, D1Ertd161e, D230025D16Rik, D830007F02Rik, Daam2, Dab1, Dach1, Dapk2, Dcamkl1, Dhfr, Dhrs8, Dnajc15, Dtymk, Dusp4, Dyrk1a, E2f7, Eda2r, Ednra, Ell2, Elmo3, Enah, Enpep, Enpp2, Epb4.1, Eps8, Esm1, Etv5, Eya1, Fabp3, Fabp5, Fank1, Fath, Fblim1, Fbxl20, Fbxl3, Fbxw7, Fem1c, Fgfr2, Fhit, Fhl2, Fkbp6, Folr1, Foxp1, Frk, Fusip1, Fxyd6, Fzd9, Gas7, Gata2, Gdpd2, Gja1, Gpc3, Gpx3, Gstk1, Gstp1, H2-Aa, H3f3a, Hdac9, Hel308, Hesx1, Heyl, Hhip, Hif3a, Hipk2, Hist1h2bc, Hnrpf, Hook1, Hoxd8, Hsd17b12, Hsp90b1, Hspa1b, Htra3, Ifitm3, Ifnar2, Igf1, Igfbp2, Igfbp3, Igfbp7, Ing3, Ipo7, Itga4, Itgb1, Itpr2, Jarid1d, Kcnab1, Kcnb1, Kcnip1, Kcnip4, Kcnj16, Kcns2, Kdr, Keap1, Kif2a, Klf6, Klf7, Klk1, Krt2-7, Krt2-8, Lama5, Lass6, Lcn2, Lgals7, Lgtn, Lhx1, Lhx9, Lmo4, Lrrc16, Lrrk1, Lsp1, Lss, Ltf, Madd, Mafa, Man1a2, Mapk1, Mapre1, Masp1, Mef2c, Mlph, Mmp19, Mod1, Morf4l1, Morf4l2, Mrpl18, Mrpl44, Mt1, Mt2, Mtdh, Mterf, Mthfd1, Mtm1, Mtr, Mtx2, Myef2, Myl1, Mylc2b, Mylk, Myo1b, Myo5b, Narg1, Nedd9, Neo1, Nfe2l2, Npc1, Npepl1, Npr2, Nr2f2, Nrg1, Nusap1, Ogt, Otx2, Pak1, Pak3, Papss2, Pard6b, Parp1, Nat Genet (2000) 25: 25-29 What is an Ontology? “Ontologies provide controlled, consistent vocabularies to describe concepts and relationships, thereby enabling knowledgesharing” (Tom Gruber, Stanford University) GO was started to facilitate comparing biological knowledge across model organisms Describe molecular and cellular biology of genes & gene products (not about gene names!) Need a practical solution for implementation & use Want a unifying, expandable, organismindependent vocabulary www.geneontology.org The GO vocabularies • Molecular Function: What a product ‘does’, precise activity • Biological Process Biological objective, accomplished via one or more ordered assemblies of functions • Cellular Component ‘is located in’ (‘is a subcomponent of’ ) Definitions are the core of any ontology Terms <string> Synonym (s) ID <tied to definition, not term> Definition GO is a Structured Vocabulary Transcriptional profiling to discover lung development genes in the mouse (C57BL/6J) E11.5 E = embryonic P = postnatal E13.5 E14.5 E16.5 P5 Images from Malpel S, Development (2000) 127:3057-67 312 genes up regulated over time during development 139 genes down regulated over time during development Expression change plots for normal mouse lung development Aldh1l1, Aldoc, Alg14, Alg6, Amph, Aox3, Aplp2, Appbp2, Aqp5, Arf2, Arf4, Arhgap6, Art3, Atf6, Atm, Atp1b1, Atp6v0b, Atp6v1e1, Atp7a, Atp8a1, Atp8b2, B230118G17Rik, BC016495, Bbs4, Bcat1, Bcl2l2, Bclaf1, Bid, Bpgm, Bphl, *Braf, Brunol4, Btbd4, Bzw1, C1qtnf3, C730048C13Rik, Cacna1d, Cadps2, Calm2, Camk2d, Camkk2, Cart1, Casp7, Cav1, Ccnb1, Ccni, Cd36, Cdc26, Cdca5, Cdkn1b, Cdkn1c, Cdkn3, Cdx2, Cebpg, Ches1, Cited1, Clca1, Clta, Clu, Cmpk, Cnot6, Cntn4, Col18a1, Col3a1, Col4a1, Col4a6, Col9a1, Cox6b2, Cpm, Cpne5, Crbn, Crls1, Cse1l, *Ctnnb1, Ctps2, Ctse, Cul3, Cyp11a1, D11Ertd333e, D1Ertd161e, D230025D16Rik, D830007F02Rik, Daam2, Dab1, Dach1, Dapk2, Dcamkl1, Dhfr, Dhrs8, Dnajc15, Dtymk, Dusp4, Dyrk1a, E2f7, Eda2r, Ednra, Ell2, Elmo3, Enah, Enpep, Enpp2, Epb4.1, Eps8, Esm1, Etv5, Eya1, Fabp3, Fabp5, Fank1, Fath, Fblim1, Fbxl20, Fbxl3, Fbxw7, Fem1c, Fgfr2, Fhit, Fhl2, Fkbp6, Folr1, Foxp1, Frk, Fusip1, Fxyd6, Fzd9, Gas7, Gata2, Gdpd2, Gja1, Gpc3, Gpx3, Gstk1, Gstp1, H2-Aa, H3f3a, Hdac9, Hel308, Hesx1, Heyl, Hhip, Hif3a, Hipk2, Hist1h2bc, Hnrpf, Hook1, Hoxd8, Hsd17b12, Hsp90b1, Hspa1b, Htra3, Ifitm3, Ifnar2, Igf1, Igfbp2, Igfbp3, Igfbp7, Ing3, Ipo7, Itga4, Itgb1, Itpr2, Jarid1d, Kcnab1, Kcnb1, Kcnip1, Kcnip4, Kcnj16, Kcns2, Kdr, Keap1, Kif2a, Klf6, Klf7, Klk1, Krt2-7, Krt2-8, Lama5, Lass6, Lcn2, Lgals7, Lgtn, Lhx1, Lhx9, Lmo4, Lrrc16, Lrrk1, Lsp1, Lss, Ltf, Madd, Mafa, Man1a2, Mapk1, Mapre1, Masp1, Mef2c, Mlph, Mmp19, Mod1, Morf4l1, Morf4l2, Mrpl18, Mrpl44, Mt1, Mt2, Mtdh, Mterf, Mthfd1, Mtm1, Mtr, Mtx2, Myef2, Myl1, Mylc2b, Mylk, Myo1b, Myo5b, Narg1, Nedd9, Neo1, Nfe2l2, Npc1, Npepl1, Npr2, Nr2f2, Nrg1, Nusap1, Ogt, Otx2, Pak1, Pak3, Papss2, Pard6b, Parp1, Pbx3, Pcbd1, Pcmtd1, Pcsk5, Pctk1, Pctk3, Pdcd6ip, Pdia3, Pfdn4, Pftk1, Phb2, Phca, Phf8, Phka1, Pitx2, Pja1, Pja2, Pnck, Pomgnt1, Porcn, Ppargc1a, Ppfibp1, Ppih, Ppp1r16b, Prc1, Prcp, Prkag2, Prkar2b, Prkcd, Psmb3, *Psrc1, Ptch1, Pten, Ptgds, Ptk2b, Ptp4a1, Ptp4a2, Ptp4a3, Ptpn13, Ptx3, Qscn6, Rab2b, Rab31, Rab3a, Rab3b, Rad51l3, Rec8L1, Ren2, Rims4, Rkhd3, Rnf11, Rnf20, Robo2, Rpl39, Rps6ka3, Runx1, Runx2, Rxrb, Ryr2, S100a6, S100a9, Sat1, Scd1, Scmh1, Scn3a, Scn7a, Scn8a, Scrn1, Sdk2, Sec24a, Sec61a2, Sema3a, Sept11, Serpina3g, Sesn3, Sf4, Sfrs1, Sgk3, Shb, Sin3b, Slc11a2, Slc16a10, Slc16a7, Slc18a2, Slc25a5, Slc26a1, Slc2a13, Slc38a5, Slc39a10, Slc41a2, Slc6a14, Slc6a15, Slc6a6, Slc7a4, Slc9a2, Smc2l1, Smg5, Snapap, Sncaip, Snrk, Soat1, Sorl1, Sox10, Sox11, Sox9, Spp1, Srp54, St3gal5, Star, Strbp, Stxbp1, Sulf1, Suv420h1, Sv2b, Sycp3, Syn2, Sypl, Tacc1, Tcea3, Tcf12, Tdgf1, Tesc, Tfrc, Tgfa, A Brief Statistical Detour… Diverse Biological Roles Consider a population of genes representing a diverse set of biological roles or themes shown below as different colors. Thanks to John Quackenbush http://compbio.dfci.harvard.edu/colon_cancer.html Many algorithms can be applied to expression data to partition genes based on expression profiles over multiple conditions. Many of these techniques work solely on expression data and disregard biological information. John Q. Consider a particular gene set… -What are the some of the predominant biological themes represented in the gene set and how should significance be assigned to a discovered biological theme? John Q. Example: Population Size: 40 genes Gene Set Size: 12 genes 10 genes, shown in green, have a common biological theme (GO annotation) and 8 occur within the gene set. John Q. Consider the Outcome The frequency of the theme in the population is 10/40 = 25% 10 40 12 8 The frequency of the theme within the cluster is 8/12 = 67% AND * 80% of the genes related to the theme in the population ended up within the relatively small cluster. John Q. Contingency Matrix A 2x2 contingency matrix is typically used to capture the relationships between gene set membership and membership to a biological theme. John Q. Gene Set in out in 8 2 out 4 26 Contingency Matrix Theme John Q. Assigning Significance to the Findings The Fisher’s Exact Test permits us to determine if there are non-random associations between the two variables, expression based cluster membership and membership to a particular biological theme. Gene Set in out in 8 2 out 4 26 Theme p .0002 ( 2x2 contingency matrix ) John Q. Hypergeometric Distribution a b c d a+c b+d a+b The probability of any particular matrix occurring by random c+d selection, given no association between the two variables, is given by the hypergeometric rule. (a c)! (b d )! a!c! b!d! (a b)!(c d )!(a c)!(b d )! n! n!a!b!c!d! (a b)!(c d )! John Q. Probability Computation For our matrix, 8 2 4 26 , we are not only interested in getting the probability of getting exactly 8 annotation hits in the cluster but rather the probability of having 8 or more hits. In this case the probabilities of each of the possible matrices is summed. 8 2 9 1 10 0 4 26 3 27 2 28 .0002207 + 7.27x10-6 + 7.79x10-8 .000228 John Q. Exclude annotations made based on “sketchy” evidence Gene list goes here Are there biological processes that are enriched in the up and down regulated gene sets during lung development? http://proto.informatics.jax.org/prototypes/vlad-1.0.3/ This is a graph of GO terms, NOT genes. The deeper the color, the more significant the association of that GO term with the gene set being analyzed. Nodes with no text are terms in the GO hierarchy that weren’t statistically significant in the analysis. Some aspects of graph display can be controlled by the user. VisuaL Annotation Display (VLAD) at MGI http://proto.informatics.jax.org/prototypes/vlad-1.03 There are MANY tools designed to help you with the functional analysis of gene lists….. http://www.geneontology.org/GO.tools.shtml#micro http://tolfalas.informatics.jax.org/vlad/