Download here - Mouse Genome Informatics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Stages of Lung Development
• Embryonic (E9 – E12)
– Primitive lung buds emerge from
ventral gut epithelium
• Pseudoglandular (E12-E15)
– Stereo-specific branching of the lung
bronchi. Differentiation of epithelial
cells to form prealveolate saccules
• Canalicular (E15-E17)
– Formation of terminal sacs and
vasculature
• Saccular (E17 – Birth)
– Expansion in the numbers of terminal
sacs and capillaries. Differentiation of
Type I and II alveolar cells
• Alveolar (Birth-P30)
– terminal sacs develop into mature
alveolar ducts and alveoli
http://www.cincinnatichildrens.org/research/div/pulmonarybiology/faculty-research/whitsett-lab/projects.htm
E = Embryonic
P = Postnatal
Transcriptional profiling to discover lung
development genes in the mouse (C57BL/6J)
E11.5
E = embryonic
P = postnatal
E13.5
E14.5
E16.5
P5
Images from Malpel S, Development
(2000) 127:3057-67
Temporal gene expression patterns
• Used Short Time-series Expression Miner (STEM)
• STEM first builds model expression profiles based on
the number of time points
– Profiles are complete and distinct
• Clustering algorithm assigns each gene to the profile
that most closely matches its expression pattern
across the time series
• Permutation tests used to determine significance of
the profiles
Ernst et al. (2005) Bioinformatics 21:159.
Ernst and Bar-Joseph. (2006) BMC Bioinformatics 7:191.
Gene expression profiles in normal mouse lung
development
Number of
genes that
match the
expression
profile
312
349
320
139
196
141
431
Data shown
for three
time points:
E14.5,
E16.5, P5
http://www.cs.cmu.edu/~jernst/stem/
Expression change
plots for STEM
profiles
Gene List Interpretation
1110005A23Rik, 1700009P03Rik, 1700020C11Rik, 1810058I14Rik, 2210018M11Rik,
2610301G19Rik, 2810407C02Rik, 4931406I20Rik, 4932432K03Rik, 5730467H21Rik,
5830411E10Rik, 6330581L23Rik, 9030612M13Rik, AI848100, Abca3, Abcc4, Abcd1, Acad10,
Acads, Acsbg1, Acsl5, Adam12, Adamts20, Adamts5, Adamts9, Adcy3, Akap2, Alas1, Aldh1a1,
Aldh1l1, Aldoc, Alg14, Alg6, Amph, Aox3, Aplp2, Appbp2, Aqp5, Arf2, Arf4, Arhgap6, Art3, Atf6,
Atm, Atp1b1, Atp6v0b, Atp6v1e1, Atp7a, Atp8a1, Atp8b2, B230118G17Rik, BC016495, Bbs4,
Bcat1, Bcl2l2, Bclaf1, Bid, Bpgm, Bphl, *Braf, Brunol4, Btbd4, Bzw1, C1qtnf3, C730048C13Rik,
Cacna1d, Cadps2, Calm2, Camk2d, Camkk2, Cart1, Casp7, Cav1, Ccnb1, Ccni, Cd36, Cdc26,
Cdca5, Cdkn1b, Cdkn1c, Cdkn3, Cdx2, Cebpg, Ches1, Cited1, Clca1, Clta, Clu, Cmpk, Cnot6,
Cntn4, Col18a1, Col3a1, Col4a1, Col4a6, Col9a1, Cox6b2, Cpm, Cpne5, Crbn, Crls1, Cse1l,
*Ctnnb1, Ctps2, Ctse, Cul3, Cyp11a1, D11Ertd333e, D1Ertd161e, D230025D16Rik,
D830007F02Rik, Daam2, Dab1, Dach1, Dapk2, Dcamkl1, Dhfr, Dhrs8, Dnajc15, Dtymk,
Dusp4, Dyrk1a, E2f7, Eda2r, Ednra, Ell2, Elmo3, Enah, Enpep, Enpp2, Epb4.1, Eps8, Esm1,
Etv5, Eya1, Fabp3, Fabp5, Fank1, Fath, Fblim1, Fbxl20, Fbxl3, Fbxw7, Fem1c, Fgfr2, Fhit,
Fhl2, Fkbp6, Folr1, Foxp1, Frk, Fusip1, Fxyd6, Fzd9, Gas7, Gata2, Gdpd2, Gja1, Gpc3, Gpx3,
Gstk1, Gstp1, H2-Aa, H3f3a, Hdac9, Hel308, Hesx1, Heyl, Hhip, Hif3a, Hipk2, Hist1h2bc,
Hnrpf, Hook1, Hoxd8, Hsd17b12, Hsp90b1, Hspa1b, Htra3, Ifitm3, Ifnar2, Igf1, Igfbp2, Igfbp3,
Igfbp7, Ing3, Ipo7, Itga4, Itgb1, Itpr2, Jarid1d, Kcnab1, Kcnb1, Kcnip1, Kcnip4, Kcnj16, Kcns2,
Kdr, Keap1, Kif2a, Klf6, Klf7, Klk1, Krt2-7, Krt2-8, Lama5, Lass6, Lcn2, Lgals7, Lgtn, Lhx1,
Lhx9, Lmo4, Lrrc16, Lrrk1, Lsp1, Lss, Ltf, Madd, Mafa, Man1a2, Mapk1, Mapre1, Masp1,
Mef2c, Mlph, Mmp19, Mod1, Morf4l1, Morf4l2, Mrpl18, Mrpl44, Mt1, Mt2, Mtdh, Mterf, Mthfd1,
Mtm1, Mtr, Mtx2, Myef2, Myl1, Mylc2b, Mylk, Myo1b, Myo5b, Narg1, Nedd9, Neo1, Nfe2l2,
Npc1, Npepl1, Npr2, Nr2f2, Nrg1, Nusap1, Ogt, Otx2, Pak1, Pak3, Papss2, Pard6b, Parp1,
Nat Genet (2000) 25: 25-29
What is an Ontology?
“Ontologies provide controlled, consistent
vocabularies to describe concepts and
relationships, thereby enabling knowledgesharing”
(Tom Gruber, Stanford University)
GO was started to facilitate comparing
biological knowledge across model organisms
Describe molecular and
cellular biology of genes &
gene products (not about
gene names!)
Need a practical solution
for implementation & use
Want a unifying,
expandable, organismindependent vocabulary
www.geneontology.org
The GO vocabularies
• Molecular Function:
What a product ‘does’, precise activity
• Biological Process
Biological objective, accomplished via
one or more ordered assemblies of
functions
• Cellular Component
‘is located in’ (‘is a subcomponent of’ )
Definitions are
the core of any
ontology
Terms <string>
Synonym (s)
ID <tied to
definition, not
term>
Definition
GO is a Structured Vocabulary
Transcriptional profiling to discover lung
development genes in the mouse (C57BL/6J)
E11.5
E = embryonic
P = postnatal
E13.5
E14.5
E16.5
P5
Images from Malpel S,
Development (2000) 127:3057-67
312 genes up
regulated over
time during
development
139 genes down
regulated over time
during development
Expression
change plots for
normal mouse
lung development
Aldh1l1, Aldoc, Alg14, Alg6, Amph, Aox3, Aplp2, Appbp2, Aqp5, Arf2, Arf4, Arhgap6, Art3, Atf6, Atm,
Atp1b1, Atp6v0b, Atp6v1e1, Atp7a, Atp8a1, Atp8b2, B230118G17Rik, BC016495, Bbs4, Bcat1,
Bcl2l2, Bclaf1, Bid, Bpgm, Bphl, *Braf, Brunol4, Btbd4, Bzw1, C1qtnf3, C730048C13Rik, Cacna1d,
Cadps2, Calm2, Camk2d, Camkk2, Cart1, Casp7, Cav1, Ccnb1, Ccni, Cd36, Cdc26, Cdca5,
Cdkn1b, Cdkn1c, Cdkn3, Cdx2, Cebpg, Ches1, Cited1, Clca1, Clta, Clu, Cmpk, Cnot6, Cntn4,
Col18a1, Col3a1, Col4a1, Col4a6, Col9a1, Cox6b2, Cpm, Cpne5, Crbn, Crls1, Cse1l, *Ctnnb1,
Ctps2, Ctse, Cul3, Cyp11a1, D11Ertd333e, D1Ertd161e, D230025D16Rik, D830007F02Rik,
Daam2, Dab1, Dach1, Dapk2, Dcamkl1, Dhfr, Dhrs8, Dnajc15, Dtymk, Dusp4, Dyrk1a, E2f7, Eda2r,
Ednra, Ell2, Elmo3, Enah, Enpep, Enpp2, Epb4.1, Eps8, Esm1, Etv5, Eya1, Fabp3, Fabp5, Fank1,
Fath, Fblim1, Fbxl20, Fbxl3, Fbxw7, Fem1c, Fgfr2, Fhit, Fhl2, Fkbp6, Folr1, Foxp1, Frk, Fusip1,
Fxyd6, Fzd9, Gas7, Gata2, Gdpd2, Gja1, Gpc3, Gpx3, Gstk1, Gstp1, H2-Aa, H3f3a, Hdac9,
Hel308, Hesx1, Heyl, Hhip, Hif3a, Hipk2, Hist1h2bc, Hnrpf, Hook1, Hoxd8, Hsd17b12, Hsp90b1,
Hspa1b, Htra3, Ifitm3, Ifnar2, Igf1, Igfbp2, Igfbp3, Igfbp7, Ing3, Ipo7, Itga4, Itgb1, Itpr2, Jarid1d,
Kcnab1, Kcnb1, Kcnip1, Kcnip4, Kcnj16, Kcns2, Kdr, Keap1, Kif2a, Klf6, Klf7, Klk1, Krt2-7, Krt2-8,
Lama5, Lass6, Lcn2, Lgals7, Lgtn, Lhx1, Lhx9, Lmo4, Lrrc16, Lrrk1, Lsp1, Lss, Ltf, Madd, Mafa,
Man1a2, Mapk1, Mapre1, Masp1, Mef2c, Mlph, Mmp19, Mod1, Morf4l1, Morf4l2, Mrpl18, Mrpl44,
Mt1, Mt2, Mtdh, Mterf, Mthfd1, Mtm1, Mtr, Mtx2, Myef2, Myl1, Mylc2b, Mylk, Myo1b, Myo5b, Narg1,
Nedd9, Neo1, Nfe2l2, Npc1, Npepl1, Npr2, Nr2f2, Nrg1, Nusap1, Ogt, Otx2, Pak1, Pak3, Papss2,
Pard6b, Parp1, Pbx3, Pcbd1, Pcmtd1, Pcsk5, Pctk1, Pctk3, Pdcd6ip, Pdia3, Pfdn4, Pftk1, Phb2,
Phca, Phf8, Phka1, Pitx2, Pja1, Pja2, Pnck, Pomgnt1, Porcn, Ppargc1a, Ppfibp1, Ppih, Ppp1r16b,
Prc1, Prcp, Prkag2, Prkar2b, Prkcd, Psmb3, *Psrc1, Ptch1, Pten, Ptgds, Ptk2b, Ptp4a1, Ptp4a2,
Ptp4a3, Ptpn13, Ptx3, Qscn6, Rab2b, Rab31, Rab3a, Rab3b, Rad51l3, Rec8L1, Ren2, Rims4,
Rkhd3, Rnf11, Rnf20, Robo2, Rpl39, Rps6ka3, Runx1, Runx2, Rxrb, Ryr2, S100a6, S100a9, Sat1,
Scd1, Scmh1, Scn3a, Scn7a, Scn8a, Scrn1, Sdk2, Sec24a, Sec61a2, Sema3a, Sept11, Serpina3g,
Sesn3, Sf4, Sfrs1, Sgk3, Shb, Sin3b, Slc11a2, Slc16a10, Slc16a7, Slc18a2, Slc25a5, Slc26a1,
Slc2a13, Slc38a5, Slc39a10, Slc41a2, Slc6a14, Slc6a15, Slc6a6, Slc7a4, Slc9a2, Smc2l1, Smg5,
Snapap, Sncaip, Snrk, Soat1, Sorl1, Sox10, Sox11, Sox9, Spp1, Srp54, St3gal5, Star, Strbp,
Stxbp1, Sulf1, Suv420h1, Sv2b, Sycp3, Syn2, Sypl, Tacc1, Tcea3, Tcf12, Tdgf1, Tesc, Tfrc, Tgfa,
A Brief Statistical Detour…
Diverse Biological Roles
Consider a population of genes representing a
diverse set of biological roles or themes shown
below as different colors.
Thanks to John Quackenbush
http://compbio.dfci.harvard.edu/colon_cancer.html
Many algorithms can be applied to expression data to
partition genes based on expression profiles over
multiple conditions.
Many of these techniques work solely on expression
data and disregard biological information.
John Q.
Consider a particular gene set…
-What are the some of the predominant
biological themes represented in the gene
set and how should significance be assigned
to a discovered biological theme?
John Q.
Example:
Population Size: 40 genes
Gene Set Size: 12 genes
10 genes, shown in green, have a common
biological theme (GO annotation) and 8 occur
within the gene set.
John Q.
Consider the Outcome
The frequency of the theme in the population is 10/40 = 25%
10
40
12
8
The frequency of the theme within the cluster is 8/12 = 67%
AND
* 80% of the genes related to the theme in the population
ended up within the relatively small cluster.
John Q.
Contingency Matrix
A 2x2 contingency matrix is typically used to
capture the relationships between gene set membership
and membership to a biological theme.
John Q.
Gene Set
in
out
in
8
2
out
4
26
Contingency
Matrix
Theme
John Q.
Assigning Significance to the Findings
The Fisher’s Exact Test permits us to determine if there are
non-random associations between the two variables, expression
based cluster membership and membership to a particular
biological theme.
Gene Set
in
out
in
8
2
out
4
26
Theme
p  .0002
( 2x2 contingency matrix )
John Q.
Hypergeometric Distribution
a
b
c
d
a+c
b+d
a+b The probability of any particular
matrix occurring by random
c+d selection, given no association
between the two variables, is given
by the hypergeometric rule.
(a  c)! (b  d )!

a!c!
b!d!  (a  b)!(c  d )!(a  c)!(b  d )!
n!
n!a!b!c!d!
(a  b)!(c  d )!
John Q.
Probability Computation
For our matrix,
8
2
4
26
, we are not only
interested in getting the probability of getting exactly
8 annotation hits in the cluster but rather the probability
of having 8 or more hits. In this case the probabilities
of each of the possible matrices is summed.
8
2
9
1
10
0
4
26
3
27
2
28
.0002207 + 7.27x10-6 + 7.79x10-8  .000228
John Q.
Exclude annotations
made based on
“sketchy” evidence
Gene list goes here
Are there biological
processes that are
enriched in the up and
down regulated gene
sets during lung
development?
http://proto.informatics.jax.org/prototypes/vlad-1.0.3/
This is a graph of GO terms,
NOT genes.
The deeper the color, the more
significant the association of that
GO term with the gene set being
analyzed.
Nodes with no text are terms in
the GO hierarchy that weren’t
statistically significant in the
analysis.
Some aspects of graph display
can be controlled by the user.
VisuaL Annotation Display (VLAD) at MGI
http://proto.informatics.jax.org/prototypes/vlad-1.03
There are MANY tools
designed to help you
with the functional
analysis of gene
lists…..
http://www.geneontology.org/GO.tools.shtml#micro
http://tolfalas.informatics.jax.org/vlad/
Related documents