Download MBI-Machiraju-lecture6 - Ohio State Computer Science and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Public health genomics wikipedia , lookup

Epigenetics in learning and memory wikipedia , lookup

Gene desert wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Oncogenomics wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Essential gene wikipedia , lookup

X-inactivation wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

History of genetic engineering wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Epigenetics in stem-cell differentiation wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Microevolution wikipedia , lookup

Genome evolution wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Gene wikipedia , lookup

Genomic imprinting wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Genome (book) wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Minimal genome wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Designer baby wikipedia , lookup

Gene expression programming wikipedia , lookup

RNA-Seq wikipedia , lookup

Ridge (biology) wikipedia , lookup

NEDD9 wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Gene expression profiling wikipedia , lookup

Mir-92 microRNA precursor family wikipedia , lookup

Transcript
Returning Back …
A Big Thanks Again 
Prof. Jason Bohland
Quantitative Neuroscience Laboratory
Boston University
Prof. Matt Hibbs
Jackson Labs
Large-scale Correlation
 Quality control → set of 3041 genes
 Combine gene volumes into a large matrix
 Decompose the voxel x gene matrix using
singular value decomposition (SVD)
s.v.’s
modes
voxels
M≈
modes
x
x
“weight”
genes
spatial pattern
gene pattern
Principal modes (SVD)
Cerebral cortex
Olfactory areas
Hippocampus
Retrohippocampal
Striatum
Pallidum
Thalamus
Hypothalamus
Midbrain
Pons
Medulla
Cerebellum
All LH brain voxels plotted as projections on first 3 modes
N=271 before we
get to 90% of the
variance
N=67 before we get to
80% of the variance
Interpreting gene modes
 Spatial modes are easily visualized. Attempt to annotate
eigenmodes using Gene Ontology (GO) annotations:
 Each GO term partitions gene list into two subsets:
IN genes: Genes annotated by that GO term
OUT genes: Genes not annotated by that GO term
 Each singular vector associates each subset above with
a set of amplitudes
 Compare these amplitudes, asking whether ‘IN’ genes
have larger magnitudes than ‘OUT’ genes

use K-S test to test whether the amplitude distributions are
different
In this low dimensional
space
 Cerebellum and striatum separated - GABAergic
interneurons and glutamatergic projection neurons in adult
mouse forebrain
 Other regions are clustered in greatly reduced space, but with
considerable overlap
 Anatomical regions do not in general correspond directly to
individual SVD modes
 Clustering of gene expression profiles in very low dimensional
subspace groups voxels drawn from same brain regions
Component Annotations
Distinctly high amplitude
in the dentate gyrus of the
hippocampus.
Enhanced specificity for
the cerebellum,
Particularly prominent in
the cerebellum and the
striatum.
Decomposition extracts correlated structure in expression profiles that corresponds to
anatomical subdivision
Once again
Gene clustering?
Genes are somewhat less
separable
- and less categorical
Build gene-gene similarity graph
partition, color code each point…
K-Means Segmentation
What does gene expression tell about regional brain
organization ?
Use simple cluster analysis.
K-means clustering:
 Dimensionality reduced (to 271) by truncating SVD
 Assign one of K labels to each voxel
 All voxels assigned the same label have more similar
expression profiles than voxels with different labels
 Similarity defined by Euclidean distance
Data-driven parcellation of mouse brain anatomy (level of
granularity determined by K)
K-means clustering results
Spatially Contiguous Clusters
K=2 – clusters separates cerebral cortex
hippocampus (gray) from other areas
(white)
K = 8 – cerebellum/striatum clearly segmented,
cortex is subdivided into distinct layers
K = 16 - thalamus has its own cluster; cortical layers
further differentiated, midbrain separated from hindbrain
Large K – More anatomical details observed; separation of
caudoputamen from the nucleus accumbens; display
laminar and areal patterns in cortex
Clustering in Cerebral Cortex
Laminar clusters broken into distinct groups along anterior–
posterior direction (bottom) at border between auditory &
somatosensory areas
K = 40
(masked)
ARA
Area
masks
Divides aud/vis areas from somatosensory areas
Validation
Relevant Questions
 Determine, for a given structure, at what value of K it
emerges as its own cluster ?
 Relative prioritization of anatomical areas based on
expression pattern similarity
 Dominant clustering of gene expression along cortical layers
consistent with those of Ng et al.
Compare with Reference
Atlas
Similarity index (S)
Reference atlases here are “flat” parcellations
with 12 or 94 regions



min  r , r 

X ij  max Pij , Pji
U ij
Wij 
i


j
0
if X ij  0
otherwise
U ij
U
ij
W X 1  X 
S  1 4
ij
ij
ranges from 0-1
ij
ij
Overlap saturating at K > 30
Clusters for large K are subdivisions of those for low K
Compare with Reference
Atlas
Clusters 1, 2, 3, and 4 together the cerebral cortex
Cluster 11 largely corresponds thalamus
Cluster 9 is wholly contained in the cerebellum
Cluster 10 in the striatum.
K=12
Classification of Region Members
Supervised learning using linear discriminant (25% test set,
10-fold cross-validation)
94.5% correct overall
What Next ?
 Size of voxels large relative to individual cell bodies
 Voxels will contain a mixture of several cell types.
 Unique expression signature for discrete brain locations
with different combinations of cell types.
 Spatial co-expression indicator of functionally-related or
interacting genes
Localization of expression
Normalized Expression Energy
0.014
Least localised
More localised
Kullback-Leibler (KL)
divergence from
(spatial) uniformity
0.012
0.01
 p( x) 
KL  p || q    p ( x) log 

x
 q( x) 
N vox
 M ( x, g ) 
KL( g )   M ( x, g ) log 

x 1
 1/ N vox 
0.008
0.006
0.004
0.002
0
10
20
30
40
Non-localized expression pattern
50
60
70
80
90
100
Well-localized expression pattern
Voxels
Gene Localization
Select most localized
genes (KL > ~1.56) to
further analyze
Threshold voxels based
on intensity histogram of
summed expressions
Remaining LH mask
(6102 voxels) essentially
excludes cerebral cortex
summed
thresholded
Voxel Uniformity in Gene
Space
Measure KL divergence from uniform
density across gene space at each
voxel
Brighter color indicates lower KL
divergence (more uniform expression
across genes)
Note cortex is generally more uniform
than subcortical areas
And middle cortical layers are notably
more uniform than superficial and
deepest layers
“Expression diversity”
Average KL divergence across all voxels in a particular
anatomical region
Expression diversity across
gross structures
Expression diversity across
cortical layers and areas
KL-score distribution
KL-score distribution
1
0.9
0.85
KL-score
0.8
0.75
OLF
HY
STR
CB
MB
PAL
TH
P
MY
RHP
CTX
HIP
0.95
0.9
0.85
0.8
KL-score
0.95
1
0.75
0.7
0.7
0.65
0.65
0.6
0.6
0.55
0.55
0.5
0.5
L2-3
L4
L6
L5
VIS
SSp
AUD
RSP
MO
ORB
Biclustering Genes &
Voxels
Can we group genes that are each highly localized to common
brain regions (sets of voxels)?
Construct a bipartite graph with N (200)
genes in vertex set V1 and M (~6000)
mask voxels in V2
 Edges are expression levels of
each gene at each voxel
GENES
V2
VOXELS
Apply graph partitioning methods to cut
graph into connected components
 Components contain both voxels
and genes
 Here we used the isoperimetric
algorithm (Grady and Schwartz,
2006).
V1
What is Biclustering ?
Finding submatrices in an n x m matrix that
follow a desired pattern*
Row/column order need not be consistent between different biclusters.
Bicluster properties
Biclustering of Expression data: Cheng and Church, RECOMB 2001
For any submatrix CIJ where I and J are a subsets of genes
and conditions, the mean squared residude score is
A bicluster is a submatrix CIJ that has a low mean squared
residue score.
Cheng and Church
Greedy Approach
Finds a submatrix that minimizes MSR
Biclusters (a) and (b) fits the definition of MSR
Biclustering Localized
Genes
Resulting voxel clusters correspond well to individual anatomical
regions, w/ functionally relevant gene lists
1
97% of energy in the cerebellum
ess
ng pathway
CB
ent
imulus
s
GO P-values
phosphorus metabolic process
biopolymer modification
10
biopolymer metabolic process
-4
hindbrain development
Clus
TRPT phosphatase signaling pathway
cerebellum development
metencephalon development
25
2
10
response to extracellular stimulus
response to nutrient levels
-6
0
Highly localized to ventricle system
10
GO P-values
tion
25
5
10
15
20
25
GO ID's, p<0.05, for Cluster 2
Cluster 2
transport"
40
genes
phosphate metabolic process
2
-2
1
10
0
CTX
OLF
HIP
RHP
STR
PAL
TH
HY
MB
P
MY
CB
5
4
3
2
Cluster 1
cess
ess
1
0
GO ID's, p<0.05, for Cluster 1
-2
Clus
" di-, tri-valent inorganic cation transport"
29
genes
death
cell death
10
apoptosis
-4
programmed cell death
protein amino acid phosphorylation
phosphorylation
reg. epithelial cell differentiation
3
reg. cell differentiation
Cluster 3
10
lactation
-6
0
5
10
15
20
GO ID's, p<0.05, for Cluster 3
25
Biclustering Localized
Genes Results shown are for 13 biclusters
GO P-values
10
" di-, tri-valent inorganic cation transport"
death
cell death
10
apoptosis
-4
programmed cell death
protein amino acid phosphorylation
phosphorylation
reg. epithelial cell differentiation
reg. cell differentiation
1
5
lactation
-6
0
10
ng pathway
CB
ent
imulus
s
25
GO P-values
69% of energy in dentate gyrus, 20% Ammon’s horn
5
10
15
20
25
GO ID's, p<0.05, for Cluster 3
CTX
OLF
HIP
RHP
STR
PAL
TH
HY
MB
P
MY
CB
Cluster 1
cess
4
3
2
1
0
10
ess
ess
Clus
Clus
-2
30
genes
reg. cell cycle
cell recognition
+ reg. cell cycle
10
death
-4
axon guidance
neuron morphogenesis during differentiation
axonogenesis
neurite morphogenesis
2
neurite development
10
neuron recognition
-6
0
Cluster 2
5
10
15
20
25
GO ID's, p<0.05, for Cluster 4
Clus
transport"
10
GO P-values
99% of energy in thalamus
tion
25
-2
11
genes
small GTPase mediated signal transduction
cell projection org. and biogenesis
cell projection morphogenesis
10
cell part morphogenesis
-4
branching morphogenesis of a tube
neuron morphogenesis during differentiation
axonogenesis
3
neurite morphogenesis
neurite development
Cluster 3
10
HIP
axon guidance
-6
0
5
10
15
20
25
Cell-type expression model
Hypothesis: do genes emerging from these biclusters
represent preferential “markers” of cell types localized to
the corresponding regions?
Cell-type specific microarray data are available (Okaty et al.,
2009; 2011) to help answer this question
Compare microarray profiles of these cell types with voxelbased transcriptomic data from ABA
 2131 overlapping genes (with high quality ABA data)
Cell-type based expression
Spatial patterns reflect organization within brain regions
A
B
C
D
(A) Granule cells (B) Purkinje cells (C) Stellate cells
(D) mature oligodendrocytes
Biclusters Cell Types
Highly localized genes
emerging from biclusters (usually) show
selective expression in
local cell types
CP bi-cluster
Cb bi-cluster
Heritable “Disease
Networks”
Online Mendelian Inheritance in Man (OMIM)
–
–
Contains records of genetic basis for ~4000 disorders
Manually curated 94 unique entities that are of neurological /
neuropsychiatric interest and intersect our gene set
1. For each disorder, calculate the mean expression pattern
across orthologs of implicated genes (MGI orthology)
2. Calculate a distance matrix between disorders by
computing the pairwise cosine distance between
expression profiles
3. Cluster disorders using hierarchical cluster analysis
OMIM Disease Clusters
Complete linkage clustering
Autism Candidate
For a given gene list,
embed expression
similarity in 2D space
Ex: ASD candidate
genes from Wigler lab
(CSHL)
Cb
(16 genes in high quality
coronal data set)
Calculate cosine
distance matrix, and
apply metric MDS
Ctx
Fgd3
Lhx1
MapT
Ptpdc1
Doc2a
Provide sub-groupings
based on expression
locus
Next ?
...
TR
time
Spatial components
1
Component
1
0.5
2
0
3
-0.5
4
0
2
fMRI
4
6
8
10
12
-1