Download MBI-Machiraju-lecture6 - Ohio State Computer Science and

Returning Back … A Big Thanks Again  Prof. Jason Bohland Quantitative Neuroscience Laboratory Boston University Prof. Matt Hibbs Jackson Labs Large-scale Correlation  Quality control → set of 3041 genes  Combine gene volumes into a large matrix  Decompose the voxel x gene matrix using singular value decomposition (SVD) s.v.’s modes voxels M≈ modes x x “weight” genes spatial pattern gene pattern Principal modes (SVD) Cerebral cortex Olfactory areas Hippocampus Retrohippocampal Striatum Pallidum Thalamus Hypothalamus Midbrain Pons Medulla Cerebellum All LH brain voxels plotted as projections on first 3 modes N=271 before we get to 90% of the variance N=67 before we get to 80% of the variance Interpreting gene modes  Spatial modes are easily visualized. Attempt to annotate eigenmodes using Gene Ontology (GO) annotations:  Each GO term partitions gene list into two subsets: IN genes: Genes annotated by that GO term OUT genes: Genes not annotated by that GO term  Each singular vector associates each subset above with a set of amplitudes  Compare these amplitudes, asking whether ‘IN’ genes have larger magnitudes than ‘OUT’ genes  use K-S test to test whether the amplitude distributions are different In this low dimensional space  Cerebellum and striatum separated - GABAergic interneurons and glutamatergic projection neurons in adult mouse forebrain  Other regions are clustered in greatly reduced space, but with considerable overlap  Anatomical regions do not in general correspond directly to individual SVD modes  Clustering of gene expression profiles in very low dimensional subspace groups voxels drawn from same brain regions Component Annotations Distinctly high amplitude in the dentate gyrus of the hippocampus. Enhanced specificity for the cerebellum, Particularly prominent in the cerebellum and the striatum. Decomposition extracts correlated structure in expression profiles that corresponds to anatomical subdivision Once again Gene clustering? Genes are somewhat less separable - and less categorical Build gene-gene similarity graph partition, color code each point… K-Means Segmentation What does gene expression tell about regional brain organization ? Use simple cluster analysis. K-means clustering:  Dimensionality reduced (to 271) by truncating SVD  Assign one of K labels to each voxel  All voxels assigned the same label have more similar expression profiles than voxels with different labels  Similarity defined by Euclidean distance Data-driven parcellation of mouse brain anatomy (level of granularity determined by K) K-means clustering results Spatially Contiguous Clusters K=2 – clusters separates cerebral cortex hippocampus (gray) from other areas (white) K = 8 – cerebellum/striatum clearly segmented, cortex is subdivided into distinct layers K = 16 - thalamus has its own cluster; cortical layers further differentiated, midbrain separated from hindbrain Large K – More anatomical details observed; separation of caudoputamen from the nucleus accumbens; display laminar and areal patterns in cortex Clustering in Cerebral Cortex Laminar clusters broken into distinct groups along anterior– posterior direction (bottom) at border between auditory & somatosensory areas K = 40 (masked) ARA Area masks Divides aud/vis areas from somatosensory areas Validation Relevant Questions  Determine, for a given structure, at what value of K it emerges as its own cluster ?  Relative prioritization of anatomical areas based on expression pattern similarity  Dominant clustering of gene expression along cortical layers consistent with those of Ng et al. Compare with Reference Atlas Similarity index (S) Reference atlases here are “flat” parcellations with 12 or 94 regions    min  r , r   X ij  max Pij , Pji U ij Wij  i   j 0 if X ij  0 otherwise U ij U ij W X 1  X  S  1 4 ij ij ranges from 0-1 ij ij Overlap saturating at K > 30 Clusters for large K are subdivisions of those for low K Compare with Reference Atlas Clusters 1, 2, 3, and 4 together the cerebral cortex Cluster 11 largely corresponds thalamus Cluster 9 is wholly contained in the cerebellum Cluster 10 in the striatum. K=12 Classification of Region Members Supervised learning using linear discriminant (25% test set, 10-fold cross-validation) 94.5% correct overall What Next ?  Size of voxels large relative to individual cell bodies  Voxels will contain a mixture of several cell types.  Unique expression signature for discrete brain locations with different combinations of cell types.  Spatial co-expression indicator of functionally-related or interacting genes Localization of expression Normalized Expression Energy 0.014 Least localised More localised Kullback-Leibler (KL) divergence from (spatial) uniformity 0.012 0.01  p( x)  KL  p || q    p ( x) log   x  q( x)  N vox  M ( x, g )  KL( g )   M ( x, g ) log   x 1  1/ N vox  0.008 0.006 0.004 0.002 0 10 20 30 40 Non-localized expression pattern 50 60 70 80 90 100 Well-localized expression pattern Voxels Gene Localization Select most localized genes (KL > ~1.56) to further analyze Threshold voxels based on intensity histogram of summed expressions Remaining LH mask (6102 voxels) essentially excludes cerebral cortex summed thresholded Voxel Uniformity in Gene Space Measure KL divergence from uniform density across gene space at each voxel Brighter color indicates lower KL divergence (more uniform expression across genes) Note cortex is generally more uniform than subcortical areas And middle cortical layers are notably more uniform than superficial and deepest layers “Expression diversity” Average KL divergence across all voxels in a particular anatomical region Expression diversity across gross structures Expression diversity across cortical layers and areas KL-score distribution KL-score distribution 1 0.9 0.85 KL-score 0.8 0.75 OLF HY STR CB MB PAL TH P MY RHP CTX HIP 0.95 0.9 0.85 0.8 KL-score 0.95 1 0.75 0.7 0.7 0.65 0.65 0.6 0.6 0.55 0.55 0.5 0.5 L2-3 L4 L6 L5 VIS SSp AUD RSP MO ORB Biclustering Genes & Voxels Can we group genes that are each highly localized to common brain regions (sets of voxels)? Construct a bipartite graph with N (200) genes in vertex set V1 and M (~6000) mask voxels in V2  Edges are expression levels of each gene at each voxel GENES V2 VOXELS Apply graph partitioning methods to cut graph into connected components  Components contain both voxels and genes  Here we used the isoperimetric algorithm (Grady and Schwartz, 2006). V1 What is Biclustering ? Finding submatrices in an n x m matrix that follow a desired pattern* Row/column order need not be consistent between different biclusters. Bicluster properties Biclustering of Expression data: Cheng and Church, RECOMB 2001 For any submatrix CIJ where I and J are a subsets of genes and conditions, the mean squared residude score is A bicluster is a submatrix CIJ that has a low mean squared residue score. Cheng and Church Greedy Approach Finds a submatrix that minimizes MSR Biclusters (a) and (b) fits the definition of MSR Biclustering Localized Genes Resulting voxel clusters correspond well to individual anatomical regions, w/ functionally relevant gene lists 1 97% of energy in the cerebellum ess ng pathway CB ent imulus s GO P-values phosphorus metabolic process biopolymer modification 10 biopolymer metabolic process -4 hindbrain development Clus TRPT phosphatase signaling pathway cerebellum development metencephalon development 25 2 10 response to extracellular stimulus response to nutrient levels -6 0 Highly localized to ventricle system 10 GO P-values tion 25 5 10 15 20 25 GO ID's, p<0.05, for Cluster 2 Cluster 2 transport" 40 genes phosphate metabolic process 2 -2 1 10 0 CTX OLF HIP RHP STR PAL TH HY MB P MY CB 5 4 3 2 Cluster 1 cess ess 1 0 GO ID's, p<0.05, for Cluster 1 -2 Clus " di-, tri-valent inorganic cation transport" 29 genes death cell death 10 apoptosis -4 programmed cell death protein amino acid phosphorylation phosphorylation reg. epithelial cell differentiation 3 reg. cell differentiation Cluster 3 10 lactation -6 0 5 10 15 20 GO ID's, p<0.05, for Cluster 3 25 Biclustering Localized Genes Results shown are for 13 biclusters GO P-values 10 " di-, tri-valent inorganic cation transport" death cell death 10 apoptosis -4 programmed cell death protein amino acid phosphorylation phosphorylation reg. epithelial cell differentiation reg. cell differentiation 1 5 lactation -6 0 10 ng pathway CB ent imulus s 25 GO P-values 69% of energy in dentate gyrus, 20% Ammon’s horn 5 10 15 20 25 GO ID's, p<0.05, for Cluster 3 CTX OLF HIP RHP STR PAL TH HY MB P MY CB Cluster 1 cess 4 3 2 1 0 10 ess ess Clus Clus -2 30 genes reg. cell cycle cell recognition + reg. cell cycle 10 death -4 axon guidance neuron morphogenesis during differentiation axonogenesis neurite morphogenesis 2 neurite development 10 neuron recognition -6 0 Cluster 2 5 10 15 20 25 GO ID's, p<0.05, for Cluster 4 Clus transport" 10 GO P-values 99% of energy in thalamus tion 25 -2 11 genes small GTPase mediated signal transduction cell projection org. and biogenesis cell projection morphogenesis 10 cell part morphogenesis -4 branching morphogenesis of a tube neuron morphogenesis during differentiation axonogenesis 3 neurite morphogenesis neurite development Cluster 3 10 HIP axon guidance -6 0 5 10 15 20 25 Cell-type expression model Hypothesis: do genes emerging from these biclusters represent preferential “markers” of cell types localized to the corresponding regions? Cell-type specific microarray data are available (Okaty et al., 2009; 2011) to help answer this question Compare microarray profiles of these cell types with voxelbased transcriptomic data from ABA  2131 overlapping genes (with high quality ABA data) Cell-type based expression Spatial patterns reflect organization within brain regions A B C D (A) Granule cells (B) Purkinje cells (C) Stellate cells (D) mature oligodendrocytes Biclusters Cell Types Highly localized genes emerging from biclusters (usually) show selective expression in local cell types CP bi-cluster Cb bi-cluster Heritable “Disease Networks” Online Mendelian Inheritance in Man (OMIM) – – Contains records of genetic basis for ~4000 disorders Manually curated 94 unique entities that are of neurological / neuropsychiatric interest and intersect our gene set 1. For each disorder, calculate the mean expression pattern across orthologs of implicated genes (MGI orthology) 2. Calculate a distance matrix between disorders by computing the pairwise cosine distance between expression profiles 3. Cluster disorders using hierarchical cluster analysis OMIM Disease Clusters Complete linkage clustering Autism Candidate For a given gene list, embed expression similarity in 2D space Ex: ASD candidate genes from Wigler lab (CSHL) Cb (16 genes in high quality coronal data set) Calculate cosine distance matrix, and apply metric MDS Ctx Fgd3 Lhx1 MapT Ptpdc1 Doc2a Provide sub-groupings based on expression locus Next ? ... TR time Spatial components 1 Component 1 0.5 2 0 3 -0.5 4 0 2 fMRI 4 6 8 10 12 -1

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download MBI-Machiraju-lecture6 - Ohio State Computer Science and