* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Brain architecture and neuroinformatics: applications for
Essential gene wikipedia , lookup
Epigenetics in learning and memory wikipedia , lookup
Gene desert wikipedia , lookup
Metagenomics wikipedia , lookup
History of genetic engineering wikipedia , lookup
Public health genomics wikipedia , lookup
Pathogenomics wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Microevolution wikipedia , lookup
Genome evolution wikipedia , lookup
Minimal genome wikipedia , lookup
Genomic imprinting wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Ridge (biology) wikipedia , lookup
Genome (book) wikipedia , lookup
Mir-92 microRNA precursor family wikipedia , lookup
Designer baby wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Brain architecture and neuroinformatics: applications for speech and language systems Jason Bohland Assistant Professor Health Sciences & Neuroscience Boston University Brain Architecture How is the brain constructed ? What are the parts? What are the pathways? How does it develop? How do different functions engage the system? How do different levels of organization come together? Neuroinformatics google scholar Citation counts google n-gram viewer: http://books.google.com/ngrams References in books Neuroscience + information science Computer assisted analysis and management of collections of neurobiological data Databases, tools and models to integrate and analyze data Follow the path of molecular biology Neuroinformatics Building tools and resources Data analysis and integration Developing explanatory models Generating new hypotheses Talk Outline GODIVA: A neural model of speech sequencing Building tools and resources Data analysis and integration Developing explanatory models Generating new hypotheses Large-scale analysis of gene expression, and other, data Comparing brain atlases and labels Wrapup Modeling speech production DIVA models speech sensorimotor control mechanisms Learning articulatory – acoustic – somatosensory associations Learning sensorimotor chunks DIVA Guenther, Ghosh, and Tourville (2006) Speech planning Psycholinguistic theories based on: Speech error data Chronometric data and neuroscientific data based on: Clinical cases Functional brain imaging Suggest a complex circuit for speech planning that interfaces phonological representations with phonetic/sensorimotor representations Speech planning is a parallel process involving short-term working memory Competitive queuing architecture Parallel “item and order” STM Selection of next item for performance Recordings from macaque F5 during serial shape drawing task from Averbeck, Chafee, Crowe, and Georgopoulos (2002) The GODIVA model Extends DIVA to include explicit parallel representations of forthcoming utterances that interface with learned speech sensorimotor programs FILLERS SLOTS Bohland JW, Bullock D, and Guenther FH (2009), J Cog Neurosci Parallel planning representations Left Inferior Frontal Sulcus Columns categorically encode phonemes (or phoneme-like items) at a given position in the syllable primacy gradient Pre-SMA Columns encode abstract syllable frames Read out of position-specific cells starts word production GODIVA modules Planning loop through basal ganglia is used to enable competition within IFS zones Similar in spirit to action selection models (e.g., Mink and Thach, 1993) Sequence through syllable slots AND enable competition in the SSM GODIVA modules Speech Sound Map GODIVA posits new structure in DIVA’s Speech Sound Map (i.e., mental syllabary) A set of phonemes chosen in IFS activates phonologically matching SSM representations Gradient of activity represents degree of match GO signal from SMA through BG/thalamus and (anticipatory) completion signal enables selection of next winning program “go di va”: Simulation 1 “go di va”: Simulation 2 Constraining the model Neural models solve inverse problems Many models can be built to yield the same results, matching the same data – but are they plausible? The approach must be to add constraints 1. Better informed representations Multi-voxel pattern analysis? 2. Anatomical and functional connections Analysis of resting state and task driven fMRI datasets 3. Differences in the brains of healthy speakers and speakers with communication disorders Better databasing efforts from the clinical world Relevance for communication disorders GODIVA makes predictions: On DIVA: “while this model addresses phenomena that may be relevant to differential diagnosis of motor speech disorders in its current stage of development it has not been extended to make claims about the relationship between disrupted processing and speech errors in motor speech disorders.” (McNeil, 2004) Apraxia of speech due to destruction or inefficiency of IFS choice buffer to SSM plan cell projections Phonological paraphasia due to damage within IFS plan field or preSMA / IFS / BG loop Oren Civier has augmented to account for observations in individuals who stutter Dopaminergic hypothesis of stuttering Elevated DA levels (e.g., Wu et al 1997) prohibit BG from normal role in enabling cortical competition Heritability of speech / language disorders 6-8 million individuals have a form of language impairment 40% of all pediatric referrals relate to developmental disorders of speech, language or communication Heritability Index for stuttering estimated above 80% (Fagnani et al., 2011) ~50% for Specific Language Impairment and speech sound related disorders Identification of FOXP2 single point mutation as the cause of severe developmental dyspraxia in the KE family (Lai et al., 2001) Brain abnormalities associated with FOXP2 Compare the functional activations in a speech non-word sequencing task with areas of abnormal GM density in affected KE family members Bohland (2007) based on Watkins et al (2002) Brain. Bohland JW and Guenther FH (2006) NeuroImage The genotype / phenotype gap Ultimately we’d like to learn how genetic variability gives rise to behavioral variability Genetic linkage and association studies associate genes or loci with phenotypes Imaging genetics tries to provide intermediate brain-based biomarkers (endophenotypes from neuroimaging) to associate with genotypes Statistical power is severely lacking as the problem is underconstrained Knowing where genes are normally expressed might help to constrain the problem Molecular Architecture GODIVA: A neural model of speech sequencing Building tools and resources Data analysis and integration Developing explanatory models Generating new hypotheses Large-scale analysis of gene expression, and other, data Comparing brain atlases and labels Wrapup http://www.brain-map.org Allen Mouse Brain Atlas Genome-wide atlas of gene expression throughout the mouse brain (N=1,2 or a few mice/gene) 56 day-old (young adult) C57BL/6J mice High-throughput experiments using non-isotopic in situ hybridization Pipeline for sectioning, ISH, digital microscopy, image analysis, atlas registration Automatic quantification of gene expression Lower dimensional data volumes Large-scale exploratory analysis of the raw image data (1.07µm) is not feasible And cell-by-cell correspondence problem intractable Analyze binned expression volumes at 200 µm3 resolution 4,104 unique genes available from coronally sectioned brains Each volume is 67 x 41 x 58 voxels (about 50k brain voxels) Comparable to fMRI resolution Smoothed Expression Energy Sum of intensities of expressing cells / # of cells in the voxel An average over many cells of diverse types Lower dimensional data volumes Raw ISH Prox1 Sagittal section Coronal section Expression Energy Prox1 volume maximum intensity projections Large-scale data analysis How much structure is present across space and across genes? How would the brain segment on the basis of gene expression patterns (as opposed to Nissl, etc.)? What can we learn from the expression patterns of genes implicated in disorders? See also Bohland et al. (2009) Methods; Ng et al. (2009) Nature Neuroscience. Large-scale correlation structure Quality control → set of 3041 genes Combine gene volumes into a large matrix Decompose the voxel x gene matrix using singular value decomposition (SVD) modes s.v.’s voxels M≈ modes x x genes “weight” spatial pattern gene pattern Principal modes (SVD) Cerebral cortex Olfactory areas Hippocampus Retrohippocampal Striatum Pallidum Thalamus Hypothalamus Midbrain Pons Medulla Cerebellum All LH brain voxels plotted as projections on first 3 modes N=271 before we get to 90% of the variance N=67 before we get to 80% of the variance K means segmentation of anatomy K-means clustering: Dimensionality reduced (to 271) by truncating SVD Assign one of K labels to each voxel All voxels assigned the same label have more similar expression profiles than voxels with different labels Similarity defined by Euclidean distance Data-driven parcellation of mouse brain anatomy (level of granularity determined by K) K-means clustering results Clustering in cerebral cortex K = 40 (masked) ARA Area masks Divides aud/vis areas from somatosensory areas Classification of region membership Supervised learning using linear discriminant (25% test set, 10fold cross-validation) 94.5% correct overall Heritable “disease networks” Online Mendelian Inheritance in Man (OMIM) – – Contains records of genetic basis for ~4000 disorders Manually curated 94 unique entities that are of neurological / neuropsychiatric interest and intersect our gene set 1. For each disorder, calculate the mean expression pattern across orthologs of implicated genes (MGI orthology) 2. Calculate a distance matrix between disorders by computing the pairwise cosine distance between expression profiles 3. Cluster disorders using hierarchical cluster analysis OMIM disease clusters Complete linkage clustering Example: cerebellum disease cluster Dandy-Walker malformation (ZIC1) Lissencephaly syndrome, Norman-Roberts type (RELN) Cerebellar hypoplasia, VLDLR-associated (VLDLR) Enlarged 4th ventricle, partial or absent cerebellar vermis Both forms of cerebellar hypoplasia caused by Reelin mutations, lead to coordination disorders Spinocerebellar ataxia, autosomal recessive 8 (SYNE1) GABA-transaminase deficiency (ABAT) Cerebral palsy, spastic, symmetric, autosomal recessive (GAD1) Mental retardation, autosomal recessive, 6 (GRIK2) Involves atrophy of the cerebellum Clinical outcome includes cerebellar hyoplasia Cerebellar stimulation is used as treatment for spastic cerebral palsy Also associated with autism There is good agreement between areas expressing disorder genes and the areas implicated in the pathology. Can we help to identify candidate genes? Human brain gene expression atlas High-density microarrays conducted post-mortem Data matrix: ~62k probes x ~1000 brain samples Data from 3 adult brains (Ages 24, 39, and 57) We are also beginning to look at the Human Developing Transcriptome Project (http://brainspan.org) human.brain-map.org Genetics of speech / language Currently building tools to help integrate expression data with imaging and other results Basic underlying hypothesis: a gene’s expression pattern in the healthy (developing) brain should be predictive of the phenotype associated with its abnormality Literature curation 33 genes implicated in language-related phenotypes 40+ articles describing structural or functional differences observed in relevant patient groups vs. controls ~600 individual annotations – identified by label and/or MNIcoordinate Specific Language Impairment, Persistent Developmental Stuttering, Developmental Verbal Dyspraxia Curating genes of interest http://qnl.bu.edu/SLDB Data processing Microarray data are normalized across samples and renormalized across brains Collapse probes to single vector per gene Choose probe with highest correlation to all others Samples are indexed to MNI-space coordinates And also assigned an anatomical label from the Allen Human Brain Atlas and Ontology Comparisons across genes are always differential / relative Magnitude of expression is not very meaningful Calculate z-scores within gene (across samples) then compare Or look at measures of correlation across genes Speech language gene space Multidimensional scaling (MDS) provides a view of the “landscape” – Genes with more similar expression patterns appear closer together – SL genes “pile up” at low distances relative to random gene pairs Gene landscape via MDS Genetics of stuttering GNPTAB: N-acetylglucosamine-1-phosphate transferase, alpha and beta subunits GNPTG: N-acetylglucosamine-1-phosphate transferase, gamma subunit NAGPA: N-acetylglucosamine-1-phosphodiester alpha-N-acetylglucosaminidase Drayna et al. (2011) Genes linked to stuttering It is difficult to ascribe a role in stuttering to this ubiquitous biological pathway Mutations of GNPTAB and GNPTG cause variants of mucolipidosis Stuttering mutations appear to be mis-sense rather than deletions, stop insertions NAGPA not previously linked to human disease Can examining brainwide expression patterns help bring these genes into our contemporary theories of stuttering? Spearman rank correlations between expression patterns: brain1 GNPTG/GNPTAB: -0.22, -0.31 (8th, 1st %) GNPTAB/NAGPA: 0.38, 0.14 (88th,73rd) GNPTG/NAGPA: 0.30,0.10 (82nd, 64th) Stuttering and the basal ganglia GNPTG GNPTAB NAGPA GNPTG has particularly high (relative) expression in globus pallidus (98th/100th percentile) and cingulum bundle, corpus callosum GNPTAB (7th/2nd percentile) and NAGPA (3rd/2nd percentile) have complementary, extreme expression patterns in pallidum and WM structures The brain areas on the tails of the expression distributions are sensible in current theories of stuttering How can we bring molecular mechanisms, preferentially localized to specific brain systems, into our theories? Genetics of stuttering Linkage studies find suggestive evidence for a gene within a (relatively large) locus Pinpointing the gene or genes involved can be expensive, time-consuming Can we use expression patterns to prioritize the search? Look for genes that are preferentially expressed in areas or circuits of interest Look for genes with expression patterns that mirror other genes of interest Genetics of stuttering 99% 95% Highly correlated pairs: STXBP5L / NAGPA TIMMDC1 (C3orf1) / GNPTG ZDHHC23 / NAGPA ATG3 / GNPTG ZDHHC23 / GNPTAB Regional Architecture GODIVA: A neural model of speech sequencing Building tools and resources Data analysis and integration Developing explanatory models Generating new hypotheses Large-scale analysis of gene expression, and other, data Comparing brain atlases and labels Wrapup Where are the regions? Functional specialization has been a dominant paradigm in understanding speech / language But different researchers disagree on, or inconsistently use region definitions In many cases, there is no definition Rauschecker and Scott (2009), Nat Neurosci supplement The brain atlas concordance problem The nomenclature problem is long-standing in neuroscience People tend to think (and report) in regions, not coordinates In human brain imaging, we have a chance to address this quantitatively by comparing different atlases delineated in a common template (MNI-305) space Atlas # Regions (LH) Brief Description AAL 62 Manual parcellation of Colin27 atlas CYTO 29 Maximum likelihood cytoarchitectonic atlas in MNI space H-O 56 Maximum likelihood atlas from manually labeled scans ICBM 49 Individual parcellation of Colin27 atlas LPBA 29 Maximum likelihood from manually labeled scans (SPM registered) T&G 65 Freesurfer-classified individual atlas, tweaked by human expert TALc 68 Brodmann’s area labels mapped to MNI space with icbm2spm TALg 49 Gyrus-level Talairach atlas mapped to MNI as above http://qnl.bu.edu/obart Comparison methods Multiple measures of region overlap may be defined: i Non-symmetric: j e.g. the proportion of region i from parcellation R contained in region j from parcellation R’ ' P( x ∈ rj | x ∈ ri ) Symmetric: e.g. the spatial overlap relative to the geometric mean of the 2 region sizes Both measures are normalized and bounded ( between 0 and 1 ) Cij This matrix has non-zero entry for any pair of regions (from 8 atlases) that overlap Single example: “Superior Temporal” region from the ICBM atlas ICBM: superior temporal gyrus (100%) LPBA: superior temporal gyrus (72%) TALg: superior temporal gyrus (47%) AAL: middle temporal gyrus (36%) AAL: superior temporal gyrus (33%) AAL: temporal pole (22%) TALg: middle temporal gyrus (17%) ICBM: superior temporal gyrus (100%) T&G: aSTg (94%) T&G: pdSTs (88%) CYTO: TE-1.2 (87%) H-O: STG anterior division (86%) T&G: adSTs (83%) T&G: pSTg (82%) Bohland et al. (2009). PLoS ONE All connections LPBA40 ATLAS HARVARD OXFORD ATLAS Edges encode max(Cij, Cji) After pruning LPBA40 ATLAS HARVARD OXFORD ATLAS Eij < 0.25 pruned Global atlas similarity Quantify how similar any two atlases are to one another How able are you to translate from one to the other? Compare against a distribution of similarity of random atlas pairs AAL Atlas Example Random Atlas Global atlas similarity ( ) min ( r , r ) = X ij = max Cij , C ji U ij Wij = i j 0 if X ij > 0 otherwise U ij ∑U ij ∑ ( S= 1 − 4 Wij X ij 1 − X ij 1000 random pairs used in simulations Green values are similarity scores above 95th percentile ij ) Mining brain architecture to build theory GODIVA: A neural model of speech sequencing Building tools and resources Data analysis and integration Developing explanatory models Generating new hypotheses Large-scale analysis of gene expression, and other, data Comparing brain atlases and labels Wrapup Information integration Brain architecture can be specified at multiple levels of organization To better understand speech and language systems, we’ll need to integrate across levels Common spatial localization (to areas and circuits) can be a mechanism, but we need to be more quantitative and precise Integrating large data sets invariably leads to new hypotheses which can be tested with more focused experiments Future wish list A common clearinghouse for speech neuroscience We’d like to lead the way (with your help) Integrate experimental, clinical, and modeling datasets All linked through localization in the brain Some items I think we’re missing: Public aphasia databases with consistent data and metadata Large N datasets with different patient groups – including behavioral, imaging, and genetic data Large N developmental brain imaging data with corresponding linguistic metadata Post-mortem analyses in patient groups Computational models that treat biological issues of development in parallel / addition to learning Acknowledgments Quantitative Neuroscience Lab Noah Kelley Esther Kim Chris Johnson Emma Myers Sara Saperstein Collaborators Mike Hawrylycz (Allen Institute) Partha Mitra (CSHL) Pascal Grange (CSHL) Hemant Bokil (Boston Scientific) Leo Grady (Siemens) Frank Guenther (BU) Dan Bullock (BU) http://qnl.bu.edu Extra Slides Localization of expression Normalized Expression Energy 0.014 Least localised More localised Kullback-Leibler (KL) divergence from (spatial) uniformity 0.012 0.01 p( x) KL ( p || q ) = ∑ p ( x) log x q( x) N vox M ( x, g ) KL( g ) = ∑ M ( x, g ) log x =1 1/ N vox 0.008 0.006 0.004 0.002 0 10 20 30 40 50 60 70 80 90 100 Non-localized expression pattern Voxels Well-localized expression pattern Gene localization filter Select most localized genes (KL > ~1.56) to further analyze Threshold voxels based on intensity histogram of summed expressions Remaining LH mask (6102 voxels) essentially excludes cerebral cortex summed thresholded Biclustering genes and voxels Can we group genes that are each highly localized to common brain regions (sets of voxels)? Construct a bipartite graph with N (200) genes in vertex set V1 and M (~6000) mask voxels in V2 V1 V2 Components contain both voxels and genes Here we used the isoperimetric algorithm (Grady and Schwartz, 2006). VOXELS Apply graph partitioning methods to cut graph into connected components GENES Edges are expression levels of each gene at each voxel Biclustering localized genes Resulting voxel clusters correspond well to individual anatomical regions, w/ functionally relevant gene lists 97% of energy in the cerebellum 10 GO P-values 5 CTX OLF HIP RHP STR PAL TH HY MB P MY CB Cluster 1 4 3 2 1 0 GO ID's, p<0.05, for Cluster 1 -2 phosphorus metabolic process 40 genes phosphate metabolic process biopolymer modification 10 biopolymer metabolic process -4 hindbrain dev elopment TRPT phosphatase signaling pathway cerebellum dev elopment metencephalon dev elopment 10 response to extracellular stimulus response to nutrient lev els -6 0 5 10 15 20 25 GO ID's, p<0.05, for Cluster 2 Highly localized to ventricle system GO P-values 10 -2 " di-, tri-v alent inorganic cation transport" 29 genes death cell death 10 apoptosis -4 programmed cell death protein amino acid phosphorylation phosphorylation reg. epithelial cell differentiation 10 reg. cell differentiation lact ation -6 0 5 10 15 20 25 Biclustering localized genes 5 GO ID's, p<0.05, for Cluster 3 CTX OLF HIP RHP STR PAL TH HY MB P MY CB Cluster 1 4 3 2 1 0 Results shown are for 13 biclusters 69% of energy in dentate gyrus, 20% Ammon’s horn GO P-values 10 -2 30 genes reg. cell cycle cell recognition + reg. cell cycle 10 death -4 axon guidance neuron morphogenesis during differentiation axonogenesis neurite morphogenesis 10 neurite dev elopment neuron recognition -6 0 5 10 15 20 25 GO ID's, p<0.05, for Cluster 4 99% of energy in thalamus GO P-values 10 -2 small GTPase mediated signal transd uct ion cell proj ection org. and biogenesis 11 genes cell proj ection morphogenesis 10 cell part morphogenesis -4 branching morphogenesis of a tube neuron morphogenesis during differentiation axonogenesis neurite morphogenesis 10 neurite dev elopment axon guidance -6 0 5 10 15 20 25 Back to Architecture Gene expression offers insight into the molecular architecture of the brain Comparative studies may inform evolutionary theories (data from zebra finch and non-human primates are now available) Will help to connect development (cf. learning) with theories of adult brain function Circuits are genetically programmed then modified during learning Current large push to systematically map brain connections and use these to additionally inform disorders (disconnection syndromes) ADHD200 Global Competition Goal: Diagnose ADHD (and 3 subtypes) based on MR images of ADHD+ & TDC children (7-21 yrs) T1-weighted anatomical scans ~6 minute resting state fMRI scans Phenotypic metadata Large multi-site dataset (700+ subjects from 8 sites) Large, consistent database is essential to making this project feasible With Sara Saperstein (Boson University) & Leo Grady (Siemens Corporate Research) http://fcon_1000.projects.nitrc.org/indi/adhd200/ ADHD200 Global Competition Approach: “Kitchen sink” Our team finished 5th out of 21 teams Pipeline processing using FreeSurfer, custom software, and WEKA (for feature selection and classification) Region-level network formed based on AAL atlas ROIs Anatomical and network features were commonly selected and used to train classifier (20k+ features calculated) Functional connectivity measure that yielded best results was a sparse regularized inverse covariance approach Also performed consistently well in Smith et al. (2011), NeuroImage 85% correct for ADHD vs. control in 10-fold cross-validation 85% correct for ADHD subtype in 10-fold cross-validation HIGHER-ORDER SPATIAL RELATIONSHIPS Although there may not be a one-to-one correspondence between regions from 2 atlases, there may be one-to-many or many-to-many correspondences. Question: how to extract these relationships automatically from our Cij matrix? For any two atlases, construct a bipartite graph, B = (V1 + V2, E) • V1, V2 represent regions in two atlases • Weighted edges represent spatial coupling = Eij max ( Cij , C ji ) , i ∈ V1 , j ∈ V2 Spatial co-expression predicts interactions These analyses are based on “co-expression” in a local brain area (voxel) – but does that predict gene interactions? Examine genes in our set that are found in the Reactome database From 299 x 299 distance matrix With 471 known interactions Are these genes’ profiles more similar than noninteracting genes? Red line: Inter-gene distance density for genes participating in common reactions (Reactome) Blue lines: 1st, 50th, and 99th percentile density distribution for randomly sampled distance pairs Autism candidate gene expression space For a given gene list, embed expression similarity in 2D space Ex: ASD candidate genes from Wigler lab (CSHL) Cb (16 genes in high quality coronal data set) Calculate cosine distance matrix, and apply metric MDS Provide sub-groupings based on expression locus Ctx Fgd3 Lhx1 MapT Ptpdc1 Doc2a Brainwide connectivity studies We assembled a working group to advocate for large-scale connectivity projects, and to suggest how they can be performed Funded proposal (NIMH) for creation of a semi-automated pipeline for high throughput neuroanatomy Project started at CSHL. Collaboration and possible informatics subprojects here at BU. Wide-field slide scanning (transgenic + double fluorescent label)