* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download CMSC 838T – Lecture 11 Gene Expression
Comparative genomic hybridization wikipedia , lookup
Genomic imprinting wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Ridge (biology) wikipedia , lookup
Molecular cloning wikipedia , lookup
Genome evolution wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Non-coding DNA wikipedia , lookup
Gene expression wikipedia , lookup
Molecular evolution wikipedia , lookup
Real-time polymerase chain reaction wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Gene regulatory network wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Community fingerprinting wikipedia , lookup
CMSC 838T – Lecture 11 X X Biological networks 0 Gene networks 0 Gene regulation networks 0 Metabolic networks DNA microarrays 0 Construction 0 Data analysis Affymetrix GeneChip Scanner 3000 CMSC 838T – Lecture 11 Gene Expression X X Gene expression 0 Genes are expressed when they are transcribed onto RNA 0 Amount of mRNA indicates gene activity O No mRNA → gene is off O mRNA present → gene is on & performing function Biologically 0 Some genes are always expressed in all tissues O Estimated 10,000 housekeeping / ubiquitous genes 0 Other genes are selectively on O Depending on tissue, disease, and/or environment 0 Change in environment → change in gene expression O So organism can respond CMSC 838T – Lecture 11 1 Biological Networks X X Gene expression does not happen in isolation 0 Individual genes code for function O Produce mRNA → protein performing function 0 Sets of genes can form pathways O Gene products can turn on / off other genes 0 Sets of pathways can form networks O When pathways interact Biology is a study of networks 0 Genes 0 Proteins 0 Etc… CMSC 838T – Lecture 11 Biological Networks & DNA Microarrays X Overview 0 Biological networks 0 DNA microarray construction 0 Microarray data analysis CMSC 838T – Lecture 11 2 Types of Biological Networks X Genetic network 0 X X Interactions between genes, gene products, small molecules Gene regulation network 0 Network of control decisions to turn genes on / off 0 Subset of genetic network Metabolic network 0 Network of interactions between proteins 0 Synthesize / break down molecules (enzymes, cofactors) CMSC 838T – Lecture 11 Gene Regulation Network CMSC 838T – Lecture 11 3 Examining Biological Networks – Benefits X X X Learn about gene function / regulation 0 Tissue differentiation 0 Response to environmental factors Identify / treat diseases 0 Discover genetic causes of disease 0 Evaluate effect of drugs Detect impact of DNA sequence variation (mutations) 0 Detection of mutations (e.g., SNPs) 0 Genetic typing CMSC 838T – Lecture 11 Examining Biological Networks – Approach X Measure protein / mRNA in cells 0 In different tissues (e.g., brain vs. muscle) O Find gene / protein with tissue-specific function 0 As environment changes O Find genes / proteins responsible for response 0 In healthy & diseased tissues O Find proteins / genes responsible for disease (if any) O Help identify diseases based on gene expression 0 In different individuals O Detect DNA sequence variation CMSC 838T – Lecture 11 4 Examining Biological Networks X Indirect approach 0 Measure mRNA production (gene expression) in cell O Random ESTs O DNA microarray 0 Advantages O High throughput O Can test large variety of mRNA simultaneously 0 Disadvantages O RNA level not always correlated with protein level / function O Misses changes at protein level O Results may thus be less precise CMSC 838T – Lecture 11 Examining Biological Networks X Direct approach 0 Measure protein production / interaction in cell O 2D electrophoresis O Mass spectroscopy O Protein microarray 0 Advantages O Precise results on proteins 0 Disadvantages O Low throughput (for now) CMSC 838T – Lecture 11 5 Biological Networks & DNA Microarrays X Overview 0 Biological networks 0 DNA microarray construction 0 Microarray data analysis CMSC 838T – Lecture 11 DNA Microarray – Affymetrix System Complete Affymetrix GeneChip instrumentation system CMSC 838T – Lecture 11 6 DNA Microarray X Experimental method for measuring RNA in cell X Microarray construction X 0 Short single-stranded DNA sequences (probes) O cDNA sequences (200+ nucleotides) O Oligomers (25-80 nucleotides) 0 Probes attached to glass slide at known fixed locations O High precision robotics (spotted cDNA / oligomers) O Photolithography (in situ oligomers) 0 Miniaturization is key O Measure many (100,000+) genes at once O Small amounts mRNA needed Works by hybridization of complementary DNA CMSC 838T – Lecture 11 DNA Microarray – Hybridization Heat Cool Denaturation Hybridization Separating DNA into single strands Forming double-stranded DNA (only if strands are complementary) CMSC 838T – Lecture 11 7 DNA Microarray Design & Analysis X Microarray construction 0 X Array design 0 X Basic statistics, reproducibility… Higher-level data analysis (of multiple samples) 0 X Spot detection, normalization, quantization Primary (hybridization) data analysis 0 X Choosing probe sequences Image processing of scanned images 0 X Spotted cDNA arrays, in situ photolithography… Clustering, self-organizing-maps… Sample tracking and database of results CMSC 838T – Lecture 11 DNA Microarray – Spotted Arrays X X Construction 0 Drops (spots) of cDNA fragments as probes 0 Attach to glass slide / nylon array at known locations 0 Use mechanical pins & robotics Use 0 Label cDNA with fluorescent dyes (fluor) 0 Apply comparative hybridization O More accurate than directly measuring intensity 0 Measure contrast in intensity 0 Use laser / CCD scanner CMSC 838T – Lecture 11 8 DNA Microarray – Spotted Arrays 1) Create desired cDNA fragments for use as probes 2) Use high precision robot spotter 3) Place spots of cDNA on glass microscope slides CMSC 838T – Lecture 11 DNA Microarray – Automatic Detection free label DNA microarray excitation of bound label imaging of surface-confined fluorescence CCD camera 9 DNA Microarray – Comparative Hybridization X Goal 0 X Measure relative amount of mRNA expressed Algorithm 1. Choose cell populations 2. mRNA extraction and reverse transcription 3. Fluorescent labeling of cDNA’s (normalized) 4. Hybridization to microarray 5. Scan the hybridized array 6. Interpret scanned image CMSC 838T – Lecture 11 DNA Microarray – Comparative Hybridization CMSC 838T – Lecture 11 10 Comparative Hybridization – Output X Color determined by relative RNA concentrations X Brightness determined by total concentration Gene expressed in A Gene expressed in A & B Gene expressed in B CMSC 838T – Lecture 11 Comparative Hybridization – Issues X X X Choosing cell populations 0 Find cells with selective gene expression 0 Provides hints of gene function Reverse transcription 0 Extract mRNA from cells, purify, transcribe to cDNA 0 mRNA may be partially transcribed, selectively transcribed 0 Result = reverse transcription bias Fluorescent labeling 0 cDNA bound with fluorescent dyes (fluors) 0 Solutions diluted to normalize brightness 0 Assumes fluorescence level directly proportional to mRNA level CMSC 838T – Lecture 11 11 DNA Microarray – Affymetrix Arrays X X X Construction 0 Synthesize oligomers in situ using photolithography 0 $500,000 per set of masks, $300 per chip Probe set 0 Create multiple oligomers per cDNA O Since short individual 25-mers 0 Place negative control next to each probe O With exactly one mismatched base at center to track / calibrate mismatches Use 0 Label cDNA, fragment & hybridize 0 Stain labeled cDNA with (single) fluorescent dye 0 Measure intensity using special CCD scanner CMSC 838T – Lecture 11 DNA Microarray – Photolithography 1) Use photolithography Affymetrix DNA microarray 2) Create 25-mer oligomers on glass slide directly 500,000 oligomers in 1.28 cm2 CMSC 838T – Lecture 11 12 CMSC 838T – Lecture 11 attach biotin incubate at 94o w/ chemicals stain attaches to biotin measure level of stain CMSC 838T – Lecture 11 13 DNA Microarray – Probe Set CMSC 838T – Lecture 11 DNA Microarray – Array Design X Choice of probe 0 Include genes of interest O Examine sequence databases 0 Avoid redundancy O No duplicate probes 0 Avoid cross hybridization X Can use software to help choose probes X Or simply buy pre-designed arrays 0 Complete genomes of yeast, Drosophila, C. elegans 0 33,000+ human genes from GenBank RefSeq on 2 microarrays 0 Expensive but labor-saving CMSC 838T – Lecture 11 14 DNA Microarray – Affymetrix Genome Arrays CMSC 838T – Lecture 11 DNA Microarray – Variability & Errors X X Sources of (undesirable) variability 0 RNA extraction 0 Probe labeling 0 Hybridization kinetics (temperature, time, mixing…) 0 Image analysis 0 Biological variability Sources of error 0 Image artifacts O Dust / bubbles in array O Spillover from bright spot to neighboring dark spots 0 Self / cross hybridization O cDnA hybridize with each other, mismatched probes CMSC 838T – Lecture 11 15 DNA Microarray – Image Processing X X Approach 1. Scan the array 2. Quantify each spot 3. Subtract background 4. Normalize intensity (across samples) 5. Calculate expression ratios (log scale) vs. control 6. Export table of fluorescent intensities for each gene Affymetrix software 0 Automatic image processing 0 Precision O Around 2% variation in measurements O Less than normal biological variability CMSC 838T – Lecture 11 Microarray Image Processing – Expression Ratio Calculating expression ratios 0 After filtering, correction, & normalization 0 Find genes with large contrasts in expression level 0 Provides data for single microarray Ratio of signal intensity Cy5 signal (log2) X Ratio < –2x Ratio > +2x Cy3 signal (log2) CMSC 838T – Lecture 11 16 Biological Networks & DNA Microarrays X Overview 0 Biological networks 0 DNA microarray construction 0 Microarray data analysis CMSC 838T – Lecture 11 DNA Microarray – Experiment Data X X Experiment design 0 Measure level of multiple mRNA (i.e., single microarray test) 0 As one or more experimental conditions vary O Time elapsed O Pathogen / drug exposure O Different tissues 0 Result is a multidimensional data O mRNA level × tissue × drug exposure × time × … Types of questions 0 What genes are up / down regulated? 0 What genes are over / under-expressed in diseased state? 0 What gene regulation networks exist? 0 Need rigorous statistical analysis to determine significance! CMSC 838T – Lecture 11 17 DNA Microarray – Data Analysis X Interpreting microarray data (vs. time) 0 Gene A expressed after gene B O B positively regulates A 0 Gene A expression stops after gene B O B negatively regulates A 0 Gene A & B expressed independently O A & B do not regulate each other 0 Gene A & B expressed at same time O A & B co-regulated 0 Gene A & B not expressed at same time O A & B not co-regulated 0 Etc… CMSC 838T – Lecture 11 DNA Microarray – Data Analysis X X Higher level microarray data analysis 0 Clustering and pattern detection 0 Data mining and visualization 0 Controls and normalization of results 0 Statistical validation 0 Linkage between gene expression data and gene sequence / function / metabolic pathways databases 0 Discovery of common sequences in co-regulated genes 0 Meta-studies using data from multiple experiments Goals 0 Discover gene, genetic networks 0 Classification of biological processes 0 Infer biological function CMSC 838T – Lecture 11 18 DNA Microarray – Multivariate Analysis X X X Multivariate analysis 0 Analyzing data with multiple response variables 0 Multidimensional data from multiple experimental factors Approaches 0 Hierarchical vs. non-hierarchical 0 Divisive vs. agglomerative 0 Supervised vs. unsupervised Clustering 0 Separating data into related groups (clusters) 0 Uses O Find genes with similar expression patterns O Find relationships between expression patterns CMSC 838T – Lecture 11 DNA Microarray – Multivariate Analysis X Clustering approaches 0 Herarchical clustering O Link similar genes, build tree 0 K-means testing O Separate data into exactly K clusters (for predetermined K) 0 Self Organizing Maps (SOM) O Genes find similar groups (using neural networks) 0 Principle component analysis O Treat every gene as a dimension (vector) O Separate genes using singular value decomposition (SVD) 0 Support vector machine O Train machine based on labeled test cases O Use machine algorithm to cluster genes CMSC 838T – Lecture 11 19 DNA Microarray – Pairwise Distances X Clustering method may require calculating distance X Metric distances 0 Satisfies 4 conditions (for all x,y,z) O Positive definite → d(x,y) ≥ 0 O Symmetric → d(x,y) = d(y,x) O Zero distance to self → d(x,x) = 0 O 0 X Triangle inequality → d(x,y) ≤ d(x,z) + d(y,z) Example – Euclidean distance Semi-metric distance 0 Satisfies first 3 conditions only (not triangle inequality) 0 Example – Pearson correlation coefficient CMSC 838T – Lecture 11 DNA Microarray – Cluster Distances X X Merging clusters by minimizing… 0 Inter-cluster distances (single linkage) 0 Maximum intra-cluster distance (complete linkage) 0 Average intra-cluster distances (UPGMA) 0 Distance between center of clusters (centroid) Choice depends on desired efficiency & robustness 0 Single linkage less robust single linkage complete linkage UPGMA / centroid CMSC 838T – Lecture 11 20 DNA Microarray – Hierarchical Clustering X Approach 0 Bottom-up approach (agglomerative) O Begin with all genes in individual cluster O Repeated merge closest clusters 0 Top-down approach (divisive) O Begin with all genes in same cluster O Repeatedly split cluster into parts 0 Produces dendogram (unrooted tree) Genes Time CMSC 838T – Lecture 11 Microarray – Iterative Clustering Methods X X K-means clustering 1. Pick K vectors 2. Assign genes to closest of K vectors 3. Pick new K vectors as center of each cluster 4. Repeat until clusters are stable Self-organizing maps (SOM) 1. Pick K partitions 2. User defines geometric configuration for partitions 3. Generate random vector for each partition 4. Randomly pick gene 5. Adjust closest vector to be more similar to vector for gene 6. Repeat until vectors are stable CMSC 838T – Lecture 11 21 DNA Microarray – SOMs from GeneCluster CMSC 838T – Lecture 11 DNA Microarray – Multivariate Analysis X X Principal component analysis 0 Linear method 0 Treat every gene as a dimension (vector) 0 Separate genes using singular value decomposition (SVD) O Finds linear combinations of vectors to separate data O Diagonalization of covariance matrix 0 Projects complex data sets onto reduced dimensionality space 0 Easier to pick out clusters (for use with K-means, SOM) Support vector machine 0 Supervised learning approach 0 Start with positive / negative examples (training set) 0 Train machine to recognize cluster types 0 Use machine to cluster data CMSC 838T – Lecture 11 22 DNA Microarray – Multivariate Analysis X Observations 0 Distance metric very important 0 Clustering method less important 0 Choosing number / sizes of clusters O Need to examine data O Look for large gaps between data 0 Handling large multidimensional data sets O Dimension reduction O Visualization techniques 0 Handling noise in data O Robust metrics for calculating distance between genes / clusters CMSC 838T – Lecture 11 DNA Microarray – Multivariate Analysis X Cluster / Treeview analysis software 0 Permutes gene order (by cluster) for display CMSC 838T – Lecture 11 23 DNA Microarray – Experimental Data X X Information needed each microarray spot 0 Gene ID 0 Signal, background intensity 0 Array characteristics (layout, substrate, date produced) 0 Hybridization conditions (method, buffer composition) 0 Labeling conditions (method, enzyme, fluorochrome) 0 RNA extraction conditions (method, mass of tissue) 0 Tissue treatment conditions (type, duration, intensity) 0 Etc... Can store as entries in relational database (RDMS) CMSC 838T – Lecture 11 DNA Microarray – Experimental Data X Standard for microarray experiments 0 X MIAME: Minimum Information About a Microarray Experiment Data required includes 0 Experimental design: whole set of hybridisation experiments 0 Array design: each array used, each element (spot) on array 0 Samples: samples used, extract preparation, and labelling 0 Hybridization: procedures and parameters 0 Measurements: images, quantization, specifications 0 Normalisation controls: types, values, specifications CMSC 838T – Lecture 11 24 DNA Microarray – Other Uses X X X Mutation detection 0 Microarray probes representing all known alleles 0 Mismatch probes detect single-nucleotide mutations (SNPs) Disease diagnosis 0 Accurate expression profiles of diseases (especially cancer) 0 Example O Diffuse large B-cell vs. follicular lymphomas O Microarray analysis of cancer tissue found significant differences in expression level of 30 of 6817 human genes O 91% correct diagnosis rate substantial improvement O Microarray analysis after treatment predicts survival rates Gene finding 0 High throughput sampling of expressed mRNA CMSC 838T – Lecture 11 DNA Microarray – Summary X DNA microarray 0 Able to extensively detect / identify wide variety mRNA 0 Much data processing at all levels O Image processing O Data filtering (for single array) O Data analysis (for multiple arrays) 0 Can yield much useful data 0 Collection & storage of microarray data not yet standardized CMSC 838T – Lecture 11 25