Download CMSC 838T – Lecture 11 Gene Expression

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Comparative genomic hybridization wikipedia , lookup

Genomic imprinting wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Ridge (biology) wikipedia , lookup

Molecular cloning wikipedia , lookup

Genome evolution wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Non-coding DNA wikipedia , lookup

Gene expression wikipedia , lookup

Molecular evolution wikipedia , lookup

Real-time polymerase chain reaction wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Gene wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Gene regulatory network wikipedia , lookup

Silencer (genetics) wikipedia , lookup

RNA-Seq wikipedia , lookup

Community fingerprinting wikipedia , lookup

Gene expression profiling wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
CMSC 838T – Lecture 11
X
X
Biological networks
0
Gene networks
0
Gene regulation networks
0
Metabolic networks
DNA microarrays
0
Construction
0
Data analysis
Affymetrix GeneChip
Scanner 3000
CMSC 838T – Lecture 11
Gene Expression
X
X
Gene expression
0
Genes are expressed when they are transcribed onto RNA
0
Amount of mRNA indicates gene activity
O No mRNA
→ gene is off
O mRNA present → gene is on & performing function
Biologically
0
Some genes are always expressed in all tissues
O Estimated 10,000 housekeeping / ubiquitous genes
0
Other genes are selectively on
O Depending on tissue, disease, and/or environment
0
Change in environment → change in gene expression
O So organism can respond
CMSC 838T – Lecture 11
1
Biological Networks
X
X
Gene expression does not happen in isolation
0
Individual genes code for function
O Produce mRNA → protein performing function
0
Sets of genes can form pathways
O Gene products can turn on / off other genes
0
Sets of pathways can form networks
O When pathways interact
Biology is a study of networks
0
Genes
0
Proteins
0
Etc…
CMSC 838T – Lecture 11
Biological Networks & DNA Microarrays
X
Overview
0
Biological networks
0
DNA microarray construction
0
Microarray data analysis
CMSC 838T – Lecture 11
2
Types of Biological Networks
X
Genetic network
0
X
X
Interactions between genes, gene products, small molecules
Gene regulation network
0
Network of control decisions to turn genes on / off
0
Subset of genetic network
Metabolic network
0
Network of interactions between proteins
0
Synthesize / break down molecules (enzymes, cofactors)
CMSC 838T – Lecture 11
Gene Regulation Network
CMSC 838T – Lecture 11
3
Examining Biological Networks – Benefits
X
X
X
Learn about gene function / regulation
0
Tissue differentiation
0
Response to environmental factors
Identify / treat diseases
0
Discover genetic causes of disease
0
Evaluate effect of drugs
Detect impact of DNA sequence variation (mutations)
0
Detection of mutations (e.g., SNPs)
0
Genetic typing
CMSC 838T – Lecture 11
Examining Biological Networks – Approach
X
Measure protein / mRNA in cells
0
In different tissues (e.g., brain vs. muscle)
O Find gene / protein with tissue-specific function
0
As environment changes
O Find genes / proteins responsible for response
0
In healthy & diseased tissues
O Find proteins / genes responsible for disease (if any)
O Help identify diseases based on gene expression
0
In different individuals
O Detect DNA sequence variation
CMSC 838T – Lecture 11
4
Examining Biological Networks
X
Indirect approach
0
Measure mRNA production (gene expression) in cell
O Random ESTs
O DNA microarray
0
Advantages
O High throughput
O Can test large variety of mRNA simultaneously
0
Disadvantages
O RNA level not always correlated with protein level / function
O Misses changes at protein level
O Results may thus be less precise
CMSC 838T – Lecture 11
Examining Biological Networks
X
Direct approach
0
Measure protein production / interaction in cell
O 2D electrophoresis
O Mass spectroscopy
O Protein microarray
0
Advantages
O Precise results on proteins
0
Disadvantages
O Low throughput (for now)
CMSC 838T – Lecture 11
5
Biological Networks & DNA Microarrays
X
Overview
0
Biological networks
0
DNA microarray construction
0
Microarray data analysis
CMSC 838T – Lecture 11
DNA Microarray – Affymetrix System
Complete Affymetrix GeneChip
instrumentation system
CMSC 838T – Lecture 11
6
DNA Microarray
X
Experimental method for measuring RNA in cell
X
Microarray construction
X
0
Short single-stranded DNA sequences (probes)
O cDNA sequences (200+ nucleotides)
O Oligomers (25-80 nucleotides)
0
Probes attached to glass slide at known fixed locations
O High precision robotics (spotted cDNA / oligomers)
O Photolithography (in situ oligomers)
0
Miniaturization is key
O Measure many (100,000+) genes at once
O Small amounts mRNA needed
Works by hybridization of complementary DNA
CMSC 838T – Lecture 11
DNA Microarray – Hybridization
Heat
Cool
Denaturation
Hybridization
Separating DNA into
single strands
Forming double-stranded DNA
(only if strands are complementary)
CMSC 838T – Lecture 11
7
DNA Microarray Design & Analysis
X
Microarray construction
0
X
Array design
0
X
Basic statistics, reproducibility…
Higher-level data analysis (of multiple samples)
0
X
Spot detection, normalization, quantization
Primary (hybridization) data analysis
0
X
Choosing probe sequences
Image processing of scanned images
0
X
Spotted cDNA arrays, in situ photolithography…
Clustering, self-organizing-maps…
Sample tracking and database of results
CMSC 838T – Lecture 11
DNA Microarray – Spotted Arrays
X
X
Construction
0
Drops (spots) of cDNA fragments as probes
0
Attach to glass slide / nylon array at known locations
0
Use mechanical pins & robotics
Use
0
Label cDNA with fluorescent dyes (fluor)
0
Apply comparative hybridization
O More accurate than directly
measuring intensity
0
Measure contrast in intensity
0 Use
laser / CCD scanner
CMSC 838T – Lecture 11
8
DNA Microarray – Spotted Arrays
1) Create desired
cDNA fragments
for use as probes
2) Use high precision
robot spotter
3) Place spots of cDNA on
glass microscope slides
CMSC 838T – Lecture 11
DNA Microarray – Automatic Detection
free label
DNA microarray
excitation
of bound
label
imaging
of surface-confined
fluorescence
CCD camera
9
DNA Microarray – Comparative Hybridization
X
Goal
0
X
Measure relative amount of
mRNA expressed
Algorithm
1. Choose cell populations
2. mRNA extraction and
reverse transcription
3. Fluorescent labeling of
cDNA’s (normalized)
4. Hybridization to microarray
5. Scan the hybridized array
6. Interpret scanned image
CMSC 838T – Lecture 11
DNA Microarray – Comparative Hybridization
CMSC 838T – Lecture 11
10
Comparative Hybridization – Output
X
Color determined by relative RNA concentrations
X
Brightness determined by total concentration
Gene
expressed
in A
Gene
expressed
in A & B
Gene
expressed
in B
CMSC 838T – Lecture 11
Comparative Hybridization – Issues
X
X
X
Choosing cell populations
0
Find cells with selective gene expression
0
Provides hints of gene function
Reverse transcription
0
Extract mRNA from cells, purify, transcribe to cDNA
0
mRNA may be partially transcribed, selectively transcribed
0
Result = reverse transcription bias
Fluorescent labeling
0
cDNA bound with fluorescent dyes (fluors)
0
Solutions diluted to normalize brightness
0
Assumes fluorescence level directly proportional to mRNA level
CMSC 838T – Lecture 11
11
DNA Microarray – Affymetrix Arrays
X
X
X
Construction
0
Synthesize oligomers in situ using photolithography
0
$500,000 per set of masks, $300 per chip
Probe set
0
Create multiple oligomers per cDNA
O Since short individual 25-mers
0
Place negative control next to each probe
O With exactly one mismatched base at
center to track / calibrate mismatches
Use
0
Label cDNA, fragment & hybridize
0
Stain labeled cDNA with (single) fluorescent dye
0
Measure intensity using special CCD scanner
CMSC 838T – Lecture 11
DNA Microarray – Photolithography
1) Use photolithography
Affymetrix DNA
microarray
2) Create 25-mer
oligomers on glass
slide directly
500,000 oligomers
in 1.28 cm2
CMSC 838T – Lecture 11
12
CMSC 838T – Lecture 11
attach biotin
incubate at 94o w/ chemicals
stain attaches to biotin
measure level of stain
CMSC 838T – Lecture 11
13
DNA Microarray – Probe Set
CMSC 838T – Lecture 11
DNA Microarray – Array Design
X
Choice of probe
0
Include genes of interest
O Examine sequence databases
0
Avoid redundancy
O No duplicate probes
0
Avoid cross hybridization
X
Can use software to help choose probes
X
Or simply buy pre-designed arrays
0
Complete genomes of yeast, Drosophila, C. elegans
0
33,000+ human genes from GenBank RefSeq on 2 microarrays
0
Expensive but labor-saving
CMSC 838T – Lecture 11
14
DNA Microarray – Affymetrix Genome Arrays
CMSC 838T – Lecture 11
DNA Microarray – Variability & Errors
X
X
Sources of (undesirable) variability
0
RNA extraction
0
Probe labeling
0
Hybridization kinetics (temperature, time, mixing…)
0
Image analysis
0
Biological variability
Sources of error
0
Image artifacts
O Dust / bubbles in array
O Spillover from bright spot to neighboring dark spots
0
Self / cross hybridization
O cDnA hybridize with each other, mismatched probes
CMSC 838T – Lecture 11
15
DNA Microarray – Image Processing
X
X
Approach
1.
Scan the array
2.
Quantify each spot
3.
Subtract background
4.
Normalize intensity (across samples)
5.
Calculate expression ratios (log scale) vs. control
6.
Export table of fluorescent intensities for each gene
Affymetrix software
0
Automatic image processing
0
Precision
O
Around 2% variation in measurements
O
Less than normal biological variability
CMSC 838T – Lecture 11
Microarray Image Processing – Expression Ratio
Calculating
expression
ratios
0
After filtering,
correction, &
normalization
0
Find genes
with large
contrasts in
expression
level
0
Provides data
for single
microarray
Ratio of signal intensity
Cy5 signal (log2)
X
Ratio < –2x
Ratio > +2x
Cy3 signal (log2)
CMSC 838T – Lecture 11
16
Biological Networks & DNA Microarrays
X
Overview
0
Biological networks
0
DNA microarray construction
0
Microarray data analysis
CMSC 838T – Lecture 11
DNA Microarray – Experiment Data
X
X
Experiment design
0
Measure level of multiple mRNA (i.e., single microarray test)
0
As one or more experimental conditions vary
O Time elapsed
O Pathogen / drug exposure
O Different tissues
0
Result is a multidimensional data
O mRNA level × tissue × drug exposure
× time × …
Types of questions
0
What genes are up / down regulated?
0
What genes are over / under-expressed in diseased state?
0
What gene regulation networks exist?
0
Need rigorous statistical analysis to determine significance!
CMSC 838T – Lecture 11
17
DNA Microarray – Data Analysis
X
Interpreting microarray data (vs. time)
0
Gene A expressed after gene B
O B positively regulates A
0
Gene A expression stops after gene B
O B negatively regulates A
0
Gene A & B expressed independently
O A & B do not regulate each other
0
Gene A & B expressed at same time
O A & B co-regulated
0
Gene A & B not expressed at same time
O A & B not co-regulated
0
Etc…
CMSC 838T – Lecture 11
DNA Microarray – Data Analysis
X
X
Higher level microarray data analysis
0
Clustering and pattern detection
0
Data mining and visualization
0
Controls and normalization of results
0
Statistical validation
0
Linkage between gene expression data and gene
sequence / function / metabolic pathways databases
0
Discovery of common sequences in co-regulated genes
0
Meta-studies using data from multiple experiments
Goals
0
Discover gene, genetic networks
0
Classification of biological processes
0
Infer biological function
CMSC 838T – Lecture 11
18
DNA Microarray – Multivariate Analysis
X
X
X
Multivariate analysis
0
Analyzing data with multiple response variables
0
Multidimensional data from multiple experimental factors
Approaches
0
Hierarchical vs. non-hierarchical
0
Divisive vs. agglomerative
0
Supervised vs. unsupervised
Clustering
0
Separating data into related groups (clusters)
0
Uses
O Find genes with similar expression patterns
O Find relationships between expression patterns
CMSC 838T – Lecture 11
DNA Microarray – Multivariate Analysis
X
Clustering approaches
0
Herarchical clustering
O Link similar genes, build tree
0
K-means testing
O Separate data into exactly K clusters (for predetermined K)
0
Self Organizing Maps (SOM)
O Genes find similar groups (using neural networks)
0
Principle component analysis
O Treat every gene as a dimension (vector)
O Separate genes using singular value decomposition (SVD)
0
Support vector machine
O Train machine based on labeled test cases
O Use machine algorithm to cluster genes
CMSC 838T – Lecture 11
19
DNA Microarray – Pairwise Distances
X
Clustering method may require calculating distance
X
Metric distances
0
Satisfies 4 conditions (for all x,y,z)
O
Positive definite
→ d(x,y) ≥ 0
O
Symmetric
→ d(x,y) = d(y,x)
O
Zero distance to self → d(x,x) = 0
O
0
X
Triangle inequality
→ d(x,y) ≤ d(x,z) + d(y,z)
Example – Euclidean distance
Semi-metric distance
0
Satisfies first 3 conditions only (not triangle inequality)
0
Example – Pearson correlation coefficient
CMSC 838T – Lecture 11
DNA Microarray – Cluster Distances
X
X
Merging clusters by minimizing…
0
Inter-cluster distances (single linkage)
0
Maximum intra-cluster distance (complete linkage)
0
Average intra-cluster distances (UPGMA)
0
Distance between center of clusters (centroid)
Choice depends on desired efficiency & robustness
0
Single linkage less robust
single
linkage
complete
linkage
UPGMA /
centroid
CMSC 838T – Lecture 11
20
DNA Microarray – Hierarchical Clustering
X
Approach
0
Bottom-up approach (agglomerative)
O Begin with all genes in individual cluster
O Repeated merge closest clusters
0
Top-down approach (divisive)
O Begin with all genes in same cluster
O Repeatedly split cluster into parts
0
Produces dendogram (unrooted tree)
Genes
Time
CMSC 838T – Lecture 11
Microarray – Iterative Clustering Methods
X
X
K-means clustering
1.
Pick K vectors
2.
Assign genes to closest of K vectors
3.
Pick new K vectors as center of each cluster
4.
Repeat until clusters are stable
Self-organizing maps (SOM)
1.
Pick K partitions
2.
User defines geometric configuration for partitions
3.
Generate random vector for each partition
4.
Randomly pick gene
5.
Adjust closest vector to be more similar to vector for gene
6.
Repeat until vectors are stable
CMSC 838T – Lecture 11
21
DNA Microarray – SOMs from GeneCluster
CMSC 838T – Lecture 11
DNA Microarray – Multivariate Analysis
X
X
Principal component analysis
0
Linear method
0
Treat every gene as a dimension (vector)
0
Separate genes using singular value decomposition (SVD)
O Finds linear combinations of vectors to separate data
O Diagonalization of covariance matrix
0
Projects complex data sets onto reduced dimensionality space
0
Easier to pick out clusters (for use with K-means, SOM)
Support vector machine
0
Supervised learning approach
0
Start with positive / negative examples (training set)
0
Train machine to recognize cluster types
0
Use machine to cluster data
CMSC 838T – Lecture 11
22
DNA Microarray – Multivariate Analysis
X
Observations
0
Distance metric very important
0
Clustering method less important
0
Choosing number / sizes of clusters
O Need to examine data
O Look for large gaps between data
0
Handling large multidimensional data sets
O Dimension reduction
O Visualization techniques
0
Handling noise in data
O Robust metrics for calculating
distance between genes / clusters
CMSC 838T – Lecture 11
DNA Microarray – Multivariate Analysis
X
Cluster / Treeview analysis software
0
Permutes gene order (by cluster) for display
CMSC 838T – Lecture 11
23
DNA Microarray – Experimental Data
X
X
Information needed each microarray spot
0
Gene ID
0
Signal, background intensity
0
Array characteristics (layout, substrate, date produced)
0
Hybridization conditions (method, buffer composition)
0
Labeling conditions (method, enzyme, fluorochrome)
0
RNA extraction conditions (method, mass of tissue)
0
Tissue treatment conditions (type, duration, intensity)
0
Etc...
Can store as entries in relational database (RDMS)
CMSC 838T – Lecture 11
DNA Microarray – Experimental Data
X
Standard for microarray experiments
0
X
MIAME: Minimum Information About a Microarray Experiment
Data required includes
0
Experimental design: whole set of hybridisation experiments
0
Array design: each array used, each element (spot) on array
0
Samples: samples used, extract preparation, and labelling
0
Hybridization: procedures and parameters
0
Measurements: images, quantization, specifications
0
Normalisation controls: types, values, specifications
CMSC 838T – Lecture 11
24
DNA Microarray – Other Uses
X
X
X
Mutation detection
0
Microarray probes representing all known alleles
0
Mismatch probes detect single-nucleotide mutations (SNPs)
Disease diagnosis
0
Accurate expression profiles of diseases (especially cancer)
0
Example
O Diffuse large B-cell vs. follicular lymphomas
O Microarray analysis of cancer tissue found significant
differences in expression level of 30 of 6817 human genes
O 91% correct diagnosis rate substantial improvement
O Microarray analysis after treatment predicts survival rates
Gene finding
0
High throughput sampling of expressed mRNA
CMSC 838T – Lecture 11
DNA Microarray – Summary
X
DNA microarray
0
Able to extensively detect / identify wide variety mRNA
0
Much data processing at all levels
O Image processing
O Data filtering (for single array)
O Data analysis (for multiple arrays)
0
Can yield much useful data
0
Collection & storage of microarray data not yet standardized
CMSC 838T – Lecture 11
25