Download DNA Microarray - School of Biotechnology

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene desert wikipedia , lookup

Genomic library wikipedia , lookup

Epigenomics wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Quantitative trait locus wikipedia , lookup

RNA interference wikipedia , lookup

X-inactivation wikipedia , lookup

NEDD9 wikipedia , lookup

Epitranscriptome wikipedia , lookup

Epigenetics in learning and memory wikipedia , lookup

Transposable element wikipedia , lookup

Cancer epigenetics wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Public health genomics wikipedia , lookup

History of RNA biology wikipedia , lookup

Short interspersed nuclear elements (SINEs) wikipedia , lookup

Oncogenomics wikipedia , lookup

Human genome wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Genomics wikipedia , lookup

RNA silencing wikipedia , lookup

Non-coding RNA wikipedia , lookup

Pathogenomics wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Non-coding DNA wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Genome (book) wikipedia , lookup

Primary transcript wikipedia , lookup

Gene expression programming wikipedia , lookup

Minimal genome wikipedia , lookup

Genomic imprinting wikipedia , lookup

Metagenomics wikipedia , lookup

History of genetic engineering wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Ridge (biology) wikipedia , lookup

Genome evolution wikipedia , lookup

Microevolution wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Designer baby wikipedia , lookup

Gene wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Mir-92 microRNA precursor family wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Gene expression profiling wikipedia , lookup

RNA-Seq wikipedia , lookup

Transcript
Introduction to DNA
Microarray
Neha Jain
Lecturer
School of Biotechnology
Devi Ahilya University, Indore
Genes can be regulated at many levels
.
Usually, when we speak of gene regulation, we are referring to
transcriptional regulation. The complete set of all genes being
.
transcribed
are referred to as the “transcriptome.”
•
•
•
•
Transcription
the “transcriptome”
Post transcription (RNA stability)
Post transcription (translational control)
Post translation (not considered gene regulation)
DNA
RNA
TRANSCRIPTION
PROTEIN
TRANSLATION
• In the last dozen years, it has become possible to look
at the entire transcriptome in a single experiment!
• High Throughput :- Simultaneous analysis of all genes
in a genome.
• The high throughput analysis of all expressed genes is
termed as Transcriptome analysis. The expression
analysis of the full set of RNA molecules produced by a
cell under a given set of conditions.
• Transcriptome analysis facilitates our understand-ing of
how sets of genes work together to form metabolic,
regulatory, and signalling pathways within the cell.
Genomic analysis of gene
expression
• Methods capable of giving a “snapshot” of RNA
expression of all genes
• Can be used as diagnostic profile
– Example: cancer diagnosis
• Can show how RNA levels change during
development, after exposure to stimulus, during cell
cycle, etc.
• Provides large amounts of data
• Can help us start to understand how whole systems
function
Types of Gene Expression Analysis
• While there are a number of variations, there are
essentially two basic ways of doing expressed gene
analysis—using sequencing-based methods and
microarrays.
• These have largely replaced older methods such as
subtractive hybridization and differential display.
• Sequencing-based methods are very powerful but have
typically been prohibitively expensive.
• However, with recent advances in low-cost, highthroughput next generation sequencing, these
methods—referred to as “RNA-seq”—are becoming
more common and may soon be dominant.
RNA-seq
• Although details of the methods vary, the concept
behind RNA-seq is simple:
• Isolate all mRNA
• Convert to cDNA using reverse transcriptase
• Sequence the cDNA
• Map sequences to the genome
• The more times a given sequence is detected, the more
abundantly transcribed it is.
• If enough sequences are generated, a comprehensive
and quantitative view of the entire transcriptome of an
organism or tissue can be obtained.
.
DNA microarrays
DNA microarrays
• Microarrays may eventually be eclipsed by sequencebased methods, but meanwhile have become
incredibly popular since their inception in 1995
(Schena et al. (1995) Science 270:467-70).
• DNA microarrays rely on the hybridization properties of
nucleic acids to monitor DNA or RNA abundance on a
genomic scale in different types of cells
• In other words, the principle behind microarray is the
ability of complementary strands of DNA (or DNA and
RNA) to hybridize to one another in solution with high
specificity.
Nucleic acid hybridization
Introduction
• A microarray (or gene chip) is a slide attached with a high-density
array of immobilized DNA oligomers (sometimes cDNAs)
representing the entire genome of the species under study.
• Each DNA is attached to solid support
– Glass, plastic, or nylon
• Oligomer is spotted on the slide and serves as a probe for binding
to a unique, complementary cDNA.
• The cDNA population, labelled with fluorescent dyes or
radioisotopes, is allowed to hybridize with the oligo probes on the
chip.
• The amount of fluorescent or radiolabels at each spot position
reflects the amount of corresponding mRNA in the cell.
• Sets of genes involved in the same regulatory or metabolic
pathways can potentially be identified.
The Process
Building the chip:
MASSIVE PCR
PCR PURIFICATION
AND PREPARATION
PREPARING
SLIDES
PRINTING
DNA/RNA preparation:
Hybing the chip:
POST PROCESSING
CELL CULTURE
AND HARVEST
ARRAY HYBRIDIZATION
RNA ISOLATION
cDNA PRODUCTION
PROBE LABELING
DATA ANALYSIS
• For each spot on the microarray, red and green fluorescence
signals are recorded.
• The two fluorescence images from the scanner are then
overlaid to create a composite image, which indicates the
relative expression levels of each gene.
• Thus, the measurement from the composite image reflects the
ratio of the two color intensities.
• If a gene is expressed at a higher level in the experimental
condition (red) than in the control (green), the spot displays a
reddish color. I
•f the gene is expressed at a lower level than the control, the
spot appears greenish.
• Unchanged gene expression, having equal amount of green
and red fluorescence, results in a yellow spot.
• The colored image is stored as a computer file (in TIFF format)
for further processing.
Microarray life cyle
Biological
Question
Data Analysis
Sample
Preparation
Microarray
Detection
Taken from Schena & Davis
Microarray
Reaction
Steps of Microarray Experiment
A typical DNA microarray experiment involves a
multistep procedure:
• Fabrication of microarrays by fixing properly designed
oligonucleotides representing specific genes;
• Hybridization of cDNA populations onto the
microarray; Scanning hybridization signals and image
analysis;
• Transformation and normalization of data;
• Analyzing data to identify differentially expressed
genes as well as sets of genes that are co regulated
Some Important Points about Microarray
 DNA microarrays are generated by fixing oligonucleotides onto a solid
support such as a glass slide using a robotic device
 The probes should be specific enough to minimize cross-hybridization
with non-specific genes.
 This requires BLAST searches against genome databases to find
sequence regions with least sequence similarity with non target
genes.
 The probes should be sensitive and devoid of low-complexity regions
(a string of identical Nucleotides)
 The oligonucleotide sequences should not form stable internal
secondary structures.
 Number of programs have been developed for designing probe
sequences for microarrays spotting.
OligoWiz
OligoArray
Image Processing
 Image processing is to locate and quantitate hybridization spots
and to separate true hybridization signals from background noise.
 The background noise and artifacts produced in this step include
nonspecific hybridization, unevenness of the slide surface, and the
presence of contaminants such as dust on the surface of the slide.
 Computer programs are used to correctly locate the boundaries
of the spots and measure the intensities of the spot images after
subtracting the background pixels.
 After subtracting the background noise, the array signals are
converted into numbers and reported as ratios between Cy5 and
Cy3 for each spot.
ArrayDB(http://genome.nhgri.nih.gov/arraydb/)
ScanAlyze(http://rana.lbl.gov/EisenSoftware.htm)
TIGR Spotfinder (http://www.tigr.org/softlab/) are
Windows program for microarray image processing
using the TIFF image format.
Data Transformation and Normalization
Following image processing, the digitized gene expression
data need to be further processed before differentially
expressed genes can be identified.
 This processing is referred to as data normalization and is
designed to correct bias owing to variations in microarray
data collection rather than intrinsic biological differences.
When the raw fluorescence intensity Cy5 is plotted against
Cy3, most of the data are clustered near the bottom left of
the plot, showing a non-normal distribution of the raw data.
one way to improve the data discrimination is to transform
Raw Cy5 and Cy3 values by taking the logarithm to the base of 2.
• This has the major advantage that it treats differential up-regulation and downregulation equally, and also has a continuous mapping space.
• For example, if the expression ratio is 1, then log2(1) equals 0 represents no
change in expression. If the expression ratio is 4, then log2 (4) equals +2 and for
expression ratio of log2(1/4) equals -2.
• Thus, in this transformation the mapping space is continuous and up-regulation
and down-regulation are comparable.
 Normalization :-When one compares the expression levels of genes that
should not change in the two conditions (say, housekeeping genes), what one
quite often finds is that an average expression ratio of such genes deviates from
1. This may be due to various reasons, for example, variation caused by
differential labelling efficiency of the two fluorescent dyes or different amounts
of starting mRNA material in the two samples. Thus, in the case of microarray
experiments, as for any large-scale experiments, there are many sources of
systematic variation that affect measurements of gene expression levels.
•Normalization is a term that is used to describe the process of eliminating such
variations to allow appropriate comparison of data obtained from the two
samples.
A method to normalize the data is by using
Lowess (locally weighted scatter plot
smoother)regression method.
The following two software programs that
are freely available are specialized in image
analysis and data normalization.
Arrayplot
SNOMAD
Statistical Analysis to Identify Differentially Expressed Genes
• One of the reasons to carry out a microarray experiment is to monitor the
expression level of genes at a genome scale. The processed data, after the
normalization procedure, can then be represented in the form of a matrix, often
called gene expression matrix Each row in the matrix corresponds to a particular
gene and each column could either correspond to an experimental condition or a
specific time point at which expression of the genes has been measured. Once we
have obtained the gene expression matrix additional levels of annotation can be
added either to the gene or to the sample. For example, the function of the genes
can be provided, or the additional details on the biology of the sample may be
provided, such as ʻdisease stateʼor ʻnormal stateʼ.
• Depending on whether the annotation is used or not, analysis of gene
expression data can be classified into two different types,
• Supervised learning, we do use the annotation of either the gene or the
sample, and create clusters of genes or samples in order to identify patterns that
are characteristic for the cluster.
•Unsupervised learning, the expression data is analysed to identify patterns that
can group genes or samples into clusters without the use of any form of
annotation. For example, genes with similar expression profi les can be clustered
together without the use of any annotation.
Statistical Analysis to Identify Differentially Expressed Genes
• To separate genes that are differentially expressed, a
normalization cut off of twofold as a criterion
•. But a data point above or below the cut off line could simply be
there by chance or because of error.
• The only way to ensure that a gene that appears to be differentially
expressed is truly differentially expressed is to perform multiple
replicate experiments and to perform statistical testing.
•The repeat experiments provide replicate data points that offer
information about the variability of the expression data at a particular
condition.
•The main hindrance to obtaining multiple replicate datasets is often
the cost: microarray experiments are extremely expensive for regular
research laboratories.
•To do the statistical analysis two test are used : ANOVA (analysis of
variance) and T-Test
•Softwares
MA-ANOVA
Cyber-T
Microarray Data Clustering
• One of the goals of microarray data analysis is to
cluster genes or samples with similar expression
profiles together, to make meaningful biological
inference about the set of genes or samples.
•The similar expression patterns are often a result of
the fact that the genes involved are in the same
metabolic pathway and have similar functions.
•The genetic basis of the co regulation could be the
result of common promoters and regulatory regions.
Clustering is one of the unsupervised approaches to classify data into
groups of genes or samples with similar patterns that are characteristic
to the group.
Clustering methods can be
Hierarchical (grouping objects into clusters and specifying relationships
among objects in a cluster, resembling a phylogenetic tree)This can be
of 2 types
 Agglomerative (starting with the assumption that each object is a
cluster and grouping similar objects into bigger clusters)
 Divisive (starting from grouping all objects into one cluster and
subsequently breaking the big cluster into smaller clusters with similar
properties)
Non-hierarchical (grouping into clusters without specifying
relationships between objects in a cluster).Non-hierarchical clustering
requires predetermination of the number of clusters. Non-hierarchical
clustering then groups existing objects into these predefined
clusters rather than organizing them into a hierarchical structure.
Experimental Design for Microarrays
There are a number of important experimental
design considerations for a microarray
experiment:
• Technical vs biological replicates
• Amplification of RNA
• Dye swaps
Experimental Design for Microarrays
Technical vs biological replicates
• Technical replicates are repeat hybridizations
using the same RNA isolate
• Biological replicates use RNA isolated from
separate experiments/experimental organisms
Although technical replicates can be useful for
reducing variation due to hybridization, imaging, etc.,
biological replicates are necessary for a properly
controlled experiment
Experimental Design for Microarrays
Amplification of RNA
• Linear amplification methods can be used to
increase the amount of RNA so that microarray
experiments can be performed using very small
numbers of cells. It’s not clear to what degree this
affects results, especially with respect to rare
transcripts, but seems to be generally OK if done
correctly
Experimental Design for Microarrays
Dye swaps
When using 2-color arrays, it’s important to hybridize
replicates using a dye-swap strategy in which the
colors (labels) are reversed between the two
replicates. This is because there can be biases in
hybridization intensity due to which dye is used (even
when the sequence is the same). Normally 2 dyes
Cy5(Red Florescence for infected/experimental
samples) and Cy3 (Green florescence for Samples)
S1
S1
S2
S2
Thanks