Download Clustering

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

MicroRNA wikipedia , lookup

Transposable element wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Gene therapy wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Gene nomenclature wikipedia , lookup

RNA interference wikipedia , lookup

History of genetic engineering wikipedia , lookup

Metagenomics wikipedia , lookup

X-inactivation wikipedia , lookup

Pathogenomics wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Cancer epigenetics wikipedia , lookup

Gene desert wikipedia , lookup

Epigenetics in learning and memory wikipedia , lookup

RNA silencing wikipedia , lookup

Non-coding RNA wikipedia , lookup

Short interspersed nuclear elements (SINEs) wikipedia , lookup

Minimal genome wikipedia , lookup

Primary transcript wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Oncogenomics wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Genome evolution wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Public health genomics wikipedia , lookup

Microevolution wikipedia , lookup

Genomic imprinting wikipedia , lookup

Gene wikipedia , lookup

NEDD9 wikipedia , lookup

Ridge (biology) wikipedia , lookup

Genome (book) wikipedia , lookup

Designer baby wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Gene expression programming wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Mir-92 microRNA precursor family wikipedia , lookup

Gene expression profiling wikipedia , lookup

RNA-Seq wikipedia , lookup

Transcript
Microarrays
Dr Peter Smooker, [email protected]
Transcription Analysis
• An analysis of transcription rates can be used to
inform us about the activity of a gene- it’s
expression levels, the tissues it is expressed in,
developmental expression etc.
• Traditionally, this was done on a gene-by-gene
basis, as the sequence of that particular gene was
identified (used as a probe). This was done using
Northern Blotting (semi-quantitative).
Developments
1. As in almost every field of molecular
biology, PCR revolutionised transcript
analysis. However, still done on a geneby-gene basis.
2. Genome sequencing projects. These
generated a large number of gene probes
that can be used to analyse global
transcription.
Global transcript analysis
• Theoretically, every gene can be arrayed
and transcription levels analyses.
• Often, a subset is used e.g. immune
response genes.
Microarrays are a discovery
technique
• Understanding the genes/proteins involved in
disease
•
Bottom up approach- single genes are analysed. What
does this gene encode? What does the product do? Are
defects in the product involved in disease?
•
Top down approach. Identify all genes whose
expression is altered in a particular disease state.
Identify an expression profile.
Microarrays- basic theory
• Spot DNA sequences (genes) onto a chip
• Extract RNA from samples to be analysed
• Convert to cDNA using reverse transcriptase
• Hybridise to chip
• Quantify hybridisation
Cy3
Cy5
Discovery….
• Microarrays used to detect yeast genes
regulated in sporulation
• More than 1000 found (many previously
unknown)
• Several mutated and phenotype observedall strains were defective in sporulation
• Discover function by observing expression
Some applications
•
•
•
•
•
Identify and validate drug targets
Gene expression in pathogens
Population genetics
Disease prognosis
etc. etc.
Fabricating arrays
• The spots on the array are generally
oligonucleotides or PCR-generated cDNA.
These are arrayed using a robotic arm.
• For RNA expression analysis, glass slides
are used.
• Up to 10,000 per
slide
Oligonucleotide arrays
• Up to 300,0000
oligonucleotides per slide
Approx. 10 per gene
Scanning
• After hybridisation of the labelled RNA, the
slide is scanned.
• A laser excites each spot. The Cy3 and Cy5
dyes emit fluorescence, which is captures
by a confocal microscope. The classic array
picture is generated (for human perusal).
Data Analysis
• The fluorescence of Cy3 and Cy5 is registered for
each spot, normalised and a ratio between the two
calculated.
• Trivially, greater than 2-fold differences are seen
as significant.
• Often calculate SD and use that as a measure of
significance.
• As the genes that are often the most interesting are
expressed in low abundance, normalisation and
statistics is important.
Expression profile
clustering
Cluster genes that give the same
expression pattern over several
experiments/conditions.
Construct a matrix. Each column
is an experiment, each row a gene.
Clustering
• Clustering is the division of the elements of
a set into subsets, by virtue of a distance
metric among the elements
• From a biological perspective, this might
mean clustering all genes that have elevated
transcription in tamoxifen-resistant breast
cancer
Clustering
• Some clustering techniques include:
•
•
•
•
Hierarchical clustering
Self-organising maps
K-means clustering
SVM
• Because the elements in a cluster are assigned a
distance, phylogenetic techniques can be used to
determine relationships. Traditional phylogenetic
tools are used (e.g. Phylip)
Cancer profiles
• One area of research is the profiling of
tumours. The expression pattern of each
tumour is compared, and the clinical history
of the patient is also known. This can lead
to diagnostic predictions.
An Example
Breast Cancer Res. 2001; 3 (2): 77–80
Molecular profiling of breast cancer:
portraits but not physiognomy
James D. Brenton, 1 Samuel A. J.
R. Aparicio,2 and Carlos Caldas2
• Breast cancers may have different outcomes
despite similar histopathological
appearance.
• Want to identify key prognostic markers.
• Used 84 arrays, total over 680,000 data
points. Tested 65 samples.
• Used hierarchical clustering to reveal
groups with similar patterns of gene
expression.