* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download ppt - people.vcu.edu
Gene nomenclature wikipedia , lookup
Epigenetics in learning and memory wikipedia , lookup
History of genetic engineering wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
X-inactivation wikipedia , lookup
Public health genomics wikipedia , lookup
Gene desert wikipedia , lookup
Metagenomics wikipedia , lookup
Minimal genome wikipedia , lookup
Oncogenomics wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Genome evolution wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Microevolution wikipedia , lookup
Ridge (biology) wikipedia , lookup
Genomic imprinting wikipedia , lookup
Genome (book) wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Designer baby wikipedia , lookup
Mir-92 microRNA precursor family wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Gene expression programming wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Scenario 6 Distinguishing different types of leukemia to target treatment Acute Myeloid Leukemia (AML) vs Acute Lymphoblastic Leukemia (ALL) Golub, T. R., et al. 1999. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531-7. AML ALL AML: ALL: AML: ALL: 1 2 3 4 5 AML: ALL: AML ALL 1 1 2 2 3 3 4 4 5 5 AML: ALL: AML ALL 1 1 2 2 3 3 4 4 5 5 AML: ALL: AML ALL 1 1 2 2 3 3 4 4 5 5 AML: ALL: AML ALL 1 1 2 2 3 3 4 4 5 5 AML: ALL: AML ALL 1 1 2 2 3 3 4 4 5 5 AML: ALL: AML ALL 1 1 2 2 3 3 4 4 5 5 AML + ALL 1 1 2 2 3 3 4 4 5 5 AML + 1 2 3 4 5 ALL Spotted Microarray Process CTRL TEST Microarray Platforms • Spotted arrays • Inserts from cDNA libraries, PCR products, or oligonucleotides • Probed with labeled RNA or cDNA from 2 samples • Affymetrix GeneChip arrays • 25mer oligonucleotides synthesized on a glass wafer • Probed with labeled RNA or cDNA from a single sample Affymetrix Synthesis of Ordered Oligonucleotide Arrays Light (deprotection) Mask OOOOO TTOOO HO HO O O O T– Substrate Light (deprotection) Mask CATAT AGCTG TTCCG TTCCO TTOOO C– Substrate REPEAT ® Affymetrix GeneChip Probe Array ® Affymetrix GeneChip Probe Arrays Hybridized Probe Cell GeneChip Probe Array Single stranded, fluorescently labeled DNA target * * * * * * Oligonucleotide probe 24µm 1.28cm Each probe cell or feature contains millions of copies of a specific oligonucleotide probe Over 250,000 different probes complementary to genetic information of interest BGT108_DukeUniv Image of Hybridized Probe Array Affymetrix Probe Tiling Strategy The presence or absence of each Gene is determined by a panel of 20 perfect match and 20 mismatch (control) oligonucleotides (25-mer) Sample output: Data Analysis Sample 1 Sample 2 Sample 1 (Light units) Data Analysis Sample 1 Sample 2 Sample 2 (Light units) Sample 1 (Light units) Data Analysis Sample 1 Sample 2 Sample 2 (Light units) Sample 1 (Light units) Data Analysis Sample 1 Sample 2 Sample 2 (Light units) Golub, T. R., et al. 1999. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531-7. http://www-genome.wi.mit.edu/cgi-bin/cancer/datasets.cgi Near the bottom of the page: “Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.” Paper, data tables, supplemental figures Golub, T. R., et al. 1999. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531-7. • Measured the expression of 6817 human genes using Affymetrix arrays. • Initially examined 27 ALL and 11 AML samples. Each ALL or AML specimen was used to prepare labeled RNA that was apparently hybridized with a single chip. • “Samples were subjected to a priori quality control standards regarding the amount of labeled RNA and the quality of the scanned microarray image.” Eight of 80 leukemia samples were discarded. Golub, T. R., et al. 1999. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531-7. • The signal strength from each chip was apparently normalized to that of the other chips by multiplying every value in the chip by the multiplication factor listed in the “Rescaling factors” table on the web. Experimental Goals 1. “Class Prediction”-- Determine whether an unknown sample belongs to a predefined class. – – Find a set of genes whose expression is high in AML and low in ALL or vice versa. Measure the expression of these genes in unknown samples and use the measurements as a class predictor. P(g,c) = Correlation Coefficient measuring the degree to which expression of a given gene in the set of samples correlates with assignment to either class (AML or ALL) = m1(g) – m2(g) or s1(g) + s2(g) m2(g) – m1(g) s1(g) + s2(g) Figure 2. Neighborhood analysis: ALL vs AML. For the 38 leukemia samples in the initial dataset, the plot shows the number of genes within various 'neighborhoods' of the the ALL/AML class distinction together with curves showing the 5% and 1% significance levels for the number of genes within corresponding neighborhoods of the randomly permuted class distinctions (see notes 16,17 in the paper). Genes more highly expressed in ALL compared to AML are shown in the left panel; those more highly expressed in AML compared to ALL are shown in right panel. Note the large number of genes highly correlated with the class distinction. In the left panel (higher in ALL), the number of genes with correlation P(g,c) > 0.30 was 709 for the AML-ALL distinction, but had a median of 173 genes for random class distinctions. Note that P(g,c) = 0.30 is the point where the observed data intersects the 1% significance level, meaning that 1% of random neighborhoods contain as many points as the observed neighborhood round the AMLALL distinction. Similarly, in the right panel (higher in AML), 711 genes with P(g,c) > 0.28 were observed, whereas a median of 136 genes is expected for random class distinctions. Votes are cast in favor of either AML or ALL for each informative gene. The magnitude of each vote is given by: wivi where vi = xi – mAML + mALL (xi = exp. of genei ) 2 And wi= a weighting factor that reflects how well the gene is correlated with the class distinction. The class with the most votes wins (either ALL or AML). Prediction Strength (PS) = Vwin – Vlose and must be >0.3. Vwin + Vlose Figure 3b. Genes distinguishing ALL from AML. The 50 genes most highly correlated with the ALL/AML class distinction are shown. Each row corresponds to a gene, with the columns corresponding to expression levels in different samples. Expression levels for each gene are normalized across the samples such that the mean is 0 and the standard deviation is 1. Expression levels greater than the mean are shaded in red, and those below the mean are shaded in blue. The scale indicates standard deviations above or below the mean. The top panel shows genes highly expressed in ALL, the bottom panel shows genes more highly expressed in AML. Note that while these genes as a group appear correlated with class, no single gene is uniformly expressed across the class, illustrating the value of a multi-gene prediction method. Supplementary fig. 2. Expression levels of predictive genes in independent dataset. The expression levels of the 50 genes most highly correlated with the ALL-AML distinction in the initial dataset were determined in the independent dataset. Each row corresponds to a gene, with the columns corresponding to expression levels in different samples. The expression level of each gene in the independent dataset is shown relative to the mean of expression levels for that gene in the initial dataset. Expression levels greater than the mean are shaded in red, and those below the mean are shaded in blue. The scale indicates standard deviations above or below the mean. The top panel shows genes highly expressed in ALL, the bottom panel shows genes more highly expressed in AML. Experimental Goals 1. 2. “Class Discovery”-- Determine whether a group of samples can be divided into two or more classes based only on measurement of their gene expression. – – Employs “self-organizing maps.” Must address two requirements: construction of algorithms to cluster the samples by gene expression and determining whether the class assignments produced by the algorithm are meaningful