* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Review of Gene Expression Analysis
X-inactivation wikipedia , lookup
Metagenomics wikipedia , lookup
Genetic engineering wikipedia , lookup
Gene therapy wikipedia , lookup
Gene nomenclature wikipedia , lookup
Genomic imprinting wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Gene Disease Database wikipedia , lookup
History of genetic engineering wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Real-time polymerase chain reaction wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Bioinformatics wikipedia , lookup
Expression vector wikipedia , lookup
Neurogenomics wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Gene expression wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Gene regulatory network wikipedia , lookup
Community fingerprinting wikipedia , lookup
Gene prediction wikipedia , lookup
Designer baby wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Analysis of High-throughput Gene Expression Profiling Why to Measure Gene Expression 1. Determines which genes are induced/repressed in response to a developmental phase or to an environmental change. 2. Sets of genes whose expression rises and falls under the same condition are likely to have a related function. 3. Features such as a common regulatory motif can be detected within co-expressed genes. 4. A pattern of gene expression may be used as an indicator of abnormal cellular regulation. • A useful tool for cancer diagnosis Why to Measure Gene Expression in Large Scale? Transitional vs. Highthroughput Approaches Techniques Used to Detect Gene Expression Level • • • • • • • • • • Microarray (single or dual channel) High-throughput SAGE EST/cDNA library Northern Blots Subtractive hybridisation Differential hybridisation Representational difference analysis (RDA) DNA/RNA Fingerprinting (RAP-PCR) Differential Display (DD-PCR) aCGH: array CGH (DNA level) Basic Information of Microarray, SAGE and cDNA Library (DNA) Microarray 1. Developed around 1987. 2. Employ methods previously exploited in immunoassay context – specific binding and marking techniques. 3. Two types of probes: Format I: probe cDNA (500~5,000 bases long) is immobilized to a solid surface such as glass; widely considered as developed at Stanford University; Traditionally called DNA microarrays. Format II: an array of oligonucleotide (20~80-mer oligos) probes is synthesized either in situ(on-chip) or by conventional synthesis followed by on-chip immobilization; developed at Affymetrix, Inc. Many companies are anufacturing oligonucleotide based chips using alternative in-situ synthesis or depositioning technologies. Historically called DNA chips. Microarray • Single Channel: sub-type classification • Dual Channel: differential expression gene screening • Tissue microarray • Protein microarray • …… Array CGH • Detecting DNA copy variation via microarray approach • A hotspot in recent research works, especially in Cancer research Microarray Analysis Which genes are upregulated, down-regulated, co-regulated, not-regulated? gene discovery pattern discovery inferences about biological processes classification of biological processes SAGE • Experimental technique assigned to gain a quantitive measure of gene expression. • ~10-20 base “tags” are produced (immediately adjacent to the 3’ end of the 3’ most NlaIII restriction site). • The SAGE technique measures not the expression level of a gene, but quantifies a "tag" which represents the transcription product of a gene. SAGE Tags are isolated and concatermized. Relative expression levels can be compared between cells in different states. SAGEmap (http://cgap.nci.nih.gov) SAGE: comparing two relational libraries EST library (UniGene) Gene expression info from Unigene Library An Example of In-house EST Library Analysis The Algorithms and Challenges of High-throughput Gene Expression Analysis Seeing is believing? No, need to correct errors. SAGE: • A typical experiment requires ~30,000 gene expression comparisons where normal and a diseased cell is compared. • The results were subject to the size and reliabilities of the SAGE libraries. • Statistical measures are used to filter out candidate genes to reduce the dimensionality of the data but it is tedious and time consuming to play with these measures until a good set is found. SAGE • TPM: a simple normalization method TPM=Count*1000,000/TotalCount • Bayesian approach http://cancerres.aacrjournals.org/cgi/con tent/full/59/21/5403 • systematic • random log signal intensity Microarray: Sources of errors log RNA abundance Sources of Errors (Cont.) • Printing and/or tip problems • Labeling and dye effects (differing amounts of RNA labeled between the 2 channels) • Differences in the power of the two lasers (or other scanner problems) • Difference in DNA concentration on arrays (plate effects) • Spatial biases in ratios across the surface of the microarray due to uneven hybridization • cDNA array cannot distinguish alternatively spliced forms Errors that cannot be corrected by statistics • Competitive hybridization of different targets on the chip • Failure to distinguish different splicing forms • Misinterpretation of time course data when there are not sufficient points • Misinterpretation of relative intensity Does clustered time course really mean coexpression? Picture taken from http://genomics.stanford.edu/yeast/additional_figures_link.html Yes, you can study known system (such as cell cycle) this way; but, how about the unknown systems? Normalization by iterative linear regression fit a line (y=mx+b) to the data set set aside outliers (residuals > 2 x s.e.) repeat until r2 changes by < 0.001 then apply slope and intercept to the original dataset D Finkelstein et al. http://www.camda.duke.edu/CAMDA00/abstracts.asp Normalization (Curvilinear) ratio {log2 (Cy5 / Cy3)} G Tseng et al., NAR 2001 Loess function fit line 0 average signal {log2 (Cy3 + Cy5)/2} After Normalization …… • Differentially Expressed (DE) Gene screeing – T-test – T-statistics – SVM • Clustering – Hierarchical – SOM – K-means • Network (Pathway) analysis – – – – BioCarta, KEGG, GO databases Bayesian network learning Topology … Bioinformatics challenges 1. data management 2. utilizing data from multiple experiments 3. utilizing data from multiple groups * with different technologies * with only processed data available Bioinformatics Analysis of Integrated Analysis of Gene Expression Profiling Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression Daniel R. et al. PNAS, 2004(101), 9309-9314 T-test Q values (estimated false discovery rates) were calculated as where P is P value, n is the total number of genes, and i is the sorted rank of P value. Cont. Meta-Profiling. The purpose of meta-profiling is to address the hypothesis that a selected set of differential expression signatures shares a significant intersection of genes (a meta-signature), thus inferring a biological relatedness. 67 genes were screened by mata-analysis Integrated Cancer Gene Expression Map 7 genes were discovered by the system THANX!!