* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Slide 1
Epigenetics of diabetes Type 2 wikipedia , lookup
Metagenomics wikipedia , lookup
Human genome wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
Oncogenomics wikipedia , lookup
X-inactivation wikipedia , lookup
Non-coding DNA wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Public health genomics wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Quantitative trait locus wikipedia , lookup
History of genetic engineering wikipedia , lookup
Essential gene wikipedia , lookup
Pathogenomics wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Designer baby wikipedia , lookup
Microevolution wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Gene expression programming wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Genomic imprinting wikipedia , lookup
Genome (book) wikipedia , lookup
Genome evolution wikipedia , lookup
Mir-92 microRNA precursor family wikipedia , lookup
Ridge (biology) wikipedia , lookup
Minimal genome wikipedia , lookup
Epigenetics of human development wikipedia , lookup
CAVEAT 1 MICROARRAY EXPERIMENTS ARE EXPENSIVE AND COMPLICATED. MICROARRAY EXPERIMENTS ARE THE STARTING POINT FOR RESEARCH. MICROARRAY EXPERIMENTS CANNOT BE THE FINAL GOAL OF A PROJECT. CAVEAT 2 LISTS OF GENES DON’T GIVE BIOLOGICAL ANSWERS. STATISTICS CAN COMPLETELY DETACHED FROM BIOLOGY. THE AMOUNT OF RESULTS IS ALWAYS BIGGER THAN OUR IMAGINATION. CAVEAT 3 WITH MICROARRAYS WE OBSERVE ONLY THE TRANSCRIPTOME. WE CAN ONLY BUILD UP HYPOTHESIS ABOUT GENOME AND PROTEOME. CAREFUL AND EXTENSIVE ANNOTATION OF THE RESULTS IS NEEDED. Dai M, et al Nucleic Acids Res. 2005 Nov 10;33(20):e175. PMID: 16284200 THE PROBLEM OF ANNOTATION THE PROBLEM OF: WHO: WHAT: WHERE: WHEN: HOW: WHO ARE THEY? WHAT DO THEY DO? WHERE ARE THEY AND WHERE DO THEY WORK? WHEN DO THEY WORK? HOW DO THEY WORK? WHO WE NEED TO GET ALL POSSIBLE INFORMATION ON THE GENES WE GET FROM MICROARRAYS. AVAILABLE TOOLS: Gene (EX-LocusLink), OMIM, PubMed WHAT THE FUNCTION OF MANY GENES IS ALREADY KNOWN. AVAILABLE TOOLS: KEGG, GeneOntology (Biological Process, Molecular Function), OMIM, PubMed. WHERE LOCATE THE GENES ON THE GENOME IS VERY IMPORTANT IN MANY SITUATIONS (--- a portion of a chromosome is strongly affected under a certain clinical condition) (--- genes closed to each other can be regulated with the same mechanisms). AVAILABLE TOOLS: NCBI-Genome, EnsEMBL. WHERE THE PRODUCTS OF THE GENES OPERATE INTO THE CELL? AVAILABLE TOOLS: KEGG, GeneOntology (Cellular Component), PubMed. WHEN IN WHICH CONDITIONS THE EXPRESSION OF A GIVEN GENE CHANGES? AVAILABLE TOOLS: PubMed, GEO HOW HOW DO GENES WORK? AVAILABLE TOOLS: PubMed, OMIM, Gene, GeneOntology THE SOCIAL LIFE OF THE GENES DIFFERENT SOCIAL DIMENSIONS: DNA LEVEL (GENOMIC POSITION) RNA LEVEL (RNA PROCESSING) PROTEIN LEVEL (INTERACTION OF PROTEINS) Diverse Biological Roles Consider a population of genes representing a diverse set of biological roles or themes shown below as different colors. Many algorithms can be applied to expression data to partition genes based on expression profiles over multiple conditions. Many of these techniques work solely on expression data and disregard biological information. Consider a particular cluster… -What are the some of the predominant biological themes represented in the cluster and how should significance be assigned to a discovered biological theme? Example: Population Size: 40 genes Cluster size: 12 genes 10 genes, shown in green, have a common biological theme and 8 occur within the cluster. Consider the Outcome The frequency of the theme in the population is 10/40 = 25% 10 40 12 8 The frequency of the theme within the cluster is 8/12 = 67% AND * 80% of the genes related to the theme in the population ended up within the relatively small cluster. Contingency Matrix A 2x2 contingency matrix is typically used to capture the relationships between cluster membership and membership to a biological theme. Cluster in out in 8 2 out 4 26 Theme Contingency Matrix Assigning Significance to the Findings The Fisher’s Exact Test permits us to determine if there are non-random associations between the two variables, expression based cluster membership and membership to a particular biological theme. Cluster in out in 8 2 out 4 26 Theme ( 2x2 contingency matrix ) p .0002 Hypergeometric Distribution a b c d a+c b+d a+b The probability of any particular matrix occurring by random c+d selection, given no association between the two variables, is given by the hypergeometric rule. (a c)! (b d )! a!c! b!d! (a b)!(c d )!(a c)!(b d )! n! n!a!b!c!d! (a b)!(c d )! Probability Computation For our matrix, 8 2 4 26 , we are not only interested in getting the probability of getting exactly 8 annotation hits in the cluster but rather the probability of having 8 or more hits. In this case the probabilities of each of the possible matrices is summed. 8 2 9 1 10 0 4 26 3 27 2 28 .0002207 + 7.27x10-6 + 7.79x10-8 .000228