* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Disease Informatics: Brush up the terms describing techniques and
Human genetic variation wikipedia , lookup
Metagenomics wikipedia , lookup
Gene expression profiling wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Tay–Sachs disease wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Genome (book) wikipedia , lookup
Designer baby wikipedia , lookup
Microevolution wikipedia , lookup
Genome-wide association study wikipedia , lookup
Fetal origins hypothesis wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Nutriepigenomics wikipedia , lookup
R. P. Deolankar Half knowledge is always dangerous Wet lab A laboratory allowing for hands-on scientific research and equipped with Appropriate plumbing Ventilation Equipment High-throughput technology The technology handling high volume of data or material Large-scale methods to purify, identify, and characterize DNA, RNA, proteins and other molecules. These methods are usually automated, allowing rapid analysis of very large numbers of samples. Microarray A tool used to sift through and analyze the information contained within a genome. A microarray consists of different nucleic acid probes that are chemically attached to a substrate, which can be a microchip, a glass slide or a microsphere-sized bead. DNA microarray A microarray of immobilized single-stranded DNA fragments of known nucleotide sequence that is used especially in the identification and sequencing of DNA samples and in the analysis of gene expression (as in a cell or tissue) Protein microarray Protein microarray is a piece of glass on which different molecules of protein have been affixed at separate locations in an ordered manner thus forming a microscopic array. Mass spectrometry An instrumental method for identifying the chemical constitution of a substance by means of the separation of gaseous ions according to their differing mass and charge -- called also mass spectroscopy Mass spectrometry: A method used to determine the masses of atoms or molecules in which an electrical charge is placed on the molecule and the resulting ions are separated by their mass to charge Tandem mass spectrometry Multiple steps of mass spectrometry selection, with some form of fragmentation occurring in between the stages Immunofluorescence and immunocytochemistry, ELISA, immunoblotting Dry lab A laboratory for making computer simulations or for data analysis especially by computers (as in bioinformatics)—called also dry laboratory Gene prioritization The results of experimental or computational analyses in the post-genomic era (e.g., those from microarrays, proteomics, ChIP-chip, genome-wide in silico searches, genetic linkages, etc.) often consist of long lists of candidate genes. There are methods that provide score to the gene and rank them. This process is known as gene prioritization. PhenoGO PhenoGO is a multiorganism database that provides phenotypic context, such as the cell type, disease, and tissue and organ to existing associations between gene products and Gene Ontology (GO) terms as specified in the Gene Ontology Annotations (GOA). BioMedLEE One existing Natural Language Processing (NLP) system, known as BioMedLEE, automatically extracts biological information consisting of bio-molecular substances and phenotypic data. MeSH Medical Subject Heading MeSH is the National Library of Medicine's controlled vocabulary thesaurus. It consists of sets of terms naming descriptors in a hierarchical structure that permits searching at various levels of specificity. PhenOS Phenotype Organizer System, PhenOS is a system under development by the Lussier research group with purpose of bridging the gap between heterogeneous biomedical terminologies. Inparanoid algorithm The protein interaction networks of two species are aligned by assigning proteins to sequence homology clusters using the Inparanoid algorithm POCUS Prioritization of candidate genes using statistics Reference: Turner FS, Clutterbuck DR, Semple CA. POCUS: mining genomic sequence annotation to predict disease genes. Genome Biol. 2003;4(11):R75. OMIM Mendelian Inheritance in Man The Online Mendelian Inheritance in Man. A catalog of human genes and genetic disorders authored and edited by Dr. Victor A. McKusick and his colleagues at Johns Hopkins and elsewhere, and provided through NCBI. The database contains information on disease phenotypes and genes, including extensive descriptions, gene names, inheritance patterns, map locations and gene polymorphisms. TOM A web-based integrated approach for identification of candidate disease genes, Transcriptomics of OMIM Reference: Rossi S, Masotti D, Nardini C, Bonora E, Romeo G, Macii E, Benini L, Volinia S. TOM: a webbased integrated approach for identification of candidate disease genes. Nucleic Acids Res. 2006 Jul 1;34 Data mining Data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information Online Predicted Human Interactions Database or OPHID Designed to be both a resource for the laboratory scientist to explore known and predicted proteinprotein interactions, and to facilitate bioinformatics initiatives exploring protein interaction networks. Single nucleotide polymorphisms (SNPs) A single nucleotide polymorphism (SNP, pronounced snip), is a DNA sequence variation occurring when a single nucleotide - A, T, C, or G - in the genome (or other shared sequence) differs between members of a species (or between paired chromosomes in an individual). Synonymous - nonsynonymous substitutions Substitutions that result in amino acid replacements are said to be nonsynonymous while substitutions that do not cause an amino acid replacement (such as a GGG to GGC change - both codons still encode glycine) are said to be synonymous substitutions. Because of the difference in their effects on the physiology of the organism, synonymous and nonsynonymous substitutions can have quite different dynamics. For example, synonymous substitutions usually occur at a much faster rate than do nonsynonymous substitutions. Hence, for coding sequence it is often desirable to separate these two. Ka/Ks values In genetics, the Ka/Ks ratio or dN/dS ratio is the ratio of the rate of non-synonymous substitutions (Ka) to the rate of synonymous substitutions (Ks), which can be used as an indication of selection on a proteincoding gene. dbSNP db (Database) of Single nucleotide polymorphism A public-domain archive for a broad collection of Single Nucleotide Polymorphisms (SNPs) and is hosted at the National Center for Biotechnology Information. Orthodisease OrthoDisease, a comprehensive database of model organism genes that are orthologous to human disease genes Orthodisease is constructed primarily using Inparanoid analysis. Inparanoid is a program that automatically detects orthologs (or groups of orthologs) from 2 species Field Biology Biology of organisms living in their natural environments Applications in Ecology and Evolutionary Biology Epidemiology Epidemiology is the study of how often disease occur in different groups of people and why Planning and evaluating strategies to prevent illness Guide to the management of patients in whom disease is already developed Reference: Epidemiology for the uninitiated by Coggon, Rose and Barker Population at risk The population at risk is the group of people, healthy or sick, who would be counted as cases if they had the disease being studied It defines the denominator for the calculation of rates of incidences and prevalence It is the number of persons potentially capable of experiencing the event or outcome of interest Floating numerator Numerator floating without its denominator Common error occurring in field investigations The error occurs due to the number of cases not relating to the “at risk” population Epidemiological conclusions (on risk) cannot be drawn from purely clinical data (on the number of sick people seen) Target population It is the population about which the conclusions are to be drawn Sometimes measurement can be made on the full target population else study samples are used Study population and study sample The group of individuals in a study In a clinical trial, the participants make up the study population Study sample is chosen from study population Aetiology The study of the factors that predispose to or precipitate the disease External agent, a susceptible host, and an environment that brings the host and agent together is a disease etiology triad Surveillance Watching over a population and recording data likely to have epidemiological significance, usually with the aim of early detection of disease. Essentially an interventionist exercise compared with monitoring, which is passive. Case Disease in populations exists as a continuum of severity rather than as an all or none phenomenon The real question in population studies is not “has the person got the disease?” but “How much of the disease has he or she got?” Diagnostic continuum is dichotomized into “cases” and “non-cases” on the basis of statistical, clinical, prognostic or operational options Hence case definition should be precise and unambiguous. Epidemiological case definitions are narrower and more rigid than clinical ones Incidence It is the rate at which new cases occur in a population during a specified period (number of new cases) / (Population at risk) * (Time during which cases were ascertained) Prevalence Point prevalence The proportion of a population that are cases at a point in time Period prevalence The proportion of a population that are cases at any time within a stated period Attributable risk and relative risk Attributable risk is the disease rate in exposed persons to that in people who are unexposed Relative risk is the ratio of the disease rate in exposed persons to that in people who are unexposed Attributable risk = rate of disease in unexposed persons * (relative risk – 1) Confounding Causing confusion about causation due to 2 or more variables associated with the disease Confounding may give rise to spurious associations when in fact there is no causal relation, or at other extreme, it may obscure the effects of a true cause Bias Bias is the deviation of inferences from the truth Selection bias is the biased selection of individuals into the study Information bias is the biased collection or biased analysis of the data Motto of the epidemiologist could well be “dirty hands but a clean mind” (manus sordidae, mens pura) Chance A measure of how likely it is that some event will occur Random, unpredictable influences on events The association between the exposure and disease is considered to be “statistically significant” if the probability that the test statistic < 0.05 Sensitivity The proportion of persons with the disease who are correctly identified by defined criteria The proportion of persons with the disease who are correctly identified by a screening test The ability of a system to detect epidemics and other changes in disease occurrence A sensitive test detects high proportion of the true cases Specificity The proportion of persons without a disease who are correctly identified by a test The number of true negative results divided by the total number of all those without the disease Randomization Randomization is used to obtain a similar allocation of individuals to each group, the groups are followed at the same time Purpose of randomization: To obtain unbiased estimates of differences among treatment responses (means or effects) and to obtain an unbiased estimate of the random error variation in the experiment Replication and Local control Replication is the repetition of an experiment in order to test the validity of its conclusion Local control is blocking or grouping to eliminate or to control the various sources of variation (error) Replication and local control are necessary to achieve a reduction in the random variation among treatment effects in the experiment Observational (non-experimental) studies Person-level unit of observation 1. Longitudinal measurements a. Cohort samples b. Case control samples 2. Cross-sectional measurements Aggregate level units of observation (ecological studies) Reference: Epidemiology Kept Simple: An Introduction to Traditional and Modern Epidemiology; by B. Burt Gerstman Personal-level vs. Aggregate-level Personal level study on smoking might collect information on each person’s smoking habits, age and disease status Aggregate level of study on smoking might collect information on each region’s per capita cigarette consumption, age distribution and disease rate Longitudinal studies Longitudinal studies are studies in which the sequence of events in individuals can be delineated over time In cohort studies the incidence of disease in exposed and non-exposed groups are compared In case-control studies people with disease (cases) and people without disease (controls) are sampled from the source population and exposure histories of cases and controls are compared Longitudinal vs. Cross sectional studies Longitudinal measurements relates exposures and diseases in individuals at various time references Cross-sectional measurements are not definitively time sequenced in individuals In cross-sectional studies the analysis of data is gathered from samples at one point in time. Since both the outcome and the variables are measured at the one time these studies are not strong at showing causeeffect relationships. Experimental studies In experimental studies, the investigator introduces or removes an exposure in order to observe its influence on a health outcome. Such allocations may be based on chance mechanism (randomized trials) or on other deliberate mechanisms built into the study’s protocol (non-randomized trials) Other disease informatics lectures: Supercourse: Epidemiology, the Internet and Global Health Lecture numbers 31981, 30331, 28921, 25381, 25371, and 34011