Download Disease Informatics: Brush up the terms describing techniques and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Human genetic variation wikipedia , lookup

Metagenomics wikipedia , lookup

Gene expression profiling wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Tay–Sachs disease wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genome (book) wikipedia , lookup

RNA-Seq wikipedia , lookup

Designer baby wikipedia , lookup

Microevolution wikipedia , lookup

Genome-wide association study wikipedia , lookup

Fetal origins hypothesis wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Public health genomics wikipedia , lookup

Transcript
R. P. Deolankar
Half knowledge is always dangerous
Wet lab
A laboratory allowing for hands-on scientific research
and equipped with

Appropriate plumbing

Ventilation

Equipment
High-throughput technology
 The technology handling high volume of data or
material
 Large-scale methods to purify, identify, and
characterize DNA, RNA, proteins and other molecules.
These methods are usually automated, allowing rapid
analysis of very large numbers of samples.
Microarray
 A tool used to sift through and analyze the
information contained within a genome. A microarray
consists of different nucleic acid probes that are
chemically attached to a substrate, which can be a
microchip, a glass slide or a microsphere-sized bead.
DNA microarray
 A microarray of immobilized single-stranded DNA
fragments of known nucleotide sequence that is used
especially in the identification and sequencing of DNA
samples and in the analysis of gene expression (as in a
cell or tissue)
Protein microarray
 Protein microarray is a piece of glass on which
different molecules of protein have been affixed at
separate locations in an ordered manner thus forming
a microscopic array.
Mass spectrometry
 An instrumental method for identifying the chemical
constitution of a substance by means of the separation
of gaseous ions according to their differing mass and
charge -- called also mass spectroscopy
 Mass spectrometry: A method used to determine the
masses of atoms or molecules in which an electrical
charge is placed on the molecule and the resulting ions
are separated by their mass to charge
Tandem mass spectrometry
 Multiple steps of mass spectrometry selection, with
some form of fragmentation occurring in between the
stages
 Immunofluorescence and immunocytochemistry,
ELISA, immunoblotting
Dry lab
 A laboratory for making computer simulations or for
data analysis especially by computers (as in
bioinformatics)—called also dry laboratory
Gene prioritization
 The results of experimental or computational analyses
in the post-genomic era (e.g., those from microarrays,
proteomics, ChIP-chip, genome-wide in silico
searches, genetic linkages, etc.) often consist of long
lists of candidate genes. There are methods that
provide score to the gene and rank them. This process
is known as gene prioritization.
PhenoGO
 PhenoGO is a multiorganism database that provides
phenotypic context, such as the cell type, disease, and
tissue and organ to existing associations between gene
products and Gene Ontology (GO) terms as specified
in the Gene Ontology Annotations (GOA).
BioMedLEE
 One existing Natural Language Processing (NLP)
system, known as BioMedLEE, automatically extracts
biological information consisting of bio-molecular
substances and phenotypic data.
MeSH
 Medical Subject Heading
 MeSH is the National Library of Medicine's controlled
vocabulary thesaurus. It consists of sets of terms
naming descriptors in a hierarchical structure that
permits searching at various levels of specificity.
PhenOS
 Phenotype Organizer System, PhenOS is a system
under development by the Lussier research group with
purpose of bridging the gap between heterogeneous
biomedical terminologies.
Inparanoid algorithm
 The protein interaction networks of two species are
aligned by assigning proteins to sequence homology
clusters using the Inparanoid algorithm
POCUS
 Prioritization of candidate genes using statistics
 Reference: Turner FS, Clutterbuck DR, Semple CA.
POCUS: mining genomic sequence annotation to
predict disease genes. Genome Biol. 2003;4(11):R75.
OMIM
 Mendelian Inheritance in Man
 The Online Mendelian Inheritance in Man. A catalog
of human genes and genetic disorders authored and
edited by Dr. Victor A. McKusick and his colleagues at
Johns Hopkins and elsewhere, and provided through
NCBI. The database contains information on disease
phenotypes and genes, including extensive
descriptions, gene names, inheritance patterns, map
locations and gene polymorphisms.
TOM
 A web-based integrated approach for identification of
candidate disease genes, Transcriptomics of OMIM
 Reference: Rossi S, Masotti D, Nardini C, Bonora E,
Romeo G, Macii E, Benini L, Volinia S. TOM: a webbased integrated approach for identification of
candidate disease genes. Nucleic Acids Res. 2006 Jul
1;34
Data mining
 Data mining (sometimes called data or knowledge
discovery) is the process of analyzing data from
different perspectives and summarizing it into useful
information
Online Predicted Human
Interactions Database or OPHID
 Designed to be both a resource for the laboratory
scientist to explore known and predicted proteinprotein interactions, and to facilitate bioinformatics
initiatives exploring protein interaction networks.
Single nucleotide polymorphisms
(SNPs)
 A single nucleotide polymorphism (SNP, pronounced
snip), is a DNA sequence variation occurring when a
single nucleotide - A, T, C, or G - in the genome (or
other shared sequence) differs between members of a
species (or between paired chromosomes in an
individual).
Synonymous - nonsynonymous
substitutions
 Substitutions that result in amino acid replacements
are said to be nonsynonymous while substitutions that
do not cause an amino acid replacement (such as a
GGG to GGC change - both codons still encode
glycine) are said to be synonymous substitutions.
Because of the difference in their effects on the
physiology of the organism, synonymous and
nonsynonymous substitutions can have quite different
dynamics. For example, synonymous substitutions
usually occur at a much faster rate than do
nonsynonymous substitutions. Hence, for coding
sequence it is often desirable to separate these two.
Ka/Ks values
 In genetics, the Ka/Ks ratio or dN/dS ratio is the ratio
of the rate of non-synonymous substitutions (Ka) to
the rate of synonymous substitutions (Ks), which can
be used as an indication of selection on a proteincoding gene.
dbSNP
 db (Database) of Single nucleotide polymorphism
 A public-domain archive for a broad collection of
Single Nucleotide Polymorphisms (SNPs) and is
hosted at the National Center for Biotechnology
Information.
Orthodisease
 OrthoDisease, a comprehensive database of model
organism genes that are orthologous to human disease
genes
 Orthodisease is constructed primarily using
Inparanoid analysis. Inparanoid is a program that
automatically detects orthologs (or groups of
orthologs) from 2 species
Field Biology
 Biology of organisms living in their natural
environments
 Applications in Ecology and Evolutionary Biology
Epidemiology
 Epidemiology is the study of how often disease occur
in different groups of people and why
 Planning and evaluating strategies to prevent illness
 Guide to the management of patients in whom disease
is already developed
 Reference: Epidemiology for the uninitiated by
Coggon, Rose and Barker
Population at risk
 The population at risk is the group of people, healthy
or sick, who would be counted as cases if they had the
disease being studied
 It defines the denominator for the calculation of rates
of incidences and prevalence
 It is the number of persons potentially capable of
experiencing the event or outcome of interest
Floating numerator
 Numerator floating without its denominator
 Common error occurring in field investigations
 The error occurs due to the number of cases not
relating to the “at risk” population
 Epidemiological conclusions (on risk) cannot be
drawn from purely clinical data (on the number of sick
people seen)
Target population
 It is the population about which the conclusions are to
be drawn
 Sometimes measurement can be made on the full
target population else study samples are used
Study population and study sample
 The group of individuals in a study
 In a clinical trial, the participants make up the study
population
 Study sample is chosen from study population
Aetiology
 The study of the factors that predispose to or
precipitate the disease
 External agent, a susceptible host, and an environment
that brings the host and agent together is a disease
etiology triad
Surveillance
 Watching over a population and recording data likely
to have epidemiological significance, usually with the
aim of early detection of disease. Essentially an
interventionist exercise compared with monitoring,
which is passive.
Case
 Disease in populations exists as a continuum of




severity rather than as an all or none phenomenon
The real question in population studies is not “has the
person got the disease?” but “How much of the disease
has he or she got?”
Diagnostic continuum is dichotomized into “cases”
and “non-cases” on the basis of statistical, clinical,
prognostic or operational options
Hence case definition should be precise and
unambiguous.
Epidemiological case definitions are narrower and
more rigid than clinical ones
Incidence
 It is the rate at which new cases occur in a population
during a specified period
(number of new cases) / (Population at risk) * (Time
during which cases were ascertained)
Prevalence
Point prevalence
 The proportion of a population that are cases at a point
in time
Period prevalence
 The proportion of a population that are cases at any
time within a stated period
Attributable risk and relative risk
 Attributable risk is the disease rate in exposed persons
to that in people who are unexposed
 Relative risk is the ratio of the disease rate in exposed
persons to that in people who are unexposed
 Attributable risk = rate of disease in unexposed
persons * (relative risk – 1)
Confounding
 Causing confusion about causation due to 2 or more
variables associated with the disease
 Confounding may give rise to spurious associations
when in fact there is no causal relation, or at other
extreme, it may obscure the effects of a true cause
Bias
 Bias is the deviation of inferences from the truth
 Selection bias is the biased selection of individuals
into the study
 Information bias is the biased collection or biased
analysis of the data
 Motto of the epidemiologist could well be “dirty hands
but a clean mind” (manus sordidae, mens pura)
Chance
 A measure of how likely it is that some event will occur
 Random, unpredictable influences on events
 The association between the exposure and disease is
considered to be “statistically significant” if the
probability that the test statistic < 0.05
Sensitivity
 The proportion of persons with the disease who are
correctly identified by defined criteria
 The proportion of persons with the disease who are
correctly identified by a screening test
 The ability of a system to detect epidemics and other
changes in disease occurrence
 A sensitive test detects high proportion of the true
cases
Specificity
 The proportion of persons without a disease who are
correctly identified by a test
 The number of true negative results divided by the
total number of all those without the disease
Randomization
 Randomization is used to obtain a similar allocation of
individuals to each group, the groups are followed at
the same time
 Purpose of randomization: To obtain unbiased
estimates of differences among treatment responses
(means or effects) and to obtain an unbiased estimate
of the random error variation in the experiment
Replication and Local control
 Replication is the repetition of an experiment in order
to test the validity of its conclusion
 Local control is blocking or grouping to eliminate or to
control the various sources of variation (error)
 Replication and local control are necessary to achieve a
reduction in the random variation among treatment
effects in the experiment
Observational (non-experimental)
studies
 Person-level unit of observation
1.
Longitudinal measurements
a.
Cohort samples
b.
Case control samples
2. Cross-sectional measurements
 Aggregate level units of observation (ecological
studies)
 Reference: Epidemiology Kept Simple: An
Introduction to Traditional and Modern
Epidemiology; by B. Burt Gerstman
Personal-level vs. Aggregate-level
 Personal level study on smoking might collect
information on each person’s smoking habits, age and
disease status
 Aggregate level of study on smoking might collect
information on each region’s per capita cigarette
consumption, age distribution and disease rate
Longitudinal studies
 Longitudinal studies are studies in which the sequence
of events in individuals can be delineated over time
 In cohort studies the incidence of disease in exposed
and non-exposed groups are compared
 In case-control studies people with disease (cases) and
people without disease (controls) are sampled from
the source population and exposure histories of cases
and controls are compared
Longitudinal vs. Cross sectional
studies
 Longitudinal measurements relates exposures and
diseases in individuals at various time references
 Cross-sectional measurements are not definitively
time sequenced in individuals
 In cross-sectional studies the analysis of data is
gathered from samples at one point in time. Since both
the outcome and the variables are measured at the one
time these studies are not strong at showing causeeffect relationships.
Experimental studies
 In experimental studies, the investigator introduces or
removes an exposure in order to observe its influence
on a health outcome. Such allocations may be based
on chance mechanism (randomized trials) or on other
deliberate mechanisms built into the study’s protocol
(non-randomized trials)
Other disease informatics lectures:
Supercourse: Epidemiology, the Internet and Global Health
Lecture numbers 31981, 30331, 28921, 25381, 25371, and 34011