Download EPIDEMIOLOGY GLOSSARY Absolute effect measures

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
EPIDEMIOLOGY GLOSSARY
A
Absolute effect measures: The difference between incidence rates, incidence proportions or prevalences,
between two exposure groups. Absolute effect measures are important for counseling individual patients and for
understanding the impact of the disease on the population.
Example: Say the absolute risk of developing a disease is 4 in 100 in non-smokers. Say the relative risk
of the disease is 1.5 in smokers compared to non-smokers. The 1.5 relates to the 4 - so the absolute
increase in the risk is 50% of 4, which is 2. So, the absolute risk of smokers developing this disease is 6
in 100.
Related topic: relative effect measures
Absolute risk: The probability of an event in a population under study in a given period of time
Formula: rate in exposed group - rate in unexposed group
Related topic: relative risk
Accuracy: General term denoting the absence of error of all kinds.
Adjustment: The summarizing procedure for a measure of association in which the effects of differences in
composition of the populations being compared have been minimized by statistical methods. There are several
statistical methods for adjustment, including multiple regression analysis, restriction, standardization and
matching.
Alpha error: see type I error
Analysis of variance (ANOVA): Test used to compare the mean values across multiple groups. The null
hypothesis for the ANOVA test is that the mean is the same for all groups, in the population.
Related topics: t-test
ANOVA: see Analysis of variance
Arithmetic mean: The average of a set of numerical values, calculated by adding them together and dividing
by the number of terms in the set.
Arm (of a trial): A group of study participants whose outcome in a study is compared with that of another
group. The arms of a trial are commonly categorized as experimental and control groups.
Ascertainment Bias: Systematic error arising from a failure to select a sample of study participants that
adequately represents the underlying population. This bias may arise because of the nature of the sources from
which participants are chosen, e.g. a specialized clinic or diagnostic process.
Association: Statistical dependence between two or more events, characteristics, or other variables. An
association may be present by chance or may be produced by various other circumstances; the presence of an
association does not necessarily imply a causal relationship.
Related topics: correlation, causation
Association, direct: Association is not via a known third variable.
Association, indirect: Association is via known other variables.
Attributable risk: Given a causal association between an exposure and an outcome, the attributable risk (AR)
is the incidence of disease associated with or due to the exposure among exposed individuals. Equivalently, the
AR is the incidence in the exposed group minus the incidence in the unexposed group. An alternate
interpretation of the attributable risk is the amount of disease in exposed persons that could be eliminated by
eliminating the exposure.
Example: If smoking truly causes kidney transplant failure, then the interpretation of the attributable risk
for smoking would be, “there are 27.3 additional kidney transplant failures per 1000 person-years among
transplant recipients who smoke.” Stated another way, “smokers with a kidney transplant incur an
estimated 27.3 extra kidney transplant failures per 1000 person-years.”
Formula:
In a cohort study:
AR = Incidence of outcome in exposed – incidence in unexposed
In a case-control study:
AR = Overall incidence of outcome in population / (Prevalence of exposure in population + [1/(RR-1)])
Attributable risk percent: Given a causal association between an exposure and an outcome, the proportion of
the occurrence of the disease in exposed individuals is due to the exposure.
Formula:
In a cohort study:
AR %
(1) = ((Incidence of outcome in exposed – incidence in unexposed) / Incidence of outcome in exposed )
* 100%
(2) = ((RR – 1)/RR) * 100%
In a case-control study,
AR% = ((OR – 1)/OR) * 100%
B
Berkson's bias: A systematic error that occurs when hospital-based cases and controls have different exposures
than the population-based cases and controls.This occurs when the combination of exposure and disease under
study increases the risk of hospital admission, thus leading to a higher exposure rate among the hospital cases
than the hospital controls. In case–control studies, controls are often selected from the same hospital where
cases were found. Such controls are conveniently accessible for purposes of the study. The problem is that
hospitalized individuals are more likely to suffer from many illnesses, as well as more severe illnesses, and
engage in less healthy behaviors.
Beta coefficient: redirect to regression coefficient
Beta error: redirect to type II error
Bias: Deviation of results or inferences from the truth, due to any cause other than sampling variation.
Possible causes of bias include, but are not limited to, factors involved in the choice or recruitment of a
study sample and factors involved in the definition and measurement of study variables. The inverse of
bias is validity.
Bias due to confounding: Systematic error that occurs when exposed and unexposed individuals differ by
characteristics other than the exposure, and those characteristics are also related to the outcome, without being
in the causal pathway between the exposure and the outcome. The bias occurs when these characteristics
influence the study results.
Bias due to instrument error: Systematic error due to faulty calibration, inaccurate measurement by
instruments, contaminated reagents, incorrect dilution of reagents, etc.
Example: Say a weighing scale is not calibrated correctly, and the mass of the reference weight is
overestimated, then all future weights measured on that scale will be underestimated, resulting in a
systematic error in the measured mass of subjects.
Bias due to withdrawals: Systematic error due to the characteristics of those subjects who choose to withdraw
from the study.
Biological plausibility: The criterion that an observed, presumably or putatively causal association fits
previously existing biological knowledge. Associations that support proven biological mechanisms are more
likely to be causal than those not supported by scientific evidence.
Example: an observational study indicating associations of LDL cholesterol levels and heart disease is
supported by evidence from multiple parallel studies: basic science studies demonstrated LDL
cholesterol deposition in the arterial wall and translational studies showed enlargement of atherosclerotic
plaque size by angiography among patients with higher LDL cholesterol levels.
Biomarker, biological marker: substance used as an indicator of a biological state.
Example: Serum creatinine is a biomarker of kidney function
Biostatistics: application of statistics to biological or medical problems.
Blind(ed) study: A study in which observers and/or subjects are kept ignorant of the group to which the
subjects are assigned. Blinding study participants to the treatment assignment attempts to make the intervention
and control groups as similar as possible, including subjects’ expectations of therapy. Blinding study
investigators attempts to remove potential biases that may occur in study measurements and analysis.
Related topic: double-blind study
Block randomization: A sampling technique used to control for factors other than the exposure that may be
related to the outcome, termed nuisance factors or potential confounders. The basic concept is to create blocks
in which the nuisance factors, for example race and gender, are held constant and the factor of interest is
allowed to vary. Within blocks, it is possible to assess the effect of different levels of the factor of interest
without having to worry about variations due to changes of the block factors.
Bonferonni correction: This procedure compensates for the multiple comparison problem by setting a more
stringent p-value threshold for declaring a study result to be ‘significant.’
Example: If an experiment tests 25 risk factors for hypertension, the p-value threshold for declaring each
risk factor to be statistically significant would not be 0.05, but instead would be 0.05/25 = 0.002.
Related topic: Multiple comparison problem
Bradford Hill criteria: redirect to Hill Criteria of Causation
C
Case: A person in the population or study group identified as having the particular disease, health disorder, or
condition under investigation.
Case-Control Study: A study design that begins with the identification of one group of persons with the
outcome of interest (cases), and a suitable group of persons without the outcome (controls). The relationship of
an exposure to the outcome is examined by comparing the cases and controls with regard to either how
frequently the exposure is present or the levels of the exposure, in each of the groups.
Case series: A descriptive, observational study of a series of cases, typically describing the clinical course and
prognosis of a condition.
Case report: A description of a single case, typically describing the signs and symptoms, clinical course, and
prognosis of that case.
Categorical variable: A variable (sometimes called a nominal variable) that is grouped into two or more
categories.
Example: Body mass index is often expressed as a categorical variable, where observations are grouped
based on the World Health Organization’s classifications of underweight (BMI <18.5 mg/kg2), normal
(BMI 18.5-25), and overweight (BMI> 25).
Causality: The relating of causes to the effects they produce. A cause is termed “necessary” when it must
always precede an effect. This effect need not be the sole result of the one cause. A cause is termed “sufficient”
when it inevitably produces an effect.
Factors favoring an inference of causation
1. Evidence from RCT
2. Strength of association
3. Temporality
4. Dose-response
5. Biological plausibility
Censoring: The loss of subjects from a follow-up study. The occurrence of the outcome of interest among such
subjects is unknown after a specified time when it was known that the event of interest had not occurred. Such
subjects are described as censored.
Related topics: informative censoring
Census: A sample that includes every individual in a population or group.
Chi-square test: statistical test that is used to compare proportions between two different groups
Example: In order to test whether the proportion of coffee drinkers with hypertension is significantly
different from the proportion of non-coffee drinkers with hypertension, the chi-square test could be used.
Clinical Trial: A research study that involves the administration of a test regimen to humans to evaluate its
efficacy and safety.
Phase I- IV Clinical Trials
A. Phase I studies seek to determine how well a drug is tolerated in humans and how large a dose
can be given before unacceptable toxicities occur.
B. Phase II studies are designed to evaluate whether a drug has biologic activity and to determine
safety and tolerability.
C. Phase III studies are randomized trials designed to assess the effectiveness and safety of an
intervention. Outcomes of phase III studies are typically clinical events, such as death or tumorfree survival. Safety assessments occur over a longer period of time compared to phase II
studies.
D. Phase IV studies occur after FDA or other approval and typically focus on long-term safety
surveillance, evaluating outcomes associated with a drug or intervention as it is used in clinical
practice.
Coefficient: redirect to regression coefficient
Coefficient of determination: The square of the sample correlation coefficient, denoted by the Greek letter
rho-squared, ρ2. It estimated the proportion of the variance in the dependent variable that is explained by or
accounted for by the independent variable(s) in a linear regression analysis. The coefficient of determination
ranges from 0 to 1, where a value of 1 indicates that the regression line perfectly fits the data.
Cohort: Any designated group of persons who share a common experience or condition and who are followed
or traced over a period of time.
Cohort Study: A study design that begins with the identification of one group of persons with the exposure of
interest, and another group of persons without the exposure. Both groups are followed over time to determine
whether or not they experience the outcome of interest. The purpose is to compare the exposed and unexposed
with regard to either (1) how frequently the outcome is present or (2) the levels of the outcome, in each of the
groups.
Strengths of a cohort study:
- Can directly estimate the incidence of the outcome in exposed and unexposed groups
- Exposure is known to precede the outcome
- Can study multiple outcomes of a single exposure
- Good for study of rare exposures
Limitations of a cohort study:
- Inefficient for rare outcomes
- Prospective studies can be expensive
- Not well-suited to study multiple exposures
Example: The Cardiovascular Health Study is a cohort study of risk factors for cardiovascular disease.
Starting in the late 80’s, investigators recruited 5888 adults over the age of 65 years, measured risk
factors (e.g. blood pressure) and followed the participants for over 20 years to capture outcomes of
interest (e.g. stroke and death) in the sample.
Confidence Interval: If a study is repeated an infinite number of times and a 95% confidence interval placed
around each parameter estimate (e.g. sample mean, sample proportion, sample relative risk) then 95% of the
intervals will contain the true population estimate.The confidence interval will be narrower when the sample
size is large and the variation is small.
Formula: For a 95% confidence interval around a sample mean:
the lower bound of the interval is the sample mean - 1.96*sample standard deviation/sqrt(n)
the upper bound of the interval is the sample mean + 1.96*sample standard deviation/sqrt(n)
Confidence level: If one computes confidence intervals an infinite number of times from independent data, the
fraction of intervals that contain the parameter is the confidence level.
Confounder: A variable that is associated with the exposure of interest and the outcome of interest, and is not
an intermediate variable. Failure to control for such a variable may lead to a distorted and/or biased estimate of
the association between the exposure and the outcome.
Confounding: A situation in which the association between the exposure and outcome is distorted by other
variables that are associated with both the exposure and the outcome of interest.
Confounding can be detected by a substantial change in the coefficient of interest after including the potential
confounding variable in the multiple regression model.
Confounding by indication: A situation in which the reason a particular medication is prescribed and not the
medication itself, may be responsible for an observed association between the use of that medication and the
study outcome.
Example: Observational studies have found associations of calcium channel blockers for treatment of
hypertension with death and myocardial infarctions. Confounding by indication may explain these
findings, in that they may reflect the selective use of these drugs to treat the highest-risk hypertensive
patients and not the adverse effects of CCBs.
Continuous variable: a variable for which, within the range of the variable values, any value is possible.
Control: A person in the population not having the disease or outcome in question. If an outcome is rare,
controls are often selected to estimate the frequency of the exposure in the population.
Control group: As used in the expressions case-control study and randomized controlled trial, describes a
comparison group that differs in disease experience or allocation to treatment, respectively.
Controls, matched: Controls who are selected so that they are similar to the exposed group, or cases, in terms
of specific characteristics. Some commonly used matching variables are age group, sex and race.
Correlation Coefficient: Also known as Pearson product-moment correlation coefficient and denoted by the
Greek letter rho, ρ. Measure of association that indicates the degree to which two variables are linearly related.
The correlation coefficient ranges from −1 to 1. A value of 1 implies that a linear equation describes the
relationship between X and Y perfectly, with all data points lying on a line for which Y increases as X increases.
A value of −1 implies that all data points lie on a line for which Y decreases as X increases. A value of 0 implies
that there is no linear correlation between the variables.
Formula: correlation coefficient of X and Y = covariance (X, Y) / (SDX*SDY)
Related topic: coefficient of determination
Covariate: a variable that is possibly associated with the outcome. A covariate may be the exposure of interest,
a confounding variable or an interaction variable.
Covariance: A measure of how two variables change together
Related topic: variance
Cox model: redirect to proportional hazards model
Crossover design: longitudinal study in which each patient is randomly assigned to a sequence of treatments,
usually including the treatment of interest and a placebo. In crossover studies, the influence of confounding
variables is reduced because each patient serves as his or her own control.
Cross-sectional study: Study that examines the relationship between an outcome and another variable of
interest as they exist in a defined population at one particular moment in time. An important disadvantage of
cross-sectional studies is, in most scenarios, the inability to discern a temporal relationship between the
exposure and the outcome.
Cumulative Incidence: The number or proportion of a group of people who experience the onset of a healthrelated event during a specified time interval.
D
Death rate: redirect to mortality rate
Degrees of freedom: The number of independent comparisons that can be made between the members of a
sample.
Example: If comparing two groups with N1 and N2 individuals, respectively, the degrees of freedom are
equal to N1 + N2 – 2.
Dependent variable: A variable the value of which is dependent on the effect of other variables – independent
variables – in the relationship under study. Often this is the outcome or condition of interest.
Descriptive Study: A study concerned with and designed only to describe the existing distribution of variables,
without regard to causal or other hypotheses.
Design: redirect to study design
Design Bias: The difference between a true value and that actually obtained, occurring as result of faulty design
of a study.
Detection Bias: Systematic error due to methods of ascertainment, diagnosis, or verification of cases.
Differential misclassification: Error in measurement of study data that results from systematic errors that occur
preferentially within a subset of a study population. Differential misclassification can either exaggerate or
underestimate an effect.
Differential misclassification of the exposure: The amount error in measurement of the exposure differs
among subjects with or without the outcome. Case-control studies are highly susceptible to this form of bias.
Can result in observing a relative risk that is closer to or further from 1.0, depending on the particular situation.
Example: In a case-control study of smoking and lung cancer, participants with lung cancer are more
likely to carefully report their smoking habits, reducing the measurement error in the lung cancer group,
but not the non-diseased group.
Differential misclassification of the outcome: The error in measurement of the outcome differs between
subjects who are exposed or unexposed. Can result in observing a relative risk that is closer to or further from
1.0, depending on the particular situation.
Example: In a cohort study of coffee consumption and MI, say 20% of coffee drinkers are incorrectly
designated as having had an MI, whereas only 5% of non-coffee drinkers are incorrectly designated as
having had an MI, differential misclassification of the outcome has occurred.
Distribution: the summary of the frequencies of the values or categories
Example: In the United States, in 2009, the gender distribution was: 155 million men and 159 million
women.
Dose-response relationship: Also known as biological gradient. A relationship in which a change in amount,
intensity, or duration of exposure is associated with a directional change in risk of a specified outcome.
Double-blind: A study in which both the observers and subjects are kept ignorant of the group to which the
subjects are assigned. This mitigates the placebo effect among subjects and protects against conscious or
unconscious prejudice for or against the treatment on the part of the observers.
Dropouts: Study participants who are lost to follow-up in a longitudinal study
Dummy variable: Variable that takes the value of 0 or 1 to indicate the absence or presence of a dichotomous
category.
E
Ecologic Fallacy: The bias that may occur when a population-level association is erroneously taken to imply a
similar individual-level association.
Ecologic Study: A study in which the units of analysis are populations or groups of people, rather than
individuals.
Effect Measure: Quantity that measures the magnitude of the association of a factor with the frequency or risk
of a health outcome.
Examples: relative risk, odds ratio, attributable risk, etc.
Effect Modification: The concept that the size of an effect or association differs according to another factor,
the effect modifier.
Effect Modifier: A factor according to which the size of an effect or association between two other factors
differs. Effect modification is examining the selected effect measure for the association under study across
levels (strata) of the potential effect modifier.
Effectiveness: The capacity for beneficial change (or therapeutic effect) of a given intervention under real-life
circumstances. A study of effectiveness asks the question: “Does an intervention work when given as it would
be in the real world?”
Related topic: efficacy
Efficacy: The capacity for beneficial change (or therapeutic effect) of a given intervention, under ideal
circumstances. A study of efficacy asks the question: “Can an intervention work when given under the most
optimal circumstances?”
Related topic: effectiveness
Eligibility Criteria: Criteria that must be satisfied in order to be selected for participation in a study or
inclusion in a cohort.
Example: In a cohort study of incident kidney transplantation, eligibility criteria could include age over
18 years, no previous kidney transplantation and the ability to provide informed consent.
Epidemic: The occurrence in a community or region of cases of an illness or health-related behavior clearly in
excess of normal expectancy.
Epidemiology: The study of the distribution and determinants of health-related states or events in populations,
and the application of this study to control of health problems.
Error, Type I: The error of rejecting a true null hypothesis.
Error, Type II: The error of failing to reject a false null hypothesis.
Excess risk: redirect to risk difference
Experimental study: A study in which the allocation or assignment of individuals in under the control of an
investigator, in contrast to an observational study, and thus can be randomized.
Explanatory variable: redirect to independent variable.
Exposed: Often used to connote a person or group whose members have been exposed to a supposed cause of a
disease or health state of interest, or possess a characteristic that is a determinant of the health outcome of
interest.
Exposure: characteristic that is a plausible determinant of an outcome of interest.
External validity: The extent to which a study’s findings apply to the population at large.
F
False Negative:
1) Negative test result in a subject who truly has the characteristic for which the test is conducted.
2) The labeling of a diseased person as non-diseased when screening for the disease.
False Positive:
1) Positive test result in a subject who truly does not have the characteristic for which the test is being
conducted.
2) The labeling of a healthy person as diseased when screening for the disease.
Follow-up: Observation over time of an individual, group, or initially defined population in order to observe
changes in health or outcome status.
Follow-up study: redirect to Cohort Study
Frequency: Number of occurrences of an event or characteristic
G
Generalizability: redirect to external validity
Genetic epidemiology: The study of the role of genetic factors in determining health and disease in families
and in populations, and the interplay of such genetic factors with environmental factors.
Geometric mean: The geometric mean of n numbers is the nth root of their product.
"Gold Standard": A method, procedure, or measurement that is widely accepted as being the best available.
Often used to compare against new methods.
H
Harmonic mean: the reciprocal of the arithmetic mean of the reciprocals
Hawthorne effect: a situation in which subjects are modifying their behavior simply because they are part of a
research study, not because of any intervention or exposure.
Hazard function: function describing how the risk of an outcome changes over time at the baseline levels of
covariates
Hazard ratio: measure of risk obtained from the Cox proportional hazards regression model, which is the ratio
of the average of slopes of two survival functions. It represents the ratio of the instantaneous risk of mortality.
In terms of interpretation, the hazard ratio is similar to the relative risk.
Healthy worker effect: A type of sampling bias which occurs when a study sample is recruited from the
workforce. Because an individual must be relatively healthy in order to be employable in a workforce, both
morbidity and mortality rates within the workforce are usually lower than in the general population. As a result,
the real excesses in both morbidity and mortality due to harmful exposures might be wholly or partially masked.
Hill’s Criteria of Causation: A list of factors pertaining to an epidemiologic study that add credence to (but do
not prove) an inference of causation.
1.
2.
3.
4.
5.
Evidence arising from randomized studies
Strength of association
Temporal relationship between exposure and outcome
Exposure-varying association, also known as dose-response
Biological plausibility
Histogram: graphical representation of the distribution of a particular variable, with the x-axis being the value
of the variable and the y-axis being the frequency or proportion of participants with that value of the variable.
Hypothesis, null: The hypothesis we wish to falsify on the basis of the data. The null hypothesis is typically
that something is not present, that there is no effect, or that there is no difference between treatment and control,
in the population.
Hypothesis testing: Using a statistical test to make a decision between rejecting or not rejecting a null
hypothesis, on the basis of a sample of observations from the population.
I
Incidence: the number of new cases of disease that develop over time. Can be expressed as incidence
proportion or incidence rate.
Related topics: incidence proportion, incidence rate.
Learning point: Incidence vs. Prevalence
Incidence proportion: the number of new cases within a specified time period divided by the size of the
population initially at risk
Example: For example, if a population initially contains 1,000 non-diseased persons and 42 develop a
condition over two years of observation, the incidence proportion is 42 cases per 1,000 persons, i.e.
4.2%.
Incidence rate: the probability of developing a disease within a specified time period. It is calculated by
dividing the number of new cases by the product of the total number of susceptible people at the beginning of
the study period and the time of observation.
Example: For example, if a population initially contains 1,000 non-diseased persons and 42 develop a
condition over two years of observation, the incidence rate is 21 cases per 1,000 person-years.
Incidence rate ratio: The incidence rate in the exposed group divided by the incidence rate in the unexposed
group.
Independent variable: The variable hypothesized to explain the dependent variable, often referring to the
exposure variable.
Inference: The process of passing from observations and data to generalization to the population at large.
Information Bias: Systematic errors due to observer or interviewer errors (for example because of lack of
blinding), response errors (for example, because of lack of blinding) or measurement error.
Informative censoring: Occurs when the probability of being lost to follow up is different based on the
probability of failure.
Informed consent: Procedure to ensure that a study participant knows and understands all of the risks involved
in participation in the study. The elements of informed consents include informing the participant of the nature
of the treatment or intervention, possible alternative treatments, and the potential risks and benefits of the
treatment or intervention.
Intent to treat analysis: Analysis based on the initial treatment intent, not on the treatment eventually
administered. Intent to treat analysis preserves the original random allocation to treatment and avoid the effects
of crossover, noncompliance or drop out.
Intention to treat analysis: redirect to intent to treat
Inter-quartile range (IQR): The inter-quartile range of a set of continuous values is the upper quartile minus
the lower quartile.
Interaction: The interdependent association of two or more variables with an outcome
Related topic: effect modification
Internal validity: The extent to which study conclusions represent the truth for the individuals studied because
the results were not likely due to the effects of chance, bias, or confounding, and because the study design,
execution, and analysis were correct. Refers to the absence of systematic error that causes the study findings
(parameter estimates) to differ from the true values as defined in the study objectives.
Related topic: external validity
Intervention: An intentional change in the exposure status of study subjects
Related topic: randomized controlled trials
Interviewer bias: Systematic error due to interviewer’s subconscious or conscious gathering of selective data.
Intraclass correlation: A measure of the extent to which members of a group resemble each other more than
they resemble members of other groups.
K
Kaplan-Meier Estimator: estimator of the survival function from life-time data
Kappa coefficient: A measure of the degree of nonrandom agreement between observers or measurements of
the same categorical variable
Calculator: κ = (P0 – Pe) / (1 – Pe) where P0 is the proportion of times the measurements agree, and Pe is
the proportion of times they can be expected to agree by chance alone.
Kurtosis: The extent to which the distribution of a variable is peaked.
L
Lead time: the time by which diagnosis of disease can be advanced by screening, as compared to the time at
which diagnosis would be made by other means (e.g. clinical presentation of overt signs of disease).
Lead time bias: Systematic error that occurs when survival is counted from the point in time when early
diagnosis was made. Even is screening is not effective, the early diagnosis adds lead time to the survival
counted from the time of usual diagnosis, resulting in an apparent longer survival time for screenees compared
to non-screenees.
Least Squares: An estimation principle in which the estimates of a set of parameters in a regression model are
those quantities that minimize the sum of the squared differences between the observed values of the dependent
variable and the values of the dependent variable predicted by the model.
Length bias: Also known as length-biased sampling. Systematic error due to selection of disproportionate
numbers of long-duration cases (cases who survive the longest). This can occur when prevalent and not incident
cases are included in a case-control study, particularly when the exposure is also associated with the length of
the disease course.
Likelihood Function: A function constructed from a regression model and a set of observed data, which gives
the probability of the observed data. The regression coefficients that maximize the probability are the maximum
likelihood estimates of the regression coefficients.
Linear Model: A statistical model in which the value of a parameter for a given value of factors, x1- xn, is
assumed to be equal to a + b1x1+…+ bnxn, where a and b1-bn are constants.
Linear Regression: Regression analysis of data using linear models, usually with a continuous outcome
variable.
Logistic Model: A statistical model for an individual’s risk (odds of a dichotomous disease state Y) as a
function of risk factors.
P ( Y| x1, …, xn) = 1/(1+e-α-β1x1-…- βnxn)
Logistic regression: Statistical modeling approach used to describe the relationship of several variables to a
dichotomous dependent variable, such as a binary disease state.
Logit: The natural logarithm of the odds of a binary outcome.
Logit Model: A linear model for the logit of disease as a function of a quantitative factor X:
Logit (disease given X = x) = a + bx
This model is mathematically equivalent to the logistic model.
Longitudinal analysis: A study in which measurements on study participants are made repeatedly over time.
The primary goal of a longitudinal study is to characterize change in the dependent variable over time and the
factors that influence change.
Loss to follow-up: The circumstance that occurs when researchers lose contact with some participants and thus
cannot complete planned data collection. This is a common cause of missing data, especially in cohort studies.
M
Mantel-Haenszel Test: A summary chi-square test for stratified data and used when controlling for
confounding.
Matching: The process of making a study group and a comparison group comparable with respect to
confounding factors. The goal of matching is to create a balanced distribution of cases and controls across strata
of a confounding factor. This enhances the ability to control for confounding and benefits statistical efficiency.
Maximum Likelihood Estimate: The value for an unknown parameter that maximizes the probability of
obtaining exactly the data that were observed.
Measure of Association: A quantity that expresses the strength of association between variables. Commonly
used measures of association are differences between means, proportions or rates, the relative risk, the odds
ratio, and correlation and regression coefficients.
Measurement Bias: Systematic error arising from inaccurate measurement (or classification) of subjects on the
study variables.
Related topics: differential and non-differential misclassification
Measurement error: Difference between the observed value of a quantity and its true value
Meta-Analysis: The process of using statistical methods to combine the results of different studies. A frequent
application has been the pooling of results from a number of small randomized controlled trials, none alone
large enough to demonstrate statistically significant differences, but in aggregate, capable of so doing.
Misclassification: The erroneous classification of an individual into a category other than that to which it
should be assigned. The probability of misclassification may be the same in all study groups (nondifferential
misclassification) or may vary between groups (differential misclassification).
Mortality rate: Number of deaths in a population over a given time interval, typically expressed in deaths per
1000 persons per year.
Multiple comparison problem: A problem that arises from the fact that the greater the number of statistical
tests conducted on a data set, the greater the probability that the tests will falsely reject the null hypothesis,
simply by chance. Because each individual hypothesis test has a type I error rate of 5% under the statistical
significance threshold of 0.05, every 20 hypothesis tests would be expected to yield one significant test due to
chance alone, even if none of the evaluated risk factors are truly associated with the outcome in the population.
Type I errors can arise when performing multiple hypothesis tests on the same data, for example a study
exploring a list of potential hypertension risk factors.
Example: GWAS
Related topic: see Bonferroni correction
Multivariable regression: Regression analysis with more than one independent variable.
Multivariate regression: Regression analysis with more than one dependent variable.
N
Natural History of Disease: The course of a disease from onset to resolution. Many diseases have certain welldefined stages such as the presymptomatic stage and several clinically manifest stages that, taken together, are
referred to as the natural history of the disease.
Necessary cause: A causal factor whose presence is required for the occurrence of the effect.
Nested case - control Study: A case control study in which cases and controls are drawn from the population
in a cohort study.
Nomogram: A two-dimensional line chart designed for reading off the result of a two-variable function, where
two scales represent known values and one scale is where the result is read off from.
Nondifferential misclassification: A situation in which the occurrence of measurement error is random, i.e. it
is not related to any other factor.
Nonparticipants: Members of a study sample or population who do not take part in the study for whatever
reason (e.g. refusal to participate, people without telephones who cannot be reached via random digit dialing
sampling). Differences between participants and nonparticipants are often a source of bias and can limit the
generalizability of study findings.
Normal distribution: Distribution of a continuous variable in a bell-shape, that is symmetrical around its mean
and in which the mean, mode and median are identical and its shape is completely determined by the mean and
standard deviation.
Null Hypothesis: The statistical hypothesis that one variable has no association with another variable, in the
population, or that two or more population distributions do not differ from one another. In simple terms, the null
hypothesis states that the results observed in a study are no different from what might have occurred as a result
of chance alone.
Number needed to treat: the number of patients who must be exposed, for a given amount of time, in order to
prevent the occurrence of one case of the outcome. The number needed to treat is the reciprocal of the absolute
risk.
O
Observational Study: Study in which nature is allowed to take its course; changes or differences in one
characteristic are studied in relation to changes or differences in other(s), without the intervention of the
investigator.
Observer Bias: Systematic difference between a true value and that actually observed, due to observer
variation.
Odds: The ratio of the probability of occurrence of an event to that of nonoccurrence, or the ratio of the
probability that something is so, to the probability that it is not.
Formula: p/1-p
Odds Ratio: In a case-control study, the odds ratio is the ratio of the odds of exposure in the cases to the odds
of exposure among non-cases. In a cohort study, the odds ratio is the ratio of the odds of disease among the
exposed to the odds of disease among the unexposed. If the disease is rare, the odds ratio approximates the
relative risk.
One-tailed test: A statistical test based on the assumption that the data have only one possible direction of
variability. This type of test is rarely used in epidemiologic studies.
Outcome: Condition that may stem from an exposure.
Over-adjustment: A situation in which unnecessary variables are included in a regression model. Adjusting for
variables in the pathway between exposure and outcome may obscure evidence of a true causal relationship.
Additionally, adjusting for variables that are not true confounders may reduce precision in the estimation of
measures of association.
Overmatching: A situation in which extraneous variables are matched on. Matching on variables in the
pathway between exposure and outcome may obscure evidence of a true causal relationship. Additionally,
matching on variables that are not true confounders may reduce precision in the estimation of measures of
association.
P
P (Probability) Value: Given a null hypothesis regarding the population, the p-value is the probability of
observing the sample result, or a more extreme result, that is due to chance (sampling variation). In most
epidemiologic work, a study result whose probability value is less than 5% (p<0.05) or 1% (p<0.01) is
considered sufficiently unlikely to have occurred by chance to justify the designation "statistically significant."
Since the size of the p-value that is generated in statistical hypothesis testing is heavily dependent on the size of
the study population: the larger the number of subjects, the smaller the p-value, we ought not to use the p-value
for any purpose other than evaluating the role of chance. The p-value is not a measure of excess disease risk.
Participant: Person upon whom research is conducted.
Peer-review: Process of review of research proposals, manuscripts and abstracts submitted for presentation at
scientific meetings, whereby these are judged for merit by other scientists in the same field.
Percentile: Divisions that produce 100 equal parts in a distribution of continuous values.
Person-Time: A measurement combining persons and time, used as the denominator in incidence rates and
mortality rates.
Placebo: An intervention with no pharmacological effect, intended to give participants the perception that they
are receiving treatment.
Placebo effect: A situation in which the beneficial effects of a placebo are due to the expectation that the
intervention will have an effect, and not to any pharmacological effects of the intervention itself.
Point prevalence: the proportion of a population that has a given characteristic at a single point in time.
Population: The collection of observations from which a sample may be drawn
Population Attributable Risk (PAR): Incidence of a disease in a population that is associated with exposure,
given that the exposure and the disease are causally related. The PAR is useful in determining if resources
should be allocated to controlling the exposure, or, instead, to exposures causing greater health problems in the
population.
Formula: Total incidence in all persons – incidence in non exposed
Population Attributable Risk % (PAR %): The portion of disease in the population that is caused by the
exposure, given that the exposure and the disease are causally related. The PAR% is useful to determine if
resources should be allocated to control of the exposure in question or to other exposures that cause a greater
proportion of the disease in the population?
Formula:
1. (Total incidence in all persons – incidence in non exposed)/ Total incidence in all persons
*100%
2. ((RR -1)/RR) * proportion of cases exposed * 100%
Population-based study: Study in which the subjects are drawn from a defined population in a manner that is
representative of the source population.
Positive Predictive Value: The probability that a person with a positive test is a true positive (truly has the
disease).
Power: The probability that a particular study will not make a type II error. In other words, power represents
the ability of a statistical test to detect some specified difference or effect.
Factors that affect study power:
1) the statistical significance criterion (p-value cutoff) used in the statistical test
2) the magnitude of the association of interest in the population
3) the sample size used to detect the association
Precautionary principle: When a research activity raises the threat of harm to human health, precautionary
safety measures should be taken to protect the study participants, even if the risks are not fully understood.
Precision: A measure of the amount of random error surrounding an estimate. Confidence intervals are
computed to demonstrate the precision of relative risk estimates. The narrower the confidence interval, the more
precise the relative risk estimate.
Prevalence: see point prevalence.
Primary Prevention: Actions aimed at reducing the incidence of disease
Probability: A measure of the frequency of an outcome or an exposure in a population
Proportion: A part, share, or number considered in relation to a whole.
Proportional Hazards Model: A statistical model used in survival analysis that yields a hazard ratio, which is
very similar to a relative risk.
Proportional Mortality Ratio: the proportion of deaths from a specific condition in a defined population,
divided by the proportion of deaths expected from this condition in a standard population, expressed either on
an age-specific basis or after age adjustment.
Prospective study: Cohort study in which new data is collected.
Example: Investigators recruit participants who do not have a prior history of stroke, collect blood for
the measurement of a panel of novel serologic stroke markers, and then follow subjects for the
development of incident stroke.
Publication bias: Tendency of editors to publish articles containing positive results, in contrast to reports that
do not present “statistically significant” findings. Can be an important source of bias in meta-analyses.
Q
Quartile:
1) Each of four equal groups into which a population can be divided according to a
particular variable.
2) Each of the three values of the random variable that divide a population into four
such groups.
Quintile:
1) Each of five equal groups into which a population can be divided according to a
particular variable.
2) Each of the four values of the random variable that divide a population into five
such groups.
R
Randomization: Allocation of individuals to groups (e.g., to experimental vs. control regimens) by chance.
Within the limits of chance variation, randomization should make the control and experimental groups similar at
the beginning of a study.
Randomized Controlled Trial: A study in which subjects are randomly allocated into groups, to receive or not
receive an experimental intervention.
Learning point: strengths and limitations of RCTs
Random Sample: A sample that is arrived at by selecting people from a population such that each person has
the same probability of selection.
Rate: The frequency with which an event occurs in a defined population, over a given period of time.
Rate Difference: The absolute difference between two rates.
Rate Ratio: The ratio of two rates, often the ratio of the rate in the exposed group to the rate in the unexposed
group.
Recall Bias: Systematic error due to differences in accuracy or completeness of memory of prior events or
experiences. Recall bias is often a concern in case-control studies.
Regression: Technique used to find the best model to describe the association between two or more variables.
Multiple regression is widely used to adjust for confounding.
Regression coefficient: A measure of how much the dependent variable will change, on average, with each unit
change in the independent variables.
Relative Risk: A measure of association calculated as the ratio of the incidence rate of the outcome among the
exposed to the incidence rate among the unexposed.
An RR = 1 indicates that the incidence in the exposed is the same as that in the unexposed group and the
interpretation is that there is no association between exposure and disease.
RR > 1 denotes a larger incidence in the exposed than in the unexposed group, interpreted as the
exposure is associated with an increased probability of developing the disease.
RR < 1 denotes a smaller incidence in the exposed than in the unexposed group, interpreted as the
exposure is associated with a decreased probability of developing the disease.
Reliability: Also known as test-retest reliability. The degree to which a measurement can be replicated and
produce the same results when it is repeated under identical conditions. Reliability is often measured using
Cronbach’s alpha
Repeatability: see reliability
Replication: Conducting an experiment more than once to confirm the findings and increase precision.
Reporting Bias: see recall bias
Representative sample: A sample that resembles the underlying population from which it is drawn, in terms of
certain characteristics.
Residual: The difference between the observed values of an outcome and those predicted by the regression
equation.
Residual confounding: Confounding that still remains after applying techniques to control for confounding, for
example, stemming from unmeasured covariates or those measured with error.
Response Bias: Systematic error due to difference in characteristics between those who choose or volunteer to
participate in a study and those who do not.
Response rate: The number of people who completed a survey divided by the number of people who were
eligible and invited to complete the survey.
Retrospective Study: Cohort study in which data collected prior to the study launch is used.
Risk: The probability that an event will occur within a given period of time.
Risk difference: Measure of association calculated as the difference of the incidence rate of the outcome
among the exposed to the incidence rate among the unexposed.
An RD = 0 indicates that the incidence in the exposed is the same as that in the unexposed group and the
interpretation is that there is no association between exposure and disease.
RR > 0 denotes a larger incidence in the exposed than in the unexposed group, interpreted as the
exposure is associated with an increased probability of developing the disease.
RR < 0 denotes a smaller incidence in the exposed than in the unexposed group, interpreted as the
exposure is associated with a decreased probability of developing the disease.
Risk Factor: A characteristic, either inborn or inherited or of personal behavior or lifestyle, or an
environmental exposure that is known to be associated with health-related outcomes.
Risk Ratio: The ratio of two risks.
S
Sample: A selected subset of a population. A sample may be random or nonrandom and may be representative
or nonrepresentative.
Sampling: The process of selecting a subset of subjects from all the subjects in a particular group.
Sampling Bias: Systematic error that occurs when certain members of the underlying population have a higher
chance of being sampled.
Sampling error: Uncertainty in study findings that occurs by conducting analyses on a sample instead of on the
entire population.
Sampling variability: Random error in the estimate of population-level parameters that occurs because only a
sample of the population is observed.
Sample size: Number of individuals (or groups) included in a sample.
Scatterplot: A graphic plot of data points for two variables.
Screening: Testing for the presence of a disease or other condition
Secondary Prevention: Actions aimed at shortening the duration of a disease
Selection Bias: Error due to systematic differences in characteristics between those who are selected for study
and those who are not. Selection bias threatens the generalizability of conclusions from studies that include only
volunteers from a healthy population.
Sensitivity analysis: A method to determine the robustness of the results by examining the extent to which they
are affected by changes in assumptions or values of variables.
Sensitivity of a test: The proportion of truly diseased people who are identified as diseased by the test.
Formula: True positives / (True positives + False negatives)
Skew: A term used to describe an asymmetrical frequency distribution.
Source population: Individuals who are eligible and can be approached for participation in a study.
Specificity of a test: The proportion of truly non-diseased people who are identified as non-diseased by the test.
Formula: True negatives / (True negatives + False positives)
Standard deviation: A measure of how dispersed a frequency distribution is around its mean. The larger the
standard deviation, the wider the spread in the distribution of the variable.
Standard error of the mean: The standard deviation of a sample mean estimate of a population mean, or the
standard deviation of the error in the sample mean relative to the true mean.
Standardization: A procedure used to remove the effects of differences in composition (such as age
distribution or gender ratio) when comparing rates for different populations, by using a weighted average in one
population using the weights from the composition of a second “standard” population.
Standardized mortality ratio: The ratio of the number of deaths in a population to the number of deaths that
would be expected if the population had the same mortality rate as a standard population.
Statistical model: A mathematical representation of the relationship between variables under study.
Statistical significance: The likelihood of observing an association at least as large as the one seen, if in truth
no association is present, in the population.
Related topics: p-value
Statistical test: Procedure used to decide whether a study hypothesis should be rejected or not.
Stratification: Process of separating data into several categories, or strata (e.g. age groups, gender, etc.), in
order to examine associations between an exposure and an outcome within each category, where confounding
by the stratifying variable cannot take place.
Stratified randomization: Procedure of randomization in which strata or categories are identified and subjects
are randomly allocated within each stratum.
Sufficient cause: A minimum set of conditions needed to produce a given outcome.
Sum of squared residuals: Also known as the residual sum of squares, is a measure of discrepancy between
the observed data and that predicted from the model. The smaller the residual sum of squares, the better (more
tightly) the model is fit to the data.
Surveillance: Continuous monitoring of disease occurrence within a group.
Survey: A study in which information is systematically gathered but in which no experiment is conducted.
Survival analysis: Statistical procedure for evaluating the association between exposures and the probability of
survival, over time.
Survival curve: A curve that starts at 100% of the study population and shows the percentage of the population
still surviving at successive times for as long as information is available.
Survival function: A function of time that starts with a population 100% alive (or non-diseased) at a particular
time and provides the percentage of the population still alive at another time.
Systematic error: see bias
Systematic review: Literature review focused on a single question that tries to identify, appraise, select and
synthesize all high quality research evidence relevant to that question.
T
Target population: The group of individuals about which a study aims to make inferences. The term is
sometimes used to indicate the population from which a sample is drawn and sometimes to denote any
"reference" population about which inferences are required.
t-test: Statistical test used to compare mean values between two different groups
Tertiary prevention: Actions aimed at reducing the extent or number of complications of a disease.
Test of significance: redirect to p-value
Time series: A study design in which there is only one exposure group in whom measurements are made a
several different times, allowing trends to be detected.
Time-to-event analysis: redirect to survival analysis
Triple blind study: A study in which the subjects, investigators and analysts are blinded as to which subjects
received what treatment.
Type I error: Occurs when a hypothesis test declares a result to be statistically significant even though the null
hypothesis is true (there is no true effect or association in the population).
Type II error: Occurs when a hypothesis test declares a result to be statistically insignificant even though there
is a true difference in the population.
Type III error: Occurs when study design produces the “right answer to the wrong question,” i.e. an error in
selection of the method of studying a problem.
V
Validity, Study: The degree to which the inference drawn from a study is warranted.
Learning point: Internal validity and external validity.
Venn diagram: A diagram representing sets of data pictorially as circles, with common elements of the sets
being represented by intersections of the circles.
Verification bias: see work-up bias.
Vital Statistics: Quantitative data concerning a population, such as the number of births, marriages, and deaths.
W
Washout period: In a crossover study design, the period of time between patients receiving interventions; it is
used to ensure that patients are free of the influence of one intervention before they begin receiving another.
Weighted average: A method of calculating the mean of a set of numbers in which some elements of the set
carry more importance (or weight) than others.
Work-up bias: Systematic error that occurs when the sample used to assess a measurement tool (e.g.,
diagnostic test) is restricted only to who have the condition or factor being measured and the sensitivity of the
measure can be overestimated.
Z
Z score: A standard score that indicates how many standard deviations an observation is above or below the
mean.