Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
EPIDEMIOLOGY GLOSSARY A Absolute effect measures: The difference between incidence rates, incidence proportions or prevalences, between two exposure groups. Absolute effect measures are important for counseling individual patients and for understanding the impact of the disease on the population. Example: Say the absolute risk of developing a disease is 4 in 100 in non-smokers. Say the relative risk of the disease is 1.5 in smokers compared to non-smokers. The 1.5 relates to the 4 - so the absolute increase in the risk is 50% of 4, which is 2. So, the absolute risk of smokers developing this disease is 6 in 100. Related topic: relative effect measures Absolute risk: The probability of an event in a population under study in a given period of time Formula: rate in exposed group - rate in unexposed group Related topic: relative risk Accuracy: General term denoting the absence of error of all kinds. Adjustment: The summarizing procedure for a measure of association in which the effects of differences in composition of the populations being compared have been minimized by statistical methods. There are several statistical methods for adjustment, including multiple regression analysis, restriction, standardization and matching. Alpha error: see type I error Analysis of variance (ANOVA): Test used to compare the mean values across multiple groups. The null hypothesis for the ANOVA test is that the mean is the same for all groups, in the population. Related topics: t-test ANOVA: see Analysis of variance Arithmetic mean: The average of a set of numerical values, calculated by adding them together and dividing by the number of terms in the set. Arm (of a trial): A group of study participants whose outcome in a study is compared with that of another group. The arms of a trial are commonly categorized as experimental and control groups. Ascertainment Bias: Systematic error arising from a failure to select a sample of study participants that adequately represents the underlying population. This bias may arise because of the nature of the sources from which participants are chosen, e.g. a specialized clinic or diagnostic process. Association: Statistical dependence between two or more events, characteristics, or other variables. An association may be present by chance or may be produced by various other circumstances; the presence of an association does not necessarily imply a causal relationship. Related topics: correlation, causation Association, direct: Association is not via a known third variable. Association, indirect: Association is via known other variables. Attributable risk: Given a causal association between an exposure and an outcome, the attributable risk (AR) is the incidence of disease associated with or due to the exposure among exposed individuals. Equivalently, the AR is the incidence in the exposed group minus the incidence in the unexposed group. An alternate interpretation of the attributable risk is the amount of disease in exposed persons that could be eliminated by eliminating the exposure. Example: If smoking truly causes kidney transplant failure, then the interpretation of the attributable risk for smoking would be, “there are 27.3 additional kidney transplant failures per 1000 person-years among transplant recipients who smoke.” Stated another way, “smokers with a kidney transplant incur an estimated 27.3 extra kidney transplant failures per 1000 person-years.” Formula: In a cohort study: AR = Incidence of outcome in exposed – incidence in unexposed In a case-control study: AR = Overall incidence of outcome in population / (Prevalence of exposure in population + [1/(RR-1)]) Attributable risk percent: Given a causal association between an exposure and an outcome, the proportion of the occurrence of the disease in exposed individuals is due to the exposure. Formula: In a cohort study: AR % (1) = ((Incidence of outcome in exposed – incidence in unexposed) / Incidence of outcome in exposed ) * 100% (2) = ((RR – 1)/RR) * 100% In a case-control study, AR% = ((OR – 1)/OR) * 100% B Berkson's bias: A systematic error that occurs when hospital-based cases and controls have different exposures than the population-based cases and controls.This occurs when the combination of exposure and disease under study increases the risk of hospital admission, thus leading to a higher exposure rate among the hospital cases than the hospital controls. In case–control studies, controls are often selected from the same hospital where cases were found. Such controls are conveniently accessible for purposes of the study. The problem is that hospitalized individuals are more likely to suffer from many illnesses, as well as more severe illnesses, and engage in less healthy behaviors. Beta coefficient: redirect to regression coefficient Beta error: redirect to type II error Bias: Deviation of results or inferences from the truth, due to any cause other than sampling variation. Possible causes of bias include, but are not limited to, factors involved in the choice or recruitment of a study sample and factors involved in the definition and measurement of study variables. The inverse of bias is validity. Bias due to confounding: Systematic error that occurs when exposed and unexposed individuals differ by characteristics other than the exposure, and those characteristics are also related to the outcome, without being in the causal pathway between the exposure and the outcome. The bias occurs when these characteristics influence the study results. Bias due to instrument error: Systematic error due to faulty calibration, inaccurate measurement by instruments, contaminated reagents, incorrect dilution of reagents, etc. Example: Say a weighing scale is not calibrated correctly, and the mass of the reference weight is overestimated, then all future weights measured on that scale will be underestimated, resulting in a systematic error in the measured mass of subjects. Bias due to withdrawals: Systematic error due to the characteristics of those subjects who choose to withdraw from the study. Biological plausibility: The criterion that an observed, presumably or putatively causal association fits previously existing biological knowledge. Associations that support proven biological mechanisms are more likely to be causal than those not supported by scientific evidence. Example: an observational study indicating associations of LDL cholesterol levels and heart disease is supported by evidence from multiple parallel studies: basic science studies demonstrated LDL cholesterol deposition in the arterial wall and translational studies showed enlargement of atherosclerotic plaque size by angiography among patients with higher LDL cholesterol levels. Biomarker, biological marker: substance used as an indicator of a biological state. Example: Serum creatinine is a biomarker of kidney function Biostatistics: application of statistics to biological or medical problems. Blind(ed) study: A study in which observers and/or subjects are kept ignorant of the group to which the subjects are assigned. Blinding study participants to the treatment assignment attempts to make the intervention and control groups as similar as possible, including subjects’ expectations of therapy. Blinding study investigators attempts to remove potential biases that may occur in study measurements and analysis. Related topic: double-blind study Block randomization: A sampling technique used to control for factors other than the exposure that may be related to the outcome, termed nuisance factors or potential confounders. The basic concept is to create blocks in which the nuisance factors, for example race and gender, are held constant and the factor of interest is allowed to vary. Within blocks, it is possible to assess the effect of different levels of the factor of interest without having to worry about variations due to changes of the block factors. Bonferonni correction: This procedure compensates for the multiple comparison problem by setting a more stringent p-value threshold for declaring a study result to be ‘significant.’ Example: If an experiment tests 25 risk factors for hypertension, the p-value threshold for declaring each risk factor to be statistically significant would not be 0.05, but instead would be 0.05/25 = 0.002. Related topic: Multiple comparison problem Bradford Hill criteria: redirect to Hill Criteria of Causation C Case: A person in the population or study group identified as having the particular disease, health disorder, or condition under investigation. Case-Control Study: A study design that begins with the identification of one group of persons with the outcome of interest (cases), and a suitable group of persons without the outcome (controls). The relationship of an exposure to the outcome is examined by comparing the cases and controls with regard to either how frequently the exposure is present or the levels of the exposure, in each of the groups. Case series: A descriptive, observational study of a series of cases, typically describing the clinical course and prognosis of a condition. Case report: A description of a single case, typically describing the signs and symptoms, clinical course, and prognosis of that case. Categorical variable: A variable (sometimes called a nominal variable) that is grouped into two or more categories. Example: Body mass index is often expressed as a categorical variable, where observations are grouped based on the World Health Organization’s classifications of underweight (BMI <18.5 mg/kg2), normal (BMI 18.5-25), and overweight (BMI> 25). Causality: The relating of causes to the effects they produce. A cause is termed “necessary” when it must always precede an effect. This effect need not be the sole result of the one cause. A cause is termed “sufficient” when it inevitably produces an effect. Factors favoring an inference of causation 1. Evidence from RCT 2. Strength of association 3. Temporality 4. Dose-response 5. Biological plausibility Censoring: The loss of subjects from a follow-up study. The occurrence of the outcome of interest among such subjects is unknown after a specified time when it was known that the event of interest had not occurred. Such subjects are described as censored. Related topics: informative censoring Census: A sample that includes every individual in a population or group. Chi-square test: statistical test that is used to compare proportions between two different groups Example: In order to test whether the proportion of coffee drinkers with hypertension is significantly different from the proportion of non-coffee drinkers with hypertension, the chi-square test could be used. Clinical Trial: A research study that involves the administration of a test regimen to humans to evaluate its efficacy and safety. Phase I- IV Clinical Trials A. Phase I studies seek to determine how well a drug is tolerated in humans and how large a dose can be given before unacceptable toxicities occur. B. Phase II studies are designed to evaluate whether a drug has biologic activity and to determine safety and tolerability. C. Phase III studies are randomized trials designed to assess the effectiveness and safety of an intervention. Outcomes of phase III studies are typically clinical events, such as death or tumorfree survival. Safety assessments occur over a longer period of time compared to phase II studies. D. Phase IV studies occur after FDA or other approval and typically focus on long-term safety surveillance, evaluating outcomes associated with a drug or intervention as it is used in clinical practice. Coefficient: redirect to regression coefficient Coefficient of determination: The square of the sample correlation coefficient, denoted by the Greek letter rho-squared, ρ2. It estimated the proportion of the variance in the dependent variable that is explained by or accounted for by the independent variable(s) in a linear regression analysis. The coefficient of determination ranges from 0 to 1, where a value of 1 indicates that the regression line perfectly fits the data. Cohort: Any designated group of persons who share a common experience or condition and who are followed or traced over a period of time. Cohort Study: A study design that begins with the identification of one group of persons with the exposure of interest, and another group of persons without the exposure. Both groups are followed over time to determine whether or not they experience the outcome of interest. The purpose is to compare the exposed and unexposed with regard to either (1) how frequently the outcome is present or (2) the levels of the outcome, in each of the groups. Strengths of a cohort study: - Can directly estimate the incidence of the outcome in exposed and unexposed groups - Exposure is known to precede the outcome - Can study multiple outcomes of a single exposure - Good for study of rare exposures Limitations of a cohort study: - Inefficient for rare outcomes - Prospective studies can be expensive - Not well-suited to study multiple exposures Example: The Cardiovascular Health Study is a cohort study of risk factors for cardiovascular disease. Starting in the late 80’s, investigators recruited 5888 adults over the age of 65 years, measured risk factors (e.g. blood pressure) and followed the participants for over 20 years to capture outcomes of interest (e.g. stroke and death) in the sample. Confidence Interval: If a study is repeated an infinite number of times and a 95% confidence interval placed around each parameter estimate (e.g. sample mean, sample proportion, sample relative risk) then 95% of the intervals will contain the true population estimate.The confidence interval will be narrower when the sample size is large and the variation is small. Formula: For a 95% confidence interval around a sample mean: the lower bound of the interval is the sample mean - 1.96*sample standard deviation/sqrt(n) the upper bound of the interval is the sample mean + 1.96*sample standard deviation/sqrt(n) Confidence level: If one computes confidence intervals an infinite number of times from independent data, the fraction of intervals that contain the parameter is the confidence level. Confounder: A variable that is associated with the exposure of interest and the outcome of interest, and is not an intermediate variable. Failure to control for such a variable may lead to a distorted and/or biased estimate of the association between the exposure and the outcome. Confounding: A situation in which the association between the exposure and outcome is distorted by other variables that are associated with both the exposure and the outcome of interest. Confounding can be detected by a substantial change in the coefficient of interest after including the potential confounding variable in the multiple regression model. Confounding by indication: A situation in which the reason a particular medication is prescribed and not the medication itself, may be responsible for an observed association between the use of that medication and the study outcome. Example: Observational studies have found associations of calcium channel blockers for treatment of hypertension with death and myocardial infarctions. Confounding by indication may explain these findings, in that they may reflect the selective use of these drugs to treat the highest-risk hypertensive patients and not the adverse effects of CCBs. Continuous variable: a variable for which, within the range of the variable values, any value is possible. Control: A person in the population not having the disease or outcome in question. If an outcome is rare, controls are often selected to estimate the frequency of the exposure in the population. Control group: As used in the expressions case-control study and randomized controlled trial, describes a comparison group that differs in disease experience or allocation to treatment, respectively. Controls, matched: Controls who are selected so that they are similar to the exposed group, or cases, in terms of specific characteristics. Some commonly used matching variables are age group, sex and race. Correlation Coefficient: Also known as Pearson product-moment correlation coefficient and denoted by the Greek letter rho, ρ. Measure of association that indicates the degree to which two variables are linearly related. The correlation coefficient ranges from −1 to 1. A value of 1 implies that a linear equation describes the relationship between X and Y perfectly, with all data points lying on a line for which Y increases as X increases. A value of −1 implies that all data points lie on a line for which Y decreases as X increases. A value of 0 implies that there is no linear correlation between the variables. Formula: correlation coefficient of X and Y = covariance (X, Y) / (SDX*SDY) Related topic: coefficient of determination Covariate: a variable that is possibly associated with the outcome. A covariate may be the exposure of interest, a confounding variable or an interaction variable. Covariance: A measure of how two variables change together Related topic: variance Cox model: redirect to proportional hazards model Crossover design: longitudinal study in which each patient is randomly assigned to a sequence of treatments, usually including the treatment of interest and a placebo. In crossover studies, the influence of confounding variables is reduced because each patient serves as his or her own control. Cross-sectional study: Study that examines the relationship between an outcome and another variable of interest as they exist in a defined population at one particular moment in time. An important disadvantage of cross-sectional studies is, in most scenarios, the inability to discern a temporal relationship between the exposure and the outcome. Cumulative Incidence: The number or proportion of a group of people who experience the onset of a healthrelated event during a specified time interval. D Death rate: redirect to mortality rate Degrees of freedom: The number of independent comparisons that can be made between the members of a sample. Example: If comparing two groups with N1 and N2 individuals, respectively, the degrees of freedom are equal to N1 + N2 – 2. Dependent variable: A variable the value of which is dependent on the effect of other variables – independent variables – in the relationship under study. Often this is the outcome or condition of interest. Descriptive Study: A study concerned with and designed only to describe the existing distribution of variables, without regard to causal or other hypotheses. Design: redirect to study design Design Bias: The difference between a true value and that actually obtained, occurring as result of faulty design of a study. Detection Bias: Systematic error due to methods of ascertainment, diagnosis, or verification of cases. Differential misclassification: Error in measurement of study data that results from systematic errors that occur preferentially within a subset of a study population. Differential misclassification can either exaggerate or underestimate an effect. Differential misclassification of the exposure: The amount error in measurement of the exposure differs among subjects with or without the outcome. Case-control studies are highly susceptible to this form of bias. Can result in observing a relative risk that is closer to or further from 1.0, depending on the particular situation. Example: In a case-control study of smoking and lung cancer, participants with lung cancer are more likely to carefully report their smoking habits, reducing the measurement error in the lung cancer group, but not the non-diseased group. Differential misclassification of the outcome: The error in measurement of the outcome differs between subjects who are exposed or unexposed. Can result in observing a relative risk that is closer to or further from 1.0, depending on the particular situation. Example: In a cohort study of coffee consumption and MI, say 20% of coffee drinkers are incorrectly designated as having had an MI, whereas only 5% of non-coffee drinkers are incorrectly designated as having had an MI, differential misclassification of the outcome has occurred. Distribution: the summary of the frequencies of the values or categories Example: In the United States, in 2009, the gender distribution was: 155 million men and 159 million women. Dose-response relationship: Also known as biological gradient. A relationship in which a change in amount, intensity, or duration of exposure is associated with a directional change in risk of a specified outcome. Double-blind: A study in which both the observers and subjects are kept ignorant of the group to which the subjects are assigned. This mitigates the placebo effect among subjects and protects against conscious or unconscious prejudice for or against the treatment on the part of the observers. Dropouts: Study participants who are lost to follow-up in a longitudinal study Dummy variable: Variable that takes the value of 0 or 1 to indicate the absence or presence of a dichotomous category. E Ecologic Fallacy: The bias that may occur when a population-level association is erroneously taken to imply a similar individual-level association. Ecologic Study: A study in which the units of analysis are populations or groups of people, rather than individuals. Effect Measure: Quantity that measures the magnitude of the association of a factor with the frequency or risk of a health outcome. Examples: relative risk, odds ratio, attributable risk, etc. Effect Modification: The concept that the size of an effect or association differs according to another factor, the effect modifier. Effect Modifier: A factor according to which the size of an effect or association between two other factors differs. Effect modification is examining the selected effect measure for the association under study across levels (strata) of the potential effect modifier. Effectiveness: The capacity for beneficial change (or therapeutic effect) of a given intervention under real-life circumstances. A study of effectiveness asks the question: “Does an intervention work when given as it would be in the real world?” Related topic: efficacy Efficacy: The capacity for beneficial change (or therapeutic effect) of a given intervention, under ideal circumstances. A study of efficacy asks the question: “Can an intervention work when given under the most optimal circumstances?” Related topic: effectiveness Eligibility Criteria: Criteria that must be satisfied in order to be selected for participation in a study or inclusion in a cohort. Example: In a cohort study of incident kidney transplantation, eligibility criteria could include age over 18 years, no previous kidney transplantation and the ability to provide informed consent. Epidemic: The occurrence in a community or region of cases of an illness or health-related behavior clearly in excess of normal expectancy. Epidemiology: The study of the distribution and determinants of health-related states or events in populations, and the application of this study to control of health problems. Error, Type I: The error of rejecting a true null hypothesis. Error, Type II: The error of failing to reject a false null hypothesis. Excess risk: redirect to risk difference Experimental study: A study in which the allocation or assignment of individuals in under the control of an investigator, in contrast to an observational study, and thus can be randomized. Explanatory variable: redirect to independent variable. Exposed: Often used to connote a person or group whose members have been exposed to a supposed cause of a disease or health state of interest, or possess a characteristic that is a determinant of the health outcome of interest. Exposure: characteristic that is a plausible determinant of an outcome of interest. External validity: The extent to which a study’s findings apply to the population at large. F False Negative: 1) Negative test result in a subject who truly has the characteristic for which the test is conducted. 2) The labeling of a diseased person as non-diseased when screening for the disease. False Positive: 1) Positive test result in a subject who truly does not have the characteristic for which the test is being conducted. 2) The labeling of a healthy person as diseased when screening for the disease. Follow-up: Observation over time of an individual, group, or initially defined population in order to observe changes in health or outcome status. Follow-up study: redirect to Cohort Study Frequency: Number of occurrences of an event or characteristic G Generalizability: redirect to external validity Genetic epidemiology: The study of the role of genetic factors in determining health and disease in families and in populations, and the interplay of such genetic factors with environmental factors. Geometric mean: The geometric mean of n numbers is the nth root of their product. "Gold Standard": A method, procedure, or measurement that is widely accepted as being the best available. Often used to compare against new methods. H Harmonic mean: the reciprocal of the arithmetic mean of the reciprocals Hawthorne effect: a situation in which subjects are modifying their behavior simply because they are part of a research study, not because of any intervention or exposure. Hazard function: function describing how the risk of an outcome changes over time at the baseline levels of covariates Hazard ratio: measure of risk obtained from the Cox proportional hazards regression model, which is the ratio of the average of slopes of two survival functions. It represents the ratio of the instantaneous risk of mortality. In terms of interpretation, the hazard ratio is similar to the relative risk. Healthy worker effect: A type of sampling bias which occurs when a study sample is recruited from the workforce. Because an individual must be relatively healthy in order to be employable in a workforce, both morbidity and mortality rates within the workforce are usually lower than in the general population. As a result, the real excesses in both morbidity and mortality due to harmful exposures might be wholly or partially masked. Hill’s Criteria of Causation: A list of factors pertaining to an epidemiologic study that add credence to (but do not prove) an inference of causation. 1. 2. 3. 4. 5. Evidence arising from randomized studies Strength of association Temporal relationship between exposure and outcome Exposure-varying association, also known as dose-response Biological plausibility Histogram: graphical representation of the distribution of a particular variable, with the x-axis being the value of the variable and the y-axis being the frequency or proportion of participants with that value of the variable. Hypothesis, null: The hypothesis we wish to falsify on the basis of the data. The null hypothesis is typically that something is not present, that there is no effect, or that there is no difference between treatment and control, in the population. Hypothesis testing: Using a statistical test to make a decision between rejecting or not rejecting a null hypothesis, on the basis of a sample of observations from the population. I Incidence: the number of new cases of disease that develop over time. Can be expressed as incidence proportion or incidence rate. Related topics: incidence proportion, incidence rate. Learning point: Incidence vs. Prevalence Incidence proportion: the number of new cases within a specified time period divided by the size of the population initially at risk Example: For example, if a population initially contains 1,000 non-diseased persons and 42 develop a condition over two years of observation, the incidence proportion is 42 cases per 1,000 persons, i.e. 4.2%. Incidence rate: the probability of developing a disease within a specified time period. It is calculated by dividing the number of new cases by the product of the total number of susceptible people at the beginning of the study period and the time of observation. Example: For example, if a population initially contains 1,000 non-diseased persons and 42 develop a condition over two years of observation, the incidence rate is 21 cases per 1,000 person-years. Incidence rate ratio: The incidence rate in the exposed group divided by the incidence rate in the unexposed group. Independent variable: The variable hypothesized to explain the dependent variable, often referring to the exposure variable. Inference: The process of passing from observations and data to generalization to the population at large. Information Bias: Systematic errors due to observer or interviewer errors (for example because of lack of blinding), response errors (for example, because of lack of blinding) or measurement error. Informative censoring: Occurs when the probability of being lost to follow up is different based on the probability of failure. Informed consent: Procedure to ensure that a study participant knows and understands all of the risks involved in participation in the study. The elements of informed consents include informing the participant of the nature of the treatment or intervention, possible alternative treatments, and the potential risks and benefits of the treatment or intervention. Intent to treat analysis: Analysis based on the initial treatment intent, not on the treatment eventually administered. Intent to treat analysis preserves the original random allocation to treatment and avoid the effects of crossover, noncompliance or drop out. Intention to treat analysis: redirect to intent to treat Inter-quartile range (IQR): The inter-quartile range of a set of continuous values is the upper quartile minus the lower quartile. Interaction: The interdependent association of two or more variables with an outcome Related topic: effect modification Internal validity: The extent to which study conclusions represent the truth for the individuals studied because the results were not likely due to the effects of chance, bias, or confounding, and because the study design, execution, and analysis were correct. Refers to the absence of systematic error that causes the study findings (parameter estimates) to differ from the true values as defined in the study objectives. Related topic: external validity Intervention: An intentional change in the exposure status of study subjects Related topic: randomized controlled trials Interviewer bias: Systematic error due to interviewer’s subconscious or conscious gathering of selective data. Intraclass correlation: A measure of the extent to which members of a group resemble each other more than they resemble members of other groups. K Kaplan-Meier Estimator: estimator of the survival function from life-time data Kappa coefficient: A measure of the degree of nonrandom agreement between observers or measurements of the same categorical variable Calculator: κ = (P0 – Pe) / (1 – Pe) where P0 is the proportion of times the measurements agree, and Pe is the proportion of times they can be expected to agree by chance alone. Kurtosis: The extent to which the distribution of a variable is peaked. L Lead time: the time by which diagnosis of disease can be advanced by screening, as compared to the time at which diagnosis would be made by other means (e.g. clinical presentation of overt signs of disease). Lead time bias: Systematic error that occurs when survival is counted from the point in time when early diagnosis was made. Even is screening is not effective, the early diagnosis adds lead time to the survival counted from the time of usual diagnosis, resulting in an apparent longer survival time for screenees compared to non-screenees. Least Squares: An estimation principle in which the estimates of a set of parameters in a regression model are those quantities that minimize the sum of the squared differences between the observed values of the dependent variable and the values of the dependent variable predicted by the model. Length bias: Also known as length-biased sampling. Systematic error due to selection of disproportionate numbers of long-duration cases (cases who survive the longest). This can occur when prevalent and not incident cases are included in a case-control study, particularly when the exposure is also associated with the length of the disease course. Likelihood Function: A function constructed from a regression model and a set of observed data, which gives the probability of the observed data. The regression coefficients that maximize the probability are the maximum likelihood estimates of the regression coefficients. Linear Model: A statistical model in which the value of a parameter for a given value of factors, x1- xn, is assumed to be equal to a + b1x1+…+ bnxn, where a and b1-bn are constants. Linear Regression: Regression analysis of data using linear models, usually with a continuous outcome variable. Logistic Model: A statistical model for an individual’s risk (odds of a dichotomous disease state Y) as a function of risk factors. P ( Y| x1, …, xn) = 1/(1+e-α-β1x1-…- βnxn) Logistic regression: Statistical modeling approach used to describe the relationship of several variables to a dichotomous dependent variable, such as a binary disease state. Logit: The natural logarithm of the odds of a binary outcome. Logit Model: A linear model for the logit of disease as a function of a quantitative factor X: Logit (disease given X = x) = a + bx This model is mathematically equivalent to the logistic model. Longitudinal analysis: A study in which measurements on study participants are made repeatedly over time. The primary goal of a longitudinal study is to characterize change in the dependent variable over time and the factors that influence change. Loss to follow-up: The circumstance that occurs when researchers lose contact with some participants and thus cannot complete planned data collection. This is a common cause of missing data, especially in cohort studies. M Mantel-Haenszel Test: A summary chi-square test for stratified data and used when controlling for confounding. Matching: The process of making a study group and a comparison group comparable with respect to confounding factors. The goal of matching is to create a balanced distribution of cases and controls across strata of a confounding factor. This enhances the ability to control for confounding and benefits statistical efficiency. Maximum Likelihood Estimate: The value for an unknown parameter that maximizes the probability of obtaining exactly the data that were observed. Measure of Association: A quantity that expresses the strength of association between variables. Commonly used measures of association are differences between means, proportions or rates, the relative risk, the odds ratio, and correlation and regression coefficients. Measurement Bias: Systematic error arising from inaccurate measurement (or classification) of subjects on the study variables. Related topics: differential and non-differential misclassification Measurement error: Difference between the observed value of a quantity and its true value Meta-Analysis: The process of using statistical methods to combine the results of different studies. A frequent application has been the pooling of results from a number of small randomized controlled trials, none alone large enough to demonstrate statistically significant differences, but in aggregate, capable of so doing. Misclassification: The erroneous classification of an individual into a category other than that to which it should be assigned. The probability of misclassification may be the same in all study groups (nondifferential misclassification) or may vary between groups (differential misclassification). Mortality rate: Number of deaths in a population over a given time interval, typically expressed in deaths per 1000 persons per year. Multiple comparison problem: A problem that arises from the fact that the greater the number of statistical tests conducted on a data set, the greater the probability that the tests will falsely reject the null hypothesis, simply by chance. Because each individual hypothesis test has a type I error rate of 5% under the statistical significance threshold of 0.05, every 20 hypothesis tests would be expected to yield one significant test due to chance alone, even if none of the evaluated risk factors are truly associated with the outcome in the population. Type I errors can arise when performing multiple hypothesis tests on the same data, for example a study exploring a list of potential hypertension risk factors. Example: GWAS Related topic: see Bonferroni correction Multivariable regression: Regression analysis with more than one independent variable. Multivariate regression: Regression analysis with more than one dependent variable. N Natural History of Disease: The course of a disease from onset to resolution. Many diseases have certain welldefined stages such as the presymptomatic stage and several clinically manifest stages that, taken together, are referred to as the natural history of the disease. Necessary cause: A causal factor whose presence is required for the occurrence of the effect. Nested case - control Study: A case control study in which cases and controls are drawn from the population in a cohort study. Nomogram: A two-dimensional line chart designed for reading off the result of a two-variable function, where two scales represent known values and one scale is where the result is read off from. Nondifferential misclassification: A situation in which the occurrence of measurement error is random, i.e. it is not related to any other factor. Nonparticipants: Members of a study sample or population who do not take part in the study for whatever reason (e.g. refusal to participate, people without telephones who cannot be reached via random digit dialing sampling). Differences between participants and nonparticipants are often a source of bias and can limit the generalizability of study findings. Normal distribution: Distribution of a continuous variable in a bell-shape, that is symmetrical around its mean and in which the mean, mode and median are identical and its shape is completely determined by the mean and standard deviation. Null Hypothesis: The statistical hypothesis that one variable has no association with another variable, in the population, or that two or more population distributions do not differ from one another. In simple terms, the null hypothesis states that the results observed in a study are no different from what might have occurred as a result of chance alone. Number needed to treat: the number of patients who must be exposed, for a given amount of time, in order to prevent the occurrence of one case of the outcome. The number needed to treat is the reciprocal of the absolute risk. O Observational Study: Study in which nature is allowed to take its course; changes or differences in one characteristic are studied in relation to changes or differences in other(s), without the intervention of the investigator. Observer Bias: Systematic difference between a true value and that actually observed, due to observer variation. Odds: The ratio of the probability of occurrence of an event to that of nonoccurrence, or the ratio of the probability that something is so, to the probability that it is not. Formula: p/1-p Odds Ratio: In a case-control study, the odds ratio is the ratio of the odds of exposure in the cases to the odds of exposure among non-cases. In a cohort study, the odds ratio is the ratio of the odds of disease among the exposed to the odds of disease among the unexposed. If the disease is rare, the odds ratio approximates the relative risk. One-tailed test: A statistical test based on the assumption that the data have only one possible direction of variability. This type of test is rarely used in epidemiologic studies. Outcome: Condition that may stem from an exposure. Over-adjustment: A situation in which unnecessary variables are included in a regression model. Adjusting for variables in the pathway between exposure and outcome may obscure evidence of a true causal relationship. Additionally, adjusting for variables that are not true confounders may reduce precision in the estimation of measures of association. Overmatching: A situation in which extraneous variables are matched on. Matching on variables in the pathway between exposure and outcome may obscure evidence of a true causal relationship. Additionally, matching on variables that are not true confounders may reduce precision in the estimation of measures of association. P P (Probability) Value: Given a null hypothesis regarding the population, the p-value is the probability of observing the sample result, or a more extreme result, that is due to chance (sampling variation). In most epidemiologic work, a study result whose probability value is less than 5% (p<0.05) or 1% (p<0.01) is considered sufficiently unlikely to have occurred by chance to justify the designation "statistically significant." Since the size of the p-value that is generated in statistical hypothesis testing is heavily dependent on the size of the study population: the larger the number of subjects, the smaller the p-value, we ought not to use the p-value for any purpose other than evaluating the role of chance. The p-value is not a measure of excess disease risk. Participant: Person upon whom research is conducted. Peer-review: Process of review of research proposals, manuscripts and abstracts submitted for presentation at scientific meetings, whereby these are judged for merit by other scientists in the same field. Percentile: Divisions that produce 100 equal parts in a distribution of continuous values. Person-Time: A measurement combining persons and time, used as the denominator in incidence rates and mortality rates. Placebo: An intervention with no pharmacological effect, intended to give participants the perception that they are receiving treatment. Placebo effect: A situation in which the beneficial effects of a placebo are due to the expectation that the intervention will have an effect, and not to any pharmacological effects of the intervention itself. Point prevalence: the proportion of a population that has a given characteristic at a single point in time. Population: The collection of observations from which a sample may be drawn Population Attributable Risk (PAR): Incidence of a disease in a population that is associated with exposure, given that the exposure and the disease are causally related. The PAR is useful in determining if resources should be allocated to controlling the exposure, or, instead, to exposures causing greater health problems in the population. Formula: Total incidence in all persons – incidence in non exposed Population Attributable Risk % (PAR %): The portion of disease in the population that is caused by the exposure, given that the exposure and the disease are causally related. The PAR% is useful to determine if resources should be allocated to control of the exposure in question or to other exposures that cause a greater proportion of the disease in the population? Formula: 1. (Total incidence in all persons – incidence in non exposed)/ Total incidence in all persons *100% 2. ((RR -1)/RR) * proportion of cases exposed * 100% Population-based study: Study in which the subjects are drawn from a defined population in a manner that is representative of the source population. Positive Predictive Value: The probability that a person with a positive test is a true positive (truly has the disease). Power: The probability that a particular study will not make a type II error. In other words, power represents the ability of a statistical test to detect some specified difference or effect. Factors that affect study power: 1) the statistical significance criterion (p-value cutoff) used in the statistical test 2) the magnitude of the association of interest in the population 3) the sample size used to detect the association Precautionary principle: When a research activity raises the threat of harm to human health, precautionary safety measures should be taken to protect the study participants, even if the risks are not fully understood. Precision: A measure of the amount of random error surrounding an estimate. Confidence intervals are computed to demonstrate the precision of relative risk estimates. The narrower the confidence interval, the more precise the relative risk estimate. Prevalence: see point prevalence. Primary Prevention: Actions aimed at reducing the incidence of disease Probability: A measure of the frequency of an outcome or an exposure in a population Proportion: A part, share, or number considered in relation to a whole. Proportional Hazards Model: A statistical model used in survival analysis that yields a hazard ratio, which is very similar to a relative risk. Proportional Mortality Ratio: the proportion of deaths from a specific condition in a defined population, divided by the proportion of deaths expected from this condition in a standard population, expressed either on an age-specific basis or after age adjustment. Prospective study: Cohort study in which new data is collected. Example: Investigators recruit participants who do not have a prior history of stroke, collect blood for the measurement of a panel of novel serologic stroke markers, and then follow subjects for the development of incident stroke. Publication bias: Tendency of editors to publish articles containing positive results, in contrast to reports that do not present “statistically significant” findings. Can be an important source of bias in meta-analyses. Q Quartile: 1) Each of four equal groups into which a population can be divided according to a particular variable. 2) Each of the three values of the random variable that divide a population into four such groups. Quintile: 1) Each of five equal groups into which a population can be divided according to a particular variable. 2) Each of the four values of the random variable that divide a population into five such groups. R Randomization: Allocation of individuals to groups (e.g., to experimental vs. control regimens) by chance. Within the limits of chance variation, randomization should make the control and experimental groups similar at the beginning of a study. Randomized Controlled Trial: A study in which subjects are randomly allocated into groups, to receive or not receive an experimental intervention. Learning point: strengths and limitations of RCTs Random Sample: A sample that is arrived at by selecting people from a population such that each person has the same probability of selection. Rate: The frequency with which an event occurs in a defined population, over a given period of time. Rate Difference: The absolute difference between two rates. Rate Ratio: The ratio of two rates, often the ratio of the rate in the exposed group to the rate in the unexposed group. Recall Bias: Systematic error due to differences in accuracy or completeness of memory of prior events or experiences. Recall bias is often a concern in case-control studies. Regression: Technique used to find the best model to describe the association between two or more variables. Multiple regression is widely used to adjust for confounding. Regression coefficient: A measure of how much the dependent variable will change, on average, with each unit change in the independent variables. Relative Risk: A measure of association calculated as the ratio of the incidence rate of the outcome among the exposed to the incidence rate among the unexposed. An RR = 1 indicates that the incidence in the exposed is the same as that in the unexposed group and the interpretation is that there is no association between exposure and disease. RR > 1 denotes a larger incidence in the exposed than in the unexposed group, interpreted as the exposure is associated with an increased probability of developing the disease. RR < 1 denotes a smaller incidence in the exposed than in the unexposed group, interpreted as the exposure is associated with a decreased probability of developing the disease. Reliability: Also known as test-retest reliability. The degree to which a measurement can be replicated and produce the same results when it is repeated under identical conditions. Reliability is often measured using Cronbach’s alpha Repeatability: see reliability Replication: Conducting an experiment more than once to confirm the findings and increase precision. Reporting Bias: see recall bias Representative sample: A sample that resembles the underlying population from which it is drawn, in terms of certain characteristics. Residual: The difference between the observed values of an outcome and those predicted by the regression equation. Residual confounding: Confounding that still remains after applying techniques to control for confounding, for example, stemming from unmeasured covariates or those measured with error. Response Bias: Systematic error due to difference in characteristics between those who choose or volunteer to participate in a study and those who do not. Response rate: The number of people who completed a survey divided by the number of people who were eligible and invited to complete the survey. Retrospective Study: Cohort study in which data collected prior to the study launch is used. Risk: The probability that an event will occur within a given period of time. Risk difference: Measure of association calculated as the difference of the incidence rate of the outcome among the exposed to the incidence rate among the unexposed. An RD = 0 indicates that the incidence in the exposed is the same as that in the unexposed group and the interpretation is that there is no association between exposure and disease. RR > 0 denotes a larger incidence in the exposed than in the unexposed group, interpreted as the exposure is associated with an increased probability of developing the disease. RR < 0 denotes a smaller incidence in the exposed than in the unexposed group, interpreted as the exposure is associated with a decreased probability of developing the disease. Risk Factor: A characteristic, either inborn or inherited or of personal behavior or lifestyle, or an environmental exposure that is known to be associated with health-related outcomes. Risk Ratio: The ratio of two risks. S Sample: A selected subset of a population. A sample may be random or nonrandom and may be representative or nonrepresentative. Sampling: The process of selecting a subset of subjects from all the subjects in a particular group. Sampling Bias: Systematic error that occurs when certain members of the underlying population have a higher chance of being sampled. Sampling error: Uncertainty in study findings that occurs by conducting analyses on a sample instead of on the entire population. Sampling variability: Random error in the estimate of population-level parameters that occurs because only a sample of the population is observed. Sample size: Number of individuals (or groups) included in a sample. Scatterplot: A graphic plot of data points for two variables. Screening: Testing for the presence of a disease or other condition Secondary Prevention: Actions aimed at shortening the duration of a disease Selection Bias: Error due to systematic differences in characteristics between those who are selected for study and those who are not. Selection bias threatens the generalizability of conclusions from studies that include only volunteers from a healthy population. Sensitivity analysis: A method to determine the robustness of the results by examining the extent to which they are affected by changes in assumptions or values of variables. Sensitivity of a test: The proportion of truly diseased people who are identified as diseased by the test. Formula: True positives / (True positives + False negatives) Skew: A term used to describe an asymmetrical frequency distribution. Source population: Individuals who are eligible and can be approached for participation in a study. Specificity of a test: The proportion of truly non-diseased people who are identified as non-diseased by the test. Formula: True negatives / (True negatives + False positives) Standard deviation: A measure of how dispersed a frequency distribution is around its mean. The larger the standard deviation, the wider the spread in the distribution of the variable. Standard error of the mean: The standard deviation of a sample mean estimate of a population mean, or the standard deviation of the error in the sample mean relative to the true mean. Standardization: A procedure used to remove the effects of differences in composition (such as age distribution or gender ratio) when comparing rates for different populations, by using a weighted average in one population using the weights from the composition of a second “standard” population. Standardized mortality ratio: The ratio of the number of deaths in a population to the number of deaths that would be expected if the population had the same mortality rate as a standard population. Statistical model: A mathematical representation of the relationship between variables under study. Statistical significance: The likelihood of observing an association at least as large as the one seen, if in truth no association is present, in the population. Related topics: p-value Statistical test: Procedure used to decide whether a study hypothesis should be rejected or not. Stratification: Process of separating data into several categories, or strata (e.g. age groups, gender, etc.), in order to examine associations between an exposure and an outcome within each category, where confounding by the stratifying variable cannot take place. Stratified randomization: Procedure of randomization in which strata or categories are identified and subjects are randomly allocated within each stratum. Sufficient cause: A minimum set of conditions needed to produce a given outcome. Sum of squared residuals: Also known as the residual sum of squares, is a measure of discrepancy between the observed data and that predicted from the model. The smaller the residual sum of squares, the better (more tightly) the model is fit to the data. Surveillance: Continuous monitoring of disease occurrence within a group. Survey: A study in which information is systematically gathered but in which no experiment is conducted. Survival analysis: Statistical procedure for evaluating the association between exposures and the probability of survival, over time. Survival curve: A curve that starts at 100% of the study population and shows the percentage of the population still surviving at successive times for as long as information is available. Survival function: A function of time that starts with a population 100% alive (or non-diseased) at a particular time and provides the percentage of the population still alive at another time. Systematic error: see bias Systematic review: Literature review focused on a single question that tries to identify, appraise, select and synthesize all high quality research evidence relevant to that question. T Target population: The group of individuals about which a study aims to make inferences. The term is sometimes used to indicate the population from which a sample is drawn and sometimes to denote any "reference" population about which inferences are required. t-test: Statistical test used to compare mean values between two different groups Tertiary prevention: Actions aimed at reducing the extent or number of complications of a disease. Test of significance: redirect to p-value Time series: A study design in which there is only one exposure group in whom measurements are made a several different times, allowing trends to be detected. Time-to-event analysis: redirect to survival analysis Triple blind study: A study in which the subjects, investigators and analysts are blinded as to which subjects received what treatment. Type I error: Occurs when a hypothesis test declares a result to be statistically significant even though the null hypothesis is true (there is no true effect or association in the population). Type II error: Occurs when a hypothesis test declares a result to be statistically insignificant even though there is a true difference in the population. Type III error: Occurs when study design produces the “right answer to the wrong question,” i.e. an error in selection of the method of studying a problem. V Validity, Study: The degree to which the inference drawn from a study is warranted. Learning point: Internal validity and external validity. Venn diagram: A diagram representing sets of data pictorially as circles, with common elements of the sets being represented by intersections of the circles. Verification bias: see work-up bias. Vital Statistics: Quantitative data concerning a population, such as the number of births, marriages, and deaths. W Washout period: In a crossover study design, the period of time between patients receiving interventions; it is used to ensure that patients are free of the influence of one intervention before they begin receiving another. Weighted average: A method of calculating the mean of a set of numbers in which some elements of the set carry more importance (or weight) than others. Work-up bias: Systematic error that occurs when the sample used to assess a measurement tool (e.g., diagnostic test) is restricted only to who have the condition or factor being measured and the sensitivity of the measure can be overestimated. Z Z score: A standard score that indicates how many standard deviations an observation is above or below the mean.