Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Continuous Data – Symmetrically Distributed Summarize with: Mean = arithmetic average SD = standard deviation =~ sq root (avg squared deviations from mean) used for variability among individuals, reference “normal” ranges SEM = std error of mean = SD(mean) = SD/√N used for precision of mean applied to “all”, but based on N subjects Mean, SD, SEM all have the same units, e.g., nmol/ml. If normally distributed (bell curve), then: 95% reference (normal) range =~ mean ± 2SD 95% confidence interval (CI) =~ mean ± 2SEM 95% CI → 95% sure that mean for “all” is in the interval → 95% sure that mean from sample of N is within 2SEM of the mean for “all” that the N represent. As Outcome Compared Among Groups: - Use t-test if 2 groups Use ANOVA if >2 groups Compares absolute, not %, differences For Association with Another Continuous Measure in one Group: - Use Pearson correlation As Outcome to be Predicted from Other Measures: - Use (ordinary) regression. 1 Continuous Data – Not Symmetrically Distributed Summarize with: Median = middle ranked value 25th – 75th percentiles Geometric Mean = antilog of mean of log(value). Used for common skewness, often lab data Normal ranges, CIs found on log scale, and antilogged, so are not symmetric about the geometric mean. As Outcome Compared Among Groups: Based on ranks: - Wilcoxon test = Mann-Whitney test if 2 groups - Kruskal-Wallis test if >2 groups or Use transformed data, e.g., by log, that is ~ normally distributed - t-test, ANOVA on log(value). - Compares % difference or ratio if log-transformed. For Association with Another Continuous Measure in one Group: - Use Spearman correlation As Outcome to be Predicted from Other Measures: - Use (ordinary) regression on log(values) or other transform. Generalized regression (GEE) may be used. 2 Proportions, Percentages, Rates Percentage = 100 x proportion. Rates involve time; % and proportions do not. Prevalence = 100 x (# of cases) / (# of persons), at one time. Incidence rate = # of new cases / person-years of observation Cumulative incidence = 100 x (# of new cases in a fixed time) / (# of persons) RR = risk ratio = relative risk = Prevalence(Group A) / Prevalence(Group B) e.g., RR = 1.6 → 60% greater risk in A, compared to B OR = odds ratio = odds(Group A) / odds(Group B) = [Prev/(100-Prev) for Group A] / [Prev/(100-Prev) for Group B] n.b, OR ~ RR if prevalence is low. Precision of a Proportion p: 95% CI ~ p ± 2SE(p), where SE(p) = square root[p(1-p)/N]. 2SE(p) is always ≤ 1/sqrt(N) → rule of thumb: if N=100, est’d % is within about 10% pts of % for “all” Compare Groups on Proportions or %s: - Chi-square test for large studies, not low prevalence - Fisher’s exact test for any size, any prevalence. As Outcome to be Predicted from Other Measures: - Use logistic regression. 3 Diagnostic Tests The symbol | means “given” or “among those with”. P [ ] means “probability of”. Sensitivity = P [T+ | D+] = Probability test is +, given Disease. Specificity = P [T- | D-] = Probability test is -, given no Disease. Do not depend on prevalence. More generalizable, less clinically useful. Predictive Values: Positive (PPV) = P [D+ | T+] Negative (NPV) = P [D- | T- ] Do depend on prevalence. Less generalizable, more clinically useful. Likelihood Ratios: Positive (LR+) = Odds of D+ | T+ Odds of D+ Negative (LR+) = Odds of D+ | TOdds of D+ Expresses increased knowledge given by the test. Do not depend on prevalence. More generalizable, more clinically useful. Easiest way to convert between statistics is to fill in cells of table: Disease + - + Test Result - 4 Hypothesis Testing Concepts are similar to diagnostic testing: - Need to claim either (1) association (effect), or (2) no association (no effect) from limited information in a study. - Truth – real effect or not – corresponds to true, unknown disease status in a diagnostic test. - Statistical procedure corresponds to the diagnostic test. - α = P[type I error] = P[claim effect | no real effect] = 1 – specificity. - β = P[type II error] = P[claim no effect | real effect] = 1 – sensitivity. - Power = 1 – β = sensitivity. As α ↓, β ↑ for fixed study size N. Increasing N decreases α and/or β. P-value: Probability of as extreme a difference as observed in the study, if really there is no difference in “all” to whom the study applies. Usually, α = 0.05, and p< α → “significant” result, i.e., beyond “reasonable” (5%) doubt. N is chosen large enough to maintain α = 0.05, but also to ↓ β, i.e., to increase power to desired level, usually 80%. 5 Other Terms Precision vs. Accuracy: Precision: refers to reproducibility, lack of variability. Accuracy: refers to lack of bias. Accurate Imprecise Inaccurate Precise Case-control vs. Cohort Studies: The distinction is primarily the comparability of the control group, not whether prospective or retrospective: Cohort Study: A single group from a common source is selected that includes both diseased and non-diseased. Case-control Study: A group of diseased subjects is selected and then a set of non-diseased controls is chosen as a comparison group to compare a risk factor. 6