Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
H. James Norton, PhD [email protected] Website: www.jimnortonphd.com Introduction to Biostatistics A young fellow from had committed a grievous crime. He had murdered his In Texas there is no excuse for murdering a horse. If the jury finds you guilty there is only one punishment. Hanging. On the day of his hanging the warden talked to the prisoner and said, “I am one of the few wardens that follows the law of 1889. The law states that I was to allow you your last meal of anything you wanted. And I did that, didn’t I.” The prisoner said, “Yes, steak and eggs.” The warden said “And the law requires that I gather a random sample of 100 Texans.” The warden dragged the prisoner to the auditorium and sure enough there were 100 Texans sitting there. The warden said, “ The law requires that I now give you an hour to speak to these people. You can discuss whatever you want, for instance your guilt or innocence, or the Texas prison system.” The prisoner said, “I have nothing to say.” Whereupon a professor in the audience stood up and said, “ I don’t think the people of Texas know enough about statistics. If you are not going to use the hour, would you mind if I use it to educate these people about statistics?” The prisoner said, “ No, go ahead.” But the prisoner turned to the warden and said, “BUT HANG ME FIRST!” 3 Types of Measurement Scales • Interval Scale – the size of the difference between 2 values on the scale has a meaning. For example, the difference between 1° F and 2° is identical to the difference between 100° and 101°. • What are other examples of interval measurements a physician would collect or make on a patient? • Examples - age, blood pressure, heart rate, weight. • Nominal (categorical) Scale – place the person or thing in a category that is not ordered. • Example – eye color puts people in a category but there is no natural ordering. • What are other examples of nominal measurements that would be collected about a patient? • Examples – gender, race, country of birth. • Ordinal scale – Observations ore ordered but differences may not have a meaning or the difference between scores are not equal. • Example #1 - Cancer Stages. The differences between Stage 0, Stage I, Stage II, Stage III, and Stage IV are not the same. • Example #2 - the scale with choices strongly disagrees, disagrees, neutral, agrees, and strongly agrees. • What is the name of this scale? • The Likert Scale. • Example #3 - What is the name of the scale that is used to evaluate the health of a baby sometimes taken a few minutes after birth? APGAR SCORING SIGN 0 1 2 A COLOR Blue pale Body pink Extremities blue Completely pink P HEART RATE Absent Slow below 100/min Over 100/min G EMEX RESPONSE 1. Response to catheter in nostril tested after oropharynx is clear) 2. Tangential foot slap No response Grimace Cough or sneeze No response Grimace Cry and withdrawal of foot A MUSCLE TONE Limp Some flexion of extremities Well Flexed R RESPIRATORY EFFORT Absent Weak cry, hypoventilation Good strong cry Hierarchy for scientific evidence in medical studies • • • • Case series Retrospective case-control studies Prospective observational studies Randomized clinical trials “Adenocarcinoma of the vagina: Association of ?? with tumor appearance in young women” • Retrospective study of 8 young women from Boston, ages 15-22, with adnocarcinoma of the vagina • Each case matched to 4 control women born in the same hospital during the same week of birth • Ask students if they were epidemiologists, “What questions would you ask the cases and controls?” • Family history of disease, sexual history, drug use, co-morbid conditions, treatment with chemo-therapy or radiation. • What caused the disease? • Mom’s use of diethylstilbestrol (DES) during pregnancy (7/8 cases, 0/32 controls) Doll & Hill’s two famous studies • Case-control study of smoking and lung cancer (1950) • Prospective study of 34,000 British physicians (1954). Detailed questionnaires on smoking habits • Student workbook published by CDC at www.cdc.gov/eis/casestudies/xsmoke.s tudent.731-703.pdf Requirements for Gold Standard Clinical Trial • Randomized • Placebo controlled (if ethical) • Else standard of care • Double Blind • Sufficient Power (adequate sample size) From: Statistics Concepts and Controversies by David Moore Contrast Nurses’ Heath Study (NHS) & Women’s Health Initiative (WHI) • NHS – observational cohort of 127,000 nurses ages 30 to 55 • WHI – randomized clinical trial of 50,000 women ages 50 to 79 • NHS – women taking estrogen after menopause had reduced risk of heart disease. • WHI – stopped study prematurely. Women taking Prempro had increased risk of heart disease. Descriptive Statistics Suppose our sample Xi ‘s are: 2, 7, 1, 11, 2, 5, 2,10 Compute the mode, median, mean, range, variance, standard deviation, and standard error of the mean. Mode – value that occurs most often. Mode = 2 . Range = (minimum value, maximum value). Range = (1,11). Sample Mean = ∑ Xi / N where N = sample size. Mean = (2+7+1+11+2+5+2+10) / 8 = 40/8 = 5. Median is the middle most value of the data set after the data are ordered from low to high. If there are an odd number of data points such as 7, 5, -1, 10, 11 then order -1, 5, 7, 10, 11 and Median = 7. With and even number of values, order the data and then take the average of the two middle points. 1, 2, 2, 2, 2, 5, 7, 10, 11 Take the average of 2, 5. Median = is 3.5 Sample variance = S2 = ∑ (Xi – mean)2 / (N-1) Xi Xi - mean (Xi – mean)2 2 2 – 5 = (-3) (-3)2 = 9 7 7–5=2 22 = 4 1 1 – 5 = (-4) (-4) 2 = 16 11 11 – 5 = 6 62 = 36 2 2 – 5 = (-3) (-3) 2 = 9 5 5–5=0 02 = 0 2 2 – 5 = (-3) (-3) 2 = 9 10 10 – 5 = 5 52 = 25 Total = 108 S2 = 108/7 = 15.43 standard deviation = s = √ s2 s = √ 15.43 = 3.93 SEM = s / √ N SEM = 3.93 / √ 8 = 1.39 Suppose data are 5, 5, 5, 5, 5, 5, 5, 5 What is the mean? mean = (5+5+5+5+5+5+5+5)/8 = 5 What is the variance? 0 When will the variance of a set of number = 0 ? Only if all the numbers are the same. Suppose data are 2, 7, 1, 11, 2, 5, 2,10 and we add an outlier or a 9th number to the data set that is = 1000. What changes the most, the mode, median, or mean? Mode stays the same = 2. 1, 2, 2, 2, 2, 5, 7, 10, 11, 1000 (median now 5) Mean now 1040/9 = 115.56 From: Statistics Concepts and Controversies by David Moore REALITY Conclusion of Test Null Hypothesis True Null Hypothesis False Do not reject Ho NO ERROR TYPE II ERROR Reject Ho TYPE I ERROR NO ERROR Ho is the null hypothesis α = Probability of a Type I error (most often chosen = .05) β = Probability of a Type II error Power = 1 - β Examples of H0 & H1 Comparing 2 groups where outcome is on interval scale: H0: μ1 = μ2 (the means are equal) H1: μ1 ≠ μ2 (the means are ≠ ) Statistical test employed is Student’s t-test. (Data for both groups must be normally distributed.) Outcome variable is systolic blood pressure after 6 months of treatment. Patients randomized to diuretic or new drug. Comparing 2 groups where outcome is dichotomous: H0: P1 = P2 (the proportions are equal) H1: P1 ≠ P2 (the proportions are ≠ ) (or equivalently) H0: The two variables are independent H1: The two variables are not independent Statistical test employed is the chi-square test. Outcome variable is whether or not infection is cured after 2 weeks of treatment. Children are randomized to once-daily or twice-daily antibiotic. Comparing 2 groups where outcome is ordinal scale. H0: The two distributions are identical H1: The two distributions are not identical Statistical test employed is the Wilcoxon rank sum test. Outcome variable is stage of colon cancer upon diagnosis. Do whites and African-Americans present with similar stages of colon cancer at diagnosis? Sampling • Simple random sample – of n elements is a sample selected from a population in such a manner that each combination of n elements has the same chance or probability of being selected as every other combination. Examples – flipping coin, random, lottery drawings, random number tables, computer generated “random numbers”. • Systematic sample - Choosing every nth person in a list of N people. Example – Suppose from a random number table the number 3 is chosen. A sample of 100 people are chosen from a list of 1000 people by taking the 3rd, 13th,23rd,33rd, …persons from the list. Is this a simple random sample? • Stratified sample – Divide the population into groups and then take a random sample from within each of the groups. For instance divide the population into males and females and then randomly assigning them to a treatment or placebo. • Haphazard sample or sample of convenience. Hypothesis Testing H0: Lucy is not a liar H1: Lucy is a liar What’s the p-value? http://vignette1.wikia.nocookie.net/peanuts/ images/a/ab/Pe660925.jpg/revision/latest?c b=20131208184939