Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture 4 Outline: Tue, Sept 16 • Chapter 1.4.2, Chapter 1.5, additional material on sampling units and meaningful comparisons – Review of probability models for randomized experiments and random samples, probability models for observational studies – Graphical methods (histograms, stem-and-leaf diagrams, boxplots) – Random assignment and random sampling in JMP – Sampling units – Meaningful comparisons (use of control group and use of rates) Review • Hypothesis Testing Examples: – (i) H_0: There is no causal effect of a treatment on outcome vs. H_1: there is a causal effect; – (ii) H_0: Two populations have the same mean vs. H_1: two populations do not have the same mean • Statistical inference about hypotheses is based on a probability model for how the sample (observed data) was taken. – P-value: Probability of observing as large a value of test statistic if null hypothesis is true, measure of evidence against H_0 (<.05 – moderate evidence against H_0, <.01 – strong evidence against H_0) P-value for Randomized Experiment • Additive Potential Outcomes Model: For each unit, Y=outcome if assigned to group I (control group), Y*=Y+ =outcome if assigned to group II (treatment group). • = causal effect of treatment. • P-value for testing H_0: =0 vs. H_1: 0 – Test statistic: T= | Y2 Y1 | – Calculate T for every possible grouping of the observed outcomes into groups of size n_1 (size of control group) and n_2 (size of treatment group). – The p-value is the proportion of regroupings with T>= observed T_O= | Y Y | 2 1 Probability Models • Probability model for randomized experiment: Random assignment of units to groups • Probability model for random sample: Random sample from each population • The probability model for an observational study or nonrandom sample is unknown. We can assume random assignment or a random sample but any inference is substantially weaker because we do not know the real probability model by which the data was obtained. Relative Frequency Histograms • A histogram is a graph that shows the relative frequency per unit of measurement. • The areas of blocks represent the percentage of observations in the blocks. • The heights of the blocks represent relative frequency per unit of measurement, i.e., crowding – percentage per unit of measurement • Histograms show broad features – particularly the center, spread and shape of the distribution (symmetric or skewed, light tailed or heavy tailed). Histograms in JMP • Click Analyze, then Distribution • Click red triangle next to Distributions, stack to see horizontal layout • Click tools, hand and click on histogram, drag to change position of bars. • To make histograms by group (e.g., sex discrimination), put Salaries in Y and Sex in By box. Stem and leaf diagrams • Cross between graph and table • Gives quick idea of distribution • Shows center, spreads and shapes as does histogram but also shows exact values, easy to construct by hand, median can be computed. • Stem and leaf plots in JMP – Click Analyze, Distribution – Put variable of interest in Y and click OK – Click red triangle next to variable of interest (e.g., salaries) and click Stem and Leaf – Back to back stem and leaf plots are not available in JMP but are useful (see page 17) Box plots • Middle 50% of a group of measurements is represented by a box. – Line in middle of box is the median • Various features of upper and lower 25% by other symbols – The whiskers extend to the farthest point that is within 1.5 interquartile ranges of upper and lower quartiles. (IQR=third quartile – first quartile) – Points farther away are shown individually as outliers. – Width of a box plot is chosen to make the box look nice; it does not represent any aspect of data. Box plots in JMP • To draw one box plot – Click Analyze, Distribution. • To draw side by side box plots – Click Analyze, Fit Y by X, putting outcome in Y and group variable in X – Click red triangle next to One Way Analysis, click Display Options and then click Box Plot. Random Assignment in JMP • To randomly assign units to two groups of size n_1 and n_2 in JMP: – Right click on the top of the random column, click on formula, click on the random function and then click on Random Uniform. – Click on Tables, Sort and then sort by random. – Create a column group. Label the first n_1 units in the table as Group I and the rest of the units as Group II Simple Random Sample • A simple random sample (of size n) is a subset of a population obtained by a procedure giving all sets of n distinct items in the population an equal chance of being chosen. • Need a frame: a numbered list of all subjects. • Simple random sample: Generate random number for each subject. Choose subjects with n smallest numbers. • Simple random sample in JMP: – Click on Tables, Subset, then put the number n in the box “Sampling Rate or Sample Size.” Sampling units • In conducting a random sample, it is important that we are randomly sampling the units of interest. Otherwise we may create a selection bias. • Sampling families – If we want mean number of children per family, we should sample by family and need to make correction if sampling by person – If we want to know mean opinion about building new school in a community, have available a frame of families and plan to sample one person per family, we need to use variable probability sampling, giving a larger probability of being sampled to larger families. The clinician’s illusion • For several diseases such as schizophrenia, alcoholism and opiate addiction, clinicians think that the long-term prognosis is much worse than do researchers. • Part of disagreement may arise from differences in the population they sample – Clinicians: “Prevalence” sample – sample from population currently suffering disease which contains a disproportionate number of people suffering disease for long time – Researchers: “Incidence” sample – sample from population who has ever contracted the disease. Meaningful Comparisons • Main lesson of chapter: The best way to compare two (or more) groups is to do a random experiment or take a random sample. This avoids systematic bias due to confounding variables and selection bias • But if this is not possible, we should generally try to make the groups as “comparable” as possible by adjusting for known confounding variables and selection biases. Often times, important first steps are to use an appropriate control group and to compare the appropriate rate rather than absolute numbers Control Group • In a randomized experiment, we want the treatment and control group to be similar in every way except that one takes the treatment and the other doesn’t, i.e., we use placebo and double blinding. • Similarly in an observational study, we want to compare the treatment group to a control group that is as similar as possible. • Explain the need for a control group by criticizing the statement “A study on the benefits of vitamin C showed that 90% of the people suffering from a cold who take vitamin C get over their cold within a week” Use of Rates • An article in This Week magazine says that if you went “hurtling down the highway at 70 miles an hour, careening from side to side,” you would have four times as good a chance of staying alive if the time were seven in the morning than seven at night. • The evidence: “Four times more fatalities occur on the highways at 7 p.m. than 7 a.m.” • Does the conclusion follow from the evidence? • More accidents occur in clear weather than foggy weather. Is clear weather safer to drive in? Polio Example • Using figure 1 as an example, explain why a contemporaneous control group is needed in experiments where the effectiveness of a drug or vaccine is being tested? • Comment on the use of the number of cases. What would be a more appropriate indicator of whether polio incidence was increasing?