Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Feb. 6 Statistic for the day: Number of Florida high school students who take physical education courses online: 1204 Assignment: Continue to review for test on Monday! These slides were created by Tom Hettmansperger and in some cases modified by David Hunter Friday, Feb. 6 Review Exam #1 (100 points) Monday, Feb 9 in class 60 Multiple choice questions Responsible for Anything in lecture (except SFD) Anything in book Chapts 1,4,5,7,8,9 Bring ID! Bring pencils! Bring 1 sheet of notes! 2 Types of studies to obtain data relevant to your research: Randomized Experiment Observational Study Literary Digest Survey Results: 2.4 million responded! 43% were for Roosevelt Literary Digest predicted a landslide victory for Alf Landon Turning Data into Information: The distribution of the data The shape of the distribution Is it skewed or is it symmetric? What is a typical value? Should we use the mean or the median? What is the spread of the distribution? Should we use the standard deviation or the interquartile range? What are the quartiles? Mean vs. Median: Which is more “typical” in this (right-skewed) case? Histogram of CD ownership, Stat 100.2 S04 Frequency 100 50 0 0 Mean = 89 500 Median=50 CDs 1000 Age at Death of English Rulers 60, 50, 47, 53, 48, 33, 71, 43, 65, 34, 56, 59, 49, 81, 67, 68, 49, 16, 86, 67 Turn these data into information. Shape: Stem and Leaf Display 1 6 2 3 34 4 37899 5 0369 6 05778 7 1 8 16 The Median and the Quartiles M Q1 (5) (5) Q3 (5) (5) 16 33 34 43 47 * 48 49 49 50 53**56 59 60 65 67***67 68 71 81 86 The first quartile is the number that divides the data into the first quarter and the last three quarters. The median divides the data into halves. 5 Number Summary Median M = 54.5 First Quartile Q1 = 47.5 Third Quartile Q3 = 67 Lowest = 16 Highest = 86 Anatomy of a Boxplot Age at death of a sample of 20 rulers of England 90 Reasonable range of data (whiskers) 80 70 Q3 age 60 50 M IQR = Q3 - Q1 Q1 40 30 20 Outlier 10 Shape: Histogram Age at death of a sample of 20 rulers of England 5 Frequency 4 3 2 1 0 10 20 30 40 50 age 60 70 80 90 Rough way to approximate the standard deviation: Look at the histogram and estimate the range of the middle 95% of the data. The standard deviation is about ¼ of this range Research Question 1: How high should I build my doorways so that 99% of the people will not have to duck? (Assume normal distribution with mean 68, st. dev. 4) Secondary Question 2: If I built my doors 75 inches (6 feet 3 inches) high, what percent of the people would have to duck? Z-Scores: Measurement in Standard Deviations Given the mean (68), the standard deviation (4), and a value (height say 75) compute Z = (75-mean) / SD = (75-68) / 4 = 1.75 This says that 75 is 1.75 standard deviations above the mean. Morals of the story: Whenever you meet a graph that is very far from square, it is likely to produce an impression different from what you would have obtained from the data themselves. Almost any graph in which the vertical scale does not start at zero is deceptive. BAD Bogus vertical scale. Hard to say what the graph should look like. Portion of income taken by the government. Top: spending equal to the income in western states. Bottom: more densely populated east. A perplexing polling paradox People generally believe the results of polls. People do not believe in the scientific principles on which polls are based According to Gallup, most Americans said that a survey of 1500 to 2000 respondents (a largerthan-average sample size for national polls) CANNOT represent the views of all Americans. How are Gallup Opinion Polls Taken? Telephone interviews: Random digit dialing At random pick Exchange (area code + first three digits; e.g., 814 865) Next two digits eg. 22 Last two digits eg. 11 Up to three callbacks (why callbacks?) Evenings and weekends This catches unlisted numbers Designed to be a random sample from the POPULATION of people with telephones. All members of the population are equally likely to be in the sample. Called a SIMPLE RANDOM SAMPLE. Polls typically take roughly 1500 or 1600 people. Margin of error: 2 standard deviations We generally will NOT have the benefit of a histogram to get the standard deviation or the margin of error of the sample percentage. SECRET FORMULA FOR THE MARGIN OF ERROR OF A SAMPLE PERCENTAGE: 1 ------------------------------Square root of sample size The Morning After Pill Do you think that the ‘morning-after’ contraceptive pill should be available over the counter? Yes No Not sure 59.1% 37.1% 3.8% USA Today call-in poll (http://www.usatoday.com/quick/health/qh1206a.htm) Volunteer response vs. volunteer sample Contraceptive call-in poll? Volunteer sample! 1936 Literary Digest poll? Volunteer response! Which is worse? Volunteer sample! Do you have a tattoo? Yes Men 15% No Men 85% Yes Women 23% No Women 77% Based on: 100 men 136 women Stat100.2 S04 Sampling methods (Simple) random sampling Stratified random sampling Cluster sampling Systematic sampling Bad: Haphazard or convenience sampling (as in tattoo survey) Stratified random sampling Divide population into subgroups, or strata From each stratum, select a random sample Example: Select a random sample from each of four groups of students (in-state non-minority, in-state minority, out-of-state non-minority, out-of-state minority) to ensure adequate representation of each group. Cluster sampling Divide population into subgroups, or clusters Select a random sample of clusters Measure individuals within selected clusters according to some plan Example: To study high schoolers, first take a random sample of schools and then look in depth at all students in selected schools Systematic sampling From a list of individuals in the population, select every kth individual Grizzly example: “Decimation”, a term originally used for a punishment for mutinous Roman legions in which the legion was lined up and every tenth person killed. Comparisons Randomized Experiments Observational Studies EXPLANATORY VARIABLE says which population we sampled from. RESPONSE VARIABLE says what we measured or counted. The key to a good observational study or a good randomized experiment is RANDOMIZATION in both cases. • In observational studies we need a random sample from each population. • In randomized experiments we must randomize the subjects to the different treatments (or treatment and control groups). Randomized Experiment Associated concepts and ideas: •Control group (provides a benchmark) •Blinding: single or double (reduce bias) •Placebo (benchmark, blinding) •Confounding (a lurking third variable) •Pairing or blocking (reduces noise in data) The Hawthorne effect Imagine the following study, intended to determine the prevalence of cheating: Individual students taking an exam in a particular course are filmed and observed closely by a team of extra observers, who then record the number of instances of cheating they observe. Named for Elton Mayo’s famous study (1924-1932) of workers at the Hawthorne, Illinois plant of the Western Electric Company Research question: Do cell phones cause cancer? What sort of a study could be used to answer this? •Observational Study? •Randomized Experiment? If we cannot establish cause and effect, perhaps we can we establish an association between cell phones and cancer using an observational study. Possible Observational Study: Response Variable: whether or not a subject gets cancer. Explanatory Variable: whether or not the subject uses a cell phone. This may require a very long time. A special kind of observational study: SWITCH RESPONSE AND EXPLANATORY VARIABLES Response Variable: whether a subject uses a cell phone or not Explanatory Variable: whether a subject has cancer or not. 1. Select a sample of cancer patients (Cancer Case) 2. Develop a group of people who match the cancer patients but do not have cancer. (Control) 3. Compute the % who use cell phones in each group. Called a retrospective Case-Control Study Research question: How does putting a smiley face on the bill influence a waitperson’s tip? Response variable: Size of tip Explanatory variable: Smiley face or not Interacting variable: Sex of waitperson Female waitress: Drawing a smiley face increased tip Male waiter: Drawing a smiley face decreased tip Source: Journ. Appl. Soc. Psych, 1996