Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Inferential Statistics Hypothesis Testing We are going to reject a hypothesis if what we actually observe to occur would be very unlikely (have low probability) were the hypothesis true • “what we actually observe to occur” must not be too finely detailed [we must throw away some information] • We will pay attention to the values of sample statistics, not to the full information in a sample. • Nor will we want to say that a single specific observed value of a sample statistic is “very unlikely” - - because of course it would be. We are going to reject a hypothesis if the value of a relevant sample statistic in an observed sample is in a region that is surprising - unlikely to occur were the hypothesis true. Example: Let p be the proportion of all SU undergrads who have played Angry Birds. Chancellor Cantor hypothesizes that p = .9 where the only plausible alternative is that p > .9 The obvious thing to do is collect a sample (random!) Of say, 100 SU undergrads and look at the sample fraction f of students in the sample who have played Angry Birds. If f = .9, that would be strong support for the hypothesis that p = .9 BUT, the odds are small that EXACTLY 90 of those students played Angry Birds. Variations in sampling will give different numbers out of 100 even if the hypothesis is true. Would f = .91 lead us to reject the hypothesis that p = .9? How about f = .91? How about f = .99? We are going to reject the hypothesis p = .9 if the value of a relevant sample statistic, f, in an observed sample is in a region that is surprising - unlikely to occur were the hypothesis true. How unlikely? Suppose for the moment we say we want to reject the hypothesis p = .9 if f > c where the event f > c has probability less than or equal to .05. I.e., we take .05 as surprising and we want to find a c for which P(f > c) is at .05 or just below. We need a probability distribution for f to calculate this. A sampling distribution ! For this problem, we are lucky, because f has a known distribution, the binomial distribution: P(f = a) is the same as P(# = 100a) where # is the number of “successes” in the 100 trials. Here, P(f = .90) = .132 P(f $ .95) = .057 P(f $ .96) = .024 So we would agree here that we would accept the hypothesis p = .9 if f is .95 or less and reject it if f is 96 or more.