Download Week 8 Feb. 28-29

Inference: Probabilities and Distributions Feb 28 - 29, 2012 A funny little thing called probability • As we noted earlier, when we take a sample and conduct a study we want to generalize (or infer) the results of our study to the wider population our sample is drawn from. • If 40 percent of our sample say they will vote Conservative we would like to estimate that this is the situation among the general population • However, we know there is a chance that if we had drawn two separate samples and done two simultaneous studies, we might have gotten different results for each sample • If the variation of results between samples is too great, then we cannot generalize our results to the wider population. • Therefore we need to know about probability to estimate the chance that our results will vary from the actual situation in the population at large. Because there is always a probability we could be wrong… • We always state our confidence interval and our margin of error • For example, a pollster might tell us there is a 95% chance that our survey result is accurate plus or minus 3% (meaning there is a 95% chance that the real support for the Conservatives among the general population is between 37% and 43%) • 95% is our confidence interval • +/- 3% is our margin of error • More will be said about these terms in later weeks In the Long-Run • The law of probability is based on a regularly documented observation that while chance can produce erratic results over the short-term or when small numbers are looked at, it generates regular and predictable outcomes over the long-term and as numbers increase • Random – Outcomes are uncertain but there is nonetheless a regular distribution of outcomes in a large number of repetitions (this is not a pattern). • Probability – The probability of any outcome of a random phenomenon is the proportion of times the outcome would occur in a very long series of repetitions. Examples of Commonly Known Probable Long-term Results • Coin Tossing – Fifty/Fifty • Stockmarket returns – Reversion to the mean • The fall of cards in a game (if you are able to count them properly and quickly enough) Randomness • Perfect Randomness is very rare and the ability to select numbers totally at random (so that each and every other number had just as good a chance of being selected and no pattern can ever be predicted) is valuable Probability Models • Sample Space “S” of a random phenomenon is the set of all possible outcomes • An Event is an outcome or a set of outcomes of a random phenomenon (the roll of the dice, the flip of the coin, etc.) • A Probability Model is a mathematical description of a random phenomenon consisting of – A sample space S – A way of assigning probability to events As in figure 10.2 in the book: There are 36 possible combinations if you roll two standard dice. If we wanted to define a sample space S for (5) it would be comprised of the four possible ways to roll 5 (i.e. the four “events” that result in 5 A ={ roll 1 & 4, roll 2 & 3, roll 3 & 2, roll 4 &1} Graphics: Moore 2009 Some Formal Probability Rules • A probability is a number between 0 and 1 – An event with a probability of 0 ought never to occur, An event with a probability of 1 out to always occur • All possible outcomes together must equal 1 • If two events have no outcomes in common, the probability that one or the other occurs is the sum of their individual probabilities. Eg. If one event occurs in 40% of cases, and the other in 25% and the two cannot occur together then the probability of one or the other occurring is 65% • The probability that an event does not occur is 1 minus the probability that it does occur Discrete vs. Continuous Models • Discrete Probability Models – Assume that the sample space is finite – To assign probabilities list the probabilities of all the individual outcomes (must be between 1 and 0 and add up to 1).The probability of an event is the sum of the outcomes making up the event. – Think of our dice example: What is the probability of rolling a five? • • • • • roll 1 & 4 = 1/36 roll 2 & 3 = 1/36 roll 3 & 2 = 1/36 roll 4 &1 = 1/36 Total Probability = 4/36 = 1/9 = 0.111 • Continuous Probability Models – Assign probabilities as areas under a density curve (such as the normal curve) – The area under the curve and above any range of values is the probability of an outcome in that range – This is what we did in chapter 3! Break Time Sampling Distributions • Some Key Words – Parameter: a number that describes the population. We often can only speculate on this as we only have data for a sample. – Statistic: is a number that can be computed from the sample data without making use of any unknown parameters. We often use statistics to estimate parameters. • The Law of Large Numbers: – Draw observations at random from any population with finite mean µ . – As the number of observations drawn  increases, the mean xof the observed values gets closer and closer to the mean µ of the population . Two types of distribution of variables (be careful) • The Population Distribution of a variable is the distribution of values of the variable in the population • The Sampling Distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population (in other words, how a statistic varies in many samples drawn from the same population) For those who like math Suppose that X is the mean of a SRS of size “n” drawn from a large population with mean and standard deviation   Then the sampling distribution of and standard deviation  n X has mean  The Central Limit Theorem • In fact life gets better still • As we saw earlier, the mean of a sampling distribution will approach the mean in the population if you draw a big enough sample often enough • Something better happens with the shape of this sampling distribution – Even if the population distribution is not normal, when the sample is large enough, the distribution of the mean changes shape so as to approach normal (provided the population has a finite standard deviation). Bottom Line and Caution • If we can compute the average for a large random sample we have a decent guess as to what the average is in the population • The average of a sample is generally a better guess of the average of the population than any one case in the population. In other words, based on a survey of incomes, I can tell you with a reasonable level of certainty what the average income of Canadians is. I cannot tell you just from that what the income of any specific Canadian is.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Week 8 Feb. 28-29