Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Transcript

Lecture 13, Wednesday, October 10, 2008 CHAPTER 10 PROBABILITY Probability is the chance that an event will occur. It is a decimal number between 0 and 1 inclusive. If probability = 0, there is no chance the event will occur. If probability = 1, it is certain that the event will occur. Probabilities may also be expressed in percent format, 0% to 100% A random phenomena, or random process, like a coin toss, is a process where nobody can predict what will occur on the next trial, but the long range result is predictable. Probability is the proportion of times that the event will occur in a very long series of trials. It may take thousands of trials for the proportion to converge on the true value for the random process. The sample space is the set of all possible outcomes. An even is an outcome or a set of outcomes in a random process. A probability model is a listing of the sample space and the probability for each outcome. Four probability rules: 1. The probability of an any event is always between 0 and 1 inclusive. 2. The probability of all the events in the sample space must sum to 1.0 3. Two events are disjoint if they have no outcomes in common and therefore they can never occur together. In this case, Prob (A or B) = Prob(A) + Prob(B) 4. For any event, Prob(A does not occur) = 1 – Prob(A) A random variable is a variable whose value is a numerical outcome of a random process. CHAPTER 11 SAMPLING DISTRIBUTIONS A PARAMETER is a number which describes some characteristic of a POPULATION. A STATISTIC is a number which describes some characteristic of a SAMPLE. We usually do not know the actual value of a parameter because populations are usually too large to include every member in a sample and determine the value. We usually have to estimate the value of parameters. A statistic is calculated from actual data obtained by taking a sample from the population. We use the statistic to estimate the value of the parameter. Examples: The sample mean, xbar, is used to estimate the population mean, mu. The sample proportion, phat, is used to estimate the population proportion, p. LAW OF LARGE NUMBERS says that the sample statistic, xbar, approaches the population parameter, mu, closer and closer as the sample size increases. This is similar to the way probability approaches the true value when the series of trials becomes very long. SAMPLING DISTRIBUTIONS Every statistic such as the sample mean or the or the sample proportion has a pattern of variation. Because each sample is different, the value of the sample statistic will fluctuate. The sampling distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population. SAMPLING DISTRIBUTION OF XBAR The sample mean, xbar, will form a distribution of values whose mean = population mean, and whose standard deviation = population standard deviation / square root of n. The shape of the XBAR distribution will be normally distributed if the samples are taken from a normal distribution. CENTRAL LIMIT THEOREM The shape of the XBAR distribution will be approximately normally distributed if the samples are taken from a non-normal distribution if the sample size is large. Large is not defined in the book, but most books say that n=30 is large enough. We will use 40. Figure 11.4 on page 282 shows a right skewed distribution of earned income in top panel. The second panel shows the distribution of XBAR’s when n=100. The third panel shows the second panel with the horizontal scale expanded. It shows a distribution which quite symmetric and very close to a normal distribution. Figure 11.5 on page 283 shows schematically how the distribution of XBAR approaches a symmetric distribution as the sample size changes from 2 to 10 to 25. Sample means are less variable than individual observations. Sample means are more normally distributed than individual observations. IMPORTANT IMPLICATIONS 1. The statistic XBAR has a distribution which is centered on the population mean, mu. Therefore XBAR is an unbiased estimator of the population mean. 2. The variation of the statistic XBAR is always less than the variation of individual observations. 3. The larger the sample size, n, the smaller the variation of XBAR becomes. 4. Regardless of the shape of the population, the sample mean will be a normal distribution, centered on the population mean if the sample size is at least 40, and the larger the better. Examples: