Download Document

Chapter 7 Distributions of Sampling Statistics 1 Chapter 7 Distributions of Sampling Statistics 7.1 A Preview 7.2 Introduction 7.3 Sample Mean 7.4 Central Limit Theorem 7.5 Sampling Proportions from a Finite Population 7.6 Distribution of the Sample Variance of a Normal Population 2 A Preview       If you bet $1 on a number at a roulette table (輪盤賭桌) in a U.S. casino(賭場), then either you will win $35 if your number appears on the roulette wheel or you will lose $1 if it does not. Since the wheel has 38 slots–numbered 0, 00, and each of the integers from 1 to 36—it follows that the probability that your number appears is 1/38. As a result, your expected gain on the bet is That is, your expected loss on each spin (旋轉) of the wheel is approximately 5.3 cents. Suppose you continually place bets at the roulette table. How lucky do you have to be in order to be winning money at the end of your play? 3 A Preview  It depends on how long you continue to play. ◦ After 100 plays you will be ahead with probability 0.4916. ◦ After 1000 plays your chance of being ahead drops to 0.39. ◦ After 100,000 plays not only will you almost certainly be losing (your probability of being ahead is approximately 0.002), but also you can be 95 percent certain that your average loss per play will be 5.26 ± 1.13 cents. ◦ … ◦ If you play long enough, you will learn that the average loss per game is around 5.26 cents. 4 Introduction  One of the key concerns of statistics is the drawing of conclusions from a set of observed data. ◦ These data consist of a sample of certain elements of a population. ◦ The objective is to use the sample to draw conclusions about the entire population.  The assumptions to use sample data to make inferences about the values of the entire population: ◦ There is an underlying probability distribution for the population values. ◦ The sample data are assumed to be independent values from this distribution. 5 Introduction  Definition (sample) If X1, . . . , Xn are independent random variables having a common probability distribution, we say they constitute (構成) a sample from that distribution.   In most applications, the population distribution will not be completely known, and one will attempt to use the sample data to make inferences about it. Two important statistics that we will consider are ◦ the sample mean and ◦ the sample variance. 6 Sample Mean  The value associated with any member of the population can be regarded as being the value of a random variable having expectation μ and variance σ 2. The quantities μ and σ2 are called the population mean and the population variance, respectively. Let X1, X2, . . . , Xn be a sample of values from this population. The sample mean  Its expectation  The variance of the sample mean    7 Example 7.1  Let us check the preceding formulas for the expected value and variance of the sample mean by considering a sample of size 2 from a population whose values are equally likely to be either 1 or 2.  The pair of values X1, X2 can assume any of four possible pairs of values (1, 1), (1, 2), (2, 1), (2, 2) . 8 Example 7.1  Figure 7.2 plots the population probability distribution alongside the probability distribution of the sample mean of a sample of size 2. 9 Discussion  The standard deviation of the sample mean is equal to the population standard deviation divided by the square root of the sample size. 10 Exercises (p. 303, 2,3)  (Take home) Suppose that X1 and X2 constitute a sample of size 2 from a population in which a typical value X is equal to either 1 or 2 with respective probabilities P{X = 1}=0.7 P{X = 2}=0.3 (a) Compute E[X]. (b) Compute Var(X). (c) What are the possible values of X=(X1+X2)/2? (d) Determine the probabilities that X assumes the values in (c). (e) Using (d), directly compute E[X ] and Var( X ).  Consider a population whose probabilities are given by p(1)=p(2)=p(3)= 1/3 (a) Determine E[X]. (b) Determine SD(X). (c) Let X denote the sample mean of a sample of size 2 from this population. Determine the possible values of X along with their probabilities. (d) Use the result of part (c) to compute E[ X ] and SD( X ). 11 Central Limit Theorem The central limit theorem states that the sum of a large number of independent random variables is approximately normally distributed.  The central limit theorem partially explains why many data sets related to biological characteristics tend to be approximately normal. ◦ For example, the central limit theorem can be used to explain why the heights of the many daughters of a particular pair of parents will follow a normal curve.  PS. 中央極限定理指不論母群的分佈如何﹐其平均數的分佈都會傾向常態分佈﹐因此中央極限定理又稱為常態收斂定理。  12 Example 7.2  An insurance company has 10,000 (=104) automobile policyholders (汽車投保人).  If the expected yearly claim (索賠) per policyholder is $260 with a standard deviation of $800, approximate the probability that the total yearly claim exceeds $2.8 million (=$2.8 × 106).  Solution 13 Distribution of the Sample Mean  Let X1, . . . , Xn be a sample from a population having mean μ and variance σ2, and let be the sample mean.  Since has expectation μ and standard deviation standardized variable , the has an approximately standard normal distribution. 14 Example 7.3  The blood cholesterol (血液膽固醇) levels of a population of workers have mean 202 and standard deviation 14. (a) If a sample of 36 workers is selected, approximate the probability that the sample mean of their blood cholesterol levels will lie between 198 and 206. (b) Repeat (a) for a sample size of 64.  Solution 15 Example 7.3 16 Example 7.4     An astronomer (天文學家) is interested in measuring, in units of lightyears, the distance from her observatory to a distant (遙遠的) star. However, the astronomer knows that due to differing atmospheric conditions (大氣的情況) and normal errors, each time a measurement is made, it will yield not the exact distance, but an estimate of it. As a result, she is planning on making a series of 10 measurements and using the average of these measurements as her estimated value for the actual distance. If the values of the measurements constitute a sample from a population having mean d (the actual distance) and a standard deviation of 3 light-years, approximate the probability that the astronomer’s estimated value of the distance will be within 0.5 light-years of the actual distance. 17 Example 7.4  Solution 18 How Large a Sample Is Needed?  The central limit theorem leaves open the question of how large the sample size n needs to be for the normal approximation to be valid, and indeed the answer depends on the population distribution of the sample data. ◦ For instance, if the underlying population distribution is normal, then the sample mean will also be normal, no matter what the sample size is. ◦ A general rule of thumb is that you can be confident of the normal approximation whenever the sample size n is at least 30.  Figure 7.3 presents the distribution of the sample means from a certain underlying population distribution (known as the exponential distribution) for samples sizes n = 1, 5, and 10. 19 Exercises (p. 311, 4; p.312, 10)  If you place a $1 bet on a number of a roulette wheel, then either you win $35, with probability 1/38, or you lose $1, with probability 37/38. Let X denote your gain on a bet of this type. (a) Find E[X] and SD(X). Suppose you continually place bets of the preceding type. Show that (b) The probability that you will be winning after 1000 bets is approximately 0.39. (c) The probability that you will be winning after 100,000 bets is approximately 0.002  A six-sided die, in which each side is equally likely to appear, is repeatedly rolled until the total of all rolls exceeds 400. What is the approximate probability that this will require more than 140 rolls? (Hint: Relate this to the probability that the sum of the first 140 rolls is less than 400.) 20 Sampling Proportions from a Finite Population     Consider a population of size N in which certain elements have a particular characteristic of interest. Let p denote the proportion of the population having this characteristic. So Np elements of the population have it and N(1 − p) do not. Example 7.5 Suppose that 60 out of a total of 900 students of a particular school are left handed. If left-handedness is the characteristic of interest, then N = 900 and p = 60/900 = 1/15. 21 Sampling Proportions from a Finite Population  Definition A sample of size n selected from a population of N elements is said to be a random sample if it is selected in such a manner that the sample chosen is equally likely to be any of the subsets of size n.   Suppose that a random sample of size n has been chosen from a population of size N. For i = 1, . . . , n, let  Consider now the sum of the Xi ◦ Xi contributes 1 to the sum if the ith member of the sample has the characteristic and contributes 0 otherwise. ◦ The sum is equal to the number of members of the sample that possess the characteristic. 22 Sampling Proportions from a Finite Population  Similarly, the sample mean  will equal the proportion of members of the sample who possess the characteristic. Let us consider the probabilities associated with the statistic . Since the ith member of the sample is equally likely to be any of the N members of the population, of which Np have the characteristic  each Xi is equal to either 1 or 0 with respective probabilities p and 1 − p. 23 Sampling Proportions from a Finite Population    Note that the random variables X1, X2, . . . , Xn are not independent. ◦ For instance, without any knowledge of the outcome of the first selection, P {X2 = 1} = p ◦ However, the conditional probability that X2 = 1, given that the first selection has the characteristic, is P {X2 = 1|X1 = 1} = (Np − 1)/( N − 1) and P {X2 = 1|X1 = 0} = (Np)/( N − 1) Thus, knowing whether the first element of the random sample has the characteristic changes the probability for the next element. However, when the population size N is large in relation to the sample size n, this change will be very slight. 24 Sampling Proportions from a Finite Population  When the population size N is large with respect to the sample size n, then X1, X2, . . . , Xn are approximately independent. 25 Sampling Proportions from a Finite Population    Let X denote the number of members of the population who have the characteristic, then ◦ if the population size N is large in relation to the sample size n, then the distribution of X is approximately a binomial distribution with parameters n and p. For the remainder of this text we will suppose that the underlying population is large in relation to the sample size, and we will take the distribution of X to be binomial. The mean and standard deviation of a binomial random variable 26 Example 7.6   Suppose that 50 percent of the population is planning on voting for candidate A in an upcoming election (選舉). If a random sample of size 100 is chosen, then the proportion of those in the sample who favor candidate A has expected value E[X] = 0.50 and standard deviation 27 Exercise (p. 319, 1) Suppose that 60 percent of the residents of a city are in favor of (支持) teaching evolution (進化論) in high school. Determine the mean and the standard deviation of the proportion of a random sample of size n that is in favor when (a) n = 10 (c) n = 1, 000  28 Probabilities Associated with Sample Proportions: The Normal Approximation to the Binomial Distribution   From an historical point of view, one of the most important applications of the central limit theorem was in computing binomial probabilities. Let X denote a binomial random variable having parameters n and p. 29 Probabilities Associated with Sample Proportions: The Normal Approximation to the Binomial Distribution 30 Example 7.7      Suppose that exactly 46 percent of the population favors a particular candidate. If a random sample of size 200 is chosen, what is the probability that at least 100 favor this candidate? Solution If X is the number who favor the candidate, then X is a binomial random variable with parameters n = 200 and p = 0.46. The desired probability is P{X ≥ 100}. ◦ Note that since the binomial is a discrete and the normal is a continuous random variable, it is best to compute P{X = i} as P{i − 0.5 ≤ X ≤ i + 0.5} when applying the normal approximation (this is called the continuity correction). 31 Exercises (p. 322, 15;p. 321, 8)  Let X be a binomial random variable with parameters n = 100 and p = 0.2. Approximate the following probabilities. (a) P{X ≤ 25} (b) P{X > 30} (c) P{15< X <22}  If 65 percent of the population of a certain community is in favor of a proposed increase in school taxes, find the approximate probability that a random sample of 100 people will contain (a) At least 45 who are in favor of the proposition (提案) (b) Fewer than 60 who are in favor (c) Between 55 and 75 who are in favor 32 Distribution of the Sample Variance of a Normal Population 33 Distribution of the Sample Variance of a Normal Population  The expected value of a chi-squared random variable  The expected value of a chi-squared random variable is equal to its number of degrees of freedom. 34 Distribution of the Sample Variance of a Normal Population 35 Explanation  Consider the standardized variables (Xi − μ)/σ , i = 1, . . . , n, where μ is the population mean. The sum of their squares  has a chi-squared distribution with n degrees of freedom. Substitute the sample mean for the population mean,   will remain a chi-squared random variable. However, it will lose 1 degree of freedom because the population mean (μ) is replaced by its estimator (the sample mean). 36 KEY TERMS  A sample from a population distribution: If X1, . . . , Xn are independent random variables having a common distribution F, we say that they constitute a sample from the population distribution F.  Statistic: A numerical quantity whose value is determined by the sample.  Sample mean  Sample variance  Central limit theorem: A theorem stating that the sum of a sample of size n from a population will approximately have a normal distribution when n is large.  Random sample: A sample of n members of a population is a random sample if it is obtained in such a manner that each of the possible subsets of n members is equally likely to be the chosen sample.  Chi-squared distribution with n degrees of freedom: The distribution of the sum of the squares of n independent standard normals. 37

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Document