Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
6 Sampling Distributions Statisticians use the word population to refer the total number of (potential) observations under consideration The population is just the set of all possible outcomes in our sample space (chapter 3) Therefore, a population may be finite (e.g. number of households in the US) or (effectively) infinite (e.g. number of stars in the universe) e.g. question: average number of TV sets per household in US population: number of TV sets in each household in US question: average number of TV sets per household in North America population: number of TV sets in each household in Canada, US and Mexico question: probability that a star has planets population: number of planets per star for all stars (past, present, future) in all galaxies in the universe In answering questions (e.g. what is the mean, what is the variance, what is the probability) for a given population, one seldom answers the questions using the entire population. In practice the questions are answered from a subset (a sample) of the population. it is important to choose the sample in a way that does not bias the answers This is the subject of an area of statistics referred to as experimental design. (how to design the sample such that you adequately reflect the entire population) e.g. in determining the probability of getting a pair in a poker hand, you would not sample only poker hands that contained two pairs. (technically this would be an attempt to determine the probability P(pair) for the entire population by approximating it by a conditional probability P(pair|two pair) e.g. To determine the average length of logs moving on a conveyor belt at constant speed, one might decide to measure only the logs that pass a certain point on the conveyor belt every 10 minutes. Upon reflection, you realize that longer logs have a greater probability of being at the measuring point at the selected times, thus the sample would give a biased average length measure that would be too large. e.g. to determine the expected lifetime of a tire, you only test it on smooth, paved roads? e.g. to determine fuel rating on cars, the EPA presumes that every car is driven 55 percent of the time in the city and 45 percent of the time on the highway!? One way to ensure unbiased sampling is to ensure your subset is a random sample Suppose our sample is to consist of n observations, π₯1 , π₯2 , β¦ , π₯π . We have to select the first observation π₯1 , the second π₯2 , etc. We think of the procedure for picking π₯π as selecting a value for a random variable ππ , that is, we think of picking values π₯1 , π₯2 , β¦ , π₯π for our sample as the process of picking values for n random variables π1 , π2 , β¦ , ππ . Using this thinking, we can define a random sample as follows: finite population: A set of observations π1 , π2 , β¦ , ππ constitutes a random sample of size n from a finite population of size N, if values for the set are chosen so that each subset of n of the N elements of the population has the same probability of being selected. infinite population: A set of observations π1 , π2 , β¦ , ππ constitutes a random sample of size n from the infinite population described by distribution (discrete) or density (continuous) π(π₯) if 1. each ππ is a RV whose distribution/density is given by π(π₯) 2. the n RVs are independent The phrase random sample is applied both to the RVβs π1 , π2 , β¦ , ππ and their values π₯1 , π₯2 , β¦ , π₯π How to achieve a random sample? e.g. the population is finite (and relatively small) Label each element of the population 1, 2, β¦, N. Draw numbers sequentially, in groups of n, from a random digits table When the population size is large or infinite, this process can become practically impossible, and careful thought must be given to, at least approximate, random sampling design. e.g. areal sampling using a regular grid works if underlying population (e.g. chemical contaminant concentration) is relatively homogenous. Doesnβt work if underlying population is spatially concentrated. e.g. replicate sampling in βanomalousβ areas 6.2 Sampling Distribution of the Mean For each sample π₯1 , π₯2 , β¦ , π₯π of n observations, we can compute a mean π₯ . The mean value will vary with each of our samples. Thus we can think of the sample mean (mean value for each sample) as a random variable π obeying some distribution function π(π₯ ; π) The distribution π(π₯ ; π) is referred to as the theoretical sampling distribution. We put aside for the moment the question of the form for π(π₯ ; π) and note that, in chapter 5.10, we have already computed the mean and variance for π(π₯ ; π) in the case of continuous RVβs. Theorem 6.1: If a random sample π1 , π2 , β¦ , ππ of size n is taken from a population having mean π and variance π 2 , then π is a RV whose distribution π(π₯ ; π) has: infinite population finite population mean value E(π) = π and variance πππ π = mean value E(π) = π and variance πππ π = πβπ π2 π π 2 πβπ β π πβ1 Note: The appearance of the term πβ1 for the variance of π in the finite population case is unexpected based upon the calculation in 5.10. The calculations in 5.10, when applied to a finite population, assume that π βͺ π. This correction factor, called the finite population correction (fpc) factor is included to account for cases in which π β² π. Note that the fpc factor =0 for π = π. (i.e. πππ π =0 when π = π). This implies that, when one sample is taken using the entire population, π exactly measures the population mean with no error (variance). e.g. For N = 1,000 and n = 10, the fpc is 990 πππ = = 0.991 999 Note that the results in Theorem 6.1 are independent of what π(π₯ ; π) may actually be !!! Apply Chebyshevβs theorem to the RV π π Let π = π π , π i.e. k = ππ , π πβπ >π π 1 < 2 . π π giving π2 π2 π2 π πβπ >π < 2 = ππ π Therefore, for any (arbitrarily small but) non-zero value for π, the probability that π differs from π can be made arbitrarily small by making n large enough. (We need π β« π 2 π 2 , which means n must get very large as π gets small). This observation is known as the law of large numbers (if you make the sample size large enough, a single sample is sufficient to give a value for π₯ arbitrarily close to the population mean.) Theorem 6.2 Let π1 , π2 , β¦ , ππ be a random sample, each having the same mean value π and variance π 2 . Then for any π > 0 π π β π > π β 0 as π β β as the sample size gets large, the probability that the average from a single random sample differs from the true mean goes to zero. Again β this result on π is independent of what π(π₯ ; π) may actually be. e.g. In an experiment, event A occurs with probability p. Repeat the experiment n times and compute number of times A occurs in π trials π relative frequency of occurrence of A = Show that the relative frequency of A β π as π β β Consider each trial as an independent RV, π1 , π2 , β¦ , ππ Each ππ takes on two values, π₯π = 0,1 depending on whether A does not or does occur in experiment i. ππ has mean value E ππ = 0 β 1 β π + 1 β π = π and variance πππ ππ = πΈ ππ2 β πΈ ππ 2 = 02 β 1 β π + 12 β π β π2 = π(1 β π) Then π1 + π2 + β― + ππ records the number of times A occurs in n trials, and π1 + π2 + β― + ππ π= π is in fact the relative frequency of occurrence of A. From Theorem 6.2 we have π(1 β π) π 2 π πβπ >π < β 0 for any π β [0,1] as π β β π πππ(π) β‘ ππ = π π is referred to at the standard error of the mean. To reduce the standard error by a factor of two, it is necessary to increase π β 4π. Thus (unfortunately) increasing sample size decreases the standard error at a relatively slow rate. (e.g. if n goes from 25 to 2,500 (a factor of 100), the standard error decreases only by 10.) While the results in Theorems 6.1 and 6.2 are independent of the form of the theoretical sampling distribution/density π(π₯ ; π), the actual form for π(π₯ ; π) depends on knowing the probability distribution which governs the population. In general it can be very difficult to compute the form of π(π₯ ; π). Two results are known β both presented as theorems. Theorem 6.3 (central limit theorem) Let π be the mean of a random sample of size n taken from a population having mean π and variance π 2 . Then the associated RV, the standardized sample mean πβπ πβ‘ π π is a RV whose distribution function approaches the standard normal distribution as π β β The central limit theorem says that, as π β β, the theoretical sampling distribution π(π₯ ; π) β a normal distribution (i.e. π is normally distributed) with mean π and variance π 2 π The distribution π(π₯ ; π) of π for samples of size n for a population with exponential distribution The distribution π(π₯ ; π) of π for samples of size n for population with uniform distribution In practice, the distribution for π is well approximated by a normal distribution for n as small as 25 to 30. Practical use of the central limit theorem: You have a population whose mean π and standard deviation π you assume that you know (but whose density function π(π₯) you do not know). You sample the population with a sample of size n. From the sample you compute a mean value π₯ . If the sample size is sufficiently large the central limit theorem will tell you the probability of getting the value π₯ given your assumptions on the values of π and π. To test your assumption, compute the standardized sample mean z using the measured π₯ and assumed values π and π. The central limit theorem states that the probability of getting the value π₯ is the same as the probability of getting the z-score z in a standard normal distribution. Theorem (Normal populations) Let π be the mean of a random sample of size n taken from a population that is normally distributed having mean π and variance π 2 . Then the standardized sample mean πβπ πβ‘ π π has the standard normal distribution function regardless of the size of n. (i.e. π(π₯ ; π) for π is normal density with mean π and variance π 2 /π). Practical use of this theorem: You have a population whose distribution is (assumed to be) normal and whose mean π and standard deviation π you assume that you know. You sample the population with a sample of size n. From the sample you compute a mean value π₯ . This theorem will tell you the probability of getting the value π₯ given your assumptions on normality and the values of π and π. To test your assumptions, compute the standardized sample mean z using the measured π₯ and assumed values π and π. This theorem states that the probability of getting the value π₯ is the same as the probability of getting the z-score z in a standard normal distribution. e.g. 1-gallon paint cans (the population) from a particular manufacturer cover, on average 513.3 sq. ft, with a standard deviation of 31.5 sq. ft. What is the probability that the mean area covered by a sample of 40 1-gallon cans will lie within 510.0 to 520.0 sq. ft. Find the standardized sample means for the two limits of the range: 510.0 β 513.3 520.0 β 513.3 π§1 = = β0.66, π§2 = = 1.34 31.5 40 31.5 40 Assuming the central limit theorem, we have from Table 3 π 510.0 < π < 520.0 = π β0.66 < π < 1.34 = πΉ 1.34 β πΉ β0.66 = 0.9099 β 0.2546 = 0.6553 6.3 The Sampling Distribution of the Mean when π is unknown (usual case) In 6.2 we discussed aspects of the distribution of the sample mean π (it has a distribution with mean π,variance π 2 π (for continuous RVs), and the related RV πβπ πβ‘ π π the standardized sample mean approaches the standard normal distribution as π β β). In practice π is not known and we have to deal with the values π₯βπ π‘β‘ π π where s is the sample standard deviation π = π 2 , and π 2 is the sample variance π 2 π=1 π₯π β π₯ 2 π = πβ1 Similar to π, we define the random variable π 2 called the sample variance π 2 π=1 ππ β π 2 π = πβ1 2 which has values π . In this section and the next, we are interested in the behavior of t and π 2 thought of as random variables. Little is known about the behavior of the distribution for t when n is small unless we are sampling from a population governed by the normal distribution (a βnormal populationβ) Theorem 6.4 If π is the sample mean for a random sample of size n taken from a normal population having mean π, then πβπ π‘β‘ π π is a random variable having the t distribution with parameter π£ = π β 1. Note: it is convention to use small βtβ for the RV for the t distribution (breaking the convention to use capital letters for the RV and small letters for its values). We will use small βtβ to stand for both the RV and its values. The t distribution: a one-parameter family of RVs, with values defined on (ββ, β) density function π£+1 π£+1 β 2 2 Ξ π‘ 2 π π‘; π£ = 1 + π£ π£ π£πΞ 2 mean value 0 (for π£ > 1), otherwise undefined variance π£ π£β2 (for π£ > 2), β for 1 < π£ < 2, otherwise undefined The t distribution is symmetric about 0, and very close to the standard normal distribution. In fact the t distribution β the standard normal distribution as π£ β β. The t distribution has βheavierβ tails than the standard normal distribution (i.e. there is higher probability in the tails of the t distribution). It is often referred to as βstudentβs t distributionβ π£ π£ π£ π£ The parameter v in the t distribution is referred to as the (number of) degrees of freedom (df) Recall that the sum of the sample deviations π₯π β π₯ is 0, hence only n β 1 of the deviations are independent of each other. Thus the RVs π 2 and, by the same reasoning, t both have n β 1 degrees of freedom. Similar to the π§πΌ for the standard normal distribution, we define the π‘πΌ for the t distribution. Because of the symmetry of the standard normal and t distributions we have π§1βπΌ = βπ§πΌ , π‘1βπΌ = βπ‘πΌ Recall that Table 3 lists values of the cumulative standard normal distribution πΉ(π§) for various values of z In contrast, Table 4 lists values of π‘πΌ for various values of πΌ and v. (Recall, πΌ is the probability in the right-hand tail above π‘πΌ ) By symmetry, the probability in the left-hand tail below βπ‘πΌ is also πΌ. Note that for π β β, π‘πΌ = π§πΌ The standard normal distribution provides a good approximation to the t distribution for samples of size 30 or more. Practical use of theorem 6.4: You have a population whose distribution is ( assumed to be) normal and whose mean π you assume that you know (but whose standard deviation you do not know). You sample the population with a sample of size n. From the sample you compute a sample mean value π₯ and the sample standard deviation s. Theorem 6.4 will tell you the probability of getting the values π₯ and s given your assumptions on normality and the value of π . To test your assumption, compute the standardized sample mean z using the measured π₯ and s and the assumed values π. Theorem 6.4 states that the probability of getting the value π₯ , s is the same as the probability of getting the value t in a t distribution with π£ = π β 1 e.g. a manufacturerβs fuses (the population) will blow in 12.40 minutes on average when subjected to a 20% overload. A sample of 20 fuses are subjected to a 20% overload. The sample average and standard deviation were observed to be, respectively, 10.63 and 2.48 minutes. What is the probability of this observation given the manufacturers claim? π‘= 10.63 β 12.40 = β3.19, π£ = 20 β 1 = 19 2.48/ 20 From Table 4, for v = 19, we see that a t value of 2.861 already has only 0.5% probability (πΌ = 0.005) of being exceeded. Consequently there is less than a 0.5% probability that a t value smaller than -2.861 will occur. Since the t value obtained in our sample of 20 is β3.19, we conclude that there is less than 0.5% probability of getting this result. We therefore suspect that the manufacturers claim is incorrect, and that the manufacturers fuses will blow in less than 12.40 minutes on average when subjected to 20% overload. If the population is not normal, studies have shown that the distribution of πβπ π π is fairly close to that of the t distribution as long as the population distribution is relatively bell-shaped and not too skewed. This can be checked using a normal scores plot on the population. 6.4 The Distribution of the Sample Variance πΊπ Theorem 6.5 Consider a random sample of size n taken from a normal population having variance π 2 . Then the RV π 2 2 (π β 1)π π β π π π=1 ο£2 β‘ = π2 π2 has the chi-square distribution with parameter π£ = π β 1 The chi-square distribution: a one-parameter family of RVs, with values defined on (0, β) density function π£ π£ 1 β1 β π π₯; π£ = π£ π₯2 π 2 π£ 22 Ξ 2 mean value v variance 2v π£ The chi-square distribution is just the gamma distribution with πΌ = 2 , π½ = 2 Again, the parameter v is referred to as the (number of) degrees of freedom (df) We define the ο£2πΌ notation similar to that of π§πΌ and π‘πΌ . Just as for Table 4, Table 5 lists values of ο£2πΌ for various values of πΌ and v. π£ π£ π£ π£ π£ π£ e.g. (the population) glass βblanksβ from an optical firm suitable for grinding into lenses Variance or refractive index of glass is 1.26 β 10β4 . Random sample of size 20 selected from any shipment, and if variance of refractive index of sample exceeds 2 β 10β4 , the sample is rejected. What is probability of rejection assuming underlying population is normal? For the measured sample of 20 20 β 1 2 β 10β4 β‘ = 30.2 1.26 β 10β4 From Table 5, for π£ = 19, 30.2 corresponds to a value πΌ = 0.05. There is therefore a 5% probability of rejected a shipment ο£2 Practical use of theorem 6.5: You have a population whose distribution is ( assumed to be) normal and whose variance π 2 you assume that you know. You sample the population with a sample of size n. From the sample you compute a sample variance π 2 . Theorem 6.5 will tell you the probability of getting the value π 2 given your assumptions on normality and the value of π 2 . To test your assumption, compute the chi square value ο£2 using the measured π 2 and the assumed value π 2 . Theorem 6.5 states that the probability of getting the value π 2 is the same as the probability of getting the value ο£2 in a chi square distribution with π£ = π β 1 Recap sample space (N outcomes if finite) sample 1 n outcomes π¦1 β¦ π¦π values for RV π₯1 β¦ π₯π e.g. n k-dice sums sample 2 e.g. n throws each of k dice sample j Think of each π₯π value as resulting from a RV ππ such that 1. each ππ has the same density π(π₯), mean π, and variance π 2 2. the ππ are independent random sample The population of outcomes in the sample space generates values for the RVs 2 Each sample generates a sample mean π₯ and a sample variance π = π π=1 πβ1 Think of the sample means and variances are values for the RVs π and π 2 What are πΉ π , πΈ π , πππ π , πΉ π 2 , πΈ π 2 , πππ π 2 ? Chapter 5 states: πΈ π = π, πΈ π = π, πππ π = π 2 /π for an infinite population πππ π = π 2 πβπ π πβ1 for an infinite population Chapter 6 addresses the questions on πΉ π , πΉ π 2 Law of large numbers for a single sample (and single value of π) π2 π πβπ >π < 2 ππ Central limit theorem πβπ πβ‘ π π is a RV whose distribution πΉ π β standard normal π(0,1) as π β β (i.e. π is a RV whosedistribution πΉ π β π(π, π) as π β β) π₯π βπ₯ 2 If the πΏπ are normally distributed with mean π and variance π 2 πβπ πβ‘ π π is a RV whose distribution πΉ π = π(0,1) for all n i.e. π is a RV whose distribution πΉ π = π(π, π) for all n If the πΏπ are normally distributed with mean π πβπ π‘β‘ π π is a RV whose distribution πΉ π‘ is the t-distribution with df π£ = π β 1 If the πΏπ are normally distributed with variance π 2 π 2 2 (π β 1)π π β π π π=1 ο£2 β‘ = π2 π2 is a RV whose distribution πΉ ο£2 is the chiβsquare distribution with df π£ = π β 1 Assume we have two populations. We may wish to inquire whether they have the same variance. Assume π12 and π22 are measured sample variances for each population. Theorem 6.6 If π12 and π22 are measured sample variances of independent random samples of respective sizes π1 and π2 taken from two normal populations having the same variance then π12 πΉ= 2 π2 is a RV having the F distribution with parameters π£1 = π1 β 1 and π£2 = π2 β 1. The F distribution: a two-parameter family of RVs, with values defined on (0, β) density function π π₯; π£1 , π£2 = mean value variance π£2 π£2 β2 1 π£1 π£2 π£ π£ π΅ 21 , 22 π£1 2 π£1 π₯ 2 β1 π£1 1+ π₯ π£2 π£ +π£ β 12 2 for π£2 > 2 2π£22 (π£1 +π£2 β2) π£1 π£2 β2 2 (π£2 β4) for π£2 > 4 The F distribution is similar to the beta distribution. π΅ 1 π΅ π₯, π¦ = 0 π‘ π₯β1 1 β π‘ π£1 π£2 , 2 2 π¦β1 ππ‘ is the beta function F distribution π£1 π£1 π£1 π£1 π£1 π£2 π£2 π£2 π£2 π£2 The parameter π£1 is referred to as the numerator degrees of freedom (df of nunerator) The parameter π£2 is referred to as the denominator degrees of freedom (df of denominator) As with π§πΌ , π‘πΌ , etc we define πΉπΌ . Values of πΉπΌ are given in Table 6 for various values of π£1 and π£2 for πΌ = 0.05 (Table 6(a)) and πΌ = 0.01(Table 6(b)) Practical use of theorem 6.6: You have two population whose distribution are ( assumed to be) normal and whose variances you assume to be equal. You sample population 1 with a sample of size π1 and population 2 with a sample of size π2 . From each sample you compute sample variances π 12 and π 22 . Theorem 6.6 will tell you the probability of getting the ratio π 12 π 22 given your assumptions on normality and equality of variance. To test your assumption, compute the value F. Theorem 6.6 states that the probability of getting the ratio π 12 π 22 is the same as the probability of getting the value F in an F distribution with π£1 = π1 β 1, π£2 = π2 β 1. e.g. Two random samples of size π1 = 7 and π2 = 13 are taken from the same normal population. What is the probability that the variance of the first sample will be at least 3 times that of the second. For π£1 = 6 and π£2 = 12, Table 6(a) shows an F value of 3.00 for πΌ = 0.05. Therefore there is a 5% probability that the variance of the first sample will be at least 3 times that of the second. 6.5 Representations of normal distributions Defining new random variables in terms of others is referred to as a representation chi-square Let π1 , π2 , β¦ , ππ£ be independent standard normal RVs. Define the RV π£ ππ2 ο£2π£ = π=1 Then ο£2π£ has a chi square distribution with v df Thus we also see that the square of a standard normal RV is a chi-square RV Let π£1 π£1 +π£2 ππ2 and ο£22 = ο£12 = π=1 ππ2 π=π£1 +1 where the ππ are independent standard normal RVs (and thus ο£12 and ο£22 are independent of each other). Then ο£12 + ο£22 has a chi square distribution with π£1 + π£2 df. Thus we see that the sum of two independent chi square RVs is also a chi square RV with the sum of the individual df t distribution Let π be a standard normal RV and ο£2 be a chi-square RV with v df. Assume π and ο£2 are independent. Then π π‘β‘ ο£2 π£ has a t distribution with π£ df F distribution Let ο£12 and ο£22 be chi-square RVs with df π£1 and π£2 respectively. Assume ο£12 and ο£22 are independent. Then ο£12 π£1 πΉπ£1,π£2 β‘ 2 ο£2 π£2 has an F distribution with π£1 , π£2 df Thus we see that π‘2 is a RV with an ππ,π distribution π2 1 β‘ 2 ο£ π£ e.g. Let π1 , π2 , β¦ , ππ be n independent normal RVs all having mean π and standard deviation π. Then ππ β π ππ = π is a standard normal RV for each i. Then 1 ππ β‘ π π π π=1 1 ππ = π π is also a standard normal RV. Consider π π ππ β π π=1 i.e. 2 π π=1 π ππ2 β 2π = π=1 π π ππ2 β ππ 2 ππ + ππ 2 = π=1 π=1 π ππ2 = π=1 ππ β π πβπ = π π/ π ππ β π 2 + ππ 2 π=1 Note that the LHS is chi square distribution with πdf. The last term on the RHS is chi square with 1 df. This implies that the first term on the RHS is chi-square with π β 1df. Thus we see that π π 2 (π β 1)π 2 π β π π=1 π = = ππ β π 2 2 2 π π π=1 has a chi square distribution with π β 1df (as claimed in Theorem 6.5) Let ππ be π(ππ , ππ2 ) for π = 1, β¦ , π be n independent normal RVs Then π π= ππ π=1 is normal with π πΈπ = π π’π , ππ2 πππ π = π=1 π=1 A sum of normal RVs is a normal RV Let ππ be a chi-square RV with df= π£π for π = 1, β¦ , π ; assume the ππ are independent Then π π= ππ π=1 is a chi-square RV with df π π£= vπ π=1 A sum of chi-square RVs is chi-square Let ππ be a Poisson RV with parameter Ξ»π for π = 1, β¦ , π ; assume the ππ are independent Then π π= ππ π=1 is a Poisson RV with parameter π Ξ»= Ξ»π π=1 A sum of Poisson RVs is Poisson