Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
SAMPLING DISTRIBUTIONS POPULATION AND SAMPLES, PARAMETERS AND STATISTICS RECALL! A POPULATION is the set of all possible subjects of a given experiment or study. A SAMPLE is a specially-chosen, relatively small subset of the population that is used for actual measurements. Usually, a RANDOM SAMPLE is obtained. POPULATION MEAN (μ) ST.DEV. (σ) SAMPLE1 MEAN (X1) ST.DEV. (s1) NOTE! SAMPLE2 MEAN (X2) ST.DEV. (s2) SAMPLE3 MEAN (X3) ST.DEV. (s3) For a given experiment, the measurements on the population are fixed quantities (constants), These measurements are called PARAMETERS (of the population). For each of the sample, these measurements are variable quantities (vary from sample to sample). These measurements are called (sample) STATISTICS. SAMPLING DISTRIBUTIONS — POPULATION AND SAMPLES, PARAMETERS AND STATISTICS Page 1 DISTRIBUTION OF A STATISTIC (MEANS) EXPLAIN PLEASE! Suppose we have the following population data: Computing the mean and standard deviation, we get: 2, 4, 6, 8 μ=4 σ=2.83 Now, we list every possible sample (of size N=2), compute their means, make a frequency table and histogram for this list of sample means. SAMPLE MEANS 2 6, 2 4 2, 4 3 6, 4 5 2, 6 4 6, 6 6 2, 8 5 6, 8 4, 2 3 8, 2 2, 2 4 sample means Freq. 2 1 3 2 7 4 3 5 5 4 4, 4 4 8, 4 6 6 3 4, 6 5 8, 6 7 7 2 8 8 1 4, 8 6 8, 8 3 2 1 0 SAMPLING DISTRIBUTIONS — DISTRIBUTION OF A STATISTIC (MEANS) 2 3 4 5 6 7 8 Page 2 THE SAMPLING DISTRIBUTION OF MEANS SAMPLING DISTRIBUTION OF THE MEANS If we can obtain all samples of a fixed size N≥30 from ANY POPULATION with mean μ and standard deviation σ, then the distribution of the sample means is normal with: mean X and standard deviation X N For a NORMAL POPULATION, the samples can be of any size! SAMPLE3 (SIZE N=35) MEAN: X1 12.3 MEAN: X2 11. 7 MEAN: X3 12. 8 SAMPLE4 (SIZE N=35) SAMPLE5 (SIZE N=35) SAMPLE6 (SIZE N=35) MEAN: X 4 10. 9 MEAN: X5 13. 8 MEAN: X1 10. 2 ... ... MEAN: μ=12.2 ST.DEV.: σ=2.41 POPULATION POPULATION POPULATION PO PULATION POPULATION POPULATION POPULATION POPU SAMPLE2 (SIZE N=35) ... LATION POPULATION POPULATION POPULA SAMPLE1 (SIZE N=35) The DATA SET of all the SAMPLE MEANS (each sample of fixed size, N=35): X1 , X 2 , X 3 , X 4 , X5 , X6 , X7 , X8 , X9 , X10 , . . . has a NORMAL DISTRIBUTION. Also, for this DATA SET of SAMPLE MEANS: MEAN X 12.2 STANDARD DEVIATION X 2. 41 35 0.41 POPULATION POPULATION POPULATION POPULATION NOTE! This DATA SET of the MEANS of all possible samples (of fixed size N) that can be drawn from a given population is called the SAMPLING DISTRIBUTION OF MEANS. SAMPLING DISTRIBUTIONS — THE SAMPLING DISTRIBUTION OF MEANS Page 3 EXAMPLE 1. The sardines delivered to the Eugenio’s Cannery have a mean length of 4.54 ins., and a standard deviation of 1.03 ins. A. Suppose these lengths are found to be normally distributed. What percentage of sardines delivered to the cannery are longer than 5 ins? MEAN = 4.54 ST.DEV. = 1.03 Convert the data values to Z-score: X 5: Shaded section: 4.54 5 Length (in) 0 0.47 z-score Z 5 4.54 0.47 1.03 P(Z>0.47) = 1 – P(Z<0.47) = 1 – 0.6808 = 0.3292 or 32.92% B. If the sardines are delivered in plastic bags (15 per bag), what percentage of these bags contain (sardines with) a mean length, less than 4.9 ins? THINK! The sardines are packed in plastic bags (15 each) packed in samples of fixed size N=20. Length of sardines = NORMAL The sampling distribution of means (of lengths) = NORMAL MEAN = 4.54 ST.DEV. = 0.27 with mean X 4.54 and st.dev. X 1.03 15 0.27 SAMPLING DISTRIBUTIONS — THE SAMPLING DISTRIBUTION OF MEANS 4.54 mean length Page 4 B. If the sardines are delivered in plastic bags (15 per bag), what percentage of these bags contain (sardines with) a mean length, less than 4.9 ins? (Continuation) MEAN = 4.54 ST.DEV. = 0.27 Convert the data values to Z-score: X 5: Z Shaded section: 4.54 4.9 0 1.33 4.9 4.54 1.33 0.27 P(Z<1.33) = 0.9082 or 90.82% Length (in) z-score C. If someone claims that he has found a plastic bag of rather large sardines with mean length above 5.2 ins, would you believe his claim? MEAN = 4.54 ST.DEV. = 0.27 Convert the data values to Z-score: X 5.2 : Z Shaded section: 4.54 5.2 0 2.44 Length (in) z-score 5.2 4.54 2.44 0.27 P(Z>2.44) = 1 – P(Z<2.44) = 1 – 0.9927 = 0.0063 or 0.63% The probability of finding a pack of fish with mean length above 5.2 ins. is almost = 0 (or impossible!) SAMPLING DISTRIBUTIONS — THE SAMPLING DISTRIBUTION OF MEANS Page 5 EXAMPLE 2. A. C. Neilsen reported that children between the ages of 2 and 5 watch an average of 25 hours of television per week, with standard deviation of 3 hours per week. A. If 40 children (ages 2 to 5) are randomly selected, what is the probability that the mean no. of hours they watch television is less than 24.6 hours? THINK! 40 children (ages 2 to 5) are randomly selected. samples (of children) of size N=40. The sampling distribution of mean no. of hours watching TV = NORMAL sample size N=40 ≥30 MEAN = 25 ST.DEV. = 0.47 with mean X 25 3 and st.dev. X 0.47 40 MEAN = 25 ST.DEV. = 0.47 24.6 25 1.33 0 mean no. of hours on TV 25 mean no. of hours on TV Convert the data values to Z-score: X 24.6 : Z Shaded section: 24.6 25 0.85 0.47 P(Z<–0.85) = 0.1977 or 19.77% z-score SAMPLING DISTRIBUTIONS — THE SAMPLING DISTRIBUTION OF MEANS Page 6 B. If 35 children (ages 2 to 5) are randomly selected, what is the probability that the mean no. of hours they watch television is more than 25.9 hours? MEAN 25 3 ST.DEV. 35 0.51 25 25.9 0 1.91 mean no. of hours on TV Convert the data values to Z-score: X 25.9 : Z 25.9 25 1.91 0.47 Shaded section: P(Z>1.91) = 1 – P(Z<1.91) = 1 – 0.9719 z-score = 0.0281 or 2.81% B. If 35 children (ages 2 to 5) are randomly selected, what is the probability that the mean no. of hours they watch television is more than 25.9 hours? MEAN 25 3 ST.DEV. 35 0.51 NOTE! 25 25.9 0 1.91 mean no. of hours on TV Convert the data values to Z-score: X 25.9 : Z 25.9 25 1.91 0.47 Shaded section: P(Z>1.91) z-score = 1 – P(Z<1.91) = 1 – 0.9719 = 0.0281 or 2.81% If we get P(X)≤0.1000, it means that the event X is UNUSUAL (or nearly impossible). Conditions that lead to unusual events can be rejected (like a contradiction). SAMPLING DISTRIBUTIONS — THE SAMPLING DISTRIBUTION OF MEANS Page 7 ESTIMATION OF PARAMETERS USING INTERVALS EXAMPLE 1. In 36 sea water samples, the mean salt concentration was 23 cm3/m3. The st.dev of salt concentration in all sea waters is known to be approximately 6.7 cm3/m3. How can we estimate the mean salt concentration in all sea waters? THINK! In this case, we know the population st.dev. as: σ=2.83 cm3/m3 We want to estimate the population mean as: μ=?? cm3/m3 We know that one sample mean is: X=23 cm3/m3 So, assume that the population mean is: μ=23 cm3/m3 0.90 But we are not fully 100% sure that μ=23! — the actual mean is somewhere around μ=23! Find To find our level of ‘sureness’, we can use the SAMPLING DISTRIBUTION OF THE MEANS. fixed sample size: N = 36 (>30) population mean: X 23 2.83 population st.dev.: X 0.47 36 Now, if I want to feel, say, 90% probability that a mean-value will fall within a specific interval around the assumed μ=23… MEAN = 23 ST.DEV. = 0.47 ?1 23 ?2 -1.65 0 1.65 1.65 ?1 23 0.47 ?1 = 22.22 1.65 ?2 23 0.47 ?2 = 23.78 z-score Finally! We can say: I am 90% certain that the population mean is within 22.22 – 23.78! SAMPLING DISTRIBUTIONS — ESTIMATION OF PARAMETERS USING INTERVALS Page 8 TESTING HYPOTHESIS: WHEN TO REJECT A CLAIM EXAMPLE 1. A manufacturer advertises that its new hybrid car has a mean gas mileage of at least 50 mi/gal. To test this, you drove a random sample of 33 such vehicles and computed a mean of 47 mi/gal. If the standard deviation of the gas mileages of all such cars is 5.8 mi/gal, how do we know if we can reject the ad? THINK! In this case, we know the population st.dev. as: σ=5.5 mi/gal We assume that the ad is true: (A) μ≥50 mi/gal (μ is AT LEAST 50) But, if the ad is not true: (B) μ<50 mi/gal MEAN = 50 ST.DEV. = 1.01 We have to test if the sample we have found is ‘possible’ under the assumption. We use the SAMPLING DISTRIBUTION OF THE MEANS. fixed sample size: N = 33 0.05 (>30) ? -1.65 population mean: X 50 5.8 population st.dev.: X 1.01 33 Here, I admit that I may be wrong in rejecting the assumption! So, I say: there is a small probability (say 5%) that a mean-value can be found to be too low for the stated assumption: μ≥50 mi/gal Z 47 50 1.01 Find 50 0 z-score Z = -2.97 With the assumption and my own admittance of possible mistake at 5% probability, I can reject the assumption! (since against that 5% probability, I still have found a sample with a mean too low!) SAMPLING DISTRIBUTIONS — TESTING HYPOTHESIS: WHEN TO REJECT CLAIMS Page 9 OTHER SAMPLE STATISTICS BESIDES THE MEAN NOTE! For EACH SAMPLE (of FIXED SIZE N) of A GIVEN POPULATION the most common SAMPLE STATISTICS that can be calculated are the following: MEAN X is simply the mean of the sample Z-STATISTIC z X ( N) where X is the mean of the sample is the mean of the population is the st.dev. of the population T-STATISTIC t X (s N) where X is the mean of the sample is the mean of the population s is the st.dev. of the sample where is the mean of the population s is the st.dev. of the sample χ2-STATISTIC NOTE! 2 (N 1)s2 2 Each of these SAMPLE STATISTICS possesses a specific SAMPLING DISTRIBUTION. We have discussed the SAMPLING DISTRIBUTION OF THE MEANS before. As for the Z-STATISTIC, if a SAMPLING DISTRIBUTION OF A STATISTIC is NORMAL, then we just convert that STATISTIC to Z-SCORE and use the standard normal. SAMPLING DISTRIBUTIONS — OTHER SAMPLE STATISTICS BESIDES THE MEAN Page 10 THE SAMPLING DISTRIBUTION OF THE t-STATISTIC t-STATISTIC NOTE! t X (s N) where X is the mean of the sample is the mean of the population s is the st.dev. of the sample The t-statistic is used for questions about the population mean where the (population) standard deviation is not given — which is usually the case!! SAMPLING DISTRIBUTION OF THE t-STAT If the distribution of sample means is normal, then the data set of the t-statistic: t N = 10 df = 9 X (s N) has the so-called t-distribution (with df = N – 1) NOTE! The t distribution is symmetric around the mean, at t=0. It is very similar to the standard normal! The standard normal has exactly one form for all sorts of normal data, while the t-distribution has one form for each value of df (degrees of freedom) N = 65 df = 64 From previous section, we learned that, if the population is normal, or if sample size N≥30, then the distribution of the sample means is normal! And so, the t-stat has the t-distribution! SAMPLING DISTRIBUTIONS — THE SAMPLING DISTRIBUTION OF THE T-STATISTICS Page 11 THE SAMPLING DISTRIBUTION OF THE χ2-STATISTIC χ2-STATISTIC NOTE! 2 (N 1)s2 where 2 is the mean of the population s is the st.dev. of the sample The χ2-statistic is used for questions about the variance and standard variation of the population data. SAMPLING DISTRIBUTION OF THE χ2-STAT If the distribution of sample means is normal, then the data set of the χ2-statistic: 2 N = 10 df = 9 (N 1)s2 2 has the so-called χ2-distribution (with df = N – 1) NOTE! The χ2-distribution is asymmetric, unlike the t-distribution and the standard normal. Like the t-distribution, the χ2-distribution has one form for each value of df (degrees of freedom) N = 15 df = 14 From previous section, we learned that, if the population is normal, or if sample size N≥30, then the distribution of the sample means is normal! And so, the χ2-stat has the χ2-distribution! SAMPLING DISTRIBUTIONS — THE SAMPLING DISTRIBUTION OF THE χ2-STATISTICS Page 12