Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 8 Estimation ESTIMATING π WHEN π IS KNOWN Note: β’ Because time and money constraints, difficulty in finding population members, and so forth, we usually do not have access to all measurements of an entire population. Therefore, we rely on information from a sample. β’ We will learn techniques for estimating the population mean using sample data. Assumptions about the random variable x β’ 1. We have a simple random sample of size n drawn from a population of x values β’ 2. The value of π, the population standard deviation of x, is known β’ 3. If the x distribution is normal, then our methods work for any sample size n β’ 4. If x has an unknown distribution, then we require a sample size π β₯ 30. However, if the x distribution is distinctly skewed and definitely not mound-shaped, a sample size 50 or even 100 or higher may be necessary. Point estimate β’ An estimate of a population parameter given by a single number is called point estimate. β’ π₯ ππ π‘βπ πππππ‘ ππ π‘ππππ‘π πππ π Note: β’ Even with a large random sample, the value of π₯ is not exactly equal to population mean π. β’ We then need to calculate the margin of error Margin of Error β’ When using π₯ as a point estimate for π, the margin of error is the magnitude of π₯ β ΞΌ ππ π₯ β π . Note: β’ We cannot say exactly how close π₯ is to π is unknown. Therefore, the exact margin of error is unknown when the population parameter is unknown. β’ Therefore, we need to use confidence interval to calculate the probability to give us reliability of an estimate. Confidence Interval β’ For a confidence level c, the critical value π§π is the number such that the area under the standard normal curve between βπ§π πππ π§π πππ’πππ π β’ π βπ§π < π§ < π§π = π Review! β’ Find the z value such that 99% of the area under the standard normal curve lies between βz and z β’ Another way to say this: β’ π βπ§0.99 < π§ < π§0.99 = 0.99 Group work β’ Find the z value such that 95% of the area under the standard normal curve lies between βz and z β’ π βπ§0.95 < π§ < π§0.95 = 0.95 Levels of confidence and their corresponding critical values Level of Confidence c Critical Value π§π 0.70, or 70% 1.04 0.75, or 75% 1.15 0.80, or 80% 1.28 0.85, or 85% 1.44 0.90, or 90% 1.645 0.95, or 95% 1.96 0.98, or 98% 2.33 0.99, or 99% 2.58 Letβs put everything together β’ Since from central limit theorem, we know π₯ is approximately normal with mean ππ₯ = π when n is large. β’ Also based from central limit theorem π§ = β’ Combined with π βπ§π < π§ < π§π = π β’ We get: β’ π π βπ§π π <π₯βπ < π π§π π =c π₯βπ . π/ π E is known as maximal margin of error β’ π E=π§π π β’ E is also known as error tolerance. It is the bound on the margin of error Letβs do more math manipulation β’ We get: β’ π βπΈ < π₯ β π < πΈ = c β’ Followed by β’ π π₯βπΈ <π <π₯+πΈ =c β’ This we called a c confidence interval for π Confidence Interval for π β’ π₯ β πΈ π‘π π₯ + πΈ β’ It is an interval computed from sample data in such a way that c is the probability of generating an interval containing the actual value of π. In other words, c is the proportion of confidence intervals, based on random samples of size n, that actually contain π. Example: β’ Julia enjoys jogging. She has been jogging over a period of several years, during which time her physical condition has remained constantly good. Usually, she jogs 2 miles per day. The standard deviation of her times is π = 1.80 minutes. During the past year, Julia has recorded her times to run 2 miles. She has a random sample of 90 of these times. For these 90 times, the mean was π₯=15.60 minutes. Let π be the mean jogging time for the entire distribution of Juliaβs 2 mile running times. Find a 0.95 confidence interval for π Answer π π 1.80 E=1.96 90 β’ E=π§π β’ β’ E approx equals to 0.37 β’ β’ β’ π₯βπΈ <π <π₯+πΈ 15.60 β 0.37 < π < 15.60 + 0.37 15.23 < π < 15.97 β’ We can conclude with 95% confidence that interval from 15.23 minutes to 15.97 minutes is one that contains the population mean π of jogging times for Julia Group work β’ Mr. Liu enjoys talking to people and get to know them. Usually he talks to 5 people per day. The standard deviation of talk time is π = 3.20 ππππ’π‘ππ . During the past year, he has recorded his time to talk to 5 people. He has a random sample of n=150. For those 150 times, the mean was π₯ = 12.46 ππππ’π‘ππ . Let π be the mean talking time for the entire distribution. Find a 0.95 confidence interval for π. Group Work β’ Walter meets Julia at the track. He prefers to jog 3 miles. He knows that π = 3.67 ππππ’π‘ππ . For a random sample 90 samples, the mean time was π₯ = 23.45 minutes. Let π be the mean jogging time for the entire distribution of Walterβs 3-mile running times. Find a 0.99 confidence interval for π How to find the sample size n for estimating π when π is known β’ Assume π₯ is approximately normal β’ π= π§π π 2 πΈ β’ E= specified maximal error of estimate β’ π = population standard deviation β’ π§π = critical value from the normal distribution for the desired confidence level c β’ If n is not a whole number, increase n to the next higher whole number. Note that n is the minimal sample size for a specified confidence level and maximal error of estimate E. Example: β’ A wildlife study is designed to find the mean weight of salmon caught by an Alaskan fishing company. A preliminary study of a random sample of 50 salmon showed π β 2.15 πππ’πππ . How large of a sample should be taken to be 99% confident that the sample mean π₯ is within 0.20 pound of the true mean weight π. Answer β’ Since sample of 50 fish is large enough to permit a good approximation (50>30) β’ π= π§π π 2 πΈ β’ π= 2.58β2.15 2 0.20 = 769.2 β’ So about 770 fish or larger Group Work β’ A study is designed to show the mean number of boyfriend/girlfriend a person has in his/her lifetime. A study with n=60 showed that π β 5.16 people. How large of a sample should be taken to be 99% confident that the sample mean π₯ is within 0.50 people of the true mean π? Homework Practice β’ Pg 338 #1-20 eoo (check answers in the back) ESTIMATING π WHEN π IS UNKNOWN Wellβ¦here is the situation β’ We have just learned how to calculate π when π is known. But much of the time, when π is unknown, π is also unknown. β’ In such cases, we use the sample standard deviation s to approximate π. β’ When we use s to approximate π, the sampling distribution for π₯ follows a new distribution called a Studentβs t distribution Note: β’ What we are about to learn is the most common way to calculate. β’ What we learned last section almost never happens! Studentβs t distribution β’ Assume that x has a normal distribution with mean π. For samples of size n with sample mean π₯ and sample standard deviation s, the t variable β’ π‘= π₯βπ π / π β’ Has a Studentβs t distribution with degrees of freedom d.f. = n-1 What is degrees of freedom? β’ d.f. = n-1 β’ Degrees of freedom is the number of variables free to change when a statistic or parameter is fixed. β’ Example: if a student needs a 90 average based on three tests, and the first two scores are 82 and 95, then the last score is fixed. It must be a 93; in other words, only the first two scores were βfree to varyβ Properties of a Studentβs t distribution β’ 1) the distribution is symmetric about the mean 0 β’ 2) The distribution depends on the degrees of freedom, d.f. (d.f. = n-1 for π confidence intervals) β’ 3) The distribution is bell-shaped, but has thicker tails than the standard normal distribution β’ 4) As the degrees of freedom increase, the t distribution approaches the standard normal distribution. Now you have to be careful β’ When you look at the critical values for confidence intervals, you donβt want to use the wrong one. Confidence Interval β’ π βπ‘π < π‘ < π‘π = π Note!!!! β’ If the degrees of freedom d.f. you need are not in the table, use the closest d.f. in the table that is smaller. This procedure results in a critical value that is more conservative in the sense that it is larger. The resulting confidence interval will be longer and have a probability that is slightly higher than c. Example: Activity time! β’ Using t-chart β’ Go to Table 6 of Appendix II pg. A24 β’ Find the critical value π‘π for a 0.99 confidence level for a t distribution with sample size n=5 β’ Procedure: β First, we find the column with c heading 0.990 β Then we compute the number of degrees of freedom: d.f.=n-1 = 5-1 = 4 β Last we read down the column under the heading c=0.99 until we reach the row headed by 4. β The answer should be: 4.604 Group Activity β’ A) Find the critical value π‘π for a 0.95 confidence level for a t distribution with sample size n=13 β’ B) Find the critical value π‘π for a 0.99 confidence level for a t distribution with sample size n=32 β’ C) Find the critical value π‘π for a 0.90 confidence level for a t distribution with sample size n=7 Answer β’ A) 2.179 β’ B) 2.750 β’ C) 1.943 Maximal margin of error, E β’ πΈ= π π‘π π Confidence Interval β’ π π βπ‘π π <π₯βπ < π π‘π π =π β’ π π₯βπΈ <π <π₯+πΈ =π β’ Look at last section for prove Summary: β’ Confidence interval for π when π is unknown β’ π₯βπΈ <π <π₯+πΈ β’ Where π₯ =sample mean of a simple random sample β’ πΈ= π π‘π π β’ C= confidence level (0<c<1) β’ π‘π =critical value for confidence level c and degrees of freedom d.f.=n-1 Example: β’ Suppose an archaeologist discovers only 7 fossil skeletons from a previously unknown species of miniature horse. Reconstructions of the skeletons of these 7 miniature horses show the shoulder heights (in cm) to be: β’ 45.3 47.1 44.2 46.8 46.5 45.5 47.6 β’ A) Find the mean and the standard deviation (sample) β’ B) Find a 99% confidence interval for π Answer β’ A) π₯ = 46.14 π = 1.19 β’ B) d.f. = n-1 = 7-1 = 6 β’ π‘0.99 = 3.707 π 1.19 πΈ = π‘π = 3.707 = 1.67 π 7 π₯ β πΈ < π < π₯ + πΈ = 46.14 β 1.67 < π < Group Work β’ A company has a new process for manufacturing large artificial sapphires. In a trail run, 37 sapphires are produced. The mean weight for these 37 gems is 6.75 carats and the sample standard deviation is 0.33 carat. Let π be the mean weight for the distribution of all sapphires produced by the new process. Find a 95% confidence interval for π and interpret it. Answer β’ 6.64 < π < 6.86 β’ The company can be 95% confident that the interval from 6.64 to 6.86 is an interval that contains the population mean weight of sapphires produced by the new process. Group Work β’ Sees candies uses a new process to create their chocolate candies. In a trial run, the weights per box are (in lbs): 12.1 10.9 15.2 11.3 12.5 11.8 β’ Find a 90% confidence interval for the population mean weight per box. Homework Practice β’ Pg 349 #1-14 eoe ESTIMATING π IN THE BINOMIAL DISTRIBUTION Note: β’ Remember binomial distribution is completely determined by the number of trials n and the probability p of success on a single trial. β’ For most experiments, the number of trials is chosen in advance, then the distribution is completely determined by p. The point estimates for p and q are β’ π= π π β’ π =1βπ β’ Where n= number of trials and r= number of successes. Margin of Error for Binomial Distribution β’ πβπ β’ Or β’ πΈ = π§π ππ/π Summary on how to find a confidence interval for a proportion p β’ P is probability of success, q represents the population probability of failure. Let r be a random variable that represents the number of successes out of the n binomial trials β’ The point estimates for p and q are β’ π= β’ The number of trials n should be sufficiently large so that both ππ > 5 πππ ππ > 5 β’ β’ Confidence interval for p πβπΈ <π <π+πΈ β’ πΈ β π§π ππ/π β’ β’ C= confidence level (0<c<1) π§π = critical value for confidence level c based on the standard normal distribution π π πππ π = 1 β π Example: β’ Suppose that 800 students are selected at random from a student body of 20000 and that they are each given a shot to prevent a certain type of flu. These 800 students are then exposed to the flu, and 600 of them do not get the flu. Let p represent the probability that the shot will be successful for any single student selected at random from the entire population of 20000. β’ A) What is the number of trials n? What is the value of r? β’ B)What are the point estimates for p and q? β’ C) Would it seem that the number of trials is large enough to justify a normal approximation to the binomial? β’ D) Find a 99% confidence interval for p Answer β’ A) n=800, r=600 β’ B) π = 800 = 0.75 β’ π = 0.25 β’ C) ππ β 800 0.75 = 600 > 5, ππ β 800 0.25 = 200 > 5 a normal approximation is justified β’ D)πΈ β π§0.99 β’ β’ 99% confidence interval is then πβπΈ <π <π+πΈ β’ β’ 0.75 β 0.0395 < π < 0.75 + 0.0395 0.71 < π < 0.79 600 ππ π β 2.58 0.75 0.25 800 β 0.0395 Group Work β’ A random sample of 190 books purchased at a local bookstore showed that 71 of the books were science fiction. Let p represent the proportion of books sold by this store that are science fiction. β’ β’ β’ β’ A) what is a point estimate for p? B) Find a 90% confidence interval for p C) Interpret the confidence interval D) Can normal approximation be justified? Group Work β’ A random sample of 260 hand sanitizer was chosen at random showed that 102 of them kills the bacteria. Let p represent the proportion of hand sanitizer that kills the bacteria. β’ β’ β’ β’ A) What are the point estimates? B) Find a 95% confidence interval for p C) Can normal approximation be justified? D) Interpret the confidence interval General interpretation of poll results β’ 1) When a poll states the results of a survey, the proportion reported to respond in the designated manner is π, the sample estimate of the population proportion β’ 2) The margin of error is the maximal error E of a 95% confidence interval for p β’ 3) A 95% confidence interval for the population proportion p is β’ ππππ ππππππ‘ π β ππππππ ππ πππππ πΈ < π < ππππ ππππππ‘ π + ππππππ ππ πππππ πΈ Example: β’ A) What confidence level corresponds to the phrase βchances are 19 of 20 that ifβ¦β β’ B) What is the error correspond to the phrase βresults by no more than 2.6 percentage points in either directionβ? Answer β’ A) 19/20 = .95 or 95% β’ B) 2.6% How to find the sample size n for estimating a proportion p β’ π =π 1βπ β’ π= 1 π§π 2 if 4 πΈ π§π 2 if πΈ you have a preliminary estimate for p you do not have a preliminary estimate for p β’ If n is not a whole number, increase n to the next higher whole number. β’ Also, if necessary, increase the sample size n to ensure that both np>5 and nq>5. Note that n is the minimal sample size for a specified confidence level and maximal error of estimate. Example β’ A company is in the business of selling wholesale popcorn to grocery stores. The company buys directly from farmers. A buyer for the company is examining a large amount of corn from a certain farmer. Before the purchase is made, the buyer wants to estimate p, the probability that a kernel will pop. β’ Suppose a random sample of n kernels is taken and r of these kernels pop. The buyer wants to be 95% sure that the point estimate π = π πππ π π€πππ ππ ππ πππππeither way by less than 0.01 π β’ A) if no preliminary study is made to estimate p, how large a sample should the buyer use? β’ B) A preliminary study showed that p was approx 0.86. If the buyer uses the results of the preliminary study, how large a sample should be used? Answer β’ π§0.95 = 1.96 β’ A)π = 9604 1 π§π 2 4 πΈ β’ B) π = π 1 β 0.86 0.14 = 1 1.96 2 4 0.01 = 0.25 38416 = π§π 2 π = πΈ 1.96 2 = 4625.29 0.01 = 4626 Group Work β’ Blah blah blah blahβ¦99% sure that the point π estimate π = πππ π π€πππ ππ ππ πππππ either way π by less than 0.48 β’ A) if no preliminary study is made to estimate p, how large a sample should we use? β’ B) A preliminary study showed that p was approx 0.42. If we use the results of the preliminary study, how large a sample should be used? Homework practice β’ Pg 361 #1-21 odd ESTIMATING ππ β ππ AND ππ β ππ Note: β’ How can we tell if two populations are different? β’ One way is to compare the difference in population mean or the difference in population proportions β’ Two samples are independent if sample data drawn from one population are completely unrelated to the selection of sample data from the other population. β’ Two samples are dependent if each data value in one sample can be paired with a corresponding data value in the other sample. Dependent Samples β’ Dependent samples and data pairs occur very naturally in which the same object or item is measured twice. Independent Samples β’ Independent samples occur very naturally when we draw two random samples, one from the first population and one from the second population. Since both samples are random samples, there is no pairing of measurements between the two populations. 1st situation: confidence intervals for π1 β π2 when π1 πππ π2 are known β’ Let π1 πππ π2 be the population standard deviations of populations 1 and 2. Obtain two independent random samples from populations 1 and 2, where β’ β’ π₯1 and π₯2 are sample means from populations 1 and 2 π1 and π2 are sample sizes from populations 1 and 2 β’ If you can assume that both population distributions 1 and 2 are normal, any sample sizes π1 and π2 will work. If you cannot assume this, then use sample sizes greater than or equals to 30; π1 β₯ 30 and π2 β₯ 30 β’ Confidence interval for π1 β π2 β’ (π₯1 β π₯2 ) β E < π1 β π2 < (π₯1 β π₯2 ) + E β’ Where πΈ = π§π β’ β’ C = confidence level (0<c<1) π§π = critical value for confidence level c π12 π1 + π22 π2 Example β’ Suppose you are a biologist studying fishing data from Yellowstone streams before and after a major disaster. Fishing reports include the number of trout caught per day per fisherman. A random sample of π1 =167 reports from period before the fire showed that the average catch was π₯1 =5.2 trout per day. Assume the standard deviation of daily catch per fisherman was π1 =1.9. Another random of π2 = 125 fishing reports 5 years after the disaster showed that the average catch per day was π₯2 =6.8 trout. Assume the s.d. during this period was π2 =2.3 β’ A) What is the population for each sample? Are the independent or dependent? β’ B) Compute a 95% confidence interval for π1 β π2 β’ C) Interpret the result Answer β’ A) Yes they are independent because they are from 2 different events. β’ B) Since π1 = 167, π₯1 = 5.2, π1 = 1.9, π2 = 125, π₯2 = 6.8, π2 = 2.3, π§0.95 = 1.96 β’ πΈ = π§π β’ β’ β’ β’ Therefore 95% CI is (π₯1 β π₯2 ) β E < π1 β π2 < (π₯1 β π₯2 ) + E (5.2β6.8) β 0.50 < π1 β π2 < (5.2β6.8) + 0.50 -2.10< π1 β π2 < -1.10 β’ C) Since the interval is negative, we are 95% confident that π1 < π2 so that we are 95% sure that average catch before the fire was less than average catch after the fire. π12 π1 π2 + π2 = 1.96 2 1.92 167 2.32 + 125 = .4955 = .50 β’ Situation 2: (Most common) confidence intervals for π1 β π2 when π1 πππ π2 are unknown Obtain two independent random samples from populations 1 and 2, where β’ β’ β’ β’ π₯1 πππ π₯2 are sample means from populations 1 and 2 π 1 πππ π 2 are sample standard deviations from populations 1 and 2 π1 πππ π2 are sample sizes from populations 1 and 2 If you can assume that both population distributions 1 and 2 are normal or at least mound shaped and symmetric, then any sample sizes π1 πππ π2 will work, if not, use sample sizes greater than or equal to 30, π1β₯ 30 πππ π2 β₯ 30 β’ Confidence interval for π1 β π2 β’ (π₯1 β π₯2 ) β E < π1 β π2 < (π₯1 β π₯2 ) + E β’ Where πΈ = π‘π β’ β’ β’ C = confidence level (0<c<1) π‘π = critical value for confidence level c d.f.= degree of freedom, the smaller of π1 β 1 πππ π2 β 1 β’ Example: if you have π1 β 1 =25 and π2 β 1 = 15, you use 15 as the d.f. π 12 π1 π 2 + π2 2 Example β’ Suppose that a random sample of 29 college students was randomly divided into two groups. The first group of π1 = 15 people was given ½ liter of red wine before going to sleep. The second group of π2 = 14 people was given no alcohol before going to sleep. Everyone in both groups went to sleep at 11 P.M. The average brain wave activity was determined for each individual in the groups. The results follow: β’ β’ Group 1: 16.0 19.6 19.9 20.9 20.3 20.1 16.4 20.6 20.1 22.3 18.8 19.1 17.4 21.1 22.1 β’ β’ Group 2: 8.2 5.4 6.8 6.5 4.7 5.9 2.9 7.6 10.2 6.4 8.8 5.4 8.3 5.1 β’ A) Do you think the samples are independent or dependent? Explain β’ B) What assumptions are we making about the data? β’ C) Compute a 90% confidence interval for π1 β π2 Answer β’ A) Since they are random sample of 29 students that was randomly divided into two groups, it is reasonable to say that they are independent. β’ B) We are assuming the populations of π₯1 and π₯2 are approximately normally distributed. β’ C) π₯1 = 19.65, s1 = 1.86 β’ π₯2 = 6.59, π 2 = 1.91 β’ πΈ = π‘π π 12 π1 + π 22 π2 = 1.771 1.862 15 + 1.912 14 = 1.24 β’ (19.65 β 6.59) β 1.24 < π1 β π2 < (19.65 β 6.59) + 1.24 β’ 11.82 < π1 β π2 < 14.30 β’ β’ β’ β’ 3rd situation: confidence intervals for π1 β π2 when π1 πππ π2 are unknown but we believe that π1 = π2 π₯1 πππ π₯2 are sample means from populations 1 and 2 π 1 πππ π 2 are sample standard deviations from populations 1 and 2 π1 πππ π2 are sample sizes from populations 1 and 2 If you can assume that both population distributions 1 and 2 are normal or at least mound shaped and symmetric, then any sample sizes π1 πππ π2 will work, if not, use sample sizes greater than or equal to 30, π1β₯ 30 πππ π2 β₯ 30 β’ Confidence interval for π1 β π2 when π1 = π2 β’ (π₯1 β π₯2 ) β E < π1 β π2 < (π₯1 β π₯2 ) + E β’ Where πΈ = π‘π β’ β’ β’ C = confidence level (0<c<1) π‘π = critical value for confidence level c d.f.= degree of freedom, d.f=π1 + π2 β 2 π1 β1)π 12 + π2 β1)π 22 π1 +π2 β2 Example (when you find the s.d. they seem very close to each other) β’ Height of Asians in Asia (in feet): β’ 5.14 5.75 5.29 5.86 5.92 6.12 5.77 5.81 5.80 5.78 β’ Height of Asians in US (in feet): β’ 5.16 5.72 5.30 5.84 5.95 6 5.79 5.80 5.85 5.81 β’ A) create a 85% confidence interval for the difference in population mean weight for Asians in Asia and USA. How to find a confidence interval for ππ β ππ β’ Binomial Experiment 1 β’ π1 = ππ’ππππ ππ π‘ππππ β’ π1 =number of successes out of π1 trials β’ π1 = π1 π1 β’ π1 =population probability of success β’ Binomial Experiment 2 β’ π2 = ππ’ππππ ππ π‘ππππ β’ π2 =number of successes out of π1 trials β’ π2 = π2 π2 β’ π2 =population probability of success The number of trials should be sufficiently large so that all four of the following are true: π1 π1 > 5; π1 π1 > 5; π2 π2 > 5; π2 π2 > 5 Confidence interval for ππ β ππ π1 β π2 β πΈ < π1 β π2 < π1 β π2 + πΈ πΈ = π§π π = π§π π1 π1 π2 π2 + π1 π2 Example: β’ Suppose two groups of subjects were randomly chosen for a sleep study. In group I, before going to sleep, the subjects spent 1 hour watching a comedy movie. In this group there were total of 175 dreams recorded, of which 49 were dreams with feeling of anxiety, fear, or aggression. In group II, the subject just went to sleep. There were total of 180 dreams recoded, of which 63 were dreams with feeling of anxiety, fear or aggression. β’ A) Why could group I and II be considered independent binomial distributions? Do we have enough sample? β’ B) compute a 95% confidence interval for ππ β ππ Answer β’ A) yes, because they were chosen randomly and they donβt overlap. Also, we have enough samples because when we did the calculations itβs all over 5 (do the work!!) β’ B) π§π π1 π1 π1 + π2 π2 π2 = 1.96 .28 .72) 175 + .35 .65) 180 β’ E=0.096 β’ π1 β π2 β πΈ < π1 β π2 < π1 β π2 + πΈ β’ .28 β .35 β 0.096 < π1 β π2 < .28 β .35 + 0.096 β’ β0.166 < π1 β π2 < 0.026 Homework Practice β’ Pg 377 #1-23 eoo