Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Transcript

In this chapter we consider questions about population means similar to those we have studied about proportions in the last couple of chapters. We would now like to construct confidence intervals and conduct hypothesis tests for population means (similar to what we did for population proportions in the last couple of chapters). We will mimic the procedures used for proportions, but we will suddenly find ourselves unable to use the z-curve anymore. Recall that for the distribution of sample means (i.e. the distribution of x ) we had: s (x) = s n where is the population’s standard deviation However, we typically will not know the value of . Thus we use s (the sample’s standard deviation) instead to get the of the distribution of x . s ( x ) » SE ( x ) = s n When we use this “distribution” of x , with standard error instead of standard deviation, and calculate “z – scores”, the distribution shape is no longer the z curve (even if n is more than 30). Rather, we get a curve similar to the z-curve. It is centered at 0, unimodal and symmetric, but it is taller and thinner than the z-curve. We end up on one of the . Each one of these curves is defined by a parameter called the “ ” or df. For every positive number, there is a t-curve for that many degrees of freedom. For problems involving only one population mean, we will use the t-curve with df = n – 1 where n is the sample size. We can use these t distributions for problems involving means if the population is known to be normal, the sample size is large (C.L.T.), or if the sample data has a fairly linear (this can be created in the TI) Put the data in L1. Press , and while in “STAT PLOT”, choose Plot1, turn it on and select the last plot option. Then press to see the plot. If the plot is approximately linear, then the sample is consistent with having come from a normal population. We have a properly selected random sample of size n that has mean x and standard deviation s. The population is known to be normal, the sample size is large (C.L.T.), or the sample data produces a fairly linear normal plot. m is in the interval: æ s ö x ±t ç ÷ è nø * where t* is the value from the t-curve with df = n – 1 corresponding to the level of confidence (this is the equivalent of the z* values used in the confidence intervals for a population proportion) This formula is problematic in practice, because there are different values of t* for each different t-curve (too many to memorize). So we will, in practice, construct these intervals using technology. Go into the STAT menu, over to TESTS, and select TInterval… If you have the actual sample data in L1, choose “Data”, if you have the summary statistics for the sample ( x and s) then choose “Stats” and enter the values. A father is concerned that his teenage son is watching too much television each day, since his son watches an average of 2 hours per day. His son says that his TV habits are no different than those of his friends. Since this father has taken a stats class, he knows that he can actually test to see whether or not his son is watching more TV than his peers. The father collects a random sample of television watching times from boys at his son’s high school and gets the following data: 1.9 2.3 2.2 1.9 1.6 2.6 1.4 2.0 2.0 2.2 Construct and interpret a 97% confidence interval based on this data. Be sure to justify the method used. In a random sample of 50 of a new brand of battery, the average lifespan was 952 hours with a standard deviation of 18 hours. Construct and interpret a 98% confidence interval based on this sample. As we did for proportions, suppose we want to control the width of the confidence interval (likely make it narrower) while at the same time having a fairly high level of confidence. The margin of error for these confidence intervals is: æ s ö ME = t ç ÷ è nø * Solving this for n gives: æ t *s ö n =ç ÷ è ME ø 2 æ t *s ö n =ç ÷ è ME ø 2 Now, this formula has 2 problems. We cannot know with which t-curve we are working without knowing the sample size (because df = n – 1) and we do not have a value for s until we have taken a sample. So, to determine n we need to know t* and s, but to know t* and s we need to have a sample. We “fix” this by replacing the t* values with the z* values from our previous confidence intervals (this gives a larger value of n than is likely needed) and we use a value for s that comes from previous studies about the variable of interest. Recall, z* = 1.645 for 90% confidence, 1.96 for 95% confidence, and 2.33 for 98% confidence. We then have: æ z*s ö n ³ç ÷ ME è ø 2 Suppose we wish to construct a 98% confidence interval for average body temperature of people testing positive for a new strain of influenza within 0.14. Suppose also that previous studies support that the standard deviation of human body temperature is 0.45. How many subjects must be tested? Hypotheses H0: μ = # Ha: one of (a) μ > # (upper tail test) (b) μ < # (lower tail test) (c) μ ≠ # (two-tail test) Test Statistic x -# t= s n where “#” is the hypothesized value of μ P-value Depends on the alternative hypothesis: (a) tcdf (t, ¥, df ) (upper tail test) (b) tcdf (-¥, t, df ) (lower tail test) (c) 2 *tcdf ( t , ¥, df ) (two-tailed test) where df = n – 1 Validity/Assumptions • We have a properly collected, random sample • Sample size is not more than 10% of the population One of: • population known to be normal • large sample size (C.L.T.): n ≥ 30 or • approximately linear normal plot of sample data Find the P-value for each of the following, assuming samples are from a normal population. (a) H0: = 100 (b) H0: = 58 Ha: < 100 Ha: ≠ 58 t = -1.48 t = -2.64 n = 15 n = 10 The posted speed limit on a certain residential road is 30mph. The residents believe that drivers are speeding on this road on average. They observe 20 randomly selected drivers on this road and find the mean speed to be 31.8mph with a standard deviation of 4.2mph. Is the residents’ belief accurate? We can use the T-Test in the TI calculator to do this as well. Go into the STAT menu, choose TESTS, then choose T-Test… If you have the actual sample data in L1, choose “Data”, if you have the summary statistics for the sample then choose “Stats”. Example 5 would like like this: A certain university advertises that the GPA of all of its Business students is 3.6. An employee at another institution believes the value is actually lower than this. He randomly gathers the GPA of 6 Business students from this university (results below). Test his claim at the α = 0.05 level. 3.4 3.6 3.3 3.6 3.5 3.5 We can replace the test statistic and P-value with a confidence interval for μ calculated from the sample using the TInterval function in the calculator. If the hypothesized value is not in the interval, then we reject H0 If the hypothesized value is in the interval, then we fail to reject H0 All other “pieces” of the hypothesis test are the same. The posted speed limit on a certain residential road is 30mph. The residents believe that drivers are speeding on this road on average. They observe 20 randomly selected drivers on this road and find the mean speed to be 31.8mph with a standard deviation of 4.2mph. Is the residents’ belief accurate? Test the relevant hypotheses using a 90% confidence interval.