* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Previously, when making inferences about the population mean
Psychometrics wikipedia , lookup
History of statistics wikipedia , lookup
Sufficient statistic wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Confidence interval wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Statistical inference wikipedia , lookup
Chapter 18 Inference about a Population Mean Conditions for inference Previously, when making inferences about the population mean, , we were assuming the following simple conditions: (1) Our data (observations) are a simple random sample (SRS) of size n from the population of interest. (2) The variable we measure has an exactly normal distribution with parameters and . (3) Population standard deviation is known. Then we were constructing confidence interval for the population mean based on _________ distribution (one-sample z statistic): This holds approximately for large samples even if the assumption (2) is not satisfied. Why? Issue: In a more realistic setting, assumption (3) is not satisfied, i.e., the standard deviation is unknown. So what can we do to handle real-life problems? We replace the population standard deviation, by its estimate: When σ is known, the standard deviation of the sample mean x is When σ is unknown, we then estimate the standard deviation of x by (This quantity is called the _____________ of the sample mean x .) We get the one-sample t statistic: When making inferences about the population mean with unknown we use the one-sample t statistic (Note that we still need the assumptions 1 and 2). But one-sample t statistic doesn’t have normal distribution, it has 1 The t-distributions We specify a particular t-distribution by giving its degrees of freedom (d.f.). How does t-distribution compare with standard normal distribution? Similarities: Difference: As the d.f. k increases, the tk distribution approaches the Normal(0,1) distribution. Notation: tk represents the t-distribution with k d.f. 2 Confidence Intervals for a Population Mean (when standard deviation σ is unknown) Confidence interval for when is unknown (t -CI) A level C confidence interval for is given by where t* is the upper (1-C)/2 critical value for the tn-1 distribution, i.e., Ex: What critical value t* from Table C would you use to make a CI for the population mean in each of the following situations? a) A 95% CI based on n = 10 observations. b) A 90% CI based on n = 26 observations. c) An 80% CI from a sample of size 7. 3 Ex: Suppose the JC-Penney wishes to know the average income of the households in the Dallas area before they decide to open another store here. A random sample of 21 households is taken and the income of these sampled households turns out to average $45,000 with a standard deviation of $15,000. Give a 90% confidence interval for the unknown average income of the households in Dallas area. Steps for doing test of significance about when is unknown (t-test) 1. State the null and the alternative hypotheses. 2. Calculate the one-sample t statistic (assuming H0 to be true): 3. Calculate P-value and use it to draw conclusion. 4 P-values for the t-test Ha P-value μ > μ0 Pr(T ≥ t) μ < μ0 Pr(T ≤ t) μ μ0 Pr(T≥|t|) + Pr(T≤|t|) = 2 Pr(T≥|t|) Area under curve Exact P-values cannot be obtained using Table C. Locate the row corresponding to the n-1 degrees of freedom. Slide across until you find the critical values that your test statistic falls between. Use the corresponding upper tail probabilities at the top of the table to calculate an interval in which the P-value falls. Ex: Suppose testing H0: = 0 vs Ha: > 0 yields a one-sample t statistic of 1.82 from a sample of size 15. a) What are the degrees of freedom for this statistic? b) Give two critical values t* from Table C that bracket t. What are the right-tail probabilities for these two entries? c) Between what two values does the P-value fall? d) Is t = 1.82 statistically significant at the 5% level? At 1%? 5 Ex: Bottles of Boca Cola are supposed to contain 300 ml of cola. A consumer group is suspicious that the Boca Cola bottles contain less cola than what is advertised. In order to check their suspicion, they measure the contents of 9 randomly selected bottles. They find an average cola content of 299.6 ml and a standard deviation of 0.4 ml. Carry out the appropriate test of significance. a) State the null and the alternative hypotheses. b) Compute the one-sample t statistic. a) What are the degrees of freedom for the above statistic? b) Between what two probabilities p from the Table C does the P-value of the test fall. What do you conclude at 1% level? 6 Matched Pairs t Procedures As we mentioned in Chapter 9, comparative studies are more convincing than single-sample investigations. For that reason, one sample- inference is less common than comparative inference. In a matched pairs design, subjects are matched in pairs and each treatment is given to one subject in each pair. The experimenter can toss a coin to assign two treatments to the two subjects in each pair. Example 1. Suppose a college placement center wants to estimate µ, the difference in mean, starting salaries for men and women graduates who seek jobs through the center. If it independently samples men and women, the starting salaries may vary because of their different college majors and differences in grade point averages. To eliminate these sources of variability, the placement center could match male and female job-seekers according to their majors and GPAs. Then the differences between the starting salaries of each pair in the sample could be used to make an inference about µ. Example 2. Suppose you wish to estimate the difference in mean absorption rate into the bloodstream for two drugs that relieve pain. If you independently sample people, the absorption rates might vary because of age, weight, sex, etc. It may be possible to obtain two measurements on the same person. First, we administer one of the two drugs and record the time until absorption. After a sufficient amount of time, the other drug is administered and a second measurement on absorption time is obtained. The differences between the measurements for each person in the sample could then be used to estimate µ. Another situation calling for matched pairs is before-and-after observations on the same subjects. Example 3. Suppose you wish to estimate the difference in mean blood pressure before and after taking a drug. We will obtain the first measurement before a patient is taking the drug and second measurement after a sufficient amount of time that the patient was taking the drug. The differences between the measurements for each person in the sample could then be used to estimate µ. If the samples are matched pairs, find the difference between the responses within each pair, then apply one-sample t procedures to those differences of observed responses. 7 Example. An experiment is conducted to compare the starting salaries of male and female college graduates who find jobs. Pairs are formed by choosing a male and a female with the same major and similar GPA. Suppose a random sample of 10 pairs is formed in this manner and the starting annual salary of each person is recorded. Let µ1 be the mean starting salary for males and let µ2 be the mean starting salary for females. Pair 1 2 3 4 5 6 7 8 9 10 Male (in $) 29300 41500 40400 38500 43500 37800 69500 41200 38400 59200 Female (in $) 28800 41600 39800 38500 42600 38000 69200 40100 38200 58500 Difference (male – female) 500 - 100 600 0 900 - 200 300 1100 200 700 (a) Compute a 95% confidence interval for the mean difference µ = µ1-µ2. The sample average of the paired difference x and the sample standard deviation of the paired difference s The 95% paired difference CI for = 1-2 is 8 Robustness of t procedures A confidence interval is called robust if the confidence level does not change very much when the conditions for use of the procedure are violated. The t confidence interval is exact when the distribution of the population is exactly _________. However, no real data are exactly ________. The usefulness of the t procedures in practice therefore depends on Here are some practical guidelines for inference on population means: ***Always make a plot to check for skewness and outliers before using the t procedures for small samples. *** 9 Using the t procedures Except in the case of small samples, the condition that the data are an SRS from the population of interest is more important than the condition that the population distribution is normal. Sample size less than 15: Use t procedures if the data appear close to normal (roughly symmetric, single peak, no outliers). If the data are clearly skewed or if outliers are presented, do not use t procedures. Sample size at least 15: The t procedures can be used except in the presence of outliers or strong skewness. Large samples: The t procedures can be used even for clearly skewed distributions when the sample size is large, say n ≥ 40. 10