Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
• Inference about the mean of a population of measurements (m) is based on the standardized value of the sample mean (Xbar). • The standardization involves subtracting the mean of Xbar and dividing by the standard deviation of Xbar – recall that – Mean of Xbar is m ; and – Standard deviation of Xbar is s/sqrt(n) • Thus we have (Xbar - m )/(s/sqrt(n)) which has a Z distribution if: – Population is normal and s is known ; or if – n is large so CLT takes over… • But what if s is unknown?? Then this standardized Xbar doesn’t have a Z distribution anymore, but a so-called t-distribution with n-1 degrees of freedom… • Since s is unknown, the standard deviation of Xbar, s/sqrt(n), is unknown. We estimate it by the so-called standard error of Xbar, s/sqrt(n), where s=the sample standard deviation. • There is a t-distribution for every value of the sample size; we’ll use t(k) to stand for the particular t-distribution with k degrees of freedom. There are some properties of these tdistributions that we should note… • Every t-distribution looks like a N(0,1) distribution; i.e., it is centered and symmetric around 0 and has the same characteristic “bell” shape… however, the standard deviation of t(k) {sqrt(k/(k-2))} is greater than 1, the s.d. of Z so the t-distribution density curve is more spread out than Z. Probabilities involving r.v.s that have the t(k) distributions are given by areas under the t(k) density curve … Table D in the back of our book gives us the probabilities we need… • The good news is that everything we’ve already learned about constructing confidence intervals and testing hypotheses about m carries through under the assumption of unknown s … • So e.g., a 95% confidence interval for m based on a SRS from a population with unknown s is Xbar +/- t*(s.e.(Xbar)) Recall that s.e.(Xbar) = s/sqrt(n). Here t* is the appropriate tabulated value from Table D so that the area between –t* and +t* is .95 • As we did before, if we change the level of confidence then the value of t* must change appropriately… • Similarly, we may test hypotheses using this tdistributed standardized Xbar… e.g., to test the H0: m =m0 against Ha: m >m0 we use (Xbar - m0)/(s/sqrt(n)) which has a tdistribution with n-1 df, assuming the null hypothesis is true. See page 422 (7.1, 3/7) for a complete summary of hypothesis testing in the case of “the one-sample t-test” … • HW: Read section 7.1 thru p. 433; go over all the examples carefully and answer the HW questions following them: #7.1-7.9 Work on the following problems (p.441 ff) (use software as needed): #7.15-7.22, 7.25, 7.32, 7.35-7.37, 7.41. Is there a difference in aggressive behavior of patients on "moon days" compared with "non-moon days"? •To summarize the analysis: – when the data comes in matched pairs, the analysis is performed on the differences between the paired measurements – then use the t-statistic with n-1 d.f. (n = # of pairs) to construct confidence intervals and test hypotheses on the true mean difference. • In a matched pairs design, subjects are matched in pairs and the outcomes are compared within each matched pair. A coin toss could determine which of the two subjects gets the treatment and which gets the control… One special kind of matched pairs design is when a subject acts as his/her own control, as in a before/after study… See example 7.7 on page 428ff (7.1, 4/7). Note that the paired observations (# of agressive behaviors) are subtracted and the difference in scores becomes the single number analyzed with a one-sample t-statistic with n-1 df, where n=the number of pairs… see the top of page 431 and the next page for a summary of the process. • HW Read through p.433. Go over Example 7.7 then do #7.32, 7.35, 7.41. • Read the section on Robustness of the t procedures (starting p.432 (7.1, 5/7))… note the definition of the statistical term robust – essentially, a statistic is robust if it is insensitive to violations of the assumptions made when the statistic is used. For example, the t-statistic requires normality of the population… how sensitive is the t-statistic to violations of normality?? Look at the practical guidelines for inference on a single mean at bottom of p.432… – If the sample size is < 15, use the t procedures if the data are close to normal. – If the sample size is >= 15 then unless there is strong non-normality or outliers, t procedures are OK – If the sample size is large (say n >= 40) then even if the distribution is skewed, t procedures are OK