Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Section 9.3 When Things Aren’t Normal Center, Spread, and Shape Center, Spread, and Shape Center: goal is to estimate population mean Center, Spread, and Shape Center: goal is to estimate population mean What is usually true about the population mean, ? Center, Spread, and Shape Center: goal is to estimate population mean What is usually true about the population mean, ? is not known, otherwise we would not have to estimate Center, Spread, and Shape Center, Spread, and Shape Spread: What is usually true about the population standard deviation, ? Center, Spread, and Shape Spread: What is usually true about the population standard deviation, ? usually unknown. What do we do? Center, Spread, and Shape Spread: What is usually true about the population standard deviation, ? usually unknown. What do we do? Substitute s for and use critical values of t* instead of critical values of z* Center, Spread, and Shape Center, Spread, and Shape Shape: Because t-procedures are robust (not very sensitive) to departures from normality, you can usually get away with less than a normally distributed population. Center, Spread, and Shape Shape: Because t-procedures are robust (not very sensitive) to departures from normality, you can usually get away with less than a normally distributed population. You still have to check a condition about shape. Why Check Conditions? If the sample size is small and Why Check Conditions? If the sample size is small and if the underlying population is highly skewed rather than normal or has extreme outliers, then: Why Check Conditions? If the sample size is small and if the underlying population is highly skewed rather than normal or has extreme outliers, then: 1) Capture rate for interval of the form s x t* might be substantially lower n than the advertised capture rate and Why Check Conditions? 2) Significance test based on normal distribution will falsely reject Ho substantially more often than the advertised rate, for example 5% Before you conduct any tests or construct any confidence intervals, always your data to see their shape. plot If plot looks like data came from normal population, then do not worry about shape. If plot looks like data came from normal population, then do not worry about shape. If plot shows any major deviations from normal shape, you may try another approach. Try a Transformation Try a Transformation Skewed distributions almost always can be made more nearly symmetric by transforming them to a new scale. Try a Transformation Skewed distributions almost always can be made more nearly symmetric by transforming them to a new scale. If a change of scale makes data look roughly normal, again you do not need to worry about shape. Outliers may now look like “part of the herd” Common Transformations 1) For distribution skewed right, try log transformation Common Transformations 1) For distribution skewed right, try log transformation 2) When data are ratios, try reciprocal transformation Brain Weights for Selection of 68 Species of Animals Page 603 Always Plot Data!! Log Transformation (logarithms of brain weights) Log Transformation (distribution of sample means) Sample means for 100 samples of size 5 from the logarithms of brain weights Reciprocal Transformation Whenever data come in the form of a ratio, think about what would happen if you invert the ratio. Always Plot Your Data!!! Page 606 Always Plot Your Data!!! Always Plot Your Data!!! Page 606 Reciprocal Transformation Reciprocal Transformation Outliers If changing the scale does not take care of outliers, then . . . Outliers If changing the scale does not take care of outliers, then do two analyses: • one with all the data • one without the outliers Then what? Outliers If both analyses yield same conclusion, then you are in good shape. What if you get a “split decision”? Outliers If both analyses yield same conclusion, then you are in good shape. What if you get a “split decision”? Get more data! Worst cases are small samples with extreme skewness or extreme outliers. 15/40 Guideline Worst cases are small samples with extreme skewness or extreme outliers. To be safe in using the t-procedure, you can rely on the 15/40 guideline. 15/40 Guideline First, plot your data. Modified boxplot helps identify shape and outliers. 15/40 Guideline If your random sample looks like it reasonably could have come from a normally distributed population, then 15/40 Guideline If your random sample looks like it reasonably could have come from a normally distributed population, then you can proceed with t-procedures for confidence intervals and significance tests. 15/40 Guideline If you suspect data did not come from a normally distributed population, follow 15/40 guideline. 15/40 Guideline If you suspect data did not come from a normally distributed population, follow 15/40 guideline. If sample size is less than 15: Be very careful. Your data or transformed data must look as if they came from normally distributed population . . . little skewness, no outliers. 15/40 Guideline If sample size between 15 and 40: Proceed with caution. If you do not transform the data or if outliers remain even after a change of scale, do two analyses of test or confidence interval, one with and one without outliers. 15/40 Guideline If sample size between 15 and 40: Proceed with caution. If you do not transform the data or if outliers remain even after a change of scale, do two analyses of test or confidence interval, one with and one without outliers. Do not rely on any conclusions that depend on whether or not outliers included. 15/40 Guideline If sample size is at least 40: You are in good shape. 15/40 Guideline If sample size is at least 40: You are in good shape. Skewness will not reduce capture rates nor alter significance levels enough to matter. If outliers present, then should do two analyses. 15/40 Guideline Page 608 Page 608, D12 Page 608, D12 a) Sample of size 10 is a small sample from a population that may be strongly skewed toward the higher priced houses. Check a plot of the data for skewness and outliers. A transformation may take care of skewness. Page 608, D12 b) This is a large sample (n = 100) from a population that may be strongly skewed toward the higher prices. Now you need not be so concerned about skewness, but you should still look for outliers that might affect the results. Page 608, D12 c) The population of SAT scores is generally quite normal in shape, so there is little cause for concern here. The t-procedure should work fine, so no transformation is needed. Page 608, D12 d) Waiting times are notoriously skewed. A typical distribution of data of this type would show many small to moderate times, but a few very long ones. With a sample size of 20, a transformation would be necessary to bring the data into the normal fold. Page 611, E44 Page 611, E44 a) Yes, it is appropriate to construct a CI without transforming the data. The lengths of stay from the sample are slightly skewed toward the larger values, but the sample size of 396 is so large a confidence interval based on t should work fine. Page 611, E44 b) 8: TInterval Inpt: Data Stats x: 2.91 sx: 1.58 n: 396 C-Level: .90 Calculate (2.7791, 3.0409) Page 611, E44 c) No, we should not be more concerned about constructing a CI without a transformation if the sample size was 40 instead of nearly 400. For a sample size of 4, a transformation should be used to see if the transformed data look as if they came from a normally distributed population. Page 612, E47 Page 612, E47 a) No, the situation looks even worse. When the original three outliers are removed, still more outliers are created. This result is typical of strongly skewed-right data. Page 612, E47 b) All 68 species: (102.34, 686.66) 3 original outliers removed: Page 612, E47 b) All 68 species: (102.34, 686.66) 3 original outliers removed: (69.315, 229.49) Page 612, E47 c) The center of the interval after removing the outliers is much lower than the center with all 68 brain weights ( 149.4 vs 394.5) The width of the second interval is also much smaller. Confidence intervals are apparently highly variable when the distribution is highly skewed. Page 612, E47 d) (2.327, 3.627) I am 95% confident that the mean natural logarithm of the weight of animals’ brains is between 2.327 and 3.627 grams. Questions?