Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Sufficient statistic wikipedia , lookup
Psychometrics wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
Foundations of statistics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
History of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
Misuse of statistics wikipedia , lookup
Today’s lesson • Case study involving a quality control (QC) application using the material that we have covered to date. • In class demonstration of how to use SPSS to get the numbers out. • Discussion of meaning of results. Today’s lesson • Confidence interval for the mean of a normal distribution using standard normal and t-distribution. • On Thursday, we finish Chapter Eleven with one sample t and z tests and a review (yet again) of the structure of tests of statistical hypotheses. Case Study • QC application in which the objective is to specify a probability model for the “manufacturing” process. • That is, specify the A part in an A vs. B comparison. • See Finch et al., Statistics in Medicine, 1999, pages 1279-1289. Context • Application is the filtering of red blood to remove white blood cells. • Your white blood cells in your blood are good. • My white blood cells in the unit of blood that I donated to be transfused to you are bad for you. – Transfusion reaction – Possible infection vector Practical Context • Health regulations are that a unit of filtered blood cannot contain more than so many “residual white blood cells” (RWBC). – US standard is 5x106 rwbc in a unit. – European standard is 1x106 rwbc in a unit. Practical Context • Measured RWBC is product of three numbers. – Constant scaling factor – volume of blood donated in the unit – Nageotte count, which is the number of white blood cells observed in a small volume of sampled filtered blood. First Question • What should be the dependent variable monitored in the QC application? • Answer is to use the Nageotte count rather than the RWBC. – American standard is then a Nageotte count of 167. – European standard is then a Nageotte count of 33. Components of Variance • Here used Fisher’s fundamental idea of finding “components of variance.” • Specifically, variance of RWBC has a nonquality related component of variance from the variation of the volume of blood in the donated unit. Second Question • What actually happened in the QC process when the manufacturer’s staff did the work? • That is, use the descriptive options in a statistical package to describe the data. Getting numbers out • Enter SPSS • Access correct file (CAREFUL, CAREFUL, CAREFUL!!!) • Statistics menu • Descriptive submenu • Frequencies option Third Question • Then, specify a probabilistic model that fits the data reasonably well so that predictions can be made. • Nageotte count variable is a ratio scale of measurement. • Nageotte count is discrete, not continuous. • Variance is very much larger than the mean. – Hence focused on the negative binomial distribution (NBD). Fourth Question • ASS-U-ME a negative binomial distribution. • How well does it fit the data observed? • Use a goodness of fit test (we won’t cover this in detail until after your first exam). Observed and Expected Nageotte Counts i 0 Oi 283 Ei 292.2 1 2 3 4 5 6 or more 54 18 15 0 4 12 30.5 15.6 10.1 7.3 5.5 24.7 Interpretation • One observation violated American, and two violated European rule. • NBD model does not predict maximum observed value of 205. • NBD model does not fit well but captures the rough order of variation (up to a Nageotte count of 33). • Choose a nonparametric test procedure because null distribution is not obvious. My Most Common Three Mistakes in Making Predictions • Eliminate “outliers” from the historical data that I am using to make my prediction. • Predict a ratio scale variable without anticipating that the variance of the variable will increase when the mean increases. • ASS-U-ME independence of observations when predicting a time series with autocorrelation. Fifth Question • How big of a sample is necessary to determine whether a user of this product is “in control.” • Simulation study suggested that under optimistic conditions 20 is a minimal sample size but that 80 may be required. • Client’s practice has evolved to use about 50. Chapter Eleven: Testing a Hypothesis about a Single Mean • Definition of Student’s t Distribution • Using tables of Student’s t distribution. • Using the observed significance level from Student’s t. • Using the confidence limit from Student’s t. Historical Background of Student’s t • Origin is quality control in the brewing industry (Guinness). • How can statistical procedures be applied with very small samples? • Nature of the advance is to describe the null distribution of the statistic that is actually used. Definition of the Student’s t distribution • The pdf is also a bell-shaped curve • Continuous distribution, unimodal, symmetric, less rapid fall-off of probability for values far from mean. • Appendix C table (546-547) gives two sided tail probability by degrees of freedom. • Most statistics texts give percentile points. Basic numeric facts of Student’s t percentiles • Student t 95-th percentiles are larger than the 95-th standard normal percentile. • Same holds for any percentile greater than 50. • The difference becomes larger as the “degrees of freedom” is smaller. Example One-Sample Z test Problem • Test the null hypothesis H0: E(Y)=500 with level of significance 0.01 against the alternative hypothesis H1: E(Y)<500. ASSU-ME Y is normal with known standard deviation 100 using the sample mean of a random sample of four observations. This statistic has value 360. Solution • Determine the side of the test, here leftsided. • Determine the standard error of the statistic, here 100/40.5=50 • Determine the critical value of the test statistic. – In original form, 500-2.326(50)=383.7 – In standard unit form, -2.326. Solution Continued • Compare the statistic to the critical value: – In original units, the observed mean of 360 is to the left of the critical value of 383.7. – In standard-score form, the z value of the mean is (statistic-hypothesized expected value)/se of statistic=(460-500)/50=-2.8, to the left of the critical value -2.326. • Make decision: reject H0 at the 0.01 level of significance. Example One-Sample T test Problem • Test the null hypothesis H0: E(Y)=500 with level of significance 0.01 against the alternative hypothesis H1: E(Y)<500. ASSU-ME Y is normal with unknown standard deviation. The mean of a random sample of four observations has value 360, and the unbiased estimate of the variance is 6400. Note that the corresponding estimate of the standard deviation is 80. Solution • Determine the side of the test, here leftsided. • Determine the estimated standard error of the statistic, here 80/40.5=40. • Student’s contribution: determine the degrees of freedom. For a one-sample t-test, it is number of observations minus one, here 4-1=3. Solution Continued • Determine the critical value of the test statistic. Don’t forget Student’s stretch of the critical value – In original form, 500-t3,2.326(40)=500-4.541(40) =318.36 – In standard unit form, -4.541. Solution Continued • Compare the statistic to the critical value: – In original units, the observed mean of 360 is to the right of the critical value of 318.36. – In standard-score form, the z value of the mean is (statistic-hypothesized expected value)/se of statistic=(360-500)/40=-3.5, to the right of the critical value -4.541. • Make decision: accept H0 at the 0.01 level of significance Major points covered • Review of material using a case study that applies descriptive statistics. • Introduction (review) of Student’s t. • The one sample standard normal test. • The one sample Student’s t test. To come • Finish Chapter 11 with one sample confidence intervals. • Begin Chapter 12, the paired t-test.