Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
REVIEW: CONFIDENCE INTERVALS AND HYPOTHESIS TESTS V INEYARD S OIL D ATA : P OTASSIUM IN 2004 COMPARED TO 2007 BACKGROUND Soil Potassium was measured at 10 randomly sampled locations in a vineyard in 2004 and again in 2007. • • Has soil potassium changed over time? If so, how much has it changed? We’ll begin by assuming we essentially have two independently chosen random samples of soil. One sample from 2004 and another from 2007. (As a matter of fact, the data was collected differently and we shall see the impact of this on the analysis later). LOOK AT THE DATA WINESOILS UNLINKED.MTW We start with the “Unlinked” worksheet. The potassium data is stored in one column, year of sampling in another. Graph > Individual Value Plot ‘One Y’, ‘With Groups’ K graph variables year categorical variables Data View Individual Symbols Mean Symbols 600 500 K (ppm) 400 300 200 100 2004 2007 Year STAT 513 - Schaffner Ch00-1.Docx Page 1 of 8 Graph > Histogram ‘Simple’ K graph variables Multiple Graphs By variables Year separate panels 2004 8 6 Frequency 4 2 0 2007 8 6 4 2 0 100 200 300 K (ppm) 400 500 Panel variable: Year Comments on shape? STAT 513 - Schaffner Ch00-1.Docx Page 2 of 8 A common plot is the mean with an error bar. How is the error bar computed? What does it tell us? Graph > Interval Plot ‘One Y’, ‘With Groups’ Data View Interval bar bar Make graph and then double click the intervals and change interval type to “Standard Error” and side = “upper” Bars are One Standard Error from the Mean 400 K (ppm) 300 200 100 0 2004 2007 Year STAT 513 - Schaffner Ch00-1.Docx Page 3 of 8 PRODUCE BASIC SUMMARY STATISTICS Stat > Basic Statistics > Display Descriptive Statistics K Variable year By variable Descriptive Statistics: K Variable K Year 2004 2007 N 10 10 N* 0 0 Variable K Year 2004 2007 Maximum 216.0 566.0 Mean 118.9 313.5 SE Mean 15.0 47.3 StDev 47.5 149.6 Minimum 82.0 154.0 Q1 91.8 175.5 Median 101.5 275.0 Q3 129.3 445.3 RECALL THE ONE SAMPLE T-INTERVAL Formula: Consider separate intervals for each year. First, we’ll split the worksheet in two, one for each year’s data. Data > Split Worksheet Year ‘By Variables’ Window > 2004 worksheet (or use ) Stat > Basic Statistics > 1-sample t Samples in columns K Samples in columns Results for: Unlinked Soils(Year = 2004) One-Sample T: K Variable K N 10 Mean 118.9 StDev 47.5 SE Mean 15.0 95% CI (84.9, 152.9) Results for: Unlinked Soils(Year = 2007) One-Sample T: K Variable K N 10 Mean 313.5 STAT 513 - Schaffner StDev 149.6 SE Mean 47.3 95% CI (206.5, 420.5) Ch00-1.Docx Page 4 of 8 CONDUCT TWO-SAMPLE T-TEST When is this test appropriate? What are the hypotheses? We will work with the full “Unlinked” worksheet. Stat > Basic Statistics > Two Sample t Samples in one column K Samples year Subscripts Two-Sample T-Test and CI: K, Year Two-sample T for K Year 2004 2007 N 10 10 Mean 118.9 314 StDev 47.5 150 SE Mean 15 47 Difference = mu (2004) - mu (2007) Estimate for difference: -194.6 95% CI for difference: (-305.2, -84.0) T-Test of difference = 0 (vs not =): T-Value = -3.92 STAT 513 - Schaffner Ch00-1.Docx P-Value = 0.003 DF = 10 Page 5 of 8 ASSESSING NORMALITY Graph > Probability Plot Probability Plot of K Normal - 95% CI 0 2004 99 500 2007 95 90 Percent 80 1000 2004 Mean 118.9 StDev 47.47 N 10 AD 1.566 P-Value <0.005 2007 Mean 313.5 StDev 149.6 N 10 AD 0.572 P-Value 0.102 70 60 50 40 30 20 10 5 1 0 500 1000 K (ppm) Panel variable: Year Do these data meet the conditions required for the two-sample t-test? STAT 513 - Schaffner Ch00-1.Docx Page 6 of 8 CONDUCT PAIRED T-TEST In actuality, the data was collected at the same sites for each year. Thus there aren’t really two independently chosen random samples of size 10 each, but rather one sample of 10 locations measured twice: once in 2004 and again in 2007. What are the hypotheses of the paired t-test? What data is actually analyzed? What conditions are needed? STAT 513 - Schaffner Ch00-1.Docx Page 7 of 8 Here we work with the “wine soils” worksheet. Wine Soils.MTW Stat > Basic Statistics > paired t-test Samples in columns 2004.S.K First sample 2007.S.K Second sample Paired T-Test and CI: 2004.S.K, 2007.S.K Paired T for 2004.S.K - 2007.S.K 2004.S.K 2007.S.K Difference N 10 10 10 Mean 118.9 313.5 -194.6 StDev 47.5 149.6 119.4 SE Mean 15.0 47.3 37.7 95% CI for mean difference: (-280.0, -109.2) T-Test of mean difference = 0 (vs not = 0): T-Value = -5.16 STAT 513 - Schaffner Ch00-1.Docx P-Value = 0.001 Page 8 of 8