Download Mod7ComDatasets

Module 7: Comparing Datasets and Comparing a Dataset with a Standard How different is enough? Concepts Independence of each data point  Test statistics  Central Limit Theorem  Standard error of the mean  Confidence interval for a mean  Significance levels  How to apply in Excel  module 7 2 Independent Measurements Each measurement must be independent (shake up basket of tickets)  Example of non-independent measurements  – Public responses to questions (one result affects next person’s answer) – Samplers too close together, so air flows affected module 7 3 Test Statistics Some number calculated based on data  In student’s t test, for example, t  If t is >= 1.96 and  – population normally distributed, – you’re to right of curve, – where 95% of data is in inner portion, symmetrically between right and left (t=1.96 on right, -1.96 on left) module 7 4 Test statistics correspond to significance levels “P” stands for percentile  Pth percentile is where p of data falls below, and 1-p fall above  module 7 5 Two Major Types of Questions  Comparing mean against a standard – Does air quality here meet NAAQS?  Comparing two datasets – Is air quality different in 2006 than 2005? – Better? – Worse? module 7 6 Comparing Mean to a Standard  Did air quality meet CARB annual standard of 12 microg/m3? Ft Ft Smith Ft Smith N_Fort year Smith avg Max Smith Min ‘05 14.78 0.1 37.9 77 module 7 7 Central Limit Theorem (magic!) Even if underlying population is not normally distributed  If we repeatedly take datasets  These different datasets have means that cluster around true mean  Distribution of these means is normally distributed!  module 7 8 Magic Concept #2: Standard Error of the Mean     Represents uncertainty around mean As sample size N gets bigger, error gets smaller! The bigger the N, the more tightly you can estimate mean LIKE standard deviation for a population, but this is for YOUR sample  module 7  N 9 For a “large” sample (N > 60), or when very close to a normal distribution… Confidence interval for population mean is:  s  x  Z   n Choice of z determines 90%, 95%, etc. module 7 10 For a “Small” Sample Replace Z value with a t value to get…  s  x  t   n  …where “t” comes from Student’s t distribution, and depends on sample size module 7 11 Student’s t Distribution vs. Normal Z Distribution T-distribution and Standard Normal Z distribution 0.4 Z distribution density 0.3 0.2 T with 5 d.f. 0.1 0.0 -5 0 Value module 7 5 12 Compare t and Z Values Confidence t value with Z value level 5 d.f 2.015 1.65 90% 2.571 1.96 95% 4.032 2.58 99% module 7 13 What happens as sample gets larger? T-distribution and Standard Normal Z distribution 0.4 Z distribution density 0.3 T with 60 d.f. 0.2 0.1 0.0 -5 0 Value module 7 5 14 What happens to CI as sample gets larger? For large samples Z and t values become almost identical, so CIs are almost identical  x  Z   x  t  module 7 s   n s   n 15 First, graph and review data  Use box plot add-in  Evaluate spread  Evaluate how far apart mean and median are  (assume sampling design and QC are good) module 7 16 Excel Summary Stats module 7 17 1. Use the box-plot add-in 40 35 2. Calculate summary stats 30 25 20 15 10 5 0 Ft Smith module 7 N=77 Min 25th Media n 75th Max Mean SD 0.1 7.5 13.7 18.1 37.9 14.8 8.718 Our Question Can we be 95%, 90%, or how confident that this mean of 14.78 is really greater than standard of 12?  We saw that N = 77, and mean and median not too different  Use z (normal) rather than t  module 7 19 The mean is 14.8 +- what?  We know equation for CI is  s  x  Z   n  Width of confidence interval represents how sure we want to be that this CI includes true mean  Now, decide how confident we want to be module 7 20 CI Calculation For 95%, z = 1.96 (often rounded to 2)  Stnd error (sigma/N) = (8.66/square root of 77) = 0.98  CI around mean = 2 x 0.98  We can be 95% sure that mean is included in (mean +- 2), or 14.8-2 at low end, to 14.8 + 2 at high end  This does NOT include 12 !  module 7 21 Excel can also calculate a confidence interval around the mean Mean, plus and minus 1.93, is a 95% confidence interval that does NOT include 12! module 7 22 We know we are more than 95% confident, but how confident can we be that Ft Smith mean > 12? Calculate where on curve our mean of 14.8 is, in terms of z (normal) score…  …or if N small, use t score  module 7 23 To find where we are on the curve, calc the test statistic…   Ft Smith mean = 14.8, sigma =8.66, N =77 Calculate test statistic, in this case the z factor z (we decided we can use the z rather than the t distribution)  (x  )  N If N was < 60, test stat is t, but calculated the same way Data’s mean module 7 Standard of 12 24 Calculate z Easily Our mean 14.8 minus standard of 12 (treat real mean  (mu) as standard) is numerator (= 2.8)  Standard error is sigma/square root of N = 0.98  (same as for CI) so z = (2.8)/0.98 = z = 2.84  So where is this z on the curve?  Remember, at z = 3 we are to the right of ~ 99%  module 7 25 Where on the curve? Z=2 Z=3 So between 95 and 99% probable that the true mean will not include 12 module 7 26 You can calculate exactly where on the curve, using Excel  Use Normsdist function, with z If z (or t) = 2.84, in Excel Yields 99.8% probability that the true mean does NOT include 12 module 7 27

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Mod7ComDatasets