Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Transcript

Statistics: Data Analysis and Presentation Fr Clinic II Overview Tables and Graphs Populations and Samples Mean, Median, and Standard Deviation Standard Error & 95% Confidence Interval (CI) Error Bars Comparing Means of Two Data Sets Linear Regression (LR) Warning Statistics is a huge field, I’ve simplified considerably here. For example: – Mean, Median, and Standard Deviation There are alternative formulas – Standard Error and the 95% Confidence Interval There are other ways to calculate CIs (e.g., z statistic instead of t; difference between two means, rather than single mean…) – Error Bars Don’t go beyond the interpretations I give here! – Comparing Means of Two Data Sets We just cover the t test for two means when the variances are unknown but equal, there are other tests – Linear Regression We only look at simple LR and only calculate the intercept, slope and R2. There is much more to LR! Tables Table 1: Average Turbidity and Color of Water Treated by Portable Water Filters a Water Turbidity (NTU) True Color (Pt-Co) (1) Pond Water (2) 10 (3) 13 Apparent Color (Pt-Co) (4) 30 Sweetwater 4 55 12 12 Hiker 3 8 11 MiniWorks 2 3 5 Standard 5a 15 15 Level at which humans can visually detect turbidity Consistent Format, Title, Units, Big Fonts Differentiate Headings, Number Columns Figures Consistent Format, Title, Units Good Axis Titles, Big Fonts 25 Turbidity (NTU) 20 20 11 15 10 11 10 7 5 5 1 0 Pond Water Sweetwater Miniworks Hiker Pioneer Voyager Filter Figure 1: Turbidity of Pond Water, Treated and Untreated Populations and Samples Population – All of the possible outcomes of experiment or observation US population Particular type of steel beam Sample – A finite number of outcomes measured or observations made 1000 US citizens 5 beams We use samples to estimate population properties – Mean, Variability (e.g. standard deviation), Distribution Height of 1000 US citizens used to estimate mean of US population Mean and Median Turbidity of Treated Water (NTU) 1 3 3 6 8 10 Mean = Sum of values divided by number of samples = (1+3+3+6+8+10)/6 = 5.2 NTU Median = The middle number Rank 1 2 3 4 5 6 Number 1 3 3 6 8 10 For even number of sample points, average middle two = (3+6)/2 = 4.5 Excel: Mean – AVERAGE; Median - MEDIAN Variance Measure of variability – sum of the square of the deviation about the mean divided by degrees of freedom s 2 x x i n 1 n = number of data points Excel: variance – VAR 2 Standard Deviation, s Square-root of the variance s s For phenomena following a Normal Distribution (bell curve), 95% of population values lie within 1.96 standard deviations of Normal Distribution the mean 2 Area under curve is probability of getting value within specified range Excel: standard deviation – STDEV 95% -4 -1.96 -2 0 1.96 2 Deviation Standard Standard Deviations from Mean 4 Standard Error of Mean Standard deviation of mean – Of sample of size n – taken from population with standard deviation s s sX n – Estimate of mean depends on sample selected – As n , variance of mean estimate goes down, i.e., estimate of population mean improves – As n , mean estimate distribution approaches normal, regardless of population distribution 95% Confidence Interval (CI) for Mean Interval within which we are 95 % confident the true mean lies X t 95%, n 1s X t95%,n-1 is t-statistic for 95% CI if sample size = n – If n 30, let t95%,n-1 = 1.96 (Normal Distribution) – Otherwise, use Excel formula: TINV(0.05,n-1) n = number of data points Error Bars Show data variability on plot of mean values Types of error bars include: ± Standard Deviation, ± Standard Error, ± 95% CI Maximum and minimum value 10 Turbidity (NTU) 8 6 4 2 0 Filter 1 Filger 2 Filter Type Filter 3 Using Error Bars to compare data Standard Deviation – Demonstrates data variability, but no comparison possible Standard Error – If bars overlap, any difference in means is not statistically significant – If bars do not overlap, indicates nothing! 95% Confidence Interval – If bars overlap, indicates nothing! – If bars do not overlap, difference is statistically significant We’ll use 95 % CI Example 1 Turbidity Data 1 2 3 mean St Dev NTU NTU NTU NTU NTU 2.1 2.1 2.2 2.1 0.06 3.2 4.4 5 4.2 0.92 4.3 4.2 4.5 4.3 0.15 Filter 1 Filter 2 Filter 3 n 3 3 3 St Error NTU 0.03 0.53 0.09 t95%,2 +/- 95% CI 4.30 4.30 4.30 0.14 2.28 0.38 Create Bar Chart of Name vs Mean. Right click on data. Select “Format Data Series”. 7.0 6.0 Turbidity (NTU) 5.0 4.2 4.3 Filter 2 Filter 3 4.0 3.0 2.1 2.0 1.0 0.0 Filter 1 Portable Water Filter Example 2 Turbidity Time 1 Min NTU 1 4.3 2 4.4 3 4.3 Measurements 2 3 mean St Dev NTU NTU NTU NTU 4.5 4.6 4.5 0.15 4.4 4.5 4.4 0.06 4.2 4.2 4.2 0.06 6.0 5.0 Turbidity (NTU) 4.0 3.0 2.0 1.0 0.0 0 1 2 Tim e (m in) 3 4 n St Error NTU 3 0.09 3 0.03 3 0.03 t95,2 +/- 95% CI 4.30 4.30 4.30 0.38 0.14 0.14 What can we do? Plot mean water quality data for various filters with error bars Plot mean water quality over time with error bars Comparing Filter Performance Use t test to determine if the mean of two populations are different. – Based on two data sets E.g., turbidity produced by two different filters Comparing Two Data Sets using the t test Example - You pump 20 gallons of water through filter 1 and 2. After every gallon, you measure the turbidity. – Filter 1: Mean = 2 NTU, s = 0.5 NTU, n = 20 – Filter 2: Mean = 3 NTU, s = 0.6 NTU, n = 20 You ask the question - Do the Filters make water with a different mean turbidity? Do the Filters make different water? Filter 1 Use TTEST (Excel) Fractional probability of being wrong if you answer yes – We want probability to be small 0.01 to 0.10 (1 to 10 %). Use 0.01 Filter 2 1.5 2 2.2 1.8 3 1.6 1.2 2.1 1.9 2.2 2.6 1.7 1.8 1.5 2.4 2.5 2.7 1.4 1.5 2.6 3 2.4 2.2 2.6 3.4 3.6 3.8 3.5 2.7 2.4 3.5 3.8 2.1 2.5 3.4 3.3 2.4 3.6 2.3 3.7 “t test” Questions Do two filters make different water? – Take multiple measurements of a particular water quality parameter for 2 filters Do two filters treat difference amounts of water between cleanings? – Measure amount of water filtered between cleanings for two filters Does the amount of water a filter treats between cleaning differ after a certain amount of water is treated? – For a single filter, measure the amount of water treated between cleanings before and after a certain total amount of water is treated Linear Regression Fit the best straight line to a data set Grade Point Average 25 20 y = 1.897x + 0.8667 R2 = 0.9762 15 10 5 0 0 2 4 6 8 10 12 Height (m) Right-click on data point and use “trendline” option. Use “options” tab to get equation and R2. R2 - Coefficient of multiple Determination ŷ y 1 y y 2 R 2 i 2 i ŷi yi R2 i ŷ y y 2 i y 2 i = Predicted y values, from regression equation = Observed y values = fraction of variance explained by regression (variance = standard deviation squared) = 1 if data lies along a straight line