Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistical analysis Why?? (besides making your life difficult …) Scientists must collect data AND analyze it Does your data support your hypothesis? Is it valid? Statistics helps us find relationships between sets of data. You are the scientist now, you must be comfortable with analysis of your data Let’s look at two sets of data Sample -10, 1 0, 10, 20, 30 Sample 8, 2 9, 10, 11, 12 What can you tell me about this data??? Mean: the “average” of the data or the central tendency Sample 1 Sample 2 -10, 0, 10, 20, 30 8, 9, 10, 11, 12 -10 + 0 + 10 + 20 + 30 8 + 9 + 10 + 11 + 12 5 Mean = 10 Is this analysis complete??? 5 Mean = 10 NO! Range: how far is the spread? Largest # - smallest # Sample 1 Sample 2 -10, 0, 10, 20, 30 8, 9, 10, 11, 12 30 12 – (-10) Range = 40 -8 Range = 4 Does this data help? Yes, Sample 1 is more dispersed Obvious? Perhaps, but now shown mathematically Something more … standard deviation SD is a measure to show how individual data points are dispersed around the mean Assuming normal data distribution (bell curve) 68% of all collected values lie within +/- 1 SD 95% of all collected values lie within +/- 2 SD So what??? Standard deviation A small SD indicates the data values are clustered around the mean May also indicate few exteme data points A large SD indicates the data values are spread out May also indicate extreme data points Outliers?? Standard deviation 𝑥 = each data point 𝑥 = the mean n = the total number of data points Σ = the sum of all the values Let’s practice … Sample 1 -10, 0, 10, 20, 30 Remember 𝑥 = 10 (-10 – 10)2 + (0 – 10)2 + (10 – 10)2 + (20 – 10)2 + (30 – 10)2 (-20)2 + (-10)2 + (0)2 + (10)2 + (20)2 400 + 100 + 0 + 100 + 400 1000, divide by n – 1 (5 – 1 = 4) 1000/4 = 250, now √250 15.8 Let’s practice … Sample 2 8, 9, 10, 11, 12 Remember 𝑥 = 10 (8– 10)2 + (9 – 10)2 + (10 – 10)2 + (11 – 10)2 + (12 – 10)2 (-2)2 + (-1)2 + (0)2 + (1)2 + (2)2 4+1+0+1+4 10, divide by n – 1 (5 – 1 = 4) 10/4 = 2.5, now √2.5 1.58 Let’s compare … Sample SD = 15.8 Sample SD 1 2 = 1.58 How can I use this in my lab? Error bars Error bars represent the variability of your data STANDARD DEVIATION range measurement uncertainties Error bars On a bar graph, the bar represents the mean of your data and the error bars represent +/- 1 sd sd mean Error bars On a line graph, the point represents the mean of your data and the error bars represent +/- 1 sd sd mean t-test t-test determines statistical significance between 2 sample means Key word!!!!! Is the difference significant? Is the difference due to your variable?? Or is it random chance?? How valid is your data? t-test determines the probability that difference is due to random chance A p value (probability) of 0.05 (5%) shows a 5% chance of randomness, but a 95% chance of confidence … your difference IS DUE TO YOUR VARIABLE You want 95% or higher! t-test For tests, you do NOT need to calculate tvalues, but you must be able to read a tchart!! For internal assessments, you may use calculators or excel to calculate t-values Need to be able to calculate degrees of freedom This is the range you are hoping for The difference between your samples has a HIGH probability of being due to your variable (and not chance) Calculating degrees of freedom df = (n1 + n2) - 2 Size of sample 1 Size of sample 2 # of samples Calculating degrees of freedom df = (n1 + n2) – 2 Population -10, n1 1 0, 10, 20, 30 =5 Population 2 8, 9, 10, 11, 12 n2 =5 df = (5 + 5) -2 df = 8 Using the t-table If df = 8 and t = 3.5, is this a significant difference? Less than 1% probability difference in data is due to chance Therefore, greater than 99% probability difference in data is due to our variable Other options, less commonly used in our class Median The middle #, when arranged in numeric order Sample 1 -10, # that occurs most often Median No 9, 10, 11, 12 = 10 Sample 1 -10, = 10 Sample 2 8, The 0, 10, 20, 30 Median Mode 0, 10, 20, 30 mode Sample 2 8, No 9, 10, 11, 12 mode Some practice: looking at plant height Height in sun (cm) Height in shade (cm) 124 131 120 60 153 131 98 160 124 212 141 117 156 131 128 95 139 145 117 118 Calculate the mean for both samples Sun = 130 cm Shade = 130 cm Some practice: looking at plant height Height in sun (cm) Height in shade (cm) 124 131 120 60 153 131 98 160 124 212 141 117 156 131 128 95 139 145 117 118 Calculate the range for both samples Sun = 58 cm Shade = 152 cm Some practice: looking at plant height Height in sun (cm) Height in shade (cm) 124 131 120 60 153 131 98 160 124 212 141 117 156 131 128 95 139 145 117 118 Calculate the median for both samples If even # of samples, find the average of the two middle numbers Sun = 126 cm Shade = 131 cm Some practice: looking at plant height Height in sun (cm) Height in shade (cm) 124 131 120 60 153 131 98 160 124 212 141 117 156 131 128 95 139 145 117 118 Calculate the mode for both samples Sun = 124 cm Shade = 131 cm Some practice: looking at plant height Height in sun (cm) Height in shade (cm) 124 131 120 60 153 131 98 160 124 212 141 117 156 131 128 95 139 145 117 118 Calculate the sd for both samples Sun = 17.56 cm Shade = 39.85 cm Some practice: looking at plant height Height in sun (cm) Height in shade (cm) 124 131 120 60 153 131 98 160 124 212 141 117 156 131 128 95 139 145 117 118 Sun: sd = 17.56 cm Low sd indicates even (close) distribution of data points More valid Shade: sd = 39.85 cm High sd indicates wide spread of data points MAY indicate a problem with your experimental design Some practice: looking at plant height Height in sun (cm) Height in shade (cm) 124 131 120 60 153 131 98 160 124 212 141 117 156 131 128 95 139 145 117 118 If t = 1.5, is this a significant difference? No Be careful: correlation vs. cause Observations (and carefully chosen data) may imply a CORRELATION, but does NOT necessarily demonstrate a cause The average global temperature has increased over the past 100 years. The number of pirates in the world has decreased over the past 100 years. Therefore, decreased number of pirates causes increased global temperatures Be careful: correlation vs. cause no no no ! Be careful: correlation vs. cause To discern a CAUSE, a valid EXPERIMENT must be done Other scientists must also be able to repeat your experiment Last word … Remember, it is ALWAYS better to PROVE your experiment failed to support your hypothesis, than to lie about it being a success!!! Any questions?