Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Statistics for A2 Biology Standard deviation Student’s t-test Chi squared Spearman’s rank Why? Statistical tests allow us to draw conclusions about data based on statistical significance. e.g. Is there a significant difference between the mean heights of students in different year groups? e.g. Is there significant correlation between the height and age of students in a school? Standard Deviation • A measure of how ‘spread out’ data is around a mean. • Allows us to compare two or more sets of data to see whether the means are significantly different. Limitations: - doesn’t give you the range of data - can be affected by outliers/chance results Calculating Standard Deviation 1. Subtract the mean of the data from each data point. 2. Square your answers. 3. Add them all together. 4. Divide your answer by the number of data points minus 1. 5. Take the square root. OR use the calculator method! What next? • Compare the mean+standard deviation for the data sets. • If these ranges overlap, there is no significant difference between the means of the data sets. • If the ranges do not overlap, there is a significant difference between the means. Statistical Tests • Student’s t-test – are two mean results significantly different? • Spearman’s Rank Correlation Coefficient – is there significant correlation between data sets? • X2 (Chi-Squared) – is there a significant difference between observed and expected results? Null hypothesis Results of an experiment could be due to random chance. Only way to support your hypothesis is to reject a null hypothesis. Null hypothesis states there is no link/correlation/difference between results. → Depends on statistical test used. Statistics - State null hypothesis - Which test will you use? - Why? - Calculate test statistic - Interpret the test statistic in relation to your null hypothesis. Use the words probability and chance in your answer. Probabilities • We normally work at the 5% probability level (P=0.05). • To reject the null hypothesis (and accept your own hypothesis), you must be sure that there is ≤5% probability that the results are due to chance. William Gosset (aka ‘Student’) (1876-1937) Worked in quality control at the Guinness brewery and could not publish under his own name. Former student of Karl Pearson The t-test What can this test tell you? If there is a statistically significant difference between two means, when: The sample size is less than 25. The data is normally distributed BETTER THAN STANDARD DEVIATION! t-test t= x1 – x2 (s12/n1) + (s22/n2) SD = x1 = mean of first sample x2 = mean of second sample s1 = standard deviation of first sample s2 = standard deviation of second sample n1 = number of measurements in first sample n2 = number of measurements in second sample (x – x)2 n–1 Worked example Does the pH of soil affects seed germination of a specific plant species? • Group 1: eight pots with soil at pH 5.5 • Group 2: eight pots with soil at pH 7.0 • 50 seeds planted in each pot and the number that germinated in each pot was recorded. What is the null hypothesis (H0)? H0 = there is no statistically significant difference between the germination success of seeds in two soils of different pH HA = there is a significant difference between the germination of seeds in two soils of different pH If the value for t exceeds the critical value (P = 0.05), then you can reject the null hypothesis. Construct the following table… Pot Group 1 (pH5.5) (x – x)2 Group 2 (pH7.0) (x – x)2 1 38 1.27 39 20.25 2 41 3.52 45 2.25 3 43 15.02 41 6.25 4 39 0.02 46 6.25 5 37 4.52 48 20.25 6 38 1.27 39 20.25 7 41 3.52 46 6.25 8 36 9.77 44 0.25 Mean 39.1 1.27 43.5 20.25 38.88 82.0 Calculate standard deviation for both groups Group 1: SD = (x – x)2 n–1 = 38.88 = 2.36 8–1 Group 2: SD = (x – x)2 n–1 = 82.0 8–1 = 3.42 Using your means and SDs, calculate value for t x 1 – x2 t= (s12/n1) + (s22/n2) 39.1 – 43.5 t= -4.4 = (2.362/8) + (3.422/8) 0.696 + 1.462 t = -2.99 BUT we can ignore the - sign Compare your value of t with the appropriate critical value: If your value is lower than the critical value: If your value is higher than the critical value: - there is no significant difference between data sets - there is a significant difference between data sets - accept null hypothesis - reject null hypothesis - >5% probability the difference in results is due to chance. - ≤5% probability the difference in results is due to chance. Compare our calculated value of r with the relevant critical value in the stats table of critical values Our value of t = 2.99 Degrees of freedom = n1 + n2 – 2 = 14 D.F. Critical Value (P = 0.05) 14 15 16 17 18 2.15 2.13 2.12 2.11 2.10 Our value for t exceeds the critical value, so we can reject the null hypothesis. We can conclude that there is a significant difference between the two means, so pH does affect the germination rate for this plant. Now try the examples on the sheet. Remember: 1. State your null hypothesis. 2. Calculate the mean and standard deviation. 3. Calculate the value of t. 4. Compare the value of t with the critical value. 5. Write a conclusion, stating: 1. Whether or not the confidence limits overlap 2. Whether you accept or reject the null hypothesis 3. What the probability is that the differences between means occurred by chance. Why use Spearman’s rank? Spearman’s Rank What can this test tell you? If there is a statistically significant correlation between two sets of measurements from the same sample. What is the null hypothesis? There is no correlation between …………. Critical values. We use the 0.05 significance (probability) level – this is all you will be given in the EMPA. Calculating Spearman’s Rank 1. State null hypothesis 2. Rank the data. 3. Calculate the correlation coefficient. 4. Compare rs with table of critical values. Worked example The table shows the mass of nitrogen in fertiliser added to fields and the mean concentration of nitrates in nearby streams. Is there a significant correlation between the two variables? 41 Concentration of nitrates in stream (mg dm-3) 1.2 41 1.3 51 1.5 56 1.8 63 1.6 69 1.9 72 2.0 Mass of N on fields (kg ha-1) What is the null hypothesis? There is no statistically significant correlation between the mass of nitrogen in fertiliser added to fields and the mean concentration of nitrates in nearby streams. Step 2: Ranking the data MassMake of sureConc. you of rank bothDifference sets of Rank nitrates Rank N in rank (D) data in the same direction (i.e. -1 -3 (kg ha ) (mg dm ) D2 lowest to highest OR highest to 1 0.5 41 1.2 1.5 lowest, NOT one each way). 0.25 41 1.5 -0.5 0.25 51 3 data 1.5 3 Always keep the 4 its original 5 1.8 in pairs. 0 0 -1 1 56 1.3 2 63 5 1.6 4 1 1 69 6 1.9 6 0 0 72 7 2.0 7 0 0 Step 3: Calculate rs • Add up the D2 values to give ∑D2: ∑D2 = 0.25 + 0.25 + 1 + 1 = 2.5 • Calculate rs: rs = rs = 1- 1- 6 x ΣD2 n3 - n 6 x 2.5 = 1 – 0.045 = 0.955 3 7 -7 • rs is always between -1 and 1. Step 4: Compare your value of rs with the appropriate critical value: Find the critical value for 7 pairs of measurements. Is your answer higher or lower than the critical value? Step 4: Compare your value of rs with the appropriate critical value: There is statistically significant positive correlation between the mass of nitrogen and the concentration of nitrates. 0.955>0.79, therefore we reject the null hypothesis. There is ≤5% probability that the correlation in results is due to chance. Step 4: Compare your value of rs with the appropriate critical value: If your value is lower than the critical value: If your value is higher than the critical value: - there is no significant correlation - there is significant correlation - accept null hypothesis - reject null hypothesis - >5% probability the correlation in results is due to chance. - ≤5% probability the correlation in results is due to chance. rs values below 0 → negative correlation. rs values above 0 → positive correlation. Calculating Spearman’s Rank 1. State null hypothesis 2. Rank the data. 3. Calculate the correlation coefficient. 4. Compare rs with table of critical values. 5. Write your conclusion, referring to the critical value, the null hypothesis, probability and chance. Why use Χ2 (chi squared)? 2 Χ What can this test tell you? If there is a statistically significant difference between observed and expected results. What is the null hypothesis? There is no difference between observed and expected results. Critical values. We use the 0.05 significance (probability) level – this is all you will be given in the EMPA. Calculating Χ2 1. State null hypothesis 2. Calculate Χ2 3. Compare Χ2 with table of critical values. Worked example • The table shows the number of people living on each side of a river who died from cancer in one year. • Is there a significant difference in the Side of river North South death rates? Death rate (per 100 people) 26 12 What is the null hypothesis? There is no statistically significant difference between the death rates on each side of the river. Step 2: Calculate X2 Deaths from cancer in people living Observed results (O) Expected results (E) O–E (O – E)2 (O – E)2 E North of river South of river 26 12 Calculate expected results 19 19 according to your null 7 hypothesis. -7 49 data, that means 49 take For this the mean! 2.6 2.6 Step 2: Calculate X2 Add up the values for (O – E)2 E Χ2 = Σ (O – E)2 E = 2.6 + 2.6 = 5.2 Step 3: Compare your value of X2 with the appropriate critical value: Number of degrees of freedom = n - 1 Is your answer higher or lower than the critical value? Step 4: Compare your value of X2 with the appropriate critical value: There is a statistically significant difference between the death rates on each side of the river. 5.2>3.84, therefore we reject the null hypothesis. There is ≤5% probability that the difference in results is due to chance. Step 4: Compare your value of X2 with the appropriate critical value: If your value is lower than the critical value: If your value is higher than the critical value: - there is no significant difference between results - there is a significant difference between results - accept null hypothesis - reject null hypothesis - >5% probability the difference in results is due to chance. - ≤5% probability the difference in results is due to chance. Calculating 2 Χ 1. State null hypothesis 2. Calculate Χ2 3. Compare Χ2 with table of critical values. 4. Write your conclusion, referring to the critical value, the null hypothesis, probability and chance. Assessment • Solve the six problems on the sheet. • Each problem is marked as follows: – Null hypothesis – 1 mark – Give your choice of test – 1 mark – Say why you’ll use that test – 1 mark – Calculate the test statistic – 1 mark – Interpret the test statistic in relation to your null hypothesis. Use the words probability and chance in your answer. – 2 marks