Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture 2: Null Hypothesis Significance Testing Continued Laura McAvinue School of Psychology Trinity College Dublin Null Hypothesis Significance Testing • Previous lecture, Steps of NHST – – – – – – Specify the alternative/research hypothesis Set up the null hypothesis Collect data Run the appropriate statistical test Obtain the test statistic and associated p value Decide whether to reject or fail to reject the null hypothesis on the basis of p value Null Hypothesis Significance Testing • Decision to reject or fail to reject Ho – P value – Probability of obtaining the observed results if Ho is true – By convention, use the significance level of p < .05 – Conclude that it is highly unlikely that we would obtain these results by chance, so we reject Ho – Caveat! The fact that there is a significance level does not mean that there is a simple ‘yes’ or ‘no’ answer to your research question Null Hypothesis Significance Testing • If you obtain results that are not statistically significant (p>.05), this does not necessarily mean that the relationship you are interested in does not exist • There are a number of factors that affect whether your results come out as statistically significant – One and two-tailed tests – Type I and Type II errors – Power One and Two-tailed Tests • One-tailed / Directional Test – Run this when you have a prediction about the direction of the results • Two-tailed / Non-Directional Test – Run this when you don’t have a prediction about the direction of the results Recall previous example… • Research Qu – Do anxiety levels of students differ from anxiety levels of young people in general? • Prediction – Due to the pressure of exams and essays, students are more stressed than young people in general • Method – You know the mean score for the normal young population on the anxiety measure = 50 – You predict that your sample will have mean > 50 – Run a one-tailed one-sample t test at p < .05 level One-tailed Test • Compare the mean of your sample to the sampling distribution for the population mean • Decide to reject Ho if your sample mean falls into the highest 5% of the sampling distribution Dilemma • But! What if your prediction is wrong? – Perhaps students are less stressed than the general young population • Their own bosses, summers off, no mortgages – With previous one-tailed test, you could only reject Ho if you got an extremely high sample mean – What if you get an extremely low sample mean? • Run a two-tailed test – Hedge your bets – Reject Ho if you obtain scores at either extreme of the distribution, very high or very low sample mean Two-tailed Test • You will reject Ho when a score appears in the highest 2.5% of the distribution or the lowest 2.5% • Note that it’s not the highest 5% and the lowest 5% as then you’d be operating at p = .1 level, rejecting Ho for 10% of the distribution • So, we gain ability to reject Ho for extreme values at either end but values must be more extreme Errors in NHST • Howell (2008) p. 157 – “Whenever we reach a decision with a statistical test, there is always a chance that our decision is the wrong one” • Misleading nature of NHST – Because there is a significance level (p = .05), people interpret NHST as a definitive exercise – Results are statistically significant or not – We reject Ho or we don’t – The Ho is wrong or right Errors in NHST • Remember we are dealing with probabilities – We make our decision on the basis of the likelihood of obtaining the results if Ho is true – There is always the chance that we are making an error • Two kinds of Error – We reject Howhen it is true (Type I error) • We say there’s a significant difference when there’s not – We accept Ho when it is false (Type II error) • We say there is no significant difference when there is Type I Error • Our anxiety example • Predict that students will have greater anxiety score than young people in general • Test Ho that students’ anxiety levels do not differ from young people • One-tailed one sample t-test at p < .05 • Compare sample mean with sampling distribution of mean for the population (Ho) Type I Error • Decide to reject Ho if your sample mean falls in the top 5% of the distribution • But! • This 5%, even though at the extreme end, still belongs to the distribution • If your sample mean falls within this top 5%, there is still a chance that your sample came from the Ho population Type I Error • For example, if p = .04, this means that there is a very small chance that your sample mean came from that population, – But this is still a chance, you could be rejecting Ho when it is in fact true • Researchers are willing to accept this small risk (5%) of making a Type I error, of rejecting Ho when it is in fact true • Probability of making Type I error = alpha = the significance level that you chose – .05, .01 Type II Error • So why not set a very low significance level to minimise your risk of making a Type I error? – Set p < .01 rather than p < .05 • As you decrease the probability of making a Type I error you increase the probability of making a Type II error • Type II Error – Fail to reject Ho when it is false – Fail to detect a significant relationship in your data when a true relationship exists • For argument’s sake, imagine that H1 is correct • Sampling Distribution under Ho • Sampling Distribution under H1 • Reject Ho if sample mean equals any value to the right of the critical value (red region) – Correct Decision • Accept Ho if sample mean equals any value to the left of the critical region – Type II Error Four Outcomes of Decision Making True State of Nature Decision Ho is True Ho is False Accept Ho Correct Decision Type II Error Reject Ho Type I Error Correct Decision Power • You should minimise both Type I and Type II errors – In reality, people are often very careful about Type I (i.e. strict about ) but ignore Type II altogether • If you ignore Type II error, your experiment could be doomed before it begins – even if a true effect exists (i.e. H1 is correct), if is high, the results may not show a statistically significant effect • How do you reduce the probability of a Type II error? – Increase the power of the experiment Power • Power – The probability of correctly rejecting a false Ho – A measure of the ability of your experiment to detect a significant effect when one truly exists – 1- How do we increase the power of our experiment? • Factors affecting power – The significance level () – One-tailed v two-tailed test – The true difference between Ho and H1(o 1) – Sample Size (n) The Influence of on Power • Reduce the significance level ()… – Reduce the probability of making a Type I error • Rejecting the Ho when it is true – Increase the probability of making a Type II error • Accepting the Ho when it is false – Reduce the power of the experiment to detect a true effect as statistically significant Reduce and reduce power Increase and increase power But! You increase the probability of a Type I error! The Influence of One v Two-tailed Tests on Power • We lose power with a twotailed test – power is divided across the two tails of the experiment – Values must be more extreme to be statistically significant The Influence of the True Difference between Ho and H1 • The bigger the difference between o and 1, the easier it is to detect it The Influence of Sample Size on Power • The bigger the sample size, the more power you have • A big sample provides a better estimate of the population mean • With bigger sample sizes, the sampling distribution for the mean clusters more tightly around the population mean • Standard deviation of the sampling distribution, known as standard error the mean is reduced • There is less overlap between the sampling distributions under Ho and H1 • The power to detect a significant difference increases The Influence of Sample Size on Power Sample Size Exercise • Open the following dataset – Software / Kevin Thomas / Power dataset (revised) – Explores the effects of Therapy on Depression • Perform two Independent Samples t-test – Analyse / Compare means / Independent Samples t test – Group represents Therapy v Control – Score represents post-treatment depression – 1. Group1 & Score1 – 2. Group 2 & Score 2 Complete the following table Analysis 1 Size of sample Therapy mean score Therapy standard deviation Control mean score Control standard deviation Mean difference T statistic df P-value Analysis 2 What explains these results? Analysis 1 Analysis 2 Size of sample 20 200 Therapy mean score 5.5 5.5 Therapy standard deviation 3.03 2.89 Control mean score 6.3 6.3 Control standard deviation 2.75 2.62 Mean difference -.8 -.8 T statistic -.618 -2.051 Df 18 198 P-value .54 .042 So, how do I increase the power of my study? • You can’t manipulate the true difference between Ho and H1 • You could increase your significance level () but then you would increase the risk of a Type I error • If you have a strong prediction about the direction of the results, you should run a one-tailed test • The factor that is most under your control is sample size – Increase it!