Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Practicals, Methodology & Statistics II Laura McAvinue School of Psychology Trinity College Dublin 1 My details • Laura McAvinue • TCIN, Rm. 3.08 • 896 8414 • [email protected] 2 Aims of Course • To provide students with practical skills needed to analyse quantitative data from empirical research, such as final year projects. • Includes tuition on the most commonly used statistical procedures 3 Course Outline Null hypothesis significance testing & statistical power Correlation Regression Simple linear regression Multiple linear regression Factor analysis Analysis of variance (ANOVA) Between groups ANOVA Repeated measures ANOVA Factorial ANOVA 4 Course Details • Computer room • Monday 4pm • Thursday 11am • Week 9: Two hour assessment – Wed 4th March 10-12 – Thurs 5th March 11-1 5 Null Hypothesis Significance Testing • Inferential Statistics • Basis of most statistical tests that you will do • Let’s begin with an example 6 Who likes sport more? • Suppose you are interested in Irish men and women’s attitudes towards sport. You want to find out who likes sport more, men or women? • You can’t possibly sample all Irish men and women • So, you take a sample – Randomly select 20 Irish men and women • You collect your data – Ask them to rate out of 10 how much they like sport 7 Descriptive Statistics Men’s mean rating Women’s mean rating 7.4 (SD = 1.3) 3.3 (SD = 1.5) 8 Descriptive Statistics 8 7 Mean Rating 6 5 4 3 2 1 0 Men Women 9 Ok… • So, you have a difference • Men = 7.4 & Women = 3.3 • Can you stop there and conclude that men like sport more than women? 10 Inferential Statistics • No! There are two possible explanations for this difference • True difference – The difference in the samples represents a true or real difference in the populations • Random Sampling Error – The difference in the samples does not reflect a true difference but is due to random sampling error or variation 11 Relative frequency 3 5 7 Random Sampling Error: Variability of a statistic from sample to sample due to chance 12 Relative frequency Both Females and Males means are estimates of the same population value Ho Relative frequency 3 7 Females and Males means are estimates of the means of different populations Ha 3 Females Mean 7 Males Mean 13 Logic of statistical testing • How do we distinguish between these two possibilities? • Null Hypothesis Significance Testing – Proceeds in a series of steps 14 Steps of NHST 1. Specify an alternative/research hypothesis (Ha) There is a true difference between the groups 2. Set up the Null Hypothesis (Ho) There is no real difference between the groups, any difference is due to random sampling error 3. Collect your data 15 Steps of NHST 4. Run the appropriate statistical test 5. Obtain the test statistic and associated p-value The probability of obtaining these results (i.e. test statistic) if the null hypothesis is true 6. Decide whether to reject or fail to reject Ho on the basis of the p-value 16 Decision Making • P-value – 0–1 – Conditional Probability – The probability of obtaining these results if Ho were true • If p-value is small… – It is highly unlikely that we would obtain these results if the Ho were true, so we can reject Ho in favour of Halt • If p-value is large… – It is very likely that we would obtain these results if Ho were true, so we cannot reject Ho 17 Decision Making • But what is large and what is small? • Convention • Significance Level / Rejection Level • P < .05 – Reject Ho if there is less than a 5% or less than a 1 in 20 chance of obtaining these results if Ho were true 18 Let’s go back to our example • Do men and women differ in terms of how much they like sport? • Open the following dataset in SPSS – Software/Kevin Thomas/NHST data set • Run an independent samples t-test on the data – SPSS, Analyse, Compare Means, Independent samples t-test – Test variable: attitude – Grouping factor: sex • 1 = male • 2 = female 19 Example • State the following… • • • • • • • Research / Alternative Hypothesis Null Hypothesis T-value P-value Is the p-value < or > .05? Can we reject the null hypothesis? What can you conclude? 20 SPSS Analysis Independent Samples Test Levene's Test for Equality of Variances F SCORE Equal variances assumed Equal variances not assumed .209 Sig. .653 t-test for Equality of Means t df Sig. (2-tailed) Mean Difference Std. Error Difference 95% Con Interval Differe Lower 6.622 18 .000 4.1000 .61914 2.79924 6.622 17.522 .000 4.1000 .61914 2.79669 • This value indicates the probability of getting this t statistic if the null hypothesis is true • We reject the null hyp if this value is less than 0.05 21 P- values • For many researchers, attaining the “sig” p-value = Holy Grail • Joy at p =0.049 v Despair at p = 0.051 – Difference between the values = .002 • Beware: implications of NHST is that there can always be a simple “yes” or “no” answer as a result of study • Statistical significance does not equal clinical/practical significance or importance – Sufficiently large sample size will result in a statistically significant result 22 Logic of approach? • Something seems odd about this approach….. – Testing a hypothesis that is opposite to the one you wish to test • Falsification – Fisher – It is difficult to prove a statement but you can disprove it • “All dogs have one tail” • Can’t be sure even after viewing 10,000 dogs with one tail • But view one dog with two tails… • Provides a useful starting point for statistical tests 23 Underlying Theory • Sampling Distributions • Recall Random Sampling Error – Statistics obtained from samples will differ from the true population parameters and will vary from sample to sample • Sampling Distribution of the Mean – Take all possible combinations of samples from a population – Compute the mean for each sample and plot them – Normal Distribution • Cluster of sample with a mean value close to population mean • Smaller and smaller no. of samples with means far from the population mean 24 Sampling Distribution of the Mean You can use the sampling distribution of a population to determine the likelihood 25 that a sample you have collected came from that population Sampling distribution for a population of scores on an anxiety questionnaire for normal young people (mean = 50) Data on same questionnaire for young students (mean = 60) Qu: Do students’ levels of anxiety differ significantly from levels of young people in general? Qu: How likely is it that this sample of students came from this population? Ans: Calculate the % of samples with mean of 60 or more If % is very small (<5%), conclude that it is unlikely that this sample came from this population, This sample of students shows significantly higher anxiety levels than normal participants 26 Sampling Distribution of a Test Statistic • Sample statistics – mean, median, variance, etc. • Test statistics – Results of statistical procedures – t, F, chi-square, etc. • All have their own sampling distributions – Can be used just like the sampling distribution of the mean 27 Let’s take the T Distribution • Recall earlier T-test – Investigate if there was a statistically significant difference between two groups – Calculated t for the two groups • T represented the difference between the two means • (Observed difference / standard error of the difference) – P-value • Probability of obtaining a t statistic this size if Ho was true • In the background… – P-value was calculated by comparing our computed t statistic with the sampling distribution for t when Ho is true 28 T Distribution • Sampling Distribution for t when Ho is true – Created by taking pairs of samples from one population, calculating t for each pair and plotting the resulting sampling distribution • Compare t statistic with sampling distribution – What percentage of samples in the sampling distribution have a t value as big or bigger than the t statistic we found • Make your decision – If the % of samples is very small (< 5%), you can conclude that it is highly unlikely that this t statistic came from a t distribution where Ho is true – You can reject Ho and say that it is highly unlikely that your two samples came from the same population, their means are significantly different 29 Summary • Purpose of Inferential Statistics – Making statements about populations based on samples • Steps of Null Hypothesis Significance Testing • Underlying Theory – Sampling distributions 30