Download Inferential statistics - Moodle

Statistics • • • • Intro to statistics Presentations More on who to do qualitative analysis Tututorial time Inferential statistics Descriptive vs Inferential statistics • Descriptive statistics like totals (how many people came?), percentages (what proportion of the total were adolescents?) and averages (how much did they enjoy it?) use numbers to describe things that happen. • Inferential statistics infer or predict the differences and relationships between things. They also tell us how certain or confident we can be about the predictions. Descriptive data page Why statistics are important Statistics are concerned with difference – how much does one feature of an environment differ from another Suicide rates/100,000 people Why statistics are important Relationships – how does much one feature of the environment change as another measure changes The response of the fear centre of white people to black faces depending on their exposure to diversity as adolescents The two tasks of statistics Magnitude: What is the size of the difference or the strength of the relationship? Reliability. What is the degree to which the measures of the magnitude of variables can be replicated with other samples drawn from the same population. Magnitude – what’s our measure? • Raw number? • Some aggregate of numbers? Mean, median, mode? Suicide rates/100,000 people Arithmetic mean or average A B Overall Gener rating al 2 1 3 0 4 3 5 4 6 3 7 12 8 38 9 28 10 57 N 146 A*B 2 0 12 ___ C A*C Unitec 1 2 0 7 6 8 16 10 14 ___ 64 Mean (M or X), is the sum (SX) of all the sample values ((X1 + X2 +X3.…… X22) divided by the sample size (N). Mean/average = SX/N Compute the mean General Unitec Total (SX) 1262 493 N 146 64 mean 8.64 7.70 The median • median is the "middle" value of the sample. There are as many sample values above the sample median as below it. • If the number (N) in the sample is odd, then the median = the value of that piece of data that is on the (N-1)/2+1 position of the sample ordered from smallest to largest value. E.g. If N=45, the median is the value of the data at the (45-1)/2+1=23rd position • If the sample size is even then the median is defined as the average of the value of N/2 position and N/2+1. If N=64, the median is the average of the 64/2 (32nd) and the 64/2+1(33rd) position Other measures of central tendency • The mode is the single most frequently occurring data value. If there are two or more values used equally frequently, then the data set is called bimodal or tri-modal, etc • The midrange is the midpoint of the sample - the average of the smallest and largest data values in the sample. (= (2+10)/2 =6 for both groups • The geometric mean (log transformation) =8.46 (general) and 7.38 (Unitec) • The harmonic mean (inverse transformation) =8.19 (general) and 6.94 (Unitec) • Both these last measures give less weight to extreme scores Overall rating 2 3 4 5 6 7 8 9 10 N General 1 0 3 4 3 12 38 28 57 146 Unitec 1 2 0 7 6 8 16 10 14 64 Compute the median and mode Means, median, mode General Unitec N 146 64 mean median 8.64 9 7.70 8 mode 10 8 geometric mean 8.49 7.38 harmonic mean 8.19 6.94 Proportion of scores The underlying distribution of the data 0.25 Mean =8.36 Median=8.36 Mode = 8.36 0.2 0.15 0.1 0.05 0 2 4 6 8 10 12 14 Overall adults OAP rating Normal distribution Data that looks like a normal distribution Three things we must know before we can say events are different 1. the difference in mean scores of two or more events - the bigger the gap between means the greater the difference 2. the degree of variability in the data - the less variability the better, as it suggests that differences between are reliable Variance and Standard Deviation These are estimates of the spread of data. They are calculated by measuring the distance between each data point and the mean variance (s2) is the average of the squared deviations of each sample value from the mean = s2 = S(X-M)2/(N-1) The standard deviation (s) is the square root of the variance. X Overall rating 2 3 4 5 6 7 8 9 10 N Mean Unitec (Mu)= n Unitec 1 2 0 7 6 8 16 10 14 64 7.70 (X-Mu) (X-Mu)2*n -5.70 -4.70 -3.70 -2.70 -1.70 -0.70 0.30 1.30 2.30 32.5 44.2 0.0 51.1 17.4 4.0 1.4 16.8 73.9 241.4 Variance= SD or s= 3.83 1.96 Calculating the Variance (s2) and the Standard Deviation (s) for the Unitec sample All normal distributions have similar properties. The percentage of the scores that is between one standard deviation (s) below the mean and one standard deviation above is always 68.26% s Is there a difference between Unitec and General overall OAP rating scores Is there a significant difference between Unitec and General OAP rating scores s s Three things we must know before we can say events are different 3. The extent to which the sample is representative of the population from which it is drawn - the bigger the sample the greater the likelihood that it represents the population from which it is drawn - small samples have unstable means. Big samples have stable means. Estimating difference The measure of stability of the mean is the Standard Error of the Mean = standard deviation/the square root of the number in the sample. So stability of mean is determined by the variability in the sample (this can be affected by the consistency of measurement) and the size of the sample. The standard error of the mean (SEM) is the standard deviation of the normal distribution of the mean if we were to measure it again and again Yes it’s significant. The mean of the smaller sample (Unitec) is not too variable. Its Standard Error of the Mean = 0.24. 1.96 *SE = 0.48 = the 95% confidence interval. The General mean falls outside this confidence interval s s Is the difference between means significant? What is clear is that the mean of the General group is outside the area where there is a 95% chance that the mean for the Unitec Group will fall, so it is likely that the General mean comes from a different population as the Unitec mean. The convention is to say that if mean 2 falls outside of the area (the confidence interval) where 95% of mean 1 scores is estimated to be, then mean 2 is significantly different from mean 1. We say the probability of mean 1 and mean 2 being the same is less than 0.05 (p<0.05) and the difference is significant The significance of significance • Not an opinion • A sign that very specific criteria have been met • A standardised way of saying that there is a There is a difference between two groups – p<0.05; There is no difference between two groups – p>0.05; There is a predictable relationship between two groups – p<0.05; or There is no predictable relationship between two groups - p>0.05. • A way of getting around the problem of variability One and two tailed tests 1-tailed test 2-tailed test -1.96 +1.96 Standard deviations 2.5% of 95% of 2.5% of distridistridistribution bution bution If you argue for a one tailed test – saying the difference can only be in one direction, then you can add 2.5% error from side where no data is expected to the side where it is T-test result t-Test: Two-Sample Assuming Unequal Variances General adults Unitec adults Mean 8.64 7.7 Variance 2.34 3.83 Observations 146 64 t Stat for p<0.05 p one-tail t Critical one-tail p two-tail t Critical two-tail 3.41 0.00 1.66 0.00 1.98 Mean Variance Observations t Stat for p<0.05 p one-tail t Critical one-tail p two-tail t Critical two-tail Mean Variance Observations t Stat for p<0.05 p one-tail t Critical one-tail p two-tail t Critical two-tail Massey 9.23 1.20 52 1.62 0.06 1.75 0.12 2.12 male 8.94 1.55 83 1.52 0.07 1.65 0.13 1.97 Unsworth Heights 8.33 4.24 15 female 8.65 2.28 125

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Inferential statistics - Moodle