Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Inferential statistics Why statistics are important • Statistics are concerned with difference – how much does one feature of an environment differ from another • Magnitude: The comparative strength of two variables. • Reliability. The degree to which the measure of the magnitude of a variable can be replicated with other samples drawn from the same population. Why statistics are important • Relationships – how does much one feature of the environment change as another measure changes Correlation or regression r=0.73 N=20 p<0.01 Arithmetic mean or average Mean (M or X), is the sum (SX) of all the sample values ((X1 + X2 +X3.…… X22) divided by the sample size (N). SX = 45, N = 22. M = SX/N = 45/22 = 2.05 The median • median is the "middle" value of the sample. There are as many sample values above the sample median as below it. • If the sample size is odd (say, 2a + 1), then the median is the (a+1)st largest data value. If the sample size is even (say, 2a), then the median is defined as the average of the ath and (a+1)st largest data values. Other measures of central tendency • The mode is the single most frequently occurring data value. • The midrange is the midpoint of the sample -- the average of the smallest and largest data values in the sample. • Find the Mean, Median and Mode frequency ecological footprint histogram 18 16 14 12 10 8 6 4 2 0 61-65 66-70 71-75 76-80 81-85 86-90 ecological footprint score 91-95 The underlying distribution of the data proportion of scores Normal distribution of the ecological footprint 0.075 Mean =77.48 SD=7.15 N=62 0.05 0.025 0 50 55 60 65 70 75 80 85 ecological footprint 90 95 100 Normal distribution All normal distributions have similar properties. The percentage of the scores that is between one standard deviation (s) below the mean and one standard deviation above is always 68.26% Mean =77.48 SD=7.15 N=62 -2SD -14.30 -1SD -7.15 0 0 +1SD +2SD +7.15 +14.30 Is there a difference between Rich and poor scores Histogram of rich vs poor ecological footprint scores 10 frquency 8 Rich Poor 6 4 2 0 61-65 66-70 71-75 76-80 81-85 86-90 91-95 ecological footprint scores Is there a significant difference between Polynesian and “other” scores Mean =75.0 SD=6.8 N=20 Mean =81.9 SD=6.5 N=20 Three things we must know before we can say events are different 1. the difference in mean scores of two or more events - the bigger the gap between means the greater the difference 2. the degree of variability in the data - the less variability the better Variance and Standard Deviation These are estimates of the spread of data. They are calculated by measuring the distance between each data point and the mean variance (s2) is the average of the squared deviations of each sample value from the mean = s2 = S(X-M)2/(N-1) The standard deviation (s) is the square root of the variance. Rich 72 75 75 76 76 76 77 77 78 80 80 82 87 87 87 88 89 89 91 95 Total Mean (Mx) Nx=20 X-M -9.85 -6.8 -6.8 -5.8 -5.8 -5.8 -4.8 -4.8 -3.8 -1.8 -1.8 0.2 5.2 5.2 5.2 6.2 7.2 7.2 9.2 13.2 1637 81.9 variance(x) Standard deviation (Sx) (X-M)2 97.02 46.9 46.9 34.2 34.2 34.2 23.5 23.5 14.8 3.4 3.4 0.0 26.5 26.5 26.5 37.8 51.1 51.1 83.7 172.9 838.55 41.9 6.5 Calculating the Variance and the standard deviation for the Rich sample Three things we must know before we can say events are different 3. The extent to which the sample is representative of the population from which it is drawn - the bigger the sample the greater the likelihood that it represents the population from which it is drawn - small samples have unstable means. Big samples have stable means. Estimating difference The measure of stability of the mean is the Standard Error of the Mean = standard deviation/the square root of the number in the sample. So stability of mean is determined by the variability in the sample (this can be affected by the consistency of measurement) and the size of the sample. The standard error of the mean (SEM) is the standard deviation of the normal distribution of the mean if we were to measure it again and again Yes it’s significant. The Standard Errors of the Mean = 1.45 and 1.53, so the 95% confidence interval will be about 3 points (1.96*1.5) either side of the mean. The means falls outside each other’s confidence intervals Is the difference between means significant? What is clear is that the mean of the Rich group is well outside of the area where there is a 95% chance that the mean for the Poor Group will fall, so it is likely that the Rich mean comes from a different population than the Poor mean. The convention is to say that if mean 2 falls outside of the area (the confidence interval) where 95% of mean 1 scores is estimated to be, then mean 2 is significantly different from mean 1. We say the probability of mean 1 and mean 2 being the same is less than 0.05 (p<0.05) and the difference is significant The significance of significance • Not an opinion • A sign that very specific criteria have been met • A standardised way of saying that there is a There is a difference between two groups – p<0.05; There is no difference between two groups – p>0.05; There is a predictable relationship between two groups – p<0.05; or There is no predictable relationship between two groups - p>0.05. • A way of getting around the problem of variability 1-tailed test 2-tailed test 2.5% of M1 distribution 95% of M1 distribution 2.5% of M1 distri=b ution If you argue for a one tailed test – saying the difference can only be in one direction, then you can add 2.5% error from side where no data is expected to the side where it is T-test results t-Test: Two-Sample Assuming Equal Variances Mean Variance Observations Poor 75 49.1 20 Pooled Variance Hypothesized Mean Difference df t Stat P(T<=t) one-tail t Critical one-tail P(T<=t) two-tail t Critical two-tail 46.6 0 38 -3.2 0 1.69 0 2.02 Rich 81.9 44.1 20 Tests of significance • Tests of difference – t-tests, analysis of variance, chi-square, odds ratios • Tests of relationship – correlation, regression analysis • Tests of difference and relationship – analysis of covariance, multiple regression analysis. Chi-squared (c2) comparison of age in the sample vs the Waitakere population Obse Participants in rved each category Sam ple O Age 26 16-34 years Expec ted Waita kere E O-E (O-E)2 (O-E)2/E 23.35 2.65 7.00 0.30 35-54 23 23.85 -0.85 0.72 0.03 55-74 10 11.52 -1.52 2.30 0.20 N=4 75 and older 3 3.29 -0.29 0.09 0.03 62 62.01 c2= 0.56 p=0.05 c2=7.82 NS=not significant DF=3 Values of chi-square for the research project The fact that two groups are not significant means that there is no significant difference between the sample and Waitakere population except for culture and qualifications Chi-squared Group obtained criterion Occupation 15.56 21.03 Age 0.56 7.82 Family context 0.39 7.82 Culture 20.13 11.07 Gender 0.01 3.84 Qualifications 6.12 5.99 P p<0.05 p<0.05 p<0.05 p>0.05 p<0.05 significance NS NS NS Significant NS p>0.05 Significant Height Person (inches) X 1 68 2 71 3 62 4 75 5 58 6 60 7 67 8 68 9 71 10 69 Self Esteem Person score/5 - Y 4.1 11 4.6 12 3.8 13 4.4 14 3.2 15 3.1 16 3.8 17 4.1 18 4.3 19 3.7 20 Height (inches) -X 68 67 63 62 60 63 65 67 63 61 Self Esteem score/5 -Y 3.5 3.2 3.7 3.3 3.4 4.0 4.1 3.8 3.4 3.6 r =( S(X – MX)*((Y – MY))/(N*SX*SY) r =correlation coefficient X = Height Y= Self Esteem MX=Mean of X MY =Mean of Y SX=Standard deviation of X SY=Standard deviation of y r=0.73 N=20 Level of Significance Two-Tailed Probabilities Probability of error 0.1 Chance of not being correlated 10% or 1/10 r value when n=20 0.378 0.05 0.01 0.001 5% or 1% or 1/20 1 /100 0.1% or 1/1000 0.444 0.561 0.679 One or two tails? What degrees of freedom What level of significance should be chosen? Correlations The perfect positive correlation The perfect negative correlation No correlation at all A perfect relationship, but not a correlation y x How correlation is used and misused Normality of residuals, Linearity, & Homoscedasticity