Download Why statistics is important

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Inferential statistics
Why statistics are important
• Statistics are concerned with difference –
how much does one feature of an
environment differ from another
• Magnitude: The comparative strength of
two variables.
• Reliability. The degree to which the
measure of the magnitude of a variable can
be replicated with other samples drawn
from the same population.
Why statistics are important
• Relationships – how does much one feature of
the environment change as another measure
changes
Correlation or regression
r=0.73
N=20
p<0.01
Arithmetic mean or average
Mean (M or X), is the sum (SX) of all the
sample values ((X1 + X2 +X3.…… X22)
divided by the sample size (N).
SX = 45, N = 22. M = SX/N = 45/22 = 2.05
The median
• median is the "middle" value of the sample.
There are as many sample values above the
sample median as below it.
• If the sample size is odd (say, 2a + 1), then
the median is the (a+1)st largest data value.
If the sample size is even (say, 2a), then the
median is defined as the average of the ath
and (a+1)st largest data values.
Other measures of central
tendency
• The mode is the single most frequently
occurring data value.
• The midrange is the midpoint of the sample
-- the average of the smallest and largest
data values in the sample.
• Find the Mean, Median and Mode
frequency
ecological footprint histogram
18
16
14
12
10
8
6
4
2
0
61-65
66-70
71-75 76-80 81-85 86-90
ecological footprint score
91-95
The underlying distribution of the data
proportion of scores
Normal distribution of the ecological
footprint
0.075
Mean =77.48
SD=7.15
N=62
0.05
0.025
0
50
55
60
65
70
75
80
85
ecological footprint
90
95 100
Normal distribution
All normal distributions have similar properties. The
percentage of the scores that is between one standard
deviation (s) below the mean and one standard deviation
above is always 68.26%
Mean =77.48
SD=7.15
N=62
-2SD
-14.30
-1SD
-7.15
0
0
+1SD +2SD
+7.15 +14.30
Is there a difference between Rich and
poor scores
Histogram of rich vs poor ecological
footprint scores
10
frquency
8
Rich
Poor
6
4
2
0
61-65 66-70 71-75 76-80 81-85 86-90 91-95
ecological footprint scores
Is there a significant difference between
Polynesian and “other” scores
Mean =75.0
SD=6.8
N=20
Mean =81.9
SD=6.5
N=20
Three things we must know before
we can say events are different
1. the difference in mean scores of two or
more events
- the bigger the gap between means the
greater the difference
2. the degree of variability in the data
- the less variability the better
Variance and Standard Deviation
These are estimates of the spread of data.
They are calculated by measuring the
distance between each data point and the
mean
variance (s2) is the average of the squared
deviations of each sample value from the
mean = s2 = S(X-M)2/(N-1)
The standard deviation (s) is the square root
of the variance.
Rich
72
75
75
76
76
76
77
77
78
80
80
82
87
87
87
88
89
89
91
95
Total
Mean (Mx)
Nx=20
X-M
-9.85
-6.8
-6.8
-5.8
-5.8
-5.8
-4.8
-4.8
-3.8
-1.8
-1.8
0.2
5.2
5.2
5.2
6.2
7.2
7.2
9.2
13.2
1637
81.9 variance(x)
Standard deviation (Sx)
(X-M)2
97.02
46.9
46.9
34.2
34.2
34.2
23.5
23.5
14.8
3.4
3.4
0.0
26.5
26.5
26.5
37.8
51.1
51.1
83.7
172.9
838.55
41.9
6.5
Calculating
the
Variance
and the
standard
deviation
for the Rich
sample
Three things we must know before
we can say events are different
3. The extent to which the sample is
representative of the population from
which it is drawn
- the bigger the sample the greater the
likelihood that it represents the population
from which it is drawn
- small samples have unstable means. Big
samples have stable means.
Estimating difference
The measure of stability of the mean is the Standard
Error of the Mean = standard deviation/the square root
of the number in the sample.
So stability of mean is determined by the variability in
the sample (this can be affected by the consistency of
measurement) and the size of the sample.
The standard error of the mean (SEM) is the standard
deviation of the normal distribution of the mean if we
were to measure it again and again
Yes it’s significant. The Standard Errors of the Mean =
1.45 and 1.53, so the 95% confidence interval will be
about 3 points (1.96*1.5) either side of the mean. The
means falls outside each other’s confidence intervals
Is the difference between means
significant?
What is clear is that the mean of the Rich group is well
outside of the area where there is a 95% chance that the
mean for the Poor Group will fall, so it is likely that the
Rich mean comes from a different population than the
Poor mean.
The convention is to say that if mean 2 falls outside of the
area (the confidence interval) where 95% of mean 1
scores is estimated to be, then mean 2 is significantly
different from mean 1. We say the probability of mean
1 and mean 2 being the same is less than 0.05 (p<0.05)
and the difference is significant
The significance of significance
• Not an opinion
• A sign that very specific criteria have been met
• A standardised way of saying that there is a
There is a difference between two groups – p<0.05;
There is no difference between two groups – p>0.05;
There is a predictable relationship between two
groups – p<0.05; or
There is no predictable relationship between two
groups - p>0.05.
• A way of getting around the problem of variability
1-tailed test
2-tailed test
2.5% of
M1
distribution
95% of
M1
distribution
2.5% of
M1
distri=b
ution
If you argue
for a one
tailed test –
saying the
difference can
only be in one
direction, then
you can add
2.5% error
from side
where no data
is expected to
the side where
it is
T-test results
t-Test: Two-Sample Assuming Equal Variances
Mean
Variance
Observations
Poor
75
49.1
20
Pooled Variance
Hypothesized Mean Difference
df
t Stat
P(T<=t) one-tail
t Critical one-tail
P(T<=t) two-tail
t Critical two-tail
46.6
0
38
-3.2
0
1.69
0
2.02
Rich
81.9
44.1
20
Tests of significance
• Tests of difference – t-tests, analysis of
variance, chi-square, odds ratios
• Tests of relationship – correlation,
regression analysis
• Tests of difference and relationship –
analysis of covariance, multiple regression
analysis.
Chi-squared (c2) comparison of age in
the sample vs the Waitakere population
Obse
Participants in rved
each category Sam
ple
O
Age
26
16-34 years
Expec
ted
Waita
kere
E
O-E (O-E)2 (O-E)2/E
23.35 2.65
7.00
0.30
35-54
23
23.85 -0.85 0.72
0.03
55-74
10
11.52 -1.52 2.30
0.20
N=4
75 and older
3
3.29 -0.29 0.09
0.03
62
62.01
c2=
0.56
p=0.05 c2=7.82
NS=not
significant
DF=3
Values of chi-square for the
research project
The fact that two groups are not significant means that there is
no significant difference between the sample and Waitakere
population except for culture and qualifications
Chi-squared
Group
obtained criterion
Occupation
15.56
21.03
Age
0.56
7.82
Family context
0.39
7.82
Culture
20.13
11.07
Gender
0.01
3.84
Qualifications
6.12
5.99
P
p<0.05
p<0.05
p<0.05
p>0.05
p<0.05
significance
NS
NS
NS
Significant
NS
p>0.05
Significant
Height
Person (inches) X
1
68
2
71
3
62
4
75
5
58
6
60
7
67
8
68
9
71
10
69
Self
Esteem
Person
score/5 - Y
4.1
11
4.6
12
3.8
13
4.4
14
3.2
15
3.1
16
3.8
17
4.1
18
4.3
19
3.7
20
Height
(inches)
-X
68
67
63
62
60
63
65
67
63
61
Self Esteem
score/5 -Y
3.5
3.2
3.7
3.3
3.4
4.0
4.1
3.8
3.4
3.6
r =( S(X – MX)*((Y – MY))/(N*SX*SY)
r =correlation coefficient
X = Height
Y= Self Esteem
MX=Mean of X
MY =Mean of Y
SX=Standard deviation of X
SY=Standard deviation of y
r=0.73
N=20
Level of Significance
Two-Tailed Probabilities
Probability
of error
0.1
Chance of
not being
correlated
10% or
1/10
r value when
n=20
0.378
0.05
0.01
0.001
5% or 1% or
1/20 1 /100
0.1% or
1/1000
0.444
0.561
0.679
One or two
tails?
What degrees
of freedom
What level of
significance
should be
chosen?
Correlations
The perfect positive correlation
The perfect negative correlation
No correlation at all
A perfect relationship, but not a
correlation
y
x
How correlation is used and
misused
Normality of residuals, Linearity,
& Homoscedasticity
Related documents