Download Inferential statistics - Moodle

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Statistics
•
•
•
•
Intro to statistics
Presentations
More on who to do qualitative analysis
Tututorial time
Inferential statistics
Descriptive vs Inferential statistics
• Descriptive statistics like totals (how many people
came?), percentages (what proportion of the total
were adolescents?) and averages (how much did
they enjoy it?) use numbers to describe things that
happen.
• Inferential statistics infer or predict the differences
and relationships between things. They also tell us
how certain or confident we can be about the
predictions.
Descriptive data page
Why statistics are important
Statistics are concerned with difference – how much
does one feature of an environment differ from
another
Suicide rates/100,000 people
Why statistics are important
Relationships – how does much one feature of
the environment change as another measure
changes
The response of the fear centre of white people to
black faces depending on their exposure to
diversity as adolescents
The two tasks of statistics
Magnitude: What is the size of the difference or the
strength of the relationship?
Reliability. What is the degree to which the
measures of the magnitude of variables can be
replicated with other samples drawn from the
same population.
Magnitude – what’s our measure?
• Raw number?
• Some aggregate of numbers? Mean, median, mode?
Suicide rates/100,000 people
Arithmetic mean or average
A
B
Overall Gener
rating al
2
1
3
0
4
3
5
4
6
3
7
12
8
38
9
28
10
57
N
146
A*B
2
0
12
___
C
A*C
Unitec
1
2
0
7
6
8
16
10
14
___
64
Mean (M or X), is
the sum (SX) of
all the sample
values ((X1 + X2
+X3.…… X22)
divided by the
sample size (N).
Mean/average =
SX/N
Compute the mean
General
Unitec
Total
(SX)
1262
493
N
146
64
mean
8.64
7.70
The median
• median is the "middle" value of the sample. There are
as many sample values above the sample median as
below it.
• If the number (N) in the sample is odd, then the
median = the value of that piece of data that is on the
(N-1)/2+1 position of the sample ordered from
smallest to largest value. E.g. If N=45, the median is
the value of the data at the (45-1)/2+1=23rd position
• If the sample size is even then the median is defined as
the average of the value of N/2 position and N/2+1. If
N=64, the median is the average of the 64/2 (32nd) and
the 64/2+1(33rd) position
Other measures of central tendency
• The mode is the single most frequently occurring
data value. If there are two or more values used
equally frequently, then the data set is called bimodal or tri-modal, etc
• The midrange is the midpoint of the sample - the
average of the smallest and largest data values in the
sample. (= (2+10)/2 =6 for both groups
• The geometric mean (log transformation) =8.46
(general) and 7.38 (Unitec)
• The harmonic mean (inverse transformation) =8.19
(general) and 6.94 (Unitec)
• Both these last measures give less weight to extreme
scores
Overall
rating
2
3
4
5
6
7
8
9
10
N
General
1
0
3
4
3
12
38
28
57
146
Unitec
1
2
0
7
6
8
16
10
14
64
Compute
the median
and mode
Means, median, mode
General
Unitec
N
146
64
mean
median
8.64
9
7.70
8
mode
10
8
geometric mean
8.49
7.38
harmonic mean
8.19
6.94
Proportion of scores
The underlying distribution of the data
0.25
Mean =8.36
Median=8.36
Mode = 8.36
0.2
0.15
0.1
0.05
0
2
4
6
8
10
12
14
Overall adults OAP rating
Normal distribution
Data that looks like a normal distribution
Three things we must know before
we can say events are different
1. the difference in mean scores of two or
more events
- the bigger the gap between means the
greater the difference
2. the degree of variability in the data
- the less variability the better, as it
suggests that differences between are
reliable
Variance and Standard Deviation
These are estimates of the spread of data.
They are calculated by measuring the
distance between each data point and the
mean
variance (s2) is the average of the squared
deviations of each sample value from the
mean = s2 = S(X-M)2/(N-1)
The standard deviation (s) is the square root
of the variance.
X
Overall
rating
2
3
4
5
6
7
8
9
10
N
Mean
Unitec
(Mu)=
n
Unitec
1
2
0
7
6
8
16
10
14
64
7.70
(X-Mu)
(X-Mu)2*n
-5.70
-4.70
-3.70
-2.70
-1.70
-0.70
0.30
1.30
2.30
32.5
44.2
0.0
51.1
17.4
4.0
1.4
16.8
73.9
241.4
Variance=
SD or s=
3.83
1.96
Calculating
the
Variance
(s2) and the
Standard
Deviation
(s) for the
Unitec
sample
All normal distributions have similar properties. The
percentage of the scores that is between one standard
deviation (s) below the mean and one standard deviation
above is always 68.26%
s
Is there a difference between Unitec and
General overall OAP rating scores
Is there a significant difference between
Unitec and General OAP rating scores
s
s
Three things we must know before
we can say events are different
3. The extent to which the sample is
representative of the population from
which it is drawn
- the bigger the sample the greater the
likelihood that it represents the population
from which it is drawn
- small samples have unstable means. Big
samples have stable means.
Estimating difference
The measure of stability of the mean is the Standard
Error of the Mean = standard deviation/the square root
of the number in the sample.
So stability of mean is determined by the variability in
the sample (this can be affected by the consistency of
measurement) and the size of the sample.
The standard error of the mean (SEM) is the standard
deviation of the normal distribution of the mean if we
were to measure it again and again
Yes it’s significant. The mean of the smaller sample (Unitec) is
not too variable. Its Standard Error of the Mean = 0.24. 1.96 *SE
= 0.48 = the 95% confidence interval. The General mean falls
outside this confidence interval
s
s
Is the difference between means
significant?
What is clear is that the mean of the General group is
outside the area where there is a 95% chance that the
mean for the Unitec Group will fall, so it is likely that
the General mean comes from a different population as
the Unitec mean.
The convention is to say that if mean 2 falls outside of the
area (the confidence interval) where 95% of mean 1
scores is estimated to be, then mean 2 is significantly
different from mean 1. We say the probability of mean
1 and mean 2 being the same is less than 0.05 (p<0.05)
and the difference is significant
The significance of significance
• Not an opinion
• A sign that very specific criteria have been met
• A standardised way of saying that there is a
There is a difference between two groups – p<0.05;
There is no difference between two groups – p>0.05;
There is a predictable relationship between two
groups – p<0.05; or
There is no predictable relationship between two
groups - p>0.05.
• A way of getting around the problem of variability
One and
two tailed
tests
1-tailed test
2-tailed test
-1.96
+1.96
Standard deviations
2.5% of
95% of
2.5% of
distridistridistribution
bution
bution
If you argue
for a one
tailed test –
saying the
difference can
only be in one
direction, then
you can add
2.5% error
from side
where no data
is expected to
the side where
it is
T-test result
t-Test: Two-Sample Assuming Unequal Variances
General adults
Unitec adults
Mean
8.64
7.7
Variance
2.34
3.83
Observations
146
64
t Stat for p<0.05
p one-tail
t Critical one-tail
p two-tail
t Critical two-tail
3.41
0.00
1.66
0.00
1.98
Mean
Variance
Observations
t Stat for p<0.05
p one-tail
t Critical one-tail
p two-tail
t Critical two-tail
Mean
Variance
Observations
t Stat for p<0.05
p one-tail
t Critical one-tail
p two-tail
t Critical two-tail
Massey
9.23
1.20
52
1.62
0.06
1.75
0.12
2.12
male
8.94
1.55
83
1.52
0.07
1.65
0.13
1.97
Unsworth Heights
8.33
4.24
15
female
8.65
2.28
125
Related documents