Download No Slide Title

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Mean field particle methods wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Student's t-test wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
Summary descriptive statistics:
means and standard deviations:
Measures of central tendency ("averages")
Measures of dispersion (spread of scores)
1. The Mode:
The most frequent score in a set of scores.
6, 11, 22, 22, 96, 98. Mode = 22
Advantages of the mode:
(i) Simple to calculate, easy to understand.
(ii) The only average which can be used with
nominal data.
Disadvantages of the mode:
(i) May be unrepresentative and hence misleading.
e.g.: 3, 4, 4, 5, 6, 7, 8, 8, 96, 96, 96.
Mode is 96 - but most of the scores are low
numbers.
(ii) May be more than one mode in a set of scores.
e.g.: 3, 3, 3, 4, 4, 4, 6, 6, 6 has three modes!
2. The Median:
When scores are arranged in order of size, the
median is either
(a) the middle score (if there is an odd number of
scores).
4, 5 ,6 ,7, 8, 8, 96. Median = 7.
or
(b) the average of the middle two scores (if there is
an even number of scores).
4, 5, 6, 7, 8, 8, 96, 96. Median = (7+8)/2 = 7.5.
Advantages of the median:
(i) Resistant to the distorting effects of extreme
high or low scores.
Disadvantages of the median:
(i) Ignores scores' numerical values, which is
wasteful if data are interval or ratio.
(ii) More susceptible to sampling fluctuations than
the mean.
(iii) Less mathematically useful than the mean.
3. The Mean:
Add all the scores together and divide by the total
number of scores.
e.g. (3+4+4+5+6) / 5 =
22 / 5 = 4.4
X

X
N
Advantages of the mean:
(i) Uses information from every single score.
(ii) Resistant to sampling fluctuation - i.e., varies the
least from sample to sample. (Important since we
normally want to extrapolate from samples to
populations).
Disadvantages of the mean:
(i) Susceptible to distortion from extreme scores.
e.g.: 4, 5, 5, 6 : mean = 5. 4, 5, 5, 106: mean = 30.
(ii) Can only be used with interval or ratio data, not
with ordinal or nominal data.
1. The Range:
The difference between the highest and lowest
scores. (i.e. range = highest - lowest).
Advantages:
Quick and easy to calculate, easy to understand.
Disadvantages:
Unduly influenced by extreme scores.
3, 4, 4, 5, 100. Range = (100-3) = 97.
3, 4, 4, 5, 5. Range = (5-3) = 2.
Conveys no information about the spread of scores
between the highest and lowest scores.
e.g. 2, 2, 2, 2, 2, 20 and 2, 20, 20, 20, 20, 20 have
exactly the same range (18) but very different
distributions.
2. The Standard Deviation (SD):
The "average difference of scores from the mean". The
bigger the SD, the more scores differ from the mean
and between themselves, and the less satisfactory the
mean becomes as a summary of the data.
7
1
5
14
6
11
8
1
5
1
6
9
5
2
6
9
mean = 6 SD = 1.69
mean = 6 SD = 5.32
Advantages:
Like the mean, uses information from every score.
Disadvantages:
Not intuitively easy to understand!
Can only be used with interval or ratio data.
s

 XX

2
n
How to calculate the standard deviation:
For the set of scores 5, 6, 7, 9, 11:
(a) Work out the mean:
X
= 38 / 5 = 7.6
s

X  X 
2
n
(b) Subtract the mean from each score:
5 - 7.6 = -2.6
6 - 7.6 = -1.6
7 - 7.6 = -0.6
9 - 7.6 = 1.4
11- 7.6 = 3.4
s

X  X 
2
n
(c) Square the differences just obtained:
-2.6 2 = 6.76
-1.6 2 = 2.56
-0.6 2 = 0.36
1.4 2 = 1.96
3.4 2 = 11.56
s

X  X 
2
n
(d) Add up the squared differences:
6.76 + 2.56 + 0.36 + 1.96 + 11.56 = 23.20
s

X  X 
2
n
(e) Divide this by the total number of scores, to get the
variance:
23.20 / 5 = 4.64
s

X  X 
2
n
(f) Standard deviation is the square root of the
variance (we do this to get back to the original units):
4.64 = 2.15.
2.15 is our sample standard deviation.
Complications in using the mean and SD.:
We usually obtain the mean and SD from a sample very rarely from the parent population.
Sometimes we are content to describe our sample per
se, but usually we want to extrapolate to the population
from our sample.
A sample mean is a good estimate of the population
mean.
A sample SD tends to underestimate the population SD.
Hence, when using the sample SD as a description of
the sample, divide by n.
When using the sample SD as an estimate of the
population SD, divide by n-1 (to make the SD larger than
it would otherwise have been).
sample SD as a
description of a
sample
(n ("sigma n") on
calculators):
sample SD as an
estimate of the
population SD
(n-1 on
calculators):
 X  X 
 X  X 
2
s
n
2
s
n 1
In most cases, we use the n-1 version of the SD formula
The Standard Error of the Mean:
This is the standard deviation of a set of sample means.
Shows how much variation there is within a set of
sample means, and hence how likely our particular
sample mean is to be in error, as an estimate of the true
population mean.
means of
different
samples
actual
population
mean
Formula for the standard error:
SE = standard deviation / square root of n
(where n = sample size)
(NB: we usually estimate this from our available data,
so use the n-1 version of the SD formula)
SE 

n 1
n
If the SE is small, our obtained sample mean is
more likely to be similar to the true population
mean than if the SE is large.
Increasing n (the sample size) reduces the size
of the SE:
A sample mean based on 100 scores is
probably closer to the population mean than a
sample mean based on 10 scores.
mean anxiety level (+/- 1 s.d.)
Arithmetic anxiety in relation to
degree subject studied
100
90
80
70
60
50
40
30
20
10
0
maths students psychology students
type of student
Error bars show the mean plus and minus 1 standard deviation.
This graph shows variability of scores within each group.
mean anxiety level (+/- 1 s.e.m.)
Arithmetic anxiety in relation to
degree subject studied
100
90
80
70
60
50
40
30
20
10
0
maths students psychology students
type of student
Error bars show the mean plus and minus 1 standard error of the
mean. This graph shows how much each mean is likely to vary if
you did the study many times over – it indicates the reliability of
each sample mean as an estimate of the true population mean.
Conclusions:
Mean shows "typical" performance - but that is only half the story!
Need to also know about the spread of scores - how
representative is the mean?
Standard deviation - spread of scores around a sample mean.
Tells us how well the mean summarises the sample.
Standard error - spread of sample means around a "true"
population mean.
Tells us how reliable our sample mean is likely to be as an
estimate of the population mean.