Download Sample B (g)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
x
x
n
x 2
2
s
x
n
Measures of central tendency are statistics that express the
most typical or average scores in a distribution
These measures are:
• The Mode
• The Median
• The Mean
The simplest measure of central tendency is the mode; the
mode is the value that occurs with the greatest frequency
within a data set
Students weighed two different samples of broad beans and
obtained the data shown below
Sample
A (g)
1.42
1.42
1.43
1.44
1.44
1.44
1.44
1.45
1.46
1.46
Sample
B (g)
1.36
1.37
1.40
1.43
1.44
1.44
1.47
1.48
1.49
2.01
Students weighed two different samples of broad beans and
obtained the data shown below
Sample
A (g)
1.42
1.42
1.43
1.44
1.44
1.44
1.44
1.45
1.46
1.46
Sample
B (g)
1.36
1.37
1.40
1.43
1.44
1.44
1.47
1.48
1.49
2.01
The most frequently occurring value (i.e. the mode) in both
Sample A and Sample B is 1.44
The Median is the central or middle value of a set of values
when placed in order
Sample 1.42 1.42 1.43 1.44 1.44 1.44 1.44 1.45 1.46 1.46
A
Median
As sample A includes an even number of values then the
median is halfway between the middle two, i.e. 1.44 and 1.44;
these values are the same and the median is therefore 1.44
Sample
A (g)
1.42
1.42
1.43
1.44
1.44
1.44
1.44
1.45
1.46
1.46
Sample
B (g)
1.36
1.37
1.40
1.43
1.44
1.44
1.47
1.48
1.49
2.01
The Mean is obtained by adding up all of the values and then
dividing their sum by the number of values
The formula for calculating the mean is:
x
x
n
where x = the mean
 = the sum of
x = any value
n = number of values
Calculate the means for samples A and B
Sample
A (g)
1.42
1.42
1.43
1.44
1.44
1.44
1.44
1.45
1.46
1.46
Sample
B (g)
1.36
1.37
1.40
1.43
1.44
1.44
1.47
1.48
1.49
2.01
For Sample A, the mean is:
1.42 + 1.42 + 1.43 + 1.44 + 1.44 + 1.44 + 1.44 + 1.45 + 1.46 + 1.46
10
Sample A Mean
14.4/10 = 1.44g
For Sample B, the mean is:
1.36 + 1.37 + 1.40 + 1.43 + 1.44 + 1.44 + 1.47 + 1.48 + 1.49 + 2.01
10
Sample B Mean
14.89/10 = 1.489g
The Mode and Grouped Data
When data is
grouped, it is not
possible to quote
the mode
precisely; the
‘modal class’ is
used to describe
the data
The modal class for this height data is 1.51 – 1.58
Rule of Thumb
In general, the mean is used as a measure of central
tendency with quantitative (interval) data, unless the
distribution is markedly skewed
When summarising qualitative data, the mode or median
are the most appropriate measures of central tendency
When the distribution of interval data is highly
skewed, then the most appropriate measure of central
tendency is the median
Measures of central tendency alone are insufficient for
characterisation of the distribution of data
A measure of how much the data are dispersed or
‘spread out’ is also needed
Four statistics can be used to indicate dispersion:
•
•
•
•
The range
The variance
The standard deviation
The interquartile range
In most cases, the mean and standard deviation
are used with quantitative (interval) data with
the mode or median, and the interquartile
range being used for qualitative variables
Standard Deviation
The Standard Deviation (s) of a set of values is a measure of
the spread of the values from the mean
A formula for calculating the standard deviation is:
( x  x )
s
n
2
where
s = standard deviation
x - x = the deviation of
a value from the mean
 = the sum of
n = number of values
A quicker method for calculating the standard deviation is to
use the equation shown below – this method
is less tedious and less prone to error
x
s
x
n
2
2
where
s = standard deviation
x = any individual value
x = the mean of a set of values
 = the sum of
n = number of values
Calculate the standard deviation for the bean samples A and B
Sample
A (g)
1.42
1.42
1.43
1.44
1.44
1.44
1.44
1.45
1.46
1.46
Sample
B (g)
1.36
1.37
1.40
1.43
1.44
1.44
1.47
1.48
1.49
2.01
The standard deviation for
Sample A is
0.013
The standard deviation for
Sample B is
0.179
Sample B data displays greater variation
than Sample A data
The standard error of the mean provides an estimate of
the likelihood that a sample mean is close to
the true mean of a whole population
The standard error is
calculated using a formula
that takes into
account the standard deviation
of the sample (s) and the
sample size (n)
s
SE 
n
The formula shows that the larger the sample size,
the smaller the standard error of the mean
A graph of standard error of the mean against sample
size reveals an interesting trend
Increasing the sample
size by a few subjects
makes a large difference
to the standard error
when the sample size is
small, but makes much
less of a difference when
the sample size is large
The standard error of the mean can be used to
define confidence limits or intervals
A student measured the heights of 62 individuals and
found the mean height to be 1.64 metres
The standard deviation of this sample was found to be 0.129
The student then estimated the standard error of the mean:
s
0.129
SE 

 0.016
n
62
The student can be 68% confident that the true mean of the
population falls within the range ± 0.016 of the mean
of the sample, i.e. 1.64 ± 0.016 (mean ± 1 SE)
What this means is that the interval between 1.623 and 1.656
(confidence limits) has a 68% probability
of containing the true mean
Researchers more commonly use the 95% confidence interval
s
0.129
SE 

 0.016
n
62
In this case, the student can be 95% confident that the true
mean of the population falls within approximately two
standard errors of the mean of the sample,
i.e. 1.64 ± 0.032 (mean ± 2 SE)
This means that the interval between 1.608 and 1.672
(confidence limits) has a 95% probability
of containing the true mean
More accurate calculations make use of z scores for a
normal distribution to estimate confidence intervals – for
95% confidence intervals, the standard error
is multiplied by 1.96; mean ± 1.96 SE
Broad Bean Samples
Sample A (g) Sample B (g)
1.42
1.36
1.42
1.43
1.44
1.37
1.40
1.43
1.44
1.44
1.44
1.44
1.44
1.47
1.45
1.46
1.46
1.48
1.49
2.01
Estimate the standard
error for Sample A
and Sample B
You may check your
answers by entering
data into a suitable
statistics programme
The two students obtained very different statistical values from their
data even though the beans had been drawn from the same population
– can you suggest reasons for these differences?
Samples A and B are sub-groups of the total population of
broad beans and may not therefore be truly representative
of the population as a whole
Variations between the samples and the original population
may arise as a consequence of:
• Bias in sampling – the students may have unknowingly been
selective when choosing beans to weigh – Random sampling
methods should be used to eliminate bias from the results
• Chance – the students may have, by chance, selected a particular
set of beans – this is more likely to be the case when only one
sample is taken and when the sample size is small – taking at least
three samples (replication), choosing appropriate sample sizes and
obtaining mean results from these different samples, helps to
eliminate chance effects from experimental values
• Measurement error – errors arising from taking any form of
measurement are not uncommon – when the same material is
measured or weighed on a different occasion, different values are
often obtained
More Data
A group of students
measured the masses
of individual French
bean seeds and their
results are shown
in the table
Present these results
in graphical form
Calculate the mean,
median and mode
for these results
Calculate the standard
deviation
Estimate the standard
error of the mean
Define the confidence
limits for the mean of
this set of data
Check your answers
with a suitable
statistics programme
Mass of Bean Seeds (g)
A knowledge of the shape of the distribution of values
obtained in an investigation is crucially important for
choosing an appropriate statistical test for analysis
The normal distribution is theoretically determined by
the value of the mean and the standard deviation
When the value of the mean is zero and the standard
deviation is one, the normal curve is said to be in
‘standard form’
A characteristic ‘bell’ shape graph is obtained
The
characteristic
bell-shaped curve
of a normal
distribution has
the following
characteristics:
• It is symmetrical about the mean, so
that equal numbers of values fall
above and below the mean (mean =
median = mode)
• Relatively few values fall into the high
or low categories of the distribution;
68% of its values are within one
standard deviation of the mean
• 95% of its values are within two
standard deviations of the mean
• About 99% of its values are
within three standard deviations
of the mean
Many investigations generate data which approximate to
the normal distribution
Skewed distributions deviate from the ‘normal’ distribution
curve - their distributions are asymmetrical
The mean, mode and median differ in a skewed distribution;
the mean and median values are less than the mode for a
negatively skewed distribution, and greater than the
mode for a positively skewed distribution
The degree of skewness can be determined by calculating the
coefficient of skewness (Sk) where s = the standard deviation
mean - mode
Sk 
s
When the distribution of interval data is
highly skewed, then the median and
interquartile range should be used as
measures of central tendency and dispersion
A useful, visual method for
assessing whether a set of data
can be assumed to have come
from a normal distribution is to
plot the data against their
cumulative frequency distribution
on special graph paper
The graph paper used for
this plot is called normal
probability paper
When the graphed data lies
close to a straight line, we may
assume that the distribution is
approximately normal
Class Interval
(m)
Frequency
1.30 – 1.37
3
1.37 – 1.44
12
1.44 – 1.51
14
1.51 – 1.58
24
1.58 – 1.65
23
1.65 – 1.72
22
1.72 – 1.79
16
1.79 – 1.86
6
Cumulative
Frequency
Percentage
Cumulative
Frequency
Use the human height data above to obtain the cumulative frequencies and
the percentage cumulative frequencies
Percentage cumulative frequencies are obtained by dividing the cumulative
frequencies by the total cumulative frequency and multiplying by 100
Frequency
Cumulative
Frequency
Percentage
Cumulative
Frequency
1.30 – 1.37
3
3
2.50
1.37 – 1.44
12
15
12.50
1.44 – 1.51
14
29
24.17
1.51 – 1.58
24
53
44.17
1.58 – 1.65
23
76
63.33
1.65 – 1.72
22
98
81.67
1.72 – 1.79
16
114
95.00
1.79 – 1.86
6
120
100.00
Class Interval
(m)
Plot a graph of percentage cumulative frequency against the upper class
boundary of the height data using the provided probability graph paper
Assess the normality of the distribution for the height data
The probability plot for the
height data shows that the
points lie close to a straight
line, and we may assume
that the distribution is
approximately normal
Using the same method,
test the bean data on
the following slide
for normality
Percentage
Cumulative
Frequency
Class Interval (g)
Frequency
Cumulative
Frequency
0.91 - 1.04
2
2
2.86
1.04 - 1.17
5
7
10.00
1.17 - 1.30
5
12
17.14
1.30 - 1.43
14
26
37.14
1.43 - 1.56
13
39
55.71
1.56 - 1.69
11
50
71.43
1.69 - 1.82
9
59
84.29
1.82 - 1.95
7
66
94.29
1.95 - 2.08
1
67
95.71
2.08 - 2.21
3
70
100.00
BEAN DATA
The probability plot for the
bean data shows that the
points lie close to a straight
line, and we may assume
that the distribution is
approximately normal
Related documents