Download f - Hinchingbrooke

Document related concepts

History of statistics wikipedia , lookup

Data mining wikipedia , lookup

Misuse of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
Statistics – What is it?
Torture numbers, and they'll confess to anything. ~Gregg Easterbrook
98% of all statistics are made up. ~Author Unknown
Statistics are like bikinis. What they reveal is suggestive, but what they conceal is
vital. ~Aaron Levenstein
Statistics can be made to prove anything - even the truth. ~Author Unknown
Lottery: A tax on people who are bad at math. ~Author Unknown
He uses statistics as a drunken man uses lampposts - for support rather than for
illumination. ~Andrew Lang
The theory of probabilities is at bottom nothing but common sense reduced to
calculus. ~Laplace, Théorie analytique des probabilités, 1820
I could prove God statistically. Take the human body alone - the chances that all the
functions of an individual would just happen is a statistical monstrosity. ~George
Gallup
Statistics are just a way for the mathematician to evangelize his faith. ~Hunter
Brinkmeier
There are three kinds of lies: lies, damned lies, and statistics.“ ~ Benjamin Disraelie
Statistics is the science of using of mathematical tools to
interpret data
Lesson Objective
Understand the different ways of describing data
Understand the importance of different sampling techniques when
collecting data
The Different Ways of Describing Data
Discrete data
Continuous data
Categorical data
Numerical data
Qualitative data
Quantitative data
The Different Ways of Describing Data
Discrete data
Continuous data
Categorical data
Data that is digital and has specific values with gaps in
between. A slight improvement in the accuracy of the
measuring device does not alter the data.
Data that is analogue and takes a range of values. A slight
improvement in the accuracy of the measuring device alters
the data collected.
Data that falls into different labelled groups. If the labels are
numerical then they have no numerical worth so calculating a
mean is meaningless.
Numerical data
Data that is based on the size of numbers where the size of the
numbers have some meaning.
Qualitative data
Data that has been collected based on some quality or
categorization that in some cases may be 'informal' or may use
relatively ill-defined characteristics such as warmth and flavour;
Data that can be observed but not measured.
Quantitative data
Data that has a been collected by using a measuring scale is
data measured or identified on a numerical scale.
Give 3 examples of each type of data:
Discrete data
Continuous data
Categorical data
Numerical data
Qualitative data
Quantitative data
The Different Ways of Describing Data
Discrete data
Eg Shoe Size, Dice score, Type of Pet
Continuous data
Eg Time to run a mile, length of a hair
Categorical data
Eg Types of Pet, House Number, Colour
Numerical data
Eg Score on a dice, Weight of a lemon
Qualitative data
Eg I feel happy, The weather is good today
Quantitative data
Eg The score obtained in a test, the height of a tree
Decide whether each of the following sets of data is categorical or numerical,
and if numerical whether it is discrete or continuous.
1) Cards drawn from a set of playing cards:
{2 of diamonds, ace of spades, 3 of hearts etc…}
2) Number of aces in a hand of 13 cards:
{1, 2, 3, 4}
3) Time in seconds for 100 metre sprint:
{10.05, 12.31, 11.20, 10.67, 11.56, …etc}
4) Fraction of coin tosses which were Heads after 1, 2, 3, … tosses for the following
sequence: H T H T T T H H …
{1, ½, 2/3, ½, 2/5, 1/3, 3/7, ½, …}
5) Number of spectators at a football match:
{23 456, 40 132, 28 320, 18 214, …etc}
6) Day of week when people were born:
{Wednesday, Monday, Sunday, Sunday, Saturday, etc…}
7) Times in seconds between ‘blips’ of a Geiger counter in a physics experiment:
{0.23, 1.23, 3.03, 0.21, 4.51, …etc}
8) Percentages gained by students for a test out of 60:
{20, 78.33, 80, 75, 53.33, …etc}
9) Number of weeds in a 1 m by 1 m square in a biology experiment:
{2, 8, 12, 3, 5, 8, …}
Solution
1 and 6 are categorical data, all the others are numerical.
2 - discrete
3 - continuous
4 - discrete, as the possible fractions can be listed
5 - discrete
7 - continuous
8 - discrete, as there are only 60 possible percentage scores.
9 - discrete, as there must be a whole number of weeds.
Different Sampling Techniques
There are many different ways to generate a sample for data collection:
4 of the most common are:
Random Sampling
Systematic Sampling
Stratified Sampling
Convenience Sampling
Look at the cards on the next slide and decide which sampling technique is being
described. Think of an advantage and a disadvantage for the technique
described.
A pollster stands in
Huntingdon market square
and asks the first 30 people
that will listen to her their
opinions on a market
revamp.
In a survey to assess
opinions about Year 10
uniform a school list is
printed and every 10th pupil
on the list selected.
At a local club it is known
that ¾ of the membership
is female. A sample of 21
females and 7 males is
drawn by randomly picking
names from a hat.
To find out opinions about a
web site you ask the first
30 people to visit the site to
complete a questionnaire
using their browser.
To select a sample of 6
people from a class of 30
to do a maths test, the
class are lined up in height
order and every 5th pupil
selected.
In a class of 20 pupils each
pupil is assigned a number
and 4 members are
selected for a competition
by using the random
number generator on a
calculator.
A Secondary school has 3
Key Stages with pupils split
between them in the ratio
3:2:3 To survey opinions
about the school canteen
they interview 30 students
from KS3, 20 from KS4 and
30 from KS5.
To investigate the health
of whales a marine
biology charity decide to
estimate the length of
whales in the South
Atlantic by measuring the
first 10 whales they find.
A bag contains 100 names.
It is shaken and 30 names
are drawn from the bag
without looking
Lesson Objective
Understand the three key things required to analyse data
In an experiment pupils were selected randomly from their maths lessons and
asked to estimate the area of a triangle and a rectangle . The area of both
shapes was 15cm2. The results are shown below:
age gender
11
f
11
f
11
m
11
f
11
f
11
f
11
m
11
f
11
f
11
m
11
f
11
m
11
m
11
f
11
f
Rec:15
12
10
15
15
18
30
16
18
16
3
15
8
8
14
15
Tr:15
11
50
10
16
64
5
25
15
16
4.5
20
12
9
13
11
age gender
17
f
17
m
17
f
17
m
17
f
17
f
17
m
17
m
17
f
17
f
17
f
17
f
18
m
18
f
18
m
19
f
19
m
Analyse this data.
Rec:15
13
14
15
12
13
16
16
14
15
12
15
10
13
14
18
15
18
Tr:15
8
16
18
20
12
12
14
10
12
13
13
20
30
15
12
15
15
In an experiment pupils were selected randomly from their maths lessons and
asked to estimate the area of a triangle and a rectangle . The area of both
shapes was 15cm2. The results are shown below:
age gender
11
f
11
f
11
m
11
f
11
f
11
f
11
m
11
f
11
f
11
m
11
f
11
m
11
m
11
f
11
f
Rec:15
12
10
15
15
18
30
16
18
16
3
15
8
8
14
15
Tr:15
11
50
10
16
64
5
25
15
16
4.5
20
12
9
13
11
age gender
17
f
17
m
17
f
17
m
17
f
17
f
17
m
17
m
17
f
17
f
17
f
17
f
18
m
18
f
18
m
19
f
19
m
What things could we investigate?
Rec:15
13
14
15
12
13
16
16
14
15
12
15
10
13
14
18
15
18
Tr:15
8
16
18
20
12
12
14
10
12
13
13
20
30
15
12
15
15
Some nuggets of wisdom:
1) “This shows that the boys had a greater spread of data, meaning that the girls
were more accurate” so spread implies accuracy?
2) “I predict that the girls will be more accurate than the boys at estimating the area
as there are more of them and so a greater chance that more will correctly
estimate the area” so the more people you have guessing the more
accurate they will be?
3) “I predict that the boys will be better at estimating as there are fewer, meaning
that there is less chance for anomalous results” so you get the best results
by having a small sample size?
Mode – generally useless for this exercise
Calculating how many got it exactly right is generally useless as the data is
continuous - the fact that some people guessed it correctly has more to do with
Psychology than good estimating skills.
Averaging averages to get an all embracing average is NEVER a good idea:
Data set 1
Data set 2
1 and 8
6
Things to consider:
1) Is what they have tried to analyse clearly stated? Is there a hypothesis or some
1 mark
alternate statement explaining what they are trying to achieve?
2) Have they attempted to find an average?
Is it the most appropriate average for the task?
Is the average calculated properly?
1 mark relevant
average 1 mark
accuracy
3) Have they attempted to look at the consistency of the data?
Have they used an appropriate method to measure consistency?
Is their measure of consistency (range, IQR) calculated properly?
1 mark relevant
measure 1 mark
accuracy
4) Have they drawn a graph or chart to help show the distribution
of the data?
1 mark relevant
graph 1 mark
accuracy
4) Have they written a final comment that refers to their initial statement/hypothesis
and that attempts to provide a conclusion?
Does the final comment agree with their actual maths?
Have they referred to/tied their maths to the conclusion?(Eg the mean of …. for
boys was greater than the mean for girls ….. therefore …)
Does the conclusion comment on both consistency and averages?
3 marks – you
Is there anything in the conclusion to suggest deeper analysis?
judge!
Is there anything that makes you go – that’s cleaver I like that!
When we are analysing numerical data we are interested in 3 things:
1) The Location (Size) of the data
2) The variation (Spread) of the data
3) The shape (Distribution) of the data
1) The Location (Size) of the data
We use averages for this purpose:
Mean
Mode
Median
Mid Range
2) The variation (Spread) of the data
Range
Inter-quartile Range
Standard deviation/Root Mean Squared Deviation
3) The shape (Distribution) of the data
We use graphs for this purpose:
Stem and Leaf diagrams
Box and Whisker Plots
Bar Chars
Histograms
Lesson Objective
Revise basic graph types and their uses
Focus on drawing and interpreting histograms
This data set is the heights of a group of 38 ‟A‟ level students.
GIRLS
1)
2)
2)
3)
4)
BOYS
How tall is the shortest person in the sample?
How many girls in the sample?
What is the range of the boys heights?
What is the median height of the girls?
What is the inter-quartile range of the boys heights?
The Pie Charts show how Year 10 and 11 students travel to school.
From the Pie Chart
a) Can you tell if more Boys or Girls walk to school?
b) If the angle for walking in the girls section is 18 degrees and represents 10 pupils,
how many girls were surveyed.
This histogram illustrates
the time students in a
form group take to get to
school in the morning.
a) Find the number of students in the class.
b) Estimate the probability that a randomly chosen pupil takes between 10
and 20 minutes to get to school.
Question 1
The table below shows the heights, to the nearest centimetre, of a group of students.
height (cm) 110-119 120-129 130-134 135-139 140-149 150-159 160-179 180-189
frequency
2
4
3
5
6
5
5
1
a) Draw a histogram for this data.
b) Use your histogram to estimate the number of students taller than 153cm.
c) Estimate the number of students between 127 and 143 cm tall.
The class width of the first bar would appear to be 9, but it is not. Because the heights are
measured to the nearest centimetre, the first class embraces all heights between 109.5cm and
119.5cm. This is a class width of 10, and also involves labelling 109.5, 119.5 etc. on the
horizontal axis of the histogram. Adding the frequency density row to the table...
height (cm)
110-119 120-129 130-134 135-139 140-149 150-159 160-179 180-189
frequency
2
4
3
5
6
5
5
1
frequency density
0.2
0.4
0.6
1
0.6
0.5
0.25
0.1
b) To find how many students are above
153cm in height, we would add the
frequencies of the last two bars to the
correct proportion of the previous bar.
So there are approximately 9 students
above 153 cm.
6.5
 5  5  1  9.25
10
 9 students
c) The number of students between 127
and 143 cm tall is given by…
frequency density
1.0
represents one person
0.8
0.6
153
0.4
0.2
0
109.5 119.5 129.5 139.5 149.5 159.5 169.5 179.5 189.5
height (cm)
2.5
3.5
433
 6  9.1
10
10
 9 students
2) Complete the table and histogram below.
time
frequency
(minutes)
0-15
90
15-20
40
20-25
25-35
time
(minute frequency
s)
0-15
90
15-20
40
20-25
80
25-35
100
frequency
density
6
8
16
10
Most suitable Data Type(s)
Discrete or Continuous
Numerical or Categorical
Bar Chart
Pie Chart
Stem and Leaf
Box and Whisker
Histogram
Advantages
Disadvantages
Most suitable
Data Type
Advantages
Disadvantages
Bar Chart
Categorical
Discrete
Easy to see how
many are in each
category. Shows
shape well.
Can’t see
proportions so easily
Pie Chart
Categorical
Discrete
Shows proportions
Clearly
Can’t see how many
are in each category.
Not good if there are
too many categories
Stem and Leaf
Numerical
Small data sets
continuous or
discrete
Keeps the raw data
Shape of data clear
Ordered data helps
with medians etc
Not good for large
data sets
Box and Whisker
Numerical
Continuous data
Good for
showing/comparing
the spread of data
Looses raw data
Histogram
Numerical
Continuous data
Good for showing
the shape of the
data and the
proportions
Can’t read actual
frequencies for the
groups easily
Lesson Objective
Be able to calculate measures of Location/Averages
Understand summation notation for the mean
What is an average and why
do we have more than one
way of calculating them?
These quotes might help you consider the answer to this
question:
“Say you were standing with one foot in the oven and one foot in an
ice bucket. According to the percentage people, you should be
perfectly comfortable. ” ~Bobby Bragan, 1963
“The average human has one breast and one testicle.” ~Des McHale
“I abhor averages. I like the individual case. A man may have six
meals one day and none the next, making an average of three meals
per day, but that is not a good way to live.” ~Louis D. Brandeis
Averages for raw/untabulated data
The data shows the number chocolates gratefully provided to a
particular maths teacher from his sixth form classes over a 3
week period:
Find the mean, mode, median and mid-range of the number of
gifts received:
1, 2, 0, 3, 5, 1, 2, 0, 0, 4, 1, 1, 2, 1, 3
The data shows the number chocolates gratefully provided to a
particular maths teacher from his sixth form classes over a 3
week period:
Find the mean, mode, median and mid-range of the number of
gifts received:
1, 2, 0, 3, 5, 1, 2, 0, 0, 4, 1, 1, 2, 1, 3
0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4, 5
Median: 1
Mode:
1
Mean:
x
=
 x f
f

26
 1.73
15
Averages for tabulated data
Find the mean, mode, median and mid-range for this data, showing shoe size
shoe size (x)
5
6
7
8
9
10
Total
frequency (f)
3
14
13
21
16
8
75
Find the mean, mode, median and mid-range for this data, showing shoe size
shoe size (x)
5
6
7
8
9
10
Total
frequency (f)
3
14
13
21
16
8
75
frequency × shoe size (fx)
15
84
91
168
144
80
582
Median:
75 items of data median at (75 + 1)/2 = 38th position
Counting through the list median shoe size is 8
Mode:
Mean =
8
x
=
 x f
f
582

 7.76
75
Averages for tabulated data
Find the mean, mode, median and mid-range for this data, showing speeds of
vehicles along a road:
speed, s
(mph)
20 ≤ s < 25
25 ≤ s < 30
30 ≤ s < 35
35 ≤ s < 40
40 ≤ s < 45
45 ≤ s < 50
Total
number of
vehicles (f)
7
11
31
20
14
9
92
Find the mean, mode, median and mid-range for this data, showing speeds of
vehicles along a road:
speed, s
(mph)
20 ≤ s < 25
25 ≤ s < 30
30 ≤ s < 35
35 ≤ s < 40
40 ≤ s < 45
45 ≤ s < 50
Total
number of
vehicles (f)
7
11
31
20
14
9
92
frequency × midpoint (fx)
157.5
302.5
1007.5
750
595
427.5
3240
mid-point
(x)
22.5
27.5
32.5
37.5
42.5
47.5
Median:
Use Cumulative Frequency Curve Instead for better accuracy!
Estimate 92 items of data median at (92 + 1)/2 = 46.5th position
Counting through the list this will be in the 30 to 35 interval.
Modal interval :
30 ≤ s < 35
x
Mean: Can only be estimated as lack of raw data =
 x f
=
f

3240
 35.21mph
92
Lesson Objective
Be able to calculate Interquartile Range for a list of data
Drawing and Interpreting Box and Whisker Plots
Understanding Skewness and identifying outliers
Two classes did a test (out of 100)
Here are the results
Class A: 50 82 40 51 45 50 48 49 47 10 43 58 56 52 39
Class B:
20 34 50 48 62 70
39 47 12
38 40
a) Find the median and interquartile range of the set of marks for each class.
b) Draw a box and whisker plot to compare the results for each class.
16
Two classes did a test (out of 100)
48.5
Here are the results
Class A: 10 16 39 40 43 45 47 48 49
50 50 51 52 56 58 82
Class B:
62 70
12 20 34 38
39
40 47 48 50
a) Find the median and interquartile range of the set of marks for each class.
b) Draw a box and whisker plot to compare the results for each class
CLASS A
CLASS B
0
10
20
30
40
50
60
70
Class A Median: 48.5, IQ Range = 10 Negatively Skewed
Class B Median: 40, IQ Range = 16 Positively Skewed
80
90
100
A piece of data is generally considered an outlier if it is :
1.5 × IQR below the lower quartile
OR
1.5 × IQR above the upper quartile
Class A:
Class B:
10 16 39 40
12 20 34 38
43
39
45 47 48 49
40 47 48 50
Are there any outliers in each class?
50 50 51 52 56 58 82
62 70
Design a data set for one of the box and whisker charts on the next page
Swap with a partner
They must design a data set to recreate your graph as best as possible
Compare at the end
Lesson Objective
Be able to calculate the Standard Deviation for a set of data
Use calculator to find the Standard Deviation for a set of data
Write down some statements to compare these two sets of data.
Which features are the same and which are different?
Here is the actual data?
How does this clash with your previous assumptions?
Consider the following sets of numbers.
Find the Range, The Interquartile Range and the Mean
What are the limitations of the Range and the Interquartile Range in measuring
consistency in a data set
4, 5, 9, 6, 6, 10, 10, 10, 11, 19
The Root Mean Squared Deviation
(Commonly called the Standard Deviation of a Sample)
R.M.S
The value of this equation before you square root is referred to as the VARIANCE
The Standard Deviation for a Population
(It can be shown the Root Mean Squared Deviation formula when calculated on a
sample taken from a population generally produces a result that is lower than the
actual Standard Deviation of the Population – this is S3 + S4). The formula can
therefore be adjusted as follows to take this into account:
S.D of Population
The value of this equation before you square root is still referred to as the VARIANCE
NOTE: FOR OUR SYLLABUS IT IS EXPECTED THAT YOU WILL ALWAYS USE THE
BOTTOM FORMULA WHEN ASKED TO CALCULATE STANDARD DEVIATION!!
Find the standard deviation for this set of data
Lesson Objective
Understand the concept of ‘Coding’
Be able to find the mean and standard deviation of ‘coded’ data and
related data sets
Here is some data.
We will call this data the ‘x’ data:
Find the mean and the standard deviation of this data?
Check your results on your calculator.
Investigation
Suppose you multiply each of the data you just used by 2 and add 3. Write down
the new set of data. Call it the y-data.
Now calculate the and the standard deviation of the y-data.
What do you notice? How is it related to the original x-data?
What if you multiply it by 2 and add 5?
What if you multiply by 3 and add 5?
Can you predict what will happen if you multiply by ‘a’ and add ‘b’?
Can you justify your results?
Suppose you have a set of values (x-data) x1, x2, x3, x4, x5 ……….
Let the mean of the set of data be ‘m’ and the standard deviation ‘s’
Let another set of values (y-data) be so related to the x-data by a linear
formula of the form yi = a × xi + b (‘a’ and ‘b’ are constants)
Then:
The mean of the y values
The standard deviation of the y values
= a × mean of ‘x-data’ + b
= a × standard deviation of ‘x-data’
We can use this to find the mean of related sets of data.
This process is called ‘Coding’
Eg Consider the values
1002, 1004, 1006, 1008, 1010
This data set is merely the data set 1, 2, 3, 4, 5 multipled by 2 and with 1000
added. The mean of 1, 2, 3, 4, 5 is 3 and the sd of 1, 2, 3, 4, 5 is 1.58
so the mean of the original data is 2 × 3 + 1000 = 1006
the sd of the original data is
2 × 1.58 = 3.16
Ex 50 Book S1 Third Edition
Lesson Objective
Recognise and be able to use the alternative formula for standard deviation.
75 adults were asked to their shoe size. The results are recorded in the
table below. Calculate the standard deviation in the shoe-sizes using the
formula:
 ( x  x )2
n 1
Check your result using your calculator
shoe size (x)
5
6
7
8
9
10
Total
frequency (f)
3
14
13
21
16
8
75
Lesson Objective
Recognise and be able to use the alternative formula for standard deviation.
75 adults were asked to their shoe size. The results are recorded in the
table below. Calculate the standard deviation in the shoe-sizes using the
formula:
 ( x  x )2
n 1
Check your result using your calculator:
shoe size (x)
5
6
7
8
9
10
Total
frequency (f)
3
14
13
21
16
8
75
Mean = 582÷ 75 =7.76
x×f
15
84
91
168
144
80
582
( x  x )2  f
22.8528
43.3664
7.5088
1.2096
24.6016
40.1408
139.68
sd = √(139.68 ÷ 74) = 1.37
An alternative (rearrangement) of the formula:
 (x  x)
2
n 1
x
Is:
2
 nx 2
n 1
This gives the same answer but is slightly easier to use when the data is
in a frequency table:
shoe size (x)
5
6
7
8
9
10
Total
frequency (f)
3
14
13
21
16
8
75
Mean = 582÷ 75 =7.76
x×f
15
84
91
168
144
80
15
x2  f
75
504
637
1344
1296
800
4656
sd =  4656  75  7.76 = 1.37
74
2
50 female students had their heights measured. The results were put into
the table below. Find the mean height and the standard deviation in the
heights:
Check your result using your calculator.
Height, h (cm)
mid-points
158.5
160.5
162.5
164.5
166.5
168.5
Total
frequency
(f)
4
11
19
8
5
3
50 female students had their heights measured. The results were put into
the table below. Find the mean height and the standard deviation in the
heights:
Check your result using your calculator.
Height, h (cm)
mid-points
158.5
160.5
162.5
164.5
166.5
168.5
Total
frequency
(f)
4
11
19
8
5
3
Mean 162.5 cm
sd = 2.56 cm
Different style of exam question
Standard deviation formulae
x
2
 nx 2
n 1
 (x  x)
n 1
Given the following information relating to data placed in a frequency
distribution.
Find the mean and the standard deviation of the data
2
Different style of exam question
Standard deviation formulae
x
2
 nx 2
n 1
 (x  x)
n 1
Given the following information relating to data placed in a frequency
distribution.
Find the mean and the standard deviation of the data
Mean = 6.1
sd = 2.25 (3 sig fig)
2
Lesson Objective
Understand what cumulative frequency curves represent
Be able to draw a cumulative frequency curve
Use a cumulative frequency curve to find medians, quartiles and
percentiles
An egg farmer wants to grade his eggs in terms of size.
Grade A will be the biggest size of egg
Grade B the next, biggest etc with Grade D the smallest.
Each grading should contain the same proportion of eggs.
The table shows the weight of his first batch of eggs.
What ‘boundaries’ should he choose for each egg Grade?
Weight of the
Egg, w (grams)
Frequ
ency
30 ≤ w < 40
15
40 ≤ w < 50
25
50 ≤ w < 60
50
60 ≤ w < 70
40
70 ≤ w < 80
10
Weight of the
Egg, w (grams)
Frequ
ency
Weight of the
Egg, w (grams)
Cum.
Freq.
30 ≤ w < 40
15
0 ≤ w < 40
15
40 ≤ w < 50
25
0 ≤ w < 50
40
50 ≤ w < 60
50
0 ≤ w < 60
90
60 ≤ w < 70
40
0 ≤ w < 70
130
70 ≤ w < 80
10
0 ≤ w < 80
140
Quartile values will be roughly around:
35 (LQ), 70 (MEDIAN), 105 (UQ)
LQ could be found by saying 40 + 20/25 of 10
= 48
MEDIAN
50 + 30/50 of 10
= 56
UQ
60+
15/
40
of 10
= 63.75
But this approach assumes a linear growth in the frequency across each interval
Weight of the
Egg, w (grams)
Frequ
ency
140
30 ≤ w < 40
15
130
40 ≤ w < 50
25
120
50 ≤ w < 60
50
60 ≤ w < 70
40
70 ≤ w < 80
10
110
a) How a many eggs did the
farmer harvest on this
particular day?
b) Estimate the Median weight of
the eggs collected.
c) Estimate the Inter-quartile
range in the Eggs collected.
Weight of the
Egg, w (grams)
0 ≤ w < 40
Cum.
Freq.
Cumulative frequency
100
90
80
70
60
50
40
30
20
0 ≤ w < 50
10
0 ≤ w < 60
0
0 ≤ w < 70
30 35 40 45 50 55 60 65 70 75 80
0 ≤ w < 80
Weight
Graph shows how long
people waited to be seen
at an eye clinic. Key
100
Points:
You plot Cumulative
Frequency at the
end of the interval.
(35,10)
(40,21) etc
Waiting Time
Cum. Freq.
0 ≤ w < 35
10
0 ≤ w < 40
21
0 ≤ w < 45
46
0 ≤ w < 50
73
……etc.
…etc
90
There were 100
people. The median
waiting time is that
obtained by the 50th
person (half of 100)
= 46 mins.
80
Cumulative frequency
Cumulative
Frequency goes
up the side
A Cumulative frequency graph tells you how many
items are below each value. Here 80 people waited
for less than 53 mins. It is mainly used to estimate
medians and percentiles for grouped data.
70
60
To find the Upper
quartile, read the
time at 75. For the
lower quartile read
the time at 25.
50
40
30
20
10
0
30
35
40
45
50
Time in mins
55
60
Horizontal
axis has a
continuous
scale
Can you find data sets to
match these cumulative
frequency curves
Summary of what we have learned:
Summary of what we have learned:
When comparing data we are interested in the location of the data (averages) the
consistency of the data (measures of spread) and the shape of the data (Graphs)
Averages: A single item of data that represents the whole data set
Mean, Mode, Median, Mid Range
Spread:
Range, Interquartile Range, Root Mean Squared Deviation,
Standard Deviation
Shape:
Bar Charts, Frequency Charts, Histograms, Frequency Polygons
Can also draw Box and Whisker Plots (Good for showing skewness and spread)
Pie Charts (Good for showing proportions)
Cumulative Frequency Curves (Good for finding Interquartile Range
for grouped data)
Standard deviation formulae:
 x 2  nx 2
n 1
 ( x  x )2
n 1
Root Mean squared Formulae:
x
n
2
 x2
 (x  x)
2
n
The formula for the Variance is that for standard deviation without the square root
Outliers are defined as being either: 1.5xIQR above the UQ or below the LQ
or
above/below mean +/- 2 standard deviations