Download 1 - Education Scotland

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Regression toward the mean wikipedia , lookup

Misuse of statistics wikipedia , lookup

German tank problem wikipedia , lookup

Transcript
Statistics
What are statistics and why do we use them?
Statistics help to make sense of numbers that have been collected.
For example, if there was a survey into the size of feet at your
school, after asking everyone their size you would end up with
hundreds of random numbers!!
Using statistics you could sort the numbers out to find:
• the most common size
• the mean
• the range of sizes.
There are many other, more complex, ways that are used to
evaluate data which we go into in Intermediate 2.
Statistics
At Intermediate 1 level you covered basic statistics including:
•
finding the mean, median, range and mode from a set of numbers
•
finding the mean, median, range and mode from a frequency table
•
finding the probability of an event occurring.
In Intermediate 2 you build on this work to calculate other values
that are used to evaluate data.
Before we go any further into the new work, we will go over the
Intermediate 1 work.
Statistics
The range is used to measure how widely spread a set of values are:
range = highest value – lowest value
Example: Stephen played 12 holes at his local golf club and
recorded his scores. What was his range of scores?
4, 3, 4, 6, 5, 3, 8, 6, 9, 2, 3, 7
range = highest value – lowest value
=9–2
=7
Example: The next day he played another 12 holes. What is
his range of scores now?
2, 9, 10, 9, 13, 12, 12, 11, 1, 3, 4, 2
range = highest value – lowest value
= 13 – 1
= 12
Statistics
The average number from a set of numbers can be calculated using
three different methods.
1.
The mode is the most common number.
Example: A group of pupils were asked how many kilometres
they could run.
4, 3, 2, 4, 4, 4, 3, 1, 5, 6, 4
mode = 4
Statistics
2. The mean is found by adding all the numbers together, then
dividing by the number of pieces of data.
Example: 11 people were asked how much pocket money they got.
What is the mean amount?
3, 4, 2, 5, 3, 6, 8, 5, 5, 7, 7
mean = 3 + 4 + 2 + 5 + 3 + 6 + 8 + 5 + 5 + 7 + 7
11
= 55
11
=5
Statistics
3. The median is the middle number.
To see which number is in the middle you have to put them in order.
Example: Julie saved up some of her pocket money over 11
weeks for an iPod Touch. What is the median amount
she saved each week?
3, 4, 2, 5, 3, 6, 8, 4, 5, 6, 7
Rearrange:
2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 8
median = 5
Once you’ve rearranged the numbers, count them to make sure you
haven’t missed any of them out.
There are 11 numbers here, so the median will be the 6th number.
Statistics
If you have an even number of amounts, the median will be between
two numbers. To calculate this value, find the mean of the two middle
numbers.
Example: Linda saved up some of her pocket money over a 10week period for a Wii. What was the median
amount that she saved?
3, 4, 2, 6, 3, 6, 8, 4, 6, 7
Rearrange:
2, 3, 3, 4, 4, 6, 6, 6, 7, 8
There are 10 numbers here, so the median will be between the
5th and 6th numbers
The median lies between £4 and £6. To calculate this value, find
the mean of these two numbers.
median = £5
Statistics
Write in your jotters the range, mode, median and mean of the
following sets of numbers.
1) The following are the distances jumped in a school sports day
in metres.
4, 3, 5, 6, 4, 5, 6, 7, 8, 12, 6
2) The following numbers are the maths scores in an S2 class.
17, 12, 13, 12, 14, 15, 16, 17, 18, 17, 17, 20
3) The following are times for the 100m sprint (in seconds).
24.5, 19.86, 21.15, 15.04, 15.10, 16.80, 20, 19.86, 14.22
4) The following are minimum temperatures (°C) in Glasgow
measured over one week.
0, 3, 1, 0, 0, 0, 4, 3, 2, 3
Statistics
Now check that your answers are correct.
1) Range: 9
mode: 6
median: 6
mean: 6
2) Range: 8
mode: 17
median: 16.5
mean: 15.67
3) Range: 10.28 mode: 19.86
median: 19.86
mean: 18.5
4) Range: 4
median: 1.5
mean: 1.6
mode: 0
Frequency tables
It is possible to calculate the mean, mode and median from a
frequency table by adding a third column to it.
Number of items
The values for the
third column are
found by multiplying
the values in column 1
(x) with the values in
column 2 (f).
Frequency
fx
x
f
1
2
3
7
4
3
17=7
24=8
33=9
4
5
4
4
4  4 = 16
5  4 = 20
The mode is the most common number = 1
The number 1 appears seven times.
Because there are 20 numbers, the median will be between the
10th and 11th numbers, in this case 2.
Frequency tables
To calculate the
mean you use this
formula:
f x

mean 
f
∑ stands for ‘sum of’
mean =
60 = 3
20
Number of items
Frequency
fx
x
f
1
2
3
7
4
3
17=7
24=8
33=9
4
5
Totals
4
4
20
4  4 = 16
5  4 = 20
60
Frequency tables
Copy and complete the frequency table to work out the mean, mode
and median of the number of cars in a group of pupils’ homes.
Number of cars
Frequency
x
f
1
4
2
3
4
9
10
6
5
Totals
1
fx
Frequency tables
Now check that your answers are correct.
Number of cars
Frequency
fx
x
f
1
2
4
9
4
18
3
4
10
6
30
24
5
Totals
1
30
5
81
Mean: 2.7
Mode: 3
Median: 3
Frequency tables
Copy and complete the frequency table to work out the mean, mode
and median of the shoe sizes of S1 pupils.
Shoe size
Frequency
x
f
3
4
5
6
7
Totals
1
12
11
14
8
fx
Frequency tables
Now check that your answers are correct.
Shoe size
Frequency
x
f
3
4
5
6
7
Totals
1
12
11
14
8
46
Mean: 5.3
Mode: 6
fx
3
48
55
84
56
246
Median: 5
Cumulative frequency
A cumulative frequency column can be added to a frequency table
to keep a running total of the frequencies.
A group of parents were asked how many children they each had.
Number of children
Frequency
Cumulative
frequency
x
f
1
6
6
2
7
13
(6 + 7)
3
5
18
(13 + 5)
4
3
21 (18 + 3)
5
1
22
The 21 tells you that
21 parents had 4 or
fewer children
(21 + 1)
22
You can easily work out the median from a cumulative frequency column.
There were 22 parents asked so the median is between the 11th and 12th
people asked.
6 parents had 1 child and 13 had 2 or less.
Therefore if the median is between the 11th and 12th it must be 2 children.
Cumulative frequency
The number of pairs of shoes owned by 5th year girls is shown
below.
Copy and complete the cumulative frequency table.
Number of shoes
Frequency
x
f
4
5
9
12
6
7
8
8
11
6
Cumulative
frequency
1) How many girls owned fewer than 6 pairs of shoes?
2) What was the median number of shoes owned by the girls?
Cumulative frequency
Now check that your answers are correct.
Number of shoes
Frequency
x
f
4
5
6
7
8
9
12
8
11
6
46
1) 21 girls
2) Median = 6
Cumulative
frequency
9
21
29
40
46
Cumulative frequency
A group of 4th year boys were asked how many hours a week
they spent playing computer games. Copy and complete
the cumulative frequency table.
Number of hours
Frequency
x
f
5
6
7
8
9
2
5
12
17
20
Cumulative
frequency
1) How many boys played games for less than 8 hours a week?
2) What was the median number of hours spent playing computer games?
Cumulative frequency
Now check that your answers are correct.
Number of hours
Frequency
x
f
Cumulative
frequency
5
6
7
2
5
12
2
7
19
8
9
17
20
56
36
56
1) 19 boys
2) Median = 8
Quartiles
To order a set of numbers into quartiles, we first of all
have to put the numbers in order from the lowest to the highest.
10, 11, 11, 12, 12, 12, 13, 13, 13, 14, 14, 14, 15, 15, 30
Q1
Q2
Q3
The median splits the numbers into two equal parts and is the second
Quartile, Q2
To calculate what the other two quartiles, Q1 and Q3, are, you
calculate the median of the upper and lower halves.
The median of the lower half is called Q1.
The median of the upper half is called Q3.
The quartiles must divide the numbers into four groups with the
same amount of numbers in each group, in this case groups of three.
Quartiles
If you have a larger group of numbers, it might not be so easy
to find which number to look to for the median and the quartiles.
The following rule will help you decide, no matter how many numbers
you have.
1. Divide the number of values by 4.
2. Your answer will tell you how many numbers will be in each group.
3. The remainder will tell you how many extra values there are.
This will be 0, 1, 2 or 3.
Example 1
12 numbers
2, 3, 3, 3, 4, 6, 7, 7, 8, 8, 8, 9
12 ÷ 4 = 3 r 0, therefore there will be 3 in each quarter, with 0 extra
values to be fitted in.
2, 3, 3, 3, 4, 6, 7, 7, 8, 8, 8, 9
Q1 = 3
Q2 = 6.5
Q3 = 8
Quartiles
Example 2
13 numbers
0, 1, 2, 2, 2, 2, 3, 5, 6, 7, 7, 7, 9
13 ÷ 4 = 3 r 1, therefore there will be 3 in each quarter, with 1 extra
value to be fitted in symmetrically.
0, 1, 2, 2, 2, 2, 3, 5, 6, 7, 7, 7, 9
Q1 = 2
Q2 = 3
Q3 = 7
Example 3
14 numbers
0, 0, 0, 1, 1, 2, 2, 3, 4, 4, 5, 5, 5, 6
14 ÷ 4 = 3 r 2, therefore there will be 3 in each quarter, with 2 extra
values to be fitted in symmetrically.
0, 0, 0, 1, 1, 2, 2, 3, 4, 4, 5, 5, 5, 6
Q1 = 1
Q2 = 2.5
Q3 = 5
Quartiles
Example 4
15 numbers
0, 1, 2, 3, 3, 3, 3, 4, 5, 5, 6, 7, 7, 8, 8
15 ÷ 4 = 3 r 3, therefore there will be 3 in each group of 4, with 3
extra values to be fitted in symmetrically.
0, 1, 2, 3, 3, 3, 3, 4, 5, 5, 6, 7, 7, 8, 8
Q1 = 3
Q2 = 4
Q3 = 7
Five-figure summary
A five-figure summary is a summary of a set of numbers.
The five figures are the three quartiles (Q1, Q2 and Q3) together with
the highest and lowest numbers.
10, 11, 11, 12, 12, 12, 13, 13, 13, 14, 14, 14, 15, 15, 30
Q1
Q2
Q3
Using the previous example showing how to calculate the quartiles,
the five-figure summary is as follows:
highest: 30
lowest: 10
Q1: 12
Q2: 13
Q3: 14
Five-figure summary
In your jotters, write down the five-figure summary for each set
of numbers.
1) 14 pupils in S4 were asked their shoe size.
4, 5, 3, 6, 4, 7, 4, 5, 6, 6, 4, 7, 8, 9
2) A group of paper boys were asked how much they earn a week.
13, 14, 15, 12, 16, 17, 18, 18, 18, 19, 29, 39, 38, 37, 36, 37
3)
English exam scores
0
1
2
3
4
0 1 2 3
1 2 4 5 5 6 7
2 2 3 4 4 6 8 9
0 0 1
2
n = 23
1
1 represents a score of 11
4) A group at the swimming pool were asked their ages.
12, 8, 7, 19, 23, 25, 20, 14
Five-figure summary
Now check that your answers are correct.
1) Minimum: 3
Maximum: 9
Q1: 4
Q2: 5.5
Q3: 7
2) Minimum: 12 Maximum: 39
Q1: 15.5 Q2: 18
Q3: 36.5
3) Minimum: 0
Maximum: 42
Q1: 12
Q2: 22
Q3: 28
4) Minimum: 7
Maximum: 25
Q1: 10
Q2: 16.5
Q3: 21.5
The range
Up until now, when we calculated the range of a set of numbers,
we took the lowest number from the highest.
In certain situations, however, this will not give an accurate
reflection of the spread of the numbers.
For example, here are the ages of a group of children in the
scouts and their leader.
10, 11, 12, 14, 13, 15, 13, 12, 11, 12, 14, 15, 14, 13, 30
The range here is the highest
30
take away the lowest
- 10
20
All of the children are aged between 10 and 15.
The leader of the group is 30 and this gives a false impression of how
widely spread the ages are.
The range only uses the two end ages and disregards all the others.
Another measure of spread is the semi-interquartile range, which
takes into account more of the numbers to give a more accurate
and relevant result.
The semi-interquartile range
Now that we know how to work out the quartiles, we can calculate
the semi-interquartile range.
Using the example of the scout group, we found that the
range was 20 years.
However, because one person was so much older than the rest, this
was not an accurate reflection of the range of ages.
10, 11, 11, 12, 12, 12, 13, 13, 13, 14, 14, 14, 15, 15, 30
Q1
Q2
Q3
To calculate the semi-interquartile range, you find the difference
between the upper quartile Q3 and the lower quartile Q1, and then
halve your answer:
upper quartile  lower quartile
semi - interquart ile range 
2
14  12

2
1
The semi-interquartile range
This way of working out the range is often preferred to just taking
the lowest from the highest as you do for the range.
The reason for this is that it takes into account more of the
numbers in the data and it also disregards what can sometimes
be extreme high or low numbers that are not typical of the data.
If you ever forget the formula for calculating the semiinterquartile range, you could construct it by breaking down the
words.
Interquartile range is the range between the upper and lower
quartile.
To calculate the ‘semi’, divide your answer by 2.
The semi-interquartile range
1) A group of 20 pupils were asked how much pocket money they got
each week.
7, 7, 2, 2, 3, 4, 5, 9, 10, 3, 4, 5, 6, 6, 7, 7, 7, 6, 3, 3
In your jotters, write down the five-figure summary, range and
semi-interquartile range.
2) A group of 15 pupils were asked what their shoe size was.
12, 9, 2, 3, 7, 8, 5, 5, 6, 8, 9, 10, 5, 4, 6
In your jotters, write down the five-figure summary, range and
semi-interquartile range.
The semi-interquartile range
Now check that your answers are correct.
1) minimum: 2
range: 8
maximum: 10 Q1: 3
Q2: 5.5
semi-interquartile range: 2
Q3: 7
2) minimum: 2
range: 10
maximum: 12 Q1: 5
Q2: 6
semi-interquartile range: 2
Q3: 9
Comparing sets of data
Very often the reason for using statistics is to compare
two or more sets of results.
Once you have statistics for two or more sets of data, you can make
statements based on the results.
Example:
As part of a school project, pupils from two schools were
asked how much pocket money they received each week.
Quahog School:
Springfield Elementary:
Quahog:
8, 6, 5, 4, 6, 5, 7, 10, 10, 7, 8, 9, 7
12, 7, 3, 4, 2, 3, 4, 4, 5, 2, 6, 3, 4
Mean = £7.08
Median = £7.00
Springfield: Mean = £4.92
Median = £4.00
Comparing sets of data
By calculating the mean and median from each set of data, what
statements can be made about how much each child receives?
Quahog:
Mean = £7.08
Median = £7.00
Springfield: Mean = £4.92
Median = £4.00
By looking at the mean and median of both sets of data, we can see
that the children at Quahog are given more pocket money on average
than the children that go to Springfield Elementary.
The mean and median are similar in each school, which suggests that
they are both a good indication of the average given to each child.
Comparing sets of data
By comparing the mean, median and range of the following
sets of data, what statements can be made about the data?
1) Two companies that produce boxes of paper clips claim that they
provide their customers with more paper clips in each box.
The boxes cost the same from each company.
Clips R Us:
102, 106, 101, 100, 99, 92, 96, 100, 101, 110, 90
Pippa’s Clippas: 87, 120, 104, 102, 100, 98, 97, 100, 101, 102, 95
2) A teacher wanted to compare the marks of her two first-year
classes. What conclusions can you make about the scores?
Class 2A:
Class 2B:
18, 19, 17, 18, 17 ,17, 18, 18, 19, 17, 20, 16, 13, 12
20, 20, 19, 3, 2, 4, 6, 10, 11, 3, 2, 15, 16, 17
Comparing sets of data
Now check that your answers are correct.
1) Clips R Us:
Mean: 99.7 Median: 100 Range: 20
Pippa’s Clippas: Mean: 100.5 Median: 100 Range: 33
By comparing the median we can see no difference in the results.
The mean shows that Pippa’s Clippas have slightly more on
average in each box. However, the range is much bigger, meaning
that the amount in each box could vary by a fairly large amount in
comparison to Clips R Us.
2) Class 2A:
Class 2B:
Mean: 17.1
Mean: 10.6
Median: 17.5
Median: 10.5
Range: 8
Range: 18
The mean for each class tells us that class 2A achieved a higher
mark on average than 2B and the median backs this up. The range
in 2B is very high, suggesting that while some people did very well,
others did very poorly. The range in 2B shows that the scores that each
pupil achieved were closer together, suggesting that in this class pupils
are closely matched in ability.
Box plots
Once you have a five-figure summary you can represent the
information on a box plot.
Using the example earlier, on the scout trip we calculated that
the quartiles were 12, 13 and 14.
10, 11, 11, 12, 12, 12, 13, 13, 13, 14, 14, 14, 15, 15, 30
Q1
Q2
Q3
We can see from the list that the lowest number is 10
and the highest number is 30.
This information can be represented on a box plot.
Median Q2
Q1
Q3
Lowest
0
2
4
6
8
10
Highest
12
14
16
18
20
22
24
26
28
30
Standard deviation
So far we have looked at two methods for checking the spread of
numbers: the range and the semi-interquartile range.
The last measure of spread of data we are going to look at is called
standard deviation.
The reason that we need to use another method is because of the
limitations of the range and semi-interquartile range, which are:
• the range only uses the two end values, ignoring every other value
• the semi-interquartile range totally disregards the two end-values.
The standard deviation is the most accurate measure of spread
because it takes into account all of the numbers.
When you work out the standard deviation you obtain a number.
This number tells you how far away on average each of the values are
from the mean.
Standard deviation
To work out the standard deviation of a group of numbers we are
going to divide the calculation into four steps.
Example: The following group of numbers is how late the bus was
(in minutes) each day as George went to work one week.
23, 15, 7, 8, 7
Calculate the mean and the standard deviation.
Step 1 Calculate the mean.
Each value when we use standard deviation is represented with an x.
mean = (the sum of all the x values) ÷ (the number of values)
We are going to be using some new notation for this:
x
x
n
(pronounced x bar) is the mean
is the sum of all the x values
is the number of values used
23, 15, 7, 8, 7
x

x
n
Step 2

23  15  7  8  7 60

 12 is the mean
5
5
Now we draw a table to see how far each value is from
the mean.
x
( x  x)
23
23 – 12 = 11
121
15
15 – 12 = 3
9
7
7 – 12 = -5
25
8
8 – 12 = -4
16
7
7 – 12 = -5
25
( x  x) 2
We now need to find the mean
of these ( x  x ) values but if we
add them together we get zero!
To get round this problem we
square each value. The
negatives disappear and we
add an extra column to the
table.
Step 3
Step 4
We now find the mean of the numbers in the last
column. For standard deviation we divide the total by the
number of values minus 1. In this case 5 – 1 = 4.
( 121 + 9 + 25 + 16 + 25) ÷ 4 = 49
Remember that we squared the numbers in step 2 so
now we must find the square root of 49.
49
7
This number is called the standard deviation and is the measure
of how far each value is from the mean.
 x  x 
2
The formula for standard
deviation is:
standard deviation

n  1
in this case = 7
When the standard deviation is low it means the scores are
close to the mean. When it is high it means they are spread
out from the mean. In this case it is a high number in relation
to the mean, so the numbers are spread out from the mean.
Standard deviation
 x  x 
2
Standard deviation

n  1
Use this formula to calculate the standard deviation of the
following sets of data in your jotters.
1) The ages of four people who climbed Everest are:
28, 43, 50, 27
2) The following times show the 0 to 60 acceleration of different
BMWs:
6.0, 5.2, 10.7, 9.6, 8.3, 11.5, 7.5
3) The following scores were recorded at a golf competition:
68, 72, 70, 71, 69
Standard deviation
Now check that your answers are correct.
1) 11.3
2) 2.4
3) 1.6
Standard deviation
There is one final formula that can be used to find the standard
deviation from a set of numbers.
You will have noticed that in the previous examples when you
calculated the mean at the beginning, it gave an easy-to-use number,
i.e. the mean was either a whole number or a decimal number to
1 decimal place.
If you calculate the mean and you have a number with many decimal
places, you can use an alternative formula.
This still gives the same answer as the one we found before,
but this formula is easier to use for numbers that have more decimal places.
 x   x 
2
2
Standard deviation

n  1
/n
Standard deviation
Example: Calculate the mean and standard deviation of the
following numbers.
22, 23, 21, 20, 20.4, 21.3
x

22  23  21  20  20.4  21.3
x

n
6
x
x²
= 21.28333…
 x   x
2
2
standard deviation

/n
n  1
2723.85  127.7 2 / 6

5

5.968334..
5
 1.092
22
484
23
529
21
441
20
400
20.4
416.16
21.3
453.69
 x  127.7  x
2
 2723.85
Standard deviation

x
2
  x  / n
2
n  1
Use this formula to calculate the standard deviation of the
following sets of data in your jotters.
1) The reaction time of four drivers were tested:
0.23, 0.85, 0.42, 0.94
2) The BMI values of a group of S5 pupils were recorded as
follows:
17.7, 22.42, 21.2, 23, 16.99, 18.4
3) A group of S6 students were asked at what age they thought
they would get married:
33, 32, 34, 34, 35
Standard deviation
Now check that your answers are correct.
1) 0.3
2)
2.6
3) 1.1
Probability
Probability is the likelihood of an event happening.
To calculate the probability of an event happening, the following
formula can be used.
P stands for probability
P(event) = number of favourable outcomes
number of possible outcomes
Example If you were to roll a dice what would the probability be
that it would land on a 2?
P(2) = number of 2s on the dice
total numbers on the dice
= 1
6
Probability
Example If you were to roll a dice, what is the probability that
you would roll an odd number?
P(odd) = number of odd numbers on the dice
total numbers on the dice
= 3 = 1
6
2
Example If you were to pick a random card out from a set of
cards, what is the probability that you would pick out
the number 4?
P(4) = number of 4s in a pack of cards
total number of cards in a pack
= 4 = 1
52
13
Probability
In your jotters, calculate the probability of the following events
happening.
1) There are 52 cards in a pack. What is the probability
that you pick out a red card?
2) A bag full of bank notes has 14 £1 notes, 6 £5 notes
3 £10 notes and 1 £20 note. What is the probability
that a £5 note would be randomly picked out?
3) There are 49 numbers in the National Lottery. What is
the probability that the first ball that rolls out is a
multiple of 4?
Probability
Now check that your answers are correct.
1)
26 = 1
52 2
2)
6=1
24 4
3)
12
49