Download IQR

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

World Values Survey wikipedia , lookup

Time series wikipedia , lookup

Transcript
Chapter 2 Summarising Numerical Data
2B The Median, Range and Interquartile Range (IQR)
The most versatile statistical tools for numerically describing the centre and spread of a
distribution uses:
the median (the middle value) as its measure of centre, and 2 measures of spread:
o the range (the maximum spread of the data values)

o the interquartile range (the spread of the middle half of data values)
Determining the Median (M)
To find the median:

First order the data.

For n data values the median is located at the 
 n  1
 th position.
 2 
When:
o
o
Example 3
a)
n is odd, the median will be the middle data value
n is even, the median will be the average of the two middle data values
(pg 47)
2, 9, 1, 8, 3, 5, 3, 8, 1
b)
Median =
10, 1, 3, 4, 8, 6, 10, 1, 2, 9
Median =
Note – you should always check that you are correct by counting the number of data values each side of
the median. They should be equal.
Using a dot plot or stem plot to help locate medians
Locating the median on a dot or stem plot can be done on a calculator but because the data is already in
order it is relatively simple to do manually.
Look at Example 4 and Example 5 in the textbook. (pg 47 – 48)
The Range
The range R, is the simplest measure of spread of a distribution. It sis the difference between the
largest and smallest values in the dataset. Look at Example 6 in textbook. (pg 49)
Range = largest value in data – smallest value in data
A problem with the range as a measure of spread
But because the range depends only on the two extreme values in the data, it is not always an informative measure
of spread. For example, one or both of these two values may be outlies. A more refined measure of spread that
overcomes these limitations of the range is the interquartile range (IQR).
The Interquartile Range (IQR)
The median divides the distribution in half and the quartiles are the points that divide the distribution
into quarters.
Range
0%
25 %
50 %
75 %
100 %
Q1
Q2
Q3
Lower Quartile
Median
Upper Quartile
IQR
The Interquartile Range (IQR) is the spread of the middle 50% of the data values so that:
IQR = Q3 – Q1
Example 7 (pg 50)
Q1
5
2
Median
5
5 6
Q3
6
4
IQR
6
6 6 7 9
Range
7
1 2 2 3 3 4 4 4 4
7
5 5 6 6 7 7
*******************Clarification *********************
When finding the quartiles of an odd number of data – disregard the median before determining the
lower and upper quartile.
3
4
4
4
5
5
6
7
9
10
11
12
14
Why is the IQR a more useful measure of spread than the range?
The IQR is a good measure of spread because it is generally not affected by the presence of
any outliers.
Exercise 2B pg 51 – 53 Q’s 1 – 8
2C The five-number summary and the box plot
The five-number summary
Five-number Summary is the following:
Min value, Q1, M, Q3, Max Value
This data can then be used to construct a new graph known as a box plot.
The box plot
The box plot (or box-and-whisker plot) is a graphical display of the five-number summary.
A box plot is a very compact way to display the location, spread and general shape of a distribution. It is
also a very useful tool for comparing distributions of related subgroups. Box plots can be drawn either
vertically or horizontally.
5
80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
Example 8 (pg 54)
Constructing a box plot from a five-number summary
Min
52
Q1
66
M
73
Q3
75
Max
77
10
15
20
25
30
35
40
45
5 50
55
60
65
70
Life Expectancy (Years)
75
80
x
Box plot with Outliers
To display outliers on a box plot, we must first determine the location of what we call the upper and
lower fences. These are imaginary lines drawn one and a half interquartile ranges above and below the
box ends (Q1 and Q3.) Data values outside these fences are then classified as possible outliers and are
plotted separately.
The whiskers will end at the highest and lowest data values that lie within 1.5 IQR’s from the ends of
the box.
Upper Fence = Q3 + 1.5 X IQR
Lower Fence = Q1 - 1.5 X IQR
Try data manually to make a box plot with outliers and then use the graphical calculator to check your
work.
3, 21, 21, 22, 23, 24, 26, 27, 28, 30, 31, 31, 33, 34, 35, 35, 36, 43, 48
0
Min =
Lower Fence
Upper Fence
1
Q1 =
Q1 – 1.5 X IQR
Q3 + 1.5 X IQR
2
Med =
3
Q3 =
4
Max =
IQR =
Range =
5
10 15 20 25 30 35 40 45 50
How to construct a box plot with outliers using the TI-Nspire CAS – do example in textbook. (pg 56)
Interpreting Box Plots
Do Example 9 and 10 from textbook. (pg 58 – 59)
Exercise 2C pg 59 – 61
Q’s 1 – 10
2D Relating a box plot to shape
Symmetric Distribution - is evenly spread out around the median. There is also a strong tendency for
the data values to cluster around the centre of the distribution and its whiskers are approximately equal in length.
Positively Skewed Distributions –
are characterised by a cluster of data values at the left-hand end
of the distribution. These distributions are said to gradually tail off to the right. The left- hand whisker will be
short, while the right-hand whicker will be long.
Negatively Skewed Distributions –
are characterised by a cluster of data values at the right-hand
end of the distribution. These distributions are said to gradually tail off to the left. The right- hand whisker
will be short, while the left-hand whicker will be long.
Distributions with Outlier(s) –
in the tails.
Exercise 2D pg 63
are characterised by gaps between the main body and data values
2E Using box plots to describe and compare distributions
Box plots are a very powerful tool for describing a distribution in terms of shape*, centre and
spread. They can also be used to help us identify outliers.
* Beware of using a box plot to identify the shape of a distribution with a data set of less than
20-30. In this case, use a dot plot to check instead.
Look at Example 11, 12 and 13 on pages 64 – 65.
Exercise 2E pg 65 – 66
Q’s 1 – 3
2F Describing the centre and spread of symmetric distributions
When can use the median, IQR and range for describing the centre and spread of distributions
of any shape.
If you have a symmetric shaped distribution without outliers there is an alternative.


measure of centre – the mean
measure of spread – the standard deviation
The mean
Most people think of the mean and think the “average”.
Mean =
x
x
n
sum of data values
total number of data values
where
x = mean
 x = sum of data values
n = total number of data values
Example 14 pg 68
n=
 x=
x =
The relationship between the mean and the median
Then median lies at the midpoint of the distribution and the mean is the balance point. (pg 68)
When to use the median rather than the mean
Because the median is relatively unaffected by the presence of extreme values, it is said to be a
resistance statistic. For this reason it is frequently used if the data is clearly skewed or is likely to
contain and outlier.
However, if a distribution is symmetric, both the mean and the median will be approximately equal in
value. The mean is often preferred because it is familiar to most people.
If the data is skewed or there is an outlier the mean will be dragged towards the tail or the direction of
the outlier.
Choosing between the mean and the median:


Symmetric w/no Outliers – either the mean or the median can be used.
Skewed and/or Outliers – the median is more reliable to indicate the centre.
Exercise 2F-1 pg 69 – 70
Q’s 1 – 8
The standard deviation
To measure the spread around the median we use the IQR.
To measure the spread around the mean we use the standard deviation (s).
We will usually use the calculator to work out the standard deviation but it is also important to
understand what the calculator is doing.
The formula for standard deviation, s, is:
s=
( x  x ) 2
n 1
Practice - How to calculate the mean and standard deviation using the TI-Nspire CAS (pg 71 – 72)
Exercise 2F-2 pg 73 – 74
Q’s 1 – 6
2G The normal distribution and the 68-95-99.7% rule
The normal distribution
Many data sets are roughly symmetrical and take on a bell shaped curve. These types of data sets are
called normal distributions.
The 68-95-99.7% rule
In normal distributions, the percentage of data that lie within a certain number of standard deviations
(SD) away from the mean can always be determined.
For a normal distribution , approximately:



68% of the data lies within one standard deviation of the mean.
95% of the data lies within two standard deviations of the mean.
99.7% of the data lies within three standard deviations of the mean.
It is helpful to view this graphically – assuming the data is approximately normal.
Look at examples and graphs on pg 75 - 77
Example 15 pg 76
X
Exercise 2G pg 78 - 79
Q’s 1 – 4 (Use WS on the next page.)
Exercise 2G pg 78 - 79
Q’s 1 – 4
2H Standard scores (z-scores)
It is common to transform normally distributed data into a new set of units which show the number of
standard deviations each data value lies from the mean of the distribution. This is called standardising
and these transformed data values are called standard or z-scores.
Standard Score =
data value  mean
s tan dard deviation
or
z
x  x 
s
Standard scores can be both positive and negative:



a positive z-score indicates that the data value lies above the mean.
a zero standardised score indicates that the data value is equal to the mean.
a negative z-score indicates that the data value lies below the mean.
Study Example 16 pg 79 – 80
Using standard scores to compare performance
Standard scores are useful for making comparisons across data distributions which
have different means and/or standard deviations..
Study Example 17 pg 81
Converting standardised scores into actual scores
By making the actual score the subject of the rule for calculating standard scores, we arrive at:
Actual score = mean + standard score X standard score
Study Example 18 pg 81
Exercise 2H pg 82
Q’s 1 – 4
Chapter 2 REVISION


Multiple-choice questions pg 85 - 88 Q’s 1 – 29
Extended-response questions pg 89 – 90 Q’s 1 - 5
x xzs