Download Year 12 Further Maths (Core)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Categorical variable wikipedia , lookup

Transcript
THE NORMAL DISTRIBUTION
Many distributions exhibit a symmetrical shape, that is an even spread from the mean. The
mean, x and standard deviation, s are thus used to fully describe the distribution.
The following facts hold for a normal distribution:
•
Middle 68% of observations lie within one standard deviation either side of the mean or
within:
x − s≤ x ≤ x + s
•
x±s)
Middle 95% of observations lie within two standard deviation either side of the mean or
within:
x − 2s ≤ x ≤ x + 2s
•
(or
(or
x ± 2s )
Middle 99.7% of observations lie within three standard deviation either side of the
mean or within:
x − 3s ≤ x ≤ x + 3s
(or
x ± 3s )
We can use these properties to predict what percentage of a given set of data lies 1, 2 or 3
standard deviation units from the mean. This is sometimes called the 68 – 95 – 99.7% rule.
The School For Excellence 2015
The Essentials – Further Mathematics – Core Materials
Page 35
QUESTION 35
The heights of female students in a Year 11 level has a normal distribution with a mean of
162 cm and a standard deviation of 12 cm. Under these circumstances, give the range of
heights that would include approximately 95% of the Year 11 female students.
Solution
x − 2s ≤ x ≤ x + 2s
x ± 2 s = 162 ± 2 × 12 cm
= 162 − 24 and 164 + 24
= 138 cm and 188 cm
QUESTION 36
The volume of a particular tomato juice carton is normally distributed with a mean of 250 ml
and a standard deviation of 5 ml. In a sample of 400 cartons, how many would be expected
to have a volume of more than 245 ml?
Solution
Draw a normal distribution graph, Insert the values for the distribution at the appropriate
places, then evaluate the answer(s).
x + s = 250 + 5 = 255
x + 2s = 250 + 10 = 260
x + 3s = 250 + 15 = 265
x − s = 250 − 5 = 245
x − 2s = 250 − 10 = 240
x − 3s = 250 − 15 = 235
245 ml = x − s
X > 245 = 34 + 34 + 13.5 + 2.35 + 0.15% = 84%
84% of 400 = 336 cartons
The School For Excellence 2015
The Essentials – Further Mathematics – Core Materials
Page 36
QUESTION 37
The distribution of the weights of Easter eggs is normally distributed with a mean of 20 g and
a standard deviation of 2 g. Easter eggs weighing less than 16 g are rejected.
The percentage of eggs that would be rejected is closest to:
A
B
C
D
E
95%
68%
99%
5%
2.5%
QUESTION 38
The distribution of systolic blood pressure of a large group of teenagers is approximately
bell-shaped with a mean of 122 and a standard deviation of 9. The percentage of these
students with a systolic blood pressure less than 131 is closest to:
A
B
C
D
E
5%
16%
68%
84%
95%
STANDARD (Z) SCORES
When comparing, for example, a male student’s height and a female student’s height, it is
not possible directly compare as they come from different distributions with different means
and standard deviations. One possible method is to use Standard or z scores.
actual data − mean
standard deviation
x−x
=
s
z=
To calculate a standard z score, you use the following rule:
QUESTION 39
At Tooshort High Year 11 girls’ heights are normally distributed with a mean of 162 cm and
standard deviation of 12 cm. Janet is 174 cm tall.
Her brother, Damon is 188 cm tall. His Year 9 classmates’ heights are normally distributed
with a mean of 176 cm and standard deviation of 16 cm.
Which of the two is taller for their sex and age?
Solution
To compare the brother and sister’s heights, calculate their individual z -scores.
Janet:
Damon:
x − x 174 − 162 z =
=
= +1
s 12
x − x 188 − 176 z=
=
= +0.75
s 16
Based on their z -scores, Janet is taller for sex and age compared to her brother Damon.
The School For Excellence 2015
The Essentials – Further Mathematics – Core Materials
Page 37
QUESTION 40
A summary of the results for Horace Cope’s classes in the French and Mathematics exams
are shown in the table below.
Class mean
Class standard deviation
French
62%
7%
Mathematics
60%
5%
Horace obtained 65% in both exams. The difference in his z-scores is:
A
0.02
B
0.36
C
0.57
D
1.0
E
1.63
QUESTION 41
A football player whose height was 198 cm was informed at the summer draft camp that he
had a z-score of 1.4 for height for that group. If the standard deviation is 6.5 cm then the
average height of football players at the camp (in centimetres to one decimal place) is:
A
207.1
B
193.4
C
191.5
D
189.0
E
188.9
QUESTION 42
Tim has a z-score for height of 1.7. This means that:
A
B
C
D
E
He is in the top 50% of heights, but not in the top 16% of heights.
He is in the top 16% of heights, but not in the top 2.5% of heights.
He is in the top 2.5% of heights, but not in the top 0.15% of heights.
He is in the top 34% of heights, but not in the top 5% of heights.
He is in the top 50% of heights, but not in the top 34% of heights.
The School For Excellence 2015
The Essentials – Further Mathematics – Core Materials
Page 38
BIVARIATE DATA
Bivariate Data explores the relationship between two variables. The study of bivariate data
attempts to:
•
Determine whether any relationship actually exists between the two variables.
•
If a relationship exists we try to describe the nature of the relationship, quantifying the
relationship if possible.
Bivariate data can explore the relationship between variables:
•
That are both categorical.
•
Where one is categorical and the other numerical.
•
That are both numerical.
The type of data being examined will determine the type of analysis and the type of display
that is appropriate for the data.
BIVARIATE DATA — TWO CATEGORICAL VARIABLES
Two categorical variables are displayed using:
•
Two way frequency tables OR
•
Two way percentaged frequency tables OR
•
Comparative segmented bar charts.
Two way frequency tables are displayed with the independent (explanatory) variable filling
the columns.
Percentaged two way frequency tables are used more commonly because they allow us to
compare values that may come from different sample sizes.
Comparative segmented bar charts allow a visual comparison between two or more
categories. They are also usually percentaged to allow accurate and meaningful
comparison.
The School For Excellence 2015
The Essentials – Further Mathematics – Core Materials
Page 39
QUESTION 43
The comparative segmented bar chart shows the twitter usage of students in Years 7- 8, 9 –
10 and 11 – 12.
Describe any relationship observed between twitter use and year level, including
percentages to support your answer.
Solution
The School For Excellence 2015
The Essentials – Further Mathematics – Core Materials
Page 40
QUESTION 44
A survey was conducted to determine whether males or females read novels more.
Respondents were asked whether they read novels regularly, sometimes or never. The
results are shown in the two way frequency table below.
Male
Female
Total
Reads regularly
23
32
55
Reads sometimes
34
35
69
Reads never
9
2
11
Total
66
69
135
Complete the percentaged two way table below correct to the nearest percent:
Male
Female
Reads regularly
Reads sometimes
Reads never
Total
QUESTION 45
Compare the reading behaviour of the males and females in this survey.
Solution
The School For Excellence 2015
The Essentials – Further Mathematics – Core Materials
Page 41
BIVARIATE DATA — ONE CATEGORICAL AND ONE
NUMERICAL VARIABLE
Two displays are available for one categorical and one numerical variable:
•
If there are two categories we can use a back to back stem and leaf plot.
•
If there are two or more variables we use parallel boxplots.
Back to back stem and leaf plots are displayed as shown below:
Note that the data on the left hand side of the back to back stem and leaf plot is ordered
backwards, that is from the centre outwards.
When comparing these two categories we describe each data set as a univariate set,
describing shape, centre, spread and the presence or otherwise of outliers. Comparisons
must include comparative words such as “greater than”, “less than”, “similar to”, etc.
A good comparison of the data shown in the back to back stem and leaf plot would be:
The distribution of reading times in this sample is symmetric for females, but the male
distribution is positively skewed. The median male reading time of 18 hours is less than the
median female reading time of 27 hours. There is a larger range of female reading times
(48 hours) than male reading times (39 hours). Neither data set has outliers.
The School For Excellence 2015
The Essentials – Further Mathematics – Core Materials
Page 42
Parallel boxplots can be used for two or more categories. The box plots are displayed on the
one scale so that any differences or similarities are visually apparent.
QUESTION 46
The three parallel boxplots suggest that gestation time and size of mammal (small, medium
and large) are positively related. Explain why, giving reference to an appropriate statistic.
Solution
The School For Excellence 2015
The Essentials – Further Mathematics – Core Materials
Page 43
QUESTION 47
The parallel boxplots show the distributions of sodium content of beef and poultry based
sausages.
Compare the sodium content of beef and poultry sausages.
Solution
QUESTION 48
A survey was conducted that explored the relationship between preferred mode of transport
(car, bus or train) and gender (male or female). An appropriate display for this data would
be:
A
B
C
D
E
A histogram
Parallel boxplots
Back to back stem and leaf plots
Percentaged two way frequency tables
A scatterplot
The School For Excellence 2015
The Essentials – Further Mathematics – Core Materials
Page 44
EXAM 2 STYLE QUESTIONS
QUESTION 49
Daniel owns a house in a popular Melbourne suburb. His workplace has moved location and
so Daniel decides to sell his house and move. Before he does, he does some research to
find out what price he should sell his house for. He finds out that the average price for a
house in his suburb is $350,000 and the distribution of selling prices is bell shaped. The
standard deviation of sales prices is $25,000.
(a)
What percentage of houses in this suburb would sell for above $300,000?
_____________________________________________________________________
_____________________________________________________________________
(b)
In one particular month there are 63 houses sold in this suburb. How many houses
would have sold for less than $375,000?
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
(c)
Daniel sells his house for $370,000. What is the z-score for this sale?
_____________________________________________________________________
_____________________________________________________________________
(d)
Daniel then buys a house in another suburb nearer his new workplace where the
average house price is $410,000 with a standard deviation of $15,000. He pays
$420,000 for this new house. What is the z-score associated with his purchase?
_____________________________________________________________________
_____________________________________________________________________
(e)
Did Daniel have more success with buying or selling? Use your answers to (c) and (d)
above to compare his two transactions.
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
(f)
Daniel’s workmate actually sold his house for a value that had a standard score of –0.3.
What price did he sell his house for?
_____________________________________________________________________
_____________________________________________________________________
____________________________________________________________________
The School For Excellence 2015
The Essentials – Further Mathematics – Core Materials
Page 45
QUESTION 50
A crop scientist has been trialling different types of wheat. He believes that there are several
factors involved in the amount of production, including the variety of the wheat grown, as
well as the type and amount of fertiliser used and the climate where the crop is grown.
(a)
What kind of variable is the variety of wheat?
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
(b)
The scientist wanted to examine the climate in one region where he wishes to grow the
wheat. He recorded the average temperature every month for a year. The results are
listed below:
Month
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Temp (ºC)
30º
31º
32º
21º
21º
7º
9º
21º
25º
25º
25º
28º
Display this data as an ordered stem and leaf plot.
(c)
Are there any outliers in this data set? Explain your answer.
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
The School For Excellence 2015
The Essentials – Further Mathematics – Core Materials
Page 46
(d)
The Scientist next looks at the effect of 2 different fertilisers on the growth of one
variety of wheat, recording the percentage of plants of each variety that grow in excess
of 40 cm in one month and the percentage of plants that grow less than 40 cm in one
month. He calls these categories “fast growth” and “slow growth”. He displays this data
in a two-way percentaged frequency table. If the display is as shown below, label the
correct positions of the variables “fertiliser type” and “growth rate”. (Note: it is usual to
put the independent variable in the columns and the dependent in the rows)
Variable _________________________
Variable _________________________
(e)
The scientist is concerned that his data isn’t specific enough when he uses the
categories “fast growth” and “slow growth”, so instead he decides to record the actual
height of each plant with each of the fertilisers. He displays this data on a parallel
boxplot as shown below.
Compare the results of the two fertilisers.
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
The School For Excellence 2015
The Essentials – Further Mathematics – Core Materials
Page 47
(f)
The scientist then decides to try another type of fertiliser, Fertiliser C. He records the
results of the growth in one month for this fertiliser in the stem and leaf plot below:
Display this data as a boxplot on the grid below:
The School For Excellence 2015
The Essentials – Further Mathematics – Core Materials
Page 48
BIVARIATE ANALYSIS — NUMERICAL AND NUMERICAL
The prime objective for numerical Bivariate analysis is to determine the existence of a
relationship between two variables and, if a relationship exists, state the:
•
Strength of the relationship
•
Form
•
Direction
The data is two numerical bivariate if both of the variables are numerical variables.
Bivariate statistics enable us to measure how strongly two variables are connected or
associated.
(EXPLANATORY) INDEPENDENT and (RESPONSE) DEPENDENT
VARIABLES
It is most important that the variables are correctly identified.
Variables: If the value of y depends on x :
•
•
y is the (response) dependent variable (DV). It is plotted on the vertical axis.
x is the (explanatory) independent variable (IV). It is plotted on the horizontal
axis.
If the variable is controlled by us, it is called the (explanatory) independent variable.
•
In some circumstances:
•
If neither variable is controlled by us, we may choose to place either variable on the
horizontal axis.
(Warning! If you do not correctly decide which is the IV and which is the DV then your
analysis will be wrong!)
Tests for independence:
1.
Does one variable affect the other? If so, the one being affected is dependent
(response).
2.
Did one variable occur before the other one? If so, the one that came first is
independent (explanatory).
3.
What are we trying to predict? The variable that is being predicted is dependent
(response).
Note:
Use exam cues! During reading time look for a graph with axes labelled with variables or an
equation written in the form y = mx + c. These may help determine which variable is
independent or dependent (explanatory or response).
The School For Excellence 2015
The Essentials – Further Mathematics – Core Materials
Page 49
Initially with two numerical bivariate analysis we focus on:
•
Strength – Is the relationship between the two variables:
STRONG, MODERATE, WEAK or NO correlation
•
Form – Is the relationship between the two variables:
LINEAR or NON-LINEAR
•
Direction – Is the relationship between the two variables:
POSITIVE or NEGATIVE
Scatterplots:
•
To determine whether a relationship exists between 2 variables, we draw a scatterplot.
•
Data may group about a well-defined curve such as a line, parabola etc.
•
In some cases, there is absolutely no relationship between the 2 variables and the
scatterplot will not display any clear association or pattern.
•
Relationships in scatterplots are preliminarily judged by eye for STRENGTH, FORM
and DIRECTION.
Positive Linear
Strength
Negative Linear
Strong
Moderate
Weak
No Relationship
The School For Excellence 2015
The Essentials – Further Mathematics – Core Materials
Page 50
QUESTION 1
A scientist recorded the effect of different amounts of one fertiliser on one particular variety
of wheat. His results are shown in the scatterplot below.
Describe the relationship in terms of strength, direction and form.
_________________________________________________________________________
_________________________________________________________________________
_________________________________________________________________________
QUESTION 2
For the following surveys or studies conducted decide which of the pairs of variables is the
dependent and independent variable.
(a)
Relationship between height and weights of 100 Year 12 male students.
(b)
Temperature each hour and time of the day.
(c)
Mathematics test mark and English test mark of 30 students from a Year 9 homegroup
class.
(d)
Study whether English influences Mathematics given the English and Mathematics test
marks of 30 students.
The School For Excellence 2015
The Essentials – Further Mathematics – Core Materials
Page 51
QUESTION 3
The number of people attending an outdoor event was recorded on a series of days of
different temperature. Which of the following graphs is correctly labelled and titled?
A
B
Attendance vs Temperature
Temperature vs Attendance
Attendance
Attendance
6000
6000
5000
4000
3000
2000
1000
0
5000
4000
3000
2000
1000
0
0
5
10
15
20
25
30
35
0
10
Temperature (celcius)
20
30
40
Temperature (celcius)
C
D
Temperature vs Attendance
Temperature (celcius)
6000
5000
4000
3000
2000
1000
0
0
5
10
15
20
25
30
35
Attendance
Temperature (celcius)
Attendance vs Temperature
35
30
25
20
15
10
5
0
0
1000
2000
3000
4000
5000
6000
Attendance
E
Temperature (celcius)
Attendance vs Temperature
35
30
25
20
15
10
5
0
0
1000
2000
3000
4000
5000
6000
Attendance
The School For Excellence 2015
The Essentials – Further Mathematics – Core Materials
Page 52
BIVARIATE ANALYSIS —
THE STRENGTH OF A LINEAR RELATIONSHIP
Bivariate statistics enables us to measure how strongly two variables are connected or
associated.
While visual inspection of a scatterplot gives us an indication of the strength and direction of
a relationship, a more accurate measure of the relationship is given by calculating the value
of Pearson’s product moment correlation coefficient, also known as the r value.
PEARSON’S PRODUCT MOMENT CORRELATION COEFFICIENT
or the r VALUE
This value can be found on the calculator by entering the data values in the statistics lists
and calculating the least square regression line.
Pearson’s can take values between 1 and 1, the sign being an indication of the direction of
the relationship and the value indicating the strength.
Pearson’s tells us nothing about the form of the relationship, because it is calculated from
the assumption that the relationship is linear. Using Pearson’s to indicate the strength and
direction of a non-linear relationship is therefore not reliable.
The interpretation of this value is as follows:
QUESTION 4
The relationship between the number of hours of study done by a Year 12 Further Maths
student and their final study score has an r value of 0.7. Interpret the meaning of this value.
Solution
The School For Excellence 2015
The Essentials – Further Mathematics – Core Materials
Page 53
THE COEFFICIENT OF DETERMINATION
•
The coefficient of determination is given by r2. Obviously, it is very easy to calculate –
we merely square Pearson’s product–moment correlation coefficient (r) (it appears on
the same screen as r also). Make sure that you bracket a negative value of Pearson’s
before squaring!
•
The value of the coefficient of determination ranges from 0 to 1. It is often expressed as
a percentage.
•
The coefficient of determination is useful when we have two variables which have a
linear relationship. It tells us the proportion of variation in one variable which can be
explained by the variation in the other variable.
•
The coefficient of determination provides a measure of how well the linear rule linking
the two variables ( x and y ) predicts the value of y when we are given the value of
x.
Standard statement for the coefficient of determination:
( r × 100%) of the variation in the (dependent/response variable) can be explained by the
variation in the (independent/explanatory variable).
2
An additional part to this statement is also used occasionally:
The other (100- r2%) of the variation in the (dependent/response variable) can be
explained by other factors.
In each case the part of the statement(s) in brackets needs to be replaced with an
appropriate value or variable.
EXAMPLE
In Question 4 we looked at an r value of 0.7 for the relationship between study score and
hours of study.
The value of the coefficient of determination would be (0.7)2 = 0.49.
49% of the variation in the study score can be explained by the variation in the hours of
study.
The other 51% of the variation in the study score can be explained by other factors.
The School For Excellence 2015
The Essentials – Further Mathematics – Core Materials
Page 54
QUESTION 5
A study to find a relationship between lung capacity in litres per minute (dependent
(response) variable) and years of cigarette smoking (independent (explanatory) variable)
produced a value of Pearson’s product moment correlation coefficient of 0.9.
(a)
Interpret the meaning of Pearson’s product moment correlation coefficient for the
relationship between lung capacity and years of smoking.
_____________________________________________________________________
(b)
Complete the following statements.
The co-efficient of determination is calculated to be
_________.
This means that ________% of the variation in _________________ can be explained
by the variation in ___________________. The other _______% of the variation in
__________________ can be explained by other factors.
QUESTION 6
A set of data comparing blood alcohol level (BAL) and a driver’s ability to control a car is
found to have a coefficient of determination of 64%. A competent driver with zero BAL would
score high in ability to control a car.
The Pearson’s correlation coefficient r is most likely to be:
A
0.64
B
+0.8
C
±0.8
D
-0.8
E
-0.64
The School For Excellence 2015
The Essentials – Further Mathematics – Core Materials
Page 55
QUESTION 7
For the following data set:
x
y
25
36
45
78
89
99
110
78
153
267
456
891
1020
1410
The coefficient of determination (to two decimal places) is closest to:
A
14.14
B
–381.97
C
0.91
D
0.95
E
0.94
Solution
TI-Nspire
Add Lists and spreadsheet and enter data,
name lists
Casio ClassPad
In Statistics Menu enter data into Lists 1 & 2
Menu 4. Statistics 1. Stat Calculations
4. Linear Regression (a+bx)
Calc Linear Reg set lists as x list and y
list
Set lists as x list and y list ok r and r2
are displayed
r and r2 are displayed
The School For Excellence 2015
The Essentials – Further Mathematics – Core Materials
Page 56
CORRELATION AND CAUSATION
While we are entitled to say that there is a strong association between, say, the height of a
footballer and the number of marks he takes, we cannot assert that the height of a footballer
causes him to take a lot of marks. Being tall might assist in taking marks, but there will be
many other factors which come into play — for example skill level, accuracy of passes from
team mates, abilities of the opposing team, and so on.
So, while establishing a high degree of correlation between two variables may be interesting
and can often flag the need for further, more detailed investigation, it in no way gives us any
basis to comment on whether or not one variable causes particular values in another
variable.
QUESTION 8
The correlation between two variables x and y is −0.98. Which of the following statements
is true?
A
B
C
D
E
x increases, causing y to increase.
x increases, causing y to decrease.
There is a poor fit between x and y .
As x increases, y tends to decrease.
As x decreases, y tends to decrease.
QUESTION 9
A study was conducted recording the number of hours of television students watched on
average each night during Year 12 and their final ATAR score. The value of Pearson’s
product moment correlation coefficient was found to be 0.72.
From this information it could be concluded that:
A
B
C
D
E
Watching a lot of television is detrimental to a student’s performance.
Approximately 52% of students watch too much television.
Television watching should be limited during Year 12.
Students who watched more television tended to have lower ATAR scores.
Television watching improved 72% of students’ ATAR scores.
The School For Excellence 2015
The Essentials – Further Mathematics – Core Materials
Page 57
INTRODUCTION TO REGRESSION
In Further Maths three different regression lines are used to quantify bivariate relationships.
They are:
•
Line by eye or line of best fit.
•
The three median line.
•
The least squares regression line.
Each of these lines has its advantages and disadvantages:
Line by eye is variable between individuals, particularly if the relationship is moderate or
weak, but it is relatively unaffected by outliers.
The three median line is unaffected by outliers, but is mainly suitable for a smaller number of
data values.
The least squares regression line is the most commonly used, but it is affected by outliers
and can be very inaccurate when they are present.
INTRODUCTION TO REGRESSION — LINE BY EYE
Step 1: Draw a line of best fit through the data. This is the line that follows the direction of
the data, has approximately the same number of points above the line as below
and ignores outlying data points.
Step 2: Choose two coordinate points on the line (preferably at either end) and work out
the gradient, m using the formula:
m=
y 2 − y1
x 2 − x1
Step 3: Either find the y-intercept, c, from the graph (but make sure the x value is zero and
it is the y axis) and substitute m and c into the equation y = mx + c .
or
Substitute a point and the gradient directly into the formula: y − y1 = m( x − x1 ) .
Note: An alternative to using the algebraic approach above is to enter the two points into the
calculator statistics menu and perform a regression on those two points.
The School For Excellence 2015
The Essentials – Further Mathematics – Core Materials
Page 58
QUESTION 10
Calculate the equation to the line below.
Solution
Choose two points on the line (65,20) and (40,80) and work out gradient, m using the
formula:
m=
y −y
x −x
2
1
2
1
80 − 20
40 − 65
= −2.4
=
Substitute a point (from the line) and the gradient directly into the formula, y − y1 = m( x − x1 )
y − 20 = −2.4 ( x − 65 )
y − 20 = −2.4 x + 156
y = −2.4 x + 156 + 20
y = −2.4 x + 176
Note: The y -intercept appears to be too high but check of the x -axis reveals that it does
not start at zero. Be careful that you do not try to extrapolate the line to the y -axis and read
the intercept from the graph.
Alternatively:
The School For Excellence 2015
The Essentials – Further Mathematics – Core Materials
Page 59
QUESTION 11
A line by eye drawn on a data set is shown in the figure below.
The best estimate of the regression equation is:
A
B
C
D
E
y = 4.5 x + 30
y = −3 x + 30
y = 0.25 x + 9.5
y = 8x + 2
y = −2 x + 26
THE THREE MEDIAN METHOD
This method is not affected by outliers and is often used when there are outliers in the
data set. Also the most suitable for small data sets (e.g. up to 20 points).
Step 1: Plot the points on a scatterplot.
Step 2: Divide the data into three even groups, L (lower), M (middle) and U (upper)
according to the order of the x -values. The number of points in a data set will not
always be exactly divisible by 3. If there is one extra value it goes in the middle
group and if there are two extra values they go into the lower and upper groups.
Step 3: Find the median of the x and y values in each of the groups, (xL, yL), (xM, yM)
And (xU,yU).
Step 4: Use the LOWER and UPPER MEDIAN to find the gradient, m, using the rule
m=
yU − y L
xU − x L
Step 5: Find the y -intercept, c, using the formula:
c=
1
[(y L + y M + yU )− m(x L + x M + xU )]
3
Step 6: Substitute m and c into the rule y = mx + c .
The School For Excellence 2015
The Essentials – Further Mathematics – Core Materials
Page 60