Download Finding the missing mean or standard deviation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Inductive probability wikipedia , lookup

Transcript
The Harry Potter Guide To S1
Last updated - 8th April 2014
Chapter 0 - Guide To Using Your Casio Calculator
_
Practice being able to calculate these with your calculator - you should not
not rely on them to calculate your answer in the first place, but allows you to
check if you've got your answer correct. We'll use the following data sets:
Set 1
Lengths of wands:
Set 2
Weights of owls 𝑀 (kg)
Frequency
Set 3
Hours Revised
Potions test score
"In terms of wizarding
prowess, you're three
standard deviations above
the mean, Harry."
12.3cm, 15.7cm, 20.4cm, 21.3cm, 29.2cm
3≀𝑀<5
5
7
75
5≀𝑀<6
8
10
86
6 ≀ 𝑀 < 10
2
10 ≀ 𝑀 < 15
1
17
92
8
76
Harry Tip #1: Be careful to note how many variables there are in the problem.
The first data set is obviously just one variable (time). The second table (where the data is grouped)
is still one variable, but has frequency information. On your calculator, press MODE then choose 2
for STAT, and finally 1-VAR for "1 variable". The get the frequency column, go to SETUP (SHIFT ->
MODE), press the down key to get to next page on the menu, choose 3 for STAT, then finally select 1
for 'FREQUENCY ON'. You may as well leave this mode on, since the frequency value defaults to 1
when you don't specify it (i.e. you just have one instance of each listed piece of data).
For the second data set, you must not choose the 'A + BX' mode and use your second Y column for
the frequencies, as you can't treat the frequency data as just a second variable.
For the last data set, since there's two variables, from SETUP -> STAT, choose 2 for 'A + BX' for linear
regression with two variables. This is the equation of the straight line, and this is because for the
purposes of S1, your variables are assumed to have a linear relationship (i.e. roughly follow a line of
best fit) when calculating both your Product Moment Correlation Coefficient (which measures how
well your data fits a straight line) and your regression line. The other modes, which you won't use at
A Level, allows your variables to have different relationships, e.g. for population growth (which
grows/falls exponentially), the 'A.B^X' mode would be more appropriate as 𝑦 = π‘Ž β‹… 𝑏 π‘₯ is an
exponential function relating the two variables π‘₯ and 𝑦.
Entering your data
Enter each value for your first variable, pressing = each time to get to the next row. If you have a
second variable or frequencies to enter, use the arrow keys to navigate back to the top right of your
table. Once done, press the AC key to 'bank' your table. It's now stored in memory.
Calculating a statistic
Press SHIFT -> 1 for 'STATISTIC'. Choose 'VAR' for calculate either the mean or variable of either of
your variables (π‘₯ and where relevant 𝑦). This will then use this quantity in your current calculation,
which you can further manipulate if you like. Then press = to get the value.
Here's a summary of what you can calculate:
www.drfrostmaths.com
VAR
Standard deviation 𝜎π‘₯ (don't use 𝑠π‘₯), number of
data values 𝑛, and the mean π‘₯Μ… , 𝑦̅.
SUM
The sum of the values of your variable Ξ£π‘₯, Σ𝑦 or
the sum of the squares Ξ£π‘₯ 2 , or the sum of the
products Ξ£π‘₯𝑦. The second and third of these are
obviously crucial when you're calculating 𝑆π‘₯π‘₯ ,
𝑆𝑦𝑦 and 𝑆π‘₯𝑦 , which your calculator is unable to
do directly.
Important note: Note that when you have a
frequency row/column, Ξ£π‘₯ is actually calculating
Σ𝑓π‘₯, and Ξ£π‘₯ 2 is actually calculating Σ𝑓π‘₯ 2
because the values are being duplicated so each
copy of the value is effectively being treated as a
separate value of π‘₯. We'll discuss this later.
REG
Only available when you're in 2-variable mode.
Allows you to calculate the y-intercept and
gradient of your line of best fit, π‘Ž and 𝑏 in 𝑦 =
π‘Ž + 𝑏π‘₯ and the Product Moment Correlation
Coefficient π‘Ÿ.
Now have a go at calculating the following for the 3 data sets above:
Data Set 1
The mean, variance and standard deviation of
the length of wands.
Answers: 𝜎 = 5.73π‘π‘š, 𝜎 2 =
32.81, π‘₯Μ… = 19.78π‘π‘š
Data Set 2
An estimate of the mean and standard deviation
of the weight (recall that you need to use the
midpoints of the class intervals).
𝑀
Μ… = 5.78π‘˜π‘”, πœŽπ‘€ = 2.11π‘˜π‘”
Try calculating the standard deviation by typing
Ξ£π‘₯ 2
in the full formula for variance, 𝜎 2 = 𝑛 βˆ’ π‘₯Μ… 2 by
repeated use of the 'STATISTIC' button before
pressing =. Verify that this gives you the same
value as when you use your calculator to
calculate 𝜎 2 directly.
Data Set 3
Try and find Ξ£π‘₯ 2 , Σ𝑦 2 , Ξ£π‘₯𝑦, Ξ£π‘₯ and Σ𝑦. Hence
find 𝑆π‘₯π‘₯ , 𝑆𝑦𝑦 and 𝑆π‘₯𝑦 using these values. Find π‘Ÿ
directly and the linear regression line.
Ξ£π‘₯ = 42, Ξ£π‘₯ 2 = 502, Σ𝑦 =
392, Σ𝑦 2 = 27261, Ξ£π‘₯𝑦 =
3557
π‘Ÿ = 0.926
𝑦 = 64.607 + 1.680π‘₯
www.drfrostmaths.com
Chapters 2/3 - Data: Location and Spread
ο‚·
_
1
Hagrid Tip #1: When you have a discrete list of items, to find the median/quartiles, find 4,
1 3
,
2 4
of the number of items 𝑛, and then round up and use that numbered item. The one
exception is when you have a whole number after dividing, in which case use this item and
the one after.
Example: 2, 4, 6, 8, 10, 12
There are 6 items. For the median 6/2 = 3. This is a whole number, so use 3rd and 4th item
(midpoint is 7). For the LQ, 6/4 = 1.5. This rounds up to 2, so use the second item (4).
ο‚·
Hagrid Tip #2: When you have grouped continuous data, and you're finding the
quartiles/percentiles, DO NOT ROUND to find the item number - just keep it as it is. Use
linear interpolation to find an estimate for your quartile/percentile.
Example: Consider our third data set again. We add a cumulative frequency row:
Weights of owls 𝑀 (kg)
Frequency
Cumulative Frequency
3≀𝑀<5
5
5
5≀𝑀<6
8
13
6 ≀ 𝑀 < 10
2
15
10 ≀ 𝑀 < 15
1
16
To calculate the Lower Quartile: 𝑛 = 16, so 16/4 = 4th item. The 4th item occurs within the
first 5 items. We can put the cumulative frequency at the start and end of the matching
interval, as well as the item we're interested in. We can also put the class boundaries on the
bottom side of the line.
0
4
5
3kg
?
5kg
We're clearly 4/5 of the way along the line, so we go 4/5 of the way along from 3kg to 5kg:
4
𝑄1 = 3 + ( × 2) = 4.6π‘˜π‘”
5
To calculate the 72th Percentile, 𝑃72 :
72% of 16 = 16 × 0.72 = 11.52th item. This doesn't occur within the first 5 items but does
occur within the first 13 items, so we know 𝑃72 is in the 5 ≀ 𝑀 < 6 weight interval.
5
5kg
Thus 𝑃72 = 5 +
ο‚·
13
?
6kg
= 5.815π‘˜π‘”.
Hagrid Tip #3: Be vigilant of gaps in class intervals vs no gaps, and of dark wizards.
Suppose we instead had the following data:
Weights of owls 𝑀 (kg)
Frequency
Cumulative Frequency
ο‚·
6.52
(
× 1)
8
11.52
3βˆ’5
3
3
6βˆ’8
4
7
9 βˆ’ 13
5
12
We now have gaps! If you don't adjust the class intervals accordingly (i.e. the class widths would
be 3, 3 and 5), you'll get absolutely no marks for your linear interpolation. On the plus side
however, Voldemort would consider enlisting you as a Death Eater.
For example, to find the median:
www.drfrostmaths.com
𝑛 = 12 so use 12 ÷ 2 = 6th item.
ο‚·
ο‚·
ο‚·
3
6
7
5.5kg
?
8.5kg
3
𝑄2 = 5.5 + ( × 3) = 7.75π‘˜π‘”
4
Hagrid Tip #4: Try and memorise the mnemonic for the formula for variance, and how the
formula results from it, rather than memorise the formula itself.
Hagrid Tip #5: Make sure you understand the difference between Ξ£π‘₯ 2 and (Ξ£π‘₯)2 .
Hagrid Tip #6: Use your calculator to check your value of the variance (see Calculator Tips
above).
The mnemonic for variance: "The mean of the squares minus the square of the mean" ("msmsm").
This gives:
1. Ungrouped data: 𝜎 2 =
2. Grouped data: 𝜎 2 =
Ξ£π‘₯ 2
𝑛
Σ𝑓π‘₯ 2
Σ𝑓
Ξ£π‘₯ 2
βˆ’(𝑛)
π‘œπ‘Ÿ 𝜎 2 =
Ξ£π‘₯ 2
𝑛
βˆ’ π‘₯Μ… 2 (since π‘₯Μ… =
Ξ£π‘₯
)
𝑛
Σ𝑓π‘₯ 2
βˆ’ ( Σ𝑓 ) . Don't get these two mixed up!
You should not however think of these formulae as different. Σ𝑓 clearly means the same as 𝑛. And
Σ𝑓π‘₯ still means the (estimated sum) of the values (using the midpoints π‘₯ of the class intervals).
Confusingly, when exam questions use a variable for the values, say 𝑀, but the data is grouped, Σ𝑀
still refers to the total of all the values with the frequencies factored in. This is likely to be different
to Σ𝑓π‘₯ , because the latter is an estimate of the total using the midpoints of the grouped data,
whereas Σ𝑀 is the exact total of the values before the data was grouped and information was lost
(see Edexcel Jan 2011 Q5 for example). This just means that if you wanted the mean of 𝑀 and you
were given Σ𝑀, then 𝑀
Μ…=
ο‚·
ο‚·
Σ𝑀
,
𝑛
and ignore the grouped frequency table you were given.
Hagrid Tip #7: Don't forget to square root when you're finding the standard deviation from the
variance!
Hagrid Tip #8: Check that your standard deviation looks sensible. Standard deviation roughly
means "the average distance from the mean". So if your standard is 10 times too large say, then
you know you've gone wrong.
Coding:
1. However you code your variable (adding, dividing, etc.) you do the same to the mean.
2. Adding or subtracting to your variable doesn't affect the spread (variance/standard
deviation). This intuitively makes sense: were everyone to get exactly 50cm taller by
standing on a chair, the heights are just as spread out.
3. Multiplying or dividing affects the standard deviation in the same way. If you double the
heights, you double the standard deviation. You halve the heights, you halve the standard
deviation.
4. For variance though, we have to square the factor difference. For example, if the values
tripled and hence 𝜎 becomes 3𝜎, then the variance is (3𝜎)2 = 9𝜎, i.e. the variance becomes
9 times larger.
Hagrid Tip #9: Make sure you check whether you're finding the new mean/variance/standard
deviation after coding, or the original mean/variance/standard deviation before coding.
www.drfrostmaths.com
Chapter 4 - Data: Location and Spread
_
Box Plots
Remember that you need to calculate your outlier boundaries, which are generally 1.5 Interquartile
Ranges above the UQ or below the LQ. You will always be told in the exam question however how
the outlier boundary is defined.
ο‚·
ο‚·
Buckbeak Tip #1: There's two possibilities for the end points of the whiskers when there's
an outlier on that end, and mark schemes accept both: either use the outlier boundary
itself, or the smallest/greatest value which is not an outlier.
Buckbeak Tip #2: You must explicitly show your calculation for the outlier boundaries.
There's marks specifically for this in the mark scheme, and if you display your whiskers
slightly incorrectly, you'll risk losing all marks.
Stem and Leaf Diagrams
You may be asked to calculate the interquartile range. In which case, just remember that you have a
discrete list of items, and hence choose the items to use for the quartiles in the correct way.
When asked to compare the two sets of data in a back-to-back stem and leaf diagram, they're
expecting things like "the boy's scores tend to be higher than the girls".
Histogram
Pretty much all histogram questions boil down to this simple diagram:
Area
×π‘˜
Frequency
i.e. You're identifying the scaling (π‘˜) from area to frequency. At GCSE you could always assume that
π‘˜ = 1, i.e. area is EQUAL to frequency. Identifying π‘˜ may come from either using the total area and
total frequency given (when frequencies of individual intervals are not available) or from the known
frequency and area of a particular bar.
Once π‘˜ is known, you can use it to calculate frequencies for any area.
Use my slides for practice: http://www.drfrostmaths.com/resource.php?id=11371
ο‚·
Buckbeak Tip #3: If you're not given a frequency density scale on the histogram on the 𝑦axis, and only know the total frequency, just add any frequency density scale. If you know
the frequency of a particular bar, it's generally easiest to set the scale such that
π‘“π‘Ÿπ‘’π‘žπ‘’π‘’π‘›π‘π‘¦
π‘“π‘Ÿπ‘’π‘žπ‘’π‘’π‘›π‘π‘¦ 𝑑𝑒𝑛𝑠𝑖𝑑𝑦 = π‘π‘™π‘Žπ‘ π‘  π‘€π‘–π‘‘π‘‘β„Ž.
ο‚·
ο‚·
ο‚·
Buckbeak Tip #4: As before, you must check if the class intervals have gaps! If so, ensure you
use the correct class intervals when calculating frequencies/frequency densities.
Buckbeak Tip #5: When asked to find the mean, median, quartiles or variance of a
histogram, first use the histogram to generate a grouped frequency table. Then use this
table as you usually would to calculate these statistics.
Buckbeak Tip #6: When asked why a histogram is an appropriate means of displaying the
data, the words they're looking for are 'continuous data/variable', and nothing else.
www.drfrostmaths.com
Skew
Remember that there's 3 ways in which you calculate skew:
1. For histograms or probability distributions, just observe the shape. If the 'tail' is in the
positive direction, you have positive skew. If it's in the negative direction, you have negative
skew.
2. Use the quartiles. If the right box of the (implied) box plot is wider, i.e. 𝑄3 βˆ’ 𝑄2 > 𝑄2 βˆ’ 𝑄1 ,
then you have positive skew. If the left box is wider, you have negative skew. If they're the
same width, you have no skew.
3. Use the mean and median. The way to remember which order means which type of skew is
to think of salaries: Large salaries in the positive tail drag up the mean but not the median,
hence π‘šπ‘’π‘Žπ‘› > π‘šπ‘’π‘‘π‘–π‘Žπ‘› means we have positive skew.
On the rare occasion, you have both the quartiles and mean available. In which case, choose either
(2) or (3) to find what type of skew you have. Otherwise, the choice should be clear based on the
data available.
ο‚·
Buckbeak Tip #6: For 2 mark questions which ask you to comment on skew, you get one
mark for saying 'negative/positive/no skew', and the other mark for given a valid reason (e.g.
π‘šπ‘’π‘Žπ‘› > π‘šπ‘’π‘‘π‘–π‘Žπ‘›).
Chapter 5 - Probability
ο‚·
ο‚·
ο‚·
_
If 𝐴 and 𝐡 are independent, then 𝐴 does not affect 𝐡 and vice-versa.
If 𝐴 and 𝐡 are mutually exclusive, then 𝐴 and 𝐡 can’t happen at the same time.
These are completely separate things – one is not the opposite of the other!
Laws
If A and B are independent:
ο‚·
ο‚·
𝑷(𝑨|𝑩) = 𝑷(𝑨)
(as the probability of A is not affected
by B)
𝑷(𝑨 ∩ 𝑩) = 𝑷(𝑨)𝑷(𝑩)
(If you're asked to show that two
events are independent, then show
this equality holds)
If A and B are mutually exclusive:
ο‚·
ο‚·
𝑷(𝑨 ∩ 𝑩) = 𝟎
𝑷(𝑨 βˆͺ 𝑩) = 𝑷(𝑨) + 𝑷(𝑩)
www.drfrostmaths.com
In general:
ο‚·
𝑷(𝑨|𝑩) =
𝑷(π‘¨βˆ©π‘©)
𝑷(𝑩)
(I remember this by β€˜the intersection divided
by the thing I’m conditioning on’)
(Notice that if A and B are independent, then
the RHS simplifies to 𝑃(𝐴))
ο‚·
Addition Rule: 𝑷(𝑨 βˆͺ 𝑩) = 𝑷(𝑨) +
𝑷(𝑩) βˆ’ 𝑷(𝑨 ∩ 𝑩)
(Remember this by thinking about two
overlapping circles – we need to subtract the
overlap)
(Notice that if A and B are mutually exclusive,
𝑃(𝐴 ∩ 𝐡) = 0, so we get our earlier formula)
(If A and B are independent, this reduces to
𝑃(𝐴 βˆͺ 𝐡) = 𝑃(𝐴) + 𝑃(𝐡) βˆ’ 𝑃(𝐴)𝑃(𝐡)
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
McGonagall Tip #1: Mutually exclusive events are indicated by separated non-overlapping
circles in a Venn Diagram. Independence does not affect the Venn Diagram.
McGonagall Tip #2: If you’re not told two events are mutual exclusive, then for the purposes
of the Venn Diagram, you have to assume that they are not mutually exclusive, i.e. they
overlap.
McGonagall Tip #3: You can often determine probabilities by constructing a Venn Diagram
and filling in the missing probability in regions by simple adding/subtracting. Other times,
this approach doesn’t work.
McGonagall Tip #4: You can treat probabilities algebraically.
e.g. 𝑃(𝐴) βˆ’ 0.7𝑃(𝐴) = 0.3𝑃(𝐴)
McGonagall Tip #5: If you’re told that A and B are independent, immediately write out that
𝑃(𝐴 ∩ 𝐡) = 𝑃(𝐴)𝑃(𝐡) using whatever probabilities you’re given. Same for mutual
exclusivity. It’ll help you visualise the probabilities you have available to determine others
you don’t know.
McGonagall Tip #6: Do you have a mixture of 𝑃(𝐴 ∩ 𝐡), 𝑃(𝐴 βˆͺ 𝐡) and 𝑃(𝐴) (or 𝑃(𝐡))? You
should write out the Addition Rule and see if it helps.
McGonagall Tip #7: Note that given some event, the probabilities add up to 1. So:
𝑃(𝐴|𝐡) + 𝑃(𝐴′ |𝐡) = 1 and 𝑃(𝐴|𝐡′ ) + 𝑃(𝐴′ |𝐡′ ) = 1
Some people incorrectly assume:
𝑃(𝐴|𝐡) + 𝑃(𝐴′ |𝐡′ ) = 1
McGonagall Tip #8: As per GCSE, a suitable tree diagram can work wonders. Remember that
the second level of branching and onwards are conditional probabilities.
Dealing with more complicated tree questions
Suppose you have a tree like the one below:
𝐢
𝐡
𝐢′
𝐴
𝐢
𝐡′
𝐢′
𝐢
𝐴′
𝐡
𝐢′
𝐢
𝐡′
𝐢′
www.drfrostmaths.com
Then how would we calculate the following?
𝑃(𝐢)
As per GCSE, we just find all the paths which match this description
(i.e. where 𝐢 is true) and add the probabilities of each path:
𝑃(𝐢) = 𝑃(𝐴 ∩ 𝐡 ∩ 𝐢) + 𝑃(𝐴 ∩ 𝐡′ ∩ 𝐢)
+𝑃(𝐴′ ∩ 𝐡 ∩ 𝐢) + 𝑃(𝐴′ ∩ 𝐡′ ∩ 𝐢)
𝑃(𝐴 ∩ 𝐢 β€² ) = 𝑃(𝐴 ∩ 𝐡 ∩ 𝐢 β€² ) + 𝑃(𝐴 ∩ 𝐡′ ∩ 𝐢 β€² )
𝑃(𝐴 ∩ 𝐢′)
𝑃(𝐡)
Note that we need not even consider the event 𝐢 because it occurs
after 𝐡:
𝑃(𝐡) = 𝑃(𝐴 ∩ 𝐡) + 𝑃(𝐴′ ∩ 𝐡)
𝑃(𝐢|𝐡)
Seeing the conditional probability, you should immediately go for your
formula for conditional probability!
𝑃(𝐢|𝐡) =
Chapter 6 - Correlation
𝑃(𝐡 ∩ 𝐢) 𝑃(𝐴 ∩ 𝐡 ∩ 𝐢) + 𝑃(𝐴′ ∩ 𝐡 ∩ 𝐢)
=
𝑃(𝐡)
𝑃(𝐴 ∩ 𝐡) + 𝑃(𝐴′ ∩ 𝐡)
_
Recall that π‘Ÿ = 1 is 'perfect positive correlation', π‘Ÿ = 0 is 'no correlation' and π‘Ÿ = βˆ’1 is 'perfect
negative correlation'. Anything below -0.7 or above 0.7 is considered to be strong correlation.
I'm going to presume here you can plug values into your 𝑆π‘₯π‘₯ , 𝑆𝑦𝑦 , 𝑆π‘₯𝑦 and π‘Ÿ formulae. But things I
see go wrong:
ο‚·
Lupin Tip #1: In 𝑆π‘₯π‘₯ = Ξ£π‘₯ 2 βˆ’
(Ξ£π‘₯)2
𝑛
, I've seen people forget to square the Ξ£π‘₯, or mix the
Ξ£π‘₯ 2
formula up with the one for variable, and do 𝑆π‘₯π‘₯ = Ξ£π‘₯ 2 βˆ’ ( 𝑛 ) . This formula is clearly
ο‚·
wrong because the 𝑛 in the denominator gets squared when it shouldn't.
Lupin Tip #2: You can use your calculator to directly calculator π‘Ÿ if you're given the original
data (see the beginning of this guide). Make sure however you still show your calculations
for 𝑆π‘₯π‘₯ and so on for the purposes of evidencing working. However, generally you're
generally provided with certain sums in the exam to save you time, so you may not be able
to enter the original data directly.
Here are some potential 'explaining' questions you might encounter:
ο‚·
ο‚·
Lupin Tip #3: If you're asked whether your correlation coefficient supports some assertion,
just comment on whether your value is close to -1, 0 or 1. If someone is claiming that house
prices falls with distance from central London, then their assertion is justified if you have a
correlation coefficient close to -1 (i.e. negative correlation).
Lupin Tip #4: If you're asked to give an 'interpretation' of your correlation coefficient, this
doesn't mean to say whether it's negative or positive, but to say what it actually means in
www.drfrostmaths.com
words. e.g. "Higher towns have lower temperature/temperature decreases as height
increases". i.e. You're asked to state to what happens to one variable as the other increases.
Coding
π‘Ÿ
Completely unaffected by any multiplications,
divisions, additions or subtractions in the coding.
𝑆π‘₯π‘₯ 𝑆𝑦𝑦
Since 𝑆π‘₯π‘₯ = π‘›πœŽ 2 , 𝑆π‘₯π‘₯ is affected by coding in the
same way as variance. So if the value is doubled
in coding, 𝑆π‘₯π‘₯ becomes 22 times bigger. The
same applies to 𝑆𝑦𝑦 .
𝑆π‘₯𝑦
As above, addition and subtraction in the coding
has no effect. If π‘₯ is scaled by a factor of π‘˜ and 𝑦
by a factor of π‘ž, then 𝑆π‘₯𝑦 is scaled by a factor of
π‘˜π‘ž. e.g. If 𝑆π‘₯𝑦 = 10 and π‘Ž = 4π‘₯ + 1 and 𝑏 =
3π‘₯ βˆ’ 2, then π‘†π‘Žπ‘ = 𝑆π‘₯𝑦 × 4 × 3 = 120.
Chapter 7 - Regression
_
Regression in general means to find the parameters of a model which best explains the data. In the
case of linear regression, the model is a straight line, and the parameters are its y-intercept and
gradient, which are set to as to minimises the total (squared) error between the predicted y-value on
the line and y-value on each data value.
The formulae for the equation of the least squares regression line are given in the formula booklet,
but they're worth memorises:
ο‚·
𝑦 = π‘Ž + 𝑏π‘₯. Notice that unlike 𝑦 = π‘šπ‘₯ + 𝑐, we put the y-intercept term first on the RHS.
This is so that when we extend say to quadratic regression, i.e. fitting a line 𝑦 = π‘Ž + 𝑏π‘₯ +
𝑐π‘₯ 2 , π‘Ž still means the y-intercept.
ο‚·
𝑏 = 𝑆π‘₯𝑦 . I remember this by the fact that 'xy' are the sex chromosomes for a man, and 'xx'
ο‚·
for a woman, and the man comes first (a little bit misogynistic I know, but easy to
remember!)
π‘Ž = 𝑦̅ βˆ’ 𝑏̅π‘₯ . This is easily remember by just rearranging 𝑦 = π‘Ž + 𝑏π‘₯ above to make π‘Ž the
subject, and then replacing π‘₯ with π‘₯Μ… and 𝑦 with 𝑦̅. This suggests that the point (π‘₯Μ… , 𝑦̅) is on
your regression line. Remember that π‘₯Μ… just means the mean of π‘₯ and is calculated using π‘₯Μ… =
𝑆
π‘₯π‘₯
Ξ£π‘₯
.
𝑛
Snape Tip #1: If asked to interpret your gradient, say something like "As [my x variable] increasing by
1, the [your y variable] increases/decreases by [the gradient]", e.g. "as the height increases by 1m,
the temperature decreases by 3 degrees".
www.drfrostmaths.com
Snape Tip #2: It's good to be clear from the outset what's your explanatory variable and what's your
independent variable. This is crucial when you calculator your gradient, since if you get your
𝑆π‘₯𝑦
variables mucked up, you might do 𝑏 = 𝑆
𝑦𝑦
by mistake. These will be rarely labelled using the
variable letters π‘₯ and 𝑦, so ensure you work out which is which. If for example your had the water
depth 𝑑 and the pressure 𝑝, clearly 𝑝 depends on 𝑑, so 𝑑 is your π‘₯ (explanatory) variable and 𝑝 your
𝑦 (dependent) variable. Don't write 𝑆π‘₯𝑦 (since π‘₯ and 𝑦 don't exist in this context), use 𝑆𝑑𝑝 instead.
Snape Tip #3: Remember that for coding, your just replace your variables using the expressions
given. e.g. If you have the line 𝑦 = 3 + 2π‘₯ and you used the coding π‘₯ = 3π‘Ž + 4, 𝑦 = 𝑏 βˆ’ 1, then just
substitute: 𝑏 βˆ’ 1 = 3 + 2(3π‘Ž + 4) and simplify. Piece of cake.
Chapter 6-7 - Interpretation Questions for Correlation/Regression
_
Interpretation questions are a bit of a pain if you don't know the examiner is looking for, and often
worth a lot of marks (anything up to 4!). Here's everything you need to know:
Question
Answer format
Comments
Interpret r
"The graph show a ___
correlation.
As ____ increases, ____
increases/decreases."
The key here is saying how the
dependent value changes as
the explanatory variable
increases. The question asks
you to 'interpret', so it's key
here you describe what it
explains using NONSTATISTICAL LANGUAGE. Saying
"Positive correlation" alone
would be DESCRIBING the
correlation.
Explain why this diagram
would support the fitting of a
regression line of y onto x
"The variables have a linear
relationship, i.e. the points are
close to the implied straight
line of best fit."
Interpret the gradient/slope of
the line/interpret b
As (x) increases by 1, (y)
increases/decreases by ___.
Again, the key word here is
'interpret'. Example "As the
height increases by 1m, the
temperature decreases by
0.1°πΆ
Interpret the yintercept/interpret a
The value (y) takes when (x) is
0.
e.g. "A score of 55 would be
obtained if no hours of revision
are done".
Extrapolation
"Reliable [1 mark] because the
value is within the range of the
data [1 mark]."
The 'tricksey' case here is when
the value is just outside the
data range. The examiner is
expecting you to state the your
www.drfrostmaths.com
"Reliable because the value is
close to the range of the data".
regression equation is still
reliable.
"Unreliable because the value is
outside the range of the data,
i.e. we are extrapolating."
Which is the explanatory
variable? Explain your answer.
"(x) is the explanatory variable
because (x) influences (y)".
Explain method of least
squares
"We minimise the square of the
residuals" (draw a diagram)
Chapter 8 - Discrete Random Variables
You may wish to go back to
your textbook to understand
what's going on here. But the
principle is that we're finding
the error between each data
value and the predicted value
(based on the model), squaring
the error (to ensure the error is
positive) then adding them up.
The gradient and y-intercept of
the straight line is set to
minimise this total error.
_
A discrete random variable has two ingredients:
1. A set of outcomes.
2. A probability distribution mapping outcomes to probability. This has two flavours:
a. A 'probability distribution' is represented either as a table or as a graph.
b. A 'probability function' allows you to have a more complex expression that
calculates the probability, as opposed to a probability distribution where each
outcome and its probability are explicitly stated. These might be specified as a
'piecewise function', where different functions are used for different ranges of
outcomes, e.g:
π‘˜(π‘₯ + 1) π‘₯ = 1,2,3,4
𝑝(π‘₯) = {
0
π‘œπ‘‘β„Žπ‘’π‘Ÿπ‘€π‘–π‘ π‘’
𝑃(𝑋 = π‘₯) is the longhand for writing a probability. It means "the probability that a random variable
𝑋 has the value π‘₯. 𝑝(π‘₯) is the shorthand. Note the use of lowercase 𝑝.
A cumulative distribution function meanwhile gives the probability up to a certain value, i.e. 𝐹(π‘₯) =
𝑃(𝑋 ≀ π‘₯).
www.drfrostmaths.com
Finding the value of a missing variable using a probability function or cumulative distribution
function
π‘˜(π‘₯ + 1)
𝑝(π‘₯) = {
0
π‘₯ = 1,2,3,4
π‘œπ‘‘β„Žπ‘’π‘Ÿπ‘€π‘–π‘ π‘’
Probabilities must add up to 1. So just plug in
each value of π‘₯ into your probability function:
2π‘˜ + 3π‘˜ + 4π‘˜ + 5π‘˜ = 1
1
So π‘˜ = 14
π‘˜(π‘₯ + 1)
𝐹(π‘₯) = {
0
π‘₯ = 1,2,3,4
π‘œπ‘‘β„Žπ‘’π‘Ÿπ‘€π‘–π‘ π‘’
It's clear 𝐹(4) = 1, because it's certain that we'll
have an outcome up to 4.
1
This gives us 5π‘˜ = 1, so π‘˜ = 5
Finding a probability distribution from a cumulative distribution function
Since the cumulative distribution function is the running total of the probability, finding the
difference tells us what we added to the running total each time.
Expected Value
Expected value just means the 'mean' of the random variable. So for a fair die, 𝐸(𝑋) = 3.5 because
we expect to see an outcome of 3.5 on average. For this reason 𝐸(𝑋) is sometimes represented as
πœ‡. We just sum the product of each outcome with its probability:
𝐸(𝑋) = Ξ£ π‘₯ 𝑝(π‘₯)
Finding the value of missing variables using a probability distribution and a given expected value
We form two simultaneous equations by:
1. Noting that the probabilities of a probability distribution add up to 1.
2. Using the provided expected value.
e.g. Given that 𝐸(𝑋) = 3 and that 𝑋 has the probability distribution below, determine π‘Ž and 𝑏.
π‘₯
1
2
3
4
𝑃(𝑋 = π‘₯)
π‘Ž
π‘Ž
𝑏
0.1
2π‘Ž + 𝑏 + 0.1 = 1
1π‘Ž + 2π‘Ž + 3𝑏 + 0.4 = 3
Then just simplify and solve.
Variance
The mnemonic "the mean of the square minus the square of the mean" still applies!
www.drfrostmaths.com
π‘‰π‘Žπ‘Ÿ(𝑋) = 𝐸(𝑋 2 ) βˆ’ 𝐸(𝑋)2
If you can't distinguish between 𝐸(𝑋 2 ) and 𝐸(𝑋)2 , then it's likely you're one of those people who
mix up Ξ£π‘₯ 2 and (Ξ£π‘₯)2 . 𝐸(𝑋 2 ) means you find the expected value as before, except you replace each
outcome with itself squared - the probability is always unaffected, i.e. 𝐸(𝑋 2 ) = Ξ£π‘₯ 2 𝑝(π‘₯).
Coding
The same rules apply as before. For expected value, whatever we did to the random variable, we do
to the expected value. Adding/subtracting has no effect on variance, and when we multiply or
divide, we square this value.
Examples:
𝐸[1 βˆ’ 3𝑋] = 1 βˆ’ 3𝐸[𝑋]
Chapter 9 - Normal Distribution
π‘‰π‘Žπ‘Ÿ[1 βˆ’ 3𝑋] = 9π‘‰π‘Žπ‘Ÿ[𝑋]
_
A normal distribution is perfect for data which is symmetrically distributed about some mean and
the probability tails off as we get further from the mean.
Some preamble
z-values and 𝒛-tables
The 𝑧 value means the number of standard deviations above the mean. For IQ for example, where
𝑋~𝑁(100,152 ), and IQ of 130 corresponds with a 𝑧-value of 2, and an IQ of 85 with a 𝑧-value of -1.
𝑧=
π‘₯βˆ’πœ‡
𝜎
This makes sense: π‘₯ βˆ’ πœ‡ is how far about the mean a value is, so dividing it by 𝜎 tells us how many
standard deviations above the mean we are.
If 𝑋 is the original random variable, converting it to a new random variable 𝑍 using the formula
above is known as standardisation. 𝑍 is a normally distributed variable where 𝑍~𝑁(0,12 ), known as
the standard normal distribution. The reason πœ‡ = 0 and 𝜎 = 1 is because then if the 𝑧 value is say 3,
then this IS the number of standard deviations above the mean, because 3 is 3 lots of 1 above 0.
Thus in converting 𝑋 to 𝑍, each value of the random variable now represents the number of
standard deviations above the mean. So an IQ of 130 gets converted to 2, and so on.
A 𝒛-table allows us to find out the probability of being up to a certain 𝑧 value, i.e. 𝑃(𝑍 < 𝑧), where
𝑧 is a specific value of 𝑍. Looking up 𝑧 = 3 in the table would determine the probability of being up
to 3 standard deviations above the mean; in the case of IQ, it would tell us the probability of having
an IQ less than 145.
www.drfrostmaths.com
Manipulating z-probabilities
Just think about the graph of a normal distribution when manipulating probabilities. 𝑃(𝑍 < βˆ’2)
represents the probability of being less than two standard deviations below the mean, e.g. having an
IQ less than 70. Due to symmetry, this is the same probability as being at least 2 standard deviations
above the mean, e.g. having an IQ greater than 130. i.e. 𝑃(𝑍 < βˆ’2) = 𝑃(𝑍 > 2).
Similarly, since the probability of having an IQ above 130 is one minus probability of having an IQ
below 130, we have that 𝑃(𝑍 > 2) = 1 βˆ’ 𝑃(𝑍 < 2).
In order to be able to use the z-table, we (a) need our z-value to be positive and (b) the inequality to
be <. Examples: 𝑃(𝑍 > βˆ’3) = 𝑃(𝑍 < 3).
𝑃(𝑍 > 4) = 1 βˆ’ 𝑃(𝑍 < 4)
𝑃(𝑍 < βˆ’0.5) = 𝑃(𝑍 > 0.5) = 1 βˆ’ 𝑃(𝑍 < 0.5). Practice this manipulation!
Finding a probability
"Let 𝑋 represent the IQ of a randomly chosen person, where 𝑋~𝑁(100,152 ). Find the probability
that a randomly chosen person has an IQ below 96".
𝑃(𝑋 < 96)
Step 1: Express your information as a probability.
Step 2: Standardise, i.e. convert your X value to a
Z value, using the formula (or common sense).
Ensure you get the sign of z correct.
Step 3: Manipulate your z probability so that the
z value is positive and you're using <.
𝑃 (𝑍 <
= 𝑃(𝑍 > 0.27) = 1 βˆ’ 𝑃(𝑍 < 0.27)
= 1 βˆ’ 0.6026 = 0.3974
Step 4: Look up in z table.
Step 5: Check that your answer looks sensible.
96 βˆ’ 100
) = 𝑃(𝑍 < βˆ’0.27)
15
(Yes, the answer looks sensible because we
expected slightly less than half the population)
Finding a value corresponding to a probability
Example 1: "Find the lower quartile for IQ, given that 𝑋~𝑁(100,152 ) again."
Step 1: As before, express your information as a
probability.
𝑃(𝑋 < 𝑄1 ) = 0.25
Step 2: Normalise. We don't like the probability
of 0.25 because it won't be in our z-table (it's
less than 0.5). A quick fix is to make the z-value
minus, which allows us to do 1 minus the
probability (as per on the right).
𝑃(𝑍 < 𝑧) = 0.25
𝑃(𝑍 < βˆ’π‘§) = 0.75
Step 3: Ordinarily, we'd look up the closest zvalue that gives that probability, which turns out
βˆ’π‘§ = 0.67
𝑧 = βˆ’0.67
www.drfrostmaths.com
to be 0.67. For certain 'nice' probabilities (e.g.
This makes sense as we expect our 𝑧 value to be
2.5%, 5%, 10%), we MUST look in the second ztable, otherwise you lose marks. However, in this negative as it's below the mean.
case, 0.75 is not there, so use first table.
Step 4: Convert back into an X value by using the
formula for z:
𝑄1 βˆ’ 100
= βˆ’0.67 β†’
15
𝑄1 = 89.95
I personalise find it easier to write 𝑄1 = 100 βˆ’
(0.67 × 15) as it expresses the fact that 𝑄1 is 0.67
standard deviations below the mean (i.e. 0.67 lots of
15 below 100). However, since the form at the top of
this box is what's listed for the method mark, it's
probably safer to do it that way.
Example 2: "Find the IQ for which 10% of the population exceeds, given that 𝑋~𝑁(100,152 ) again."
Step 1: As before, express your information as a
probability. Clearly if 10% of people have an IQ
higher than this value, then there's 90% below
(recall that we always want the probability of
being below some value on the right half of the
graph in order to use the z-table).
𝑃(𝑋 < π‘₯) = 0.9
Step 2: Standardise. No manipulation of our z
probability required as we have < and a
probability greater than 0.5.
𝑃(𝑍 < 𝑧) = 0.9
𝑧 = 1.2816
Step 3: This time we can use the second z-table
because 0.1 is on (note that confusing, the
second table gives us the probability of being
ABOVE some z-value, i.e. 𝑃(𝑍 > 𝑧)).
Step 4: As before, use the formula for z to
convert back into an IQ.
1.2816 =
π‘₯ βˆ’ 100
β†’
15
π‘₯ = 119.224
Finding the missing mean or standard deviation
You're highly advised to do lots of practice questions on this until it's second nature!
Example: "Only 10% of maths teachers live more than 80 years. Triple that number live less than 75
years. Given that life expectancy of maths teachers is normally distributed, calculate the standard
deviation and mean life expectancy."
Step 1: As before, express your
information in terms of probabilistic
statements. Notice we've turned the
"10% above" into "90% below".
www.drfrostmaths.com
𝑃(𝑋 < 80) = 0.9
𝑃(𝑋 < 75) = 0.3
Step 2: Deal with first.
𝑃(𝑋 < 80) = 0.9
𝑃(𝑍 < 𝑧) = 0.9
𝑧 = 1.2816
80 βˆ’ πœ‡
1.2816 =
𝜎
Step 3: Deal with second.
𝑃(𝑋 < 75) = 0.3
𝑃(𝑍 < 𝑧) = 0.3
𝑃(𝑍 < βˆ’π‘§) = 0.7
βˆ’π‘§ = 0.5244
𝑧 = βˆ’0.5244
βˆ’0.5244 =
Step 4: We have two simultaneous
equations, so solve!
75 βˆ’ πœ‡
𝜎
80 βˆ’ πœ‡ = 1.2816𝜎
75 βˆ’ πœ‡ = βˆ’0.5244𝜎
Subtracting the two equations:
5 = 1.806𝜎 β†’
πœ‡ = 76.451
𝜎 = 2.769
I find your S1 efforts most... a-πœ‡-sing.
Harry Potter stats puns (c) Lord Voldermort.
Copyright infringement punishable by Cruciatus.
www.drfrostmaths.com