Download Statistics 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
M.E.I. STRUCTURED MATHEMATICS
UNIT S1
Unit S1: Scheme of Work
For first teaching in September 2004
Unit Title
Statistics 1 (4766) is an AS unit.
Objectives
To enable students to build on and extend the data handling and sampling techniques they
have learned at GCSE.
To enable students to apply theoretical knowledge to practical situations using simple
probability models.
To give students insight into the ideas and techniques underlying hypothesis testing.
Assessment
Examination (72 marks)
1 hour 30 minutes.
The examination paper has two sections:
Section A:
five questions, each worth at most 8 marks.
Section Total: 36 marks.
Section B:
two questions, each worth about 18 marks.
Section Total: 36 marks.
Calculators
A graphical calculator is allowed in the examination for this module.
Assumed Knowledge
Candidates are expected to know the content of Intermediate Tier GCSE. In addition, they
need to know the binomial expansion as covered in C1.
Textbook
“Statistics 1” by Eccles, Francis, Graham and Porkess.
Scheme of Work
Each of the blocks below is roughly one double period. The module should take about 34
double periods to cover, plus time spent on past papers for revision.
Exercises from the textbook are suggestions only. These will usually be started in class and
students expected to finish them at home. Lessons should start with reference to the last
exercise.
This scheme of work is a “working document” and comments and alterations are very
welcome, especially if you have a new or exciting way to teach a topic.
Page 1
M.E.I. STRUCTURED MATHEMATICS
UNIT S1
Topic and learning objectives
S1/1 EXPLORING DATA
1.1. Introduction
Motivation: the idea of a statistical
investigation. Mention the idea of
population and (random) sample (p5,
p6)
Types of data: categorical, discrete,
continuous (D1)
Idea of a distribution: classification
using skewness (D9)
References
Teaching points
Ch.1 p.1-6, 12-13
1.2. Stem-and-leaf diagrams
Sorted stem-and-leaf diagrams: how
to “expand”, keys (D6)
Ch.1 p.6-8
Ex.1A Q1,2,5 (orally),
Q7,9
1.3. Measures of central tendency
Mean, median and mode; midrange
(new); their relative usefulness (D10,
D11)
Ch.1 p.13-16
Ex.1B (self-study)
1.4. Frequency distributions
Understanding and constructing
frequency tables for ungrouped data
(D2): calculating measures of central
location for such data (D10)
Grouped data: choice of class
boundaries for discrete and
continuous data: estimating the mean
(D2,D10)
Ch.1 p.17-19
Ex.1C (self-study)
What is statistics? A possible definition is “collection, organisation and analysis of information”. Discuss
some of the processes behind statistics: we may collect data to settle a question, given in the form of a
hypothesis. We could try to collect data from every element of a population (a census) but we are much
more likely to take a sample: discuss the process of taking a random sample.
Use the Internet and find a large data file, e.g. Mayfield High School, and discuss the variables. Ask the
students to use the file to provide examples of categorical data (e.g. favourite TV programme), continuous
data (e.g. height) and discrete data (e.g. KS2 levels).
The data file contains a great deal of raw data, e.g. the KS2 levels in Mathematics. How might we analyse
this data? Organise it by tallying, and then draw a vertical line chart: this shows the distribution of the
data. Discuss the possible shapes of distributions: symmetrical, unimodal, negative skew (mostly high
values), positive skew (mostly low), bimodal, uniform.
Take another data set, e.g. % examination scores (Autograph can be persuaded to generate something, or
there are files in the Maths area from which the names can be removed). How could we display this
without losing the information? Hopefully someone will suggest a stem-and-leaf diagram: the MEI
definition requires a “scale” or key, and that the leaves are sorted. If there are too many values to each
“stem”, the diagram can be “stretched” (see p.8).
This should not need much time: review mean (mention mean age: age is always rounded down, so add
0.5), median and mode, and introduce the idea of midrange. Discuss the relative merits of each, and the
relative positions in skewed distributions: mean is good for symmetric distributions but will be skewed if
the data set contains very large or very small values, e.g. Cookson Enterprises has many workers earning
£10,000 per year, and the Managing Director who earns £1m: calculate mean, median, mode and
midrange.
This example is a little positively-skewed distribution. Perhaps use the Mayfield KS2 levels to construct
some discrete distributions (or e.g. survey no. of brothers and sisters) and calculate means, medians
(perhaps via a column of cumulative frequencies, which Excel could put in), modes and midranges:
discuss their usefulness.
1.5. Measures of spread
Range: disadvantages (D12)
Calculating and interpreting mean
absolute deviation, mean square
deviation and root mean square
deviation (D13)
Ch.1 p.31-36
Ch.1 p.22-29
Ex.1D Q2,6
Discuss need for grouping (see p.23): take a fairly large discrete data set, e.g. the examination marks
mentioned above, and discuss choice of class boundaries (relate to stem-and-leaf diagram above, if
drawn): estimate the mean using mid-interval values, and discuss estimating the median via linear
interpolation. Now move on to continuous data: we must be careful about class boundaries. Why not use
the example on p.27?
The range will need little introduction, e.g. Cookson Enterprises salaries above have a range of £990000.
Candidates will be familiar with standard deviation from previous work but now need to learn some new
names. Take a fairly small data set, e.g. a sample of the Mayfield KS2 levels. We aim at something that
shows how far on average data points are from the middle, i.e. the mean. Mean deviation is always zero:
this motivates MAD, but MAD involves the modulus function which is not easy to work with. How else
can we make the deviations positive? Square them: this motivates msd and rmsd. The new MEI notation
Page 2
M.E.I. STRUCTURED MATHEMATICS
UNIT S1
involves Sxx which is
Calculating and interpreting variance
and standard deviation: notation s2
and s, divisor n – 1 (D13)
Statistical functions on a calculator
(D14)
Ch.1 p.36-39
Ex.1E Q1,4,7,10
Completion of above, including
combining sets of data
Ex.1E Q2,3,6,11,13
Identifying outliers using 2s (D16)
Ch.1 p.40-41
Ex.1E Q8,9
1.6. Linear coding
Effect of a linear transformation of
the form y = a + bx on mean and
standard deviation (D14)
Completion of above
Ch.1 p.45-48
Ex.1F Q2;
Q6 [MEI]
S1/2 DATA PRESENTATION
2.1. Charts
Display of discrete data using a
vertical line chart (D4)
Pie charts and bar charts for
categorical data (D3)
Comparative pie charts
2.2. Histograms
Displaying continuous data using
 x  x
2
: it is not difficult to show that this is equivalent to
x
2
2
 nx .
Then msd = Sxx ÷ n (the mean of the squares minus the mean squared) and rmsd is the square root. Yes, I
know we think this is standard deviation, but from now on, it isn’t…
…because this has a new (and strictly more correct) definition at A level. We should not really divide by
n, because (as we discovered when discussing mean deviation)
  x  x  = 0 and so once we have found
n – 1 of the deviations, the nth one follows on because of this fact. So we should divide by n – 1. Using
S
S xx
the MEI notation, then, variance s 2  xx and standard deviation s 
. Calculate these for the
n 1
n 1
small data set above, and then ensure that the class can use their own calculators to produce the right
answers.
Provide examples of the following: rmsd and standard deviation for a large data set, where summary
statistics are given (perhaps do this in advance for one of the Mayfield variables); using this data set,
investigate the truth of the statement “standard deviation ≈ range ÷ 6”; rmsd and standard deviation for a
distribution and a grouped distribution (investigate use of calculator); combining sets of data whose
summary statistics are given (find combined rmsd and standard deviation).
Outliers are unusual or “freak” values which differ greatly in magnitude from the majority of the data
values. But just how large or small does a value have to be to be an outlier? There is no simple answer to
this question, but one rule derives from considering the fact that, in a symmetric population, about 95% of
the values will be within two standard deviations of the mean. Any values outside this range are outliers
and should be investigated as possibly not belonging to the data set.
Calculate the mean and standard deviation of the data set 1, 2, 3, 4, 5: then repeat the calculation for 1001,
1002, 1003, 1004, 1005; 2, 4, 6, 8, 10; 12, 14, 16, 18, 20, etc. Generalise this: if a new data set is obtained
via a linear relationship of the form y = a + bx, the mean scales in the same way, but the standard deviation
is not affected by the value of a.
Ex.1G (mixed)
Q5-11 [MEI]
Assessment 1
Ch.2 p.56-60
Ex.2A Q1 (orally),
rest as self-study
Ch.2 p.62-69
Ex.2B Q3,4,5 (no need
E.g. discrete data set above (Mayfield KS2 levels). What is the most appropriate way to display it?
Hopefully a vertical line chart (stick graph) will be suggested: it shows quite clearly that the levels can
only take integer values.
Categorical data (e.g. favourite subject) is better displayed using a bar chart. Take a look at the examples
on p.58/59 illustrating compound and multiple bar charts.
Pie charts may be used for many types of data and are used to show the size of constituent parts relative to
the whole: comparative pie charts (area proportional to total frequency) may be used to show a change in
the “whole” over time.
Histograms are used to display continuous data. Brainstorm the differences between histograms and bar
charts/frequency charts (p.64): no gaps between the bars; area proportional to frequency; vertical axis
Page 3
M.E.I. STRUCTURED MATHEMATICS
histograms with equal class intervals
and unequal class intervals (D5)
2.3. Quartiles
Quartiles for large and small data
sets: interquartile range
to worry about means)
Ch.2 p.71-74
Ex.2C Q1 (some), Q3
Displaying and interpreting data on a
box and whisker plot (D7)
Outliers as data more than 1.5 × IQR
beyond quartiles (D16)
Displaying and interpreting a
cumulative frequency distribution
(D8)
Percentiles (D12)
Completion of above
S1/3 PROBABILITY
3.1. Introduction
Terminology: trial; event; classical
definition; experimental probability
(u1)
Ch.2 p.74-77
Ex.2C Q5
Ch.3 p.86-91
Ch.3 p.92-95
Ex.3A Q6 (at least)
General rule for P(A  B) (u9)
Tree diagrams (u8)
Independent events: AND rule (u5,
labelled “frequency density” (see example on p.65). Discuss using histograms to model discrete data via a
“continuity correction”. Take care with class boundaries: what’s different about age?
Standard deviation is an appropriate measure of spread when the distribution is symmetrical. What if it is
not? Discuss how to find IQR: the method used to find the quartiles in the textbook is the one students
should have met, which is to find the medians of the two halves of the data set (ignoring the median if
there is an odd number of pieces of data). Investigate this using a small data set and a graphical calculator
and Excel: do they get it right? Also investigate the truth of the statement “standard deviation  ¾ × IQR”.
This should be familiar: mention its obvious use in comparing two data sets, and the different shapes that
can be expected from different distributions (negative and positive skew, etc.).
If we have described a data set via its median and IQR, rather than via mean and s.d., we need to establish
a new definition of outlier: the rule described here was developed by a statistician called John Tukey and
should be familiar: outliers are more than 1.5 × IQR beyond the nearest quartile.
Although the construction of a cumulative frequency curve should be familiar, it bears repeating in capital
letters that the points should be plotted at the UPPER CLASS BOUNDARIES. A box plot can be drawn
from a cumulative frequency curve, but the extreme values will be unknown, so the whiskers will end in
arrows drawn to the 10th and 90th percentiles.
Ex.2D (mixed)
Q4,6,7,10 [MEI]
Test 1 (chs 1 & 2)
The idea of the complementary event
A′ (u2)
Expected frequency of an event (u4)
3.2. Combined events
Use of sample space diagrams (u3)
Use of Venn diagrams: ,  (u10)
Mutually exclusive events: OR rule
(u5,u6)
UNIT S1
Ch.3 p.98-103
Ex.3B Q2,4,5,8;
It would be a good idea to run through the precise terminology: we perform an experiment or trial, which
has a finite number of outcomes: an event is a subset of the set of outcomes; the probability of an event is
the number of outcomes it contains (“favourable”) divided by the total number of outcomes. This is easy
to do with a fair dice, but what if the dice is not fair? The best we can do is do an experiment, where we
throw the dice a large number of times, and produce relative frequencies or experimental probabilities.
We can use the classical definition of probability to prove some results taken for granted: probabilities lie
between 0 and 1 (mention certainty and impossibility, and give examples of events) and P(A′) = 1 – P(A)
(draw two cards without replacement from an ordinary pack: find the probability that they are not both
kings).
How many sixes would you expect to throw if you threw a fair dice 1000 times? The formula is np.
Simple combined events: what is the probability of getting a total score of more than 7 when two dice are
thrown? Most efficient method: sample space diagram, and count outcomes.
Venn diagrams provide an efficient way of dealing with probability problems: discuss the general idea
first, e.g. by considering the sets of listeners to Saga 105.7 and Radio WM. Some will listen to both, and
most young people to neither. Introduce the universal set and the notation  and , and the idea of
mutually exclusive events, e.g. universal set = “integers between 11 and 20”, A = “even numbers”, B =
“prime numbers”. What is P(A  B)? This motivates the OR rule.
Now change the universal set to “integers between 1 and 10”. Are A and B still mutually exclusive? Does
the OR rule still apply? Investigate a replacement which takes the “double counting” into account.
A bag contains 7 red marbles and 4 white marbles. Two marbles are selected with replacement. What is
the probability they are both the same colour? This motivates the AND rule. This can be combined with
Page 4
M.E.I. STRUCTURED MATHEMATICS
u7)
Completion of above
3.3. Conditional probability
Idea of conditional probability:
calculation by formula. Examples
involving Venn diagrams, tree
diagrams and sample space diagrams
(u11)
Q13,14 [MEI]
“at least” situations, e.g. I throw a fair dice 20 times. What is the probability that I get at least one six?
These events are independent: we can probably have a stab at a rather unsatisfactory definition of
independence, but we’ll be able to do better shortly. What if the marbles were selected without
replacement?
Ch.3 p.107-113
Ex.3C Q1,3,7;
Q5,8,11,12,13 [MEI]
Throw a dice behind a screen: who thinks the score was odd? Now inform the class that the score is prime.
Who now thinks the score was odd? More, hopefully. Illustrate via a Venn diagram: events O (odd) and P
(prime). P(O) = ½, but once we know that P has occurred, we only have three outcomes left (2, 3 and 5),
of which two are prime. This is P(O|P), or “P of O given P”, and it is ⅔. These events are not independent:
two events A and B are independent if P(A|B) = P(A), and if P(B|A) = P(B).
Go back to the example of the marbles in the bag, and express the probabilities in terms of events such as
R1 (a red marble the first time), R2, W1 and W2. The probabilities on the second set of branches are such
things as P(R2|R1). The probability that both marbles drawn are red is P(R 1  R2) = P(R1) × P(R2|R1),
multiplying along the branches. Writing this in terms of simpler events, P(A  B) = P(A) × P(B|A), which
gives a formula for P(B|A) and a definition of independence: P(B|A) = P(B)  P(A  B) = P(A) × P(B).
Do plenty of examples: this topic is difficult. The students should look for the cue words “if” or “given”.
Ch.4 p.118-124
Ex.4A Q2,3,5,8
Let’s start simple, with some board games. Game A: throw a dice and move that number of squares. “The
number of squares moved in a turn” is a variable because it can take different values, which depend on
chance. Hence “random variable”. Call it X. What is P(X = 3)? Tabulate the possible values of X and their
probabilities: this is the distribution of X. Display it using a vertical line graph, and describe the shape.
Game B: as above, except that a player is allowed a second throw of the dice if a six is thrown: tabulate
and draw as above. Game C: throw two dice and add the scores (Ex.4A Q2 covers differences).
Formalising: X is a discrete random variable if it takes values r1, r2,..., rn with probabilities p1, p2,..., pn. It
has to take at least one of these values, and they are mutually exclusive, so p1 + p2 + ... + pn = 1.
Other examples: using the bags as above, X = “no. of red marbles drawn”; giving the probability
distribution algebraically, e.g. P(X = r) = cr2. r = 1,2,3. This is the probability function. Find c.
Ch.4 p.126-130
Ex.4B Q1 (orally?),
Q3,6,7 (at least)
Expectation: play roulette: the wheel is numbered 1 to 37. If the ball lands on 10, you win £10: if it lands
on a number ending in 5, you win £5; otherwise nothing. How much should you be charged for a turn?
Play 3700 times, work out the expected frequencies and hence the mean, and demonstrate independence
from the number of turns. Or extend Ex.4A Q5 using three (or four) fair coins, throw 2000 times, etc. All
this generalises into E( X )   ri pi or E( X )   rP( X  r ) . It need not be a possible value of X., and can
Definition of independence (u12)
Completion of above
S1/4 DISCRETE RANDOM
VARIABLES
4.1. Introduction
The idea of a d.r.v.: display of values
in a vertical line chart; notation and
conditions; probability from tables or
algebraic definition (R1,R2)
Completion of above
4.2. Expectation and variance
Simple cases of calculations of the
expectation E(X) and variance
Var(X): the meaning of E(X) (R3,
R4)
UNIT S1
often be arrived at “by symmetry”.
How spread out is our probability distribution? By analogy with the work on data sets, we can introduce
the variance of X as the msd of X, i.e. the mean of  x    where   E( X ) . As with data sets, this can be
2
proved to be equivalent to
Page 5
r
i
2
pi   2 or E( X 2 )   2 . Read the examples on p.128/129 for methods of
M.E.I. STRUCTURED MATHEMATICS
UNIT S1
calculation, and provide plenty for the students to do, although former S2 examples are likely to be
beyond the standard required.
Completion of above
Completion of above
S1/5 FURTHER PROBABILITY
5.1. Successive events
n! as the number of ways of
arranging n distinct objects in a line
(H5)
5.2. Permutations and
combinations
n
Pr as the number of ways to order r
things selected from n
n
Cr as the number of ways to select r
things from n (H4)
Investigation of binomial coefficients
Calculating probabilities in less
simple cases
S1/6 BINOMIAL DISTRIBUTION
6.1. Introduction
Recognising binomial situations and
the parameter p; notation (H1,H2)
Deriving the probability function:
calculations of probabilities (H3)
“At least” situations
6.2. Using the distribution
Calculation of expected frequencies
(H7)
Ex.4C (mixed)
Q3,6,7-10 [MEI]
Assessment 2 (chs 3 &
4); Test 2 (chs 3 & 4)
Ch.5 p.138-140
Ex.5A (some orally)
Pick a few names: rearrange their letters. How many ways? Chinese people have it easy (Ke, Gao, Chan)
and it won’t take long to establish the well-known fact that n unlike objects can be arranged in n! different
ways. Evaluate a few factorials on calculators (how far can you go?) and do some cancelling, e.g. 6! ÷ 4!.
Ch.5 p.142-146
Ex.5B Q2,4,6 (orally)
Sports Day: your commentator has to read out the first three runners in the 100m. In how many ways can
this be done? Easy: 8 × 7 × 6 = 336. But can we write this using factorials? The expression is the start of
8!, and our work on cancelling above suggests 8! ÷ 5!. Generalise: the number of ways to order r things
selected from n is nPr = n! ÷ (n – r)!.
Sports Day: the 100m is about to start, when along comes GHC, who chooses three runners at random to
help him do something very important. Then order does not matter, so ABC = ACB = BAC etc. and the
number of permutations found above must be divided by 3!, which is the number of ways of ordering A,
n!
B, C. Hence we find n Cr 
.
 n  r ! r !
Ex.5B Q9,11;
Q12 [MEI]
Ch.6 p.153-157
Ex.6A Q1,3,5,7,9
Ch.6 p.158-162
Ex.6B
Q8-13 [MEI]
There are plenty of possible examples, e.g. selecting people from a squad for a football team, committees.
Optional: see p.145/146.
Use the classical definition of probability to tackle problems such as these by counting outcomes:
A box of one dozen eggs contains one that is bad. If three eggs are chosen at random, what is the
probability that one of them is bad.
Five cards are dealt without replacement from a standard pack of 52 cards. Find the probability that
exactly 3 of the 5 cards are hearts.
Rats run down a maze, looking suspiciously like Pascal’s Triangle. At each junction they can turn either
left or right with equal probability. Label the exits 0,1,2,3,4 so that the exit number counts the number of
right turns. This defines a discrete random variable: find the probabilities P(X = r), which are 1/16 × 4Cr
(choosing r right turns from the possible 4). Now genetically modify the rats so that they turn right with
probability ⅔, and find P(X = r). Generalise further: let ⅔ become p, and the maze become n deep.
Define the binomial distribution X ~ B(n, p) and present some binomial situations, e.g. tossing four coins,
throwing four dice and counting sixes, picking hearts from a pack of cards with replacement, defective
light bulbs in a batch, “at least” situations, etc. (leave cumulative binomial tables until S1/7).
Take one of the situations above, e.g. throwing the four dice. Repeat the experiment 7776 times, and find
the expected frequencies by multiplying the probabilities by 7776.
Find the mean number of sixes per throw: the answer should be ⅔. Do this via a calculation for E(X).
Page 6
M.E.I. STRUCTURED MATHEMATICS
The expectation of B(n, p) (H6)
Completion of above
S1/7 HYPOTHESIS TESTING
7.1. Introduction
The process of hypothesis testing and
the associated vocabulary and
notation: null and alternative
hypotheses (H8,H9)
Significance levels (H10)
Drawing the correct conclusion
(H12)
UNIT S1
Generalise (p.159 issues a challenge for a proof). What is the modal number of sixes? Why?
Further examples will be required. Variance is not in the specification at this stage.
Assessment 3
(chs 5 & 6)
Ch.7 p.167-175
Ex.7A (all);
Q2,7 [MEI]
We could begin with an experiment, e.g. “Mind reading” or “Smarties” (p.179), which can lead into the
correct structure of the test (H0: p = … and H1: p < … or p > …): the idea of testing the tail (test a coin by
throwing it 2000 times: 1000 heads are obtained: find the probability: “test the result obtained and all
other results as least as favourable to the alternative hypothesis”); use of cumulative binomial tables (and
use of e.g. P(X ≥ 11) = 1 – P(X ≤ 10)); significance level (statistical “proof”: what strength of evidence do
we require before we take the big step of rejecting H0? Why would a drug company use 0.01%? Why
should we set the significance level before we collect the data?); the idea of rejecting H0 if the probability
is “significantly small”; the correct language in the conclusion (“sufficient evidence that the coin is biased
at the 5% level”, “insufficient evidence…”).
Completion of above
7.2. Critical values and regions
Identifying the critical and
acceptance regions (H11)
Ch.7 p.177-180
Ex.7B Q1-4 [MEI]
7.3. Two-tail tests
When to apply a 2-tail test (H13)
Symmetrical and asymmetrical cases
Ch.7 p.182-185
Ex.7C (as many as
possible); Q11 [MEI]
Imagine you are setting up a hypothesis test for a non-statistician. You will not use the language and
terminology, but instead specify sets of results from which the conclusions “sufficient evidence…” and
“insufficient evidence…” can be drawn. The set of values for which H 0 is rejected is the critical region,
and the complement is the acceptance region. Exam questions sometimes require these to be marked on a
number line. The Bob Francis spreadsheet “Hypothesis Tester” illustrates this very well.
Is a coin biased? Set the test up – but we do not know if the coin is biased in favour of heads or tails.
Therefore if p = P(head), our alternative hypothesis is H 1: p ≠ ½, and this is a two-tail test. We can
establish the probability here by symmetry, but what about an asymmetric case, e.g. p ≠ 0.2? We test the
one tail we have against half the significance level. Cover finding critical regions in this case as well.
Completion of above
Assessment 4
Page 7