Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
M.E.I. STRUCTURED MATHEMATICS UNIT S1 Unit S1: Scheme of Work For first teaching in September 2004 Unit Title Statistics 1 (4766) is an AS unit. Objectives To enable students to build on and extend the data handling and sampling techniques they have learned at GCSE. To enable students to apply theoretical knowledge to practical situations using simple probability models. To give students insight into the ideas and techniques underlying hypothesis testing. Assessment Examination (72 marks) 1 hour 30 minutes. The examination paper has two sections: Section A: five questions, each worth at most 8 marks. Section Total: 36 marks. Section B: two questions, each worth about 18 marks. Section Total: 36 marks. Calculators A graphical calculator is allowed in the examination for this module. Assumed Knowledge Candidates are expected to know the content of Intermediate Tier GCSE. In addition, they need to know the binomial expansion as covered in C1. Textbook “Statistics 1” by Eccles, Francis, Graham and Porkess. Scheme of Work Each of the blocks below is roughly one double period. The module should take about 34 double periods to cover, plus time spent on past papers for revision. Exercises from the textbook are suggestions only. These will usually be started in class and students expected to finish them at home. Lessons should start with reference to the last exercise. This scheme of work is a “working document” and comments and alterations are very welcome, especially if you have a new or exciting way to teach a topic. Page 1 M.E.I. STRUCTURED MATHEMATICS UNIT S1 Topic and learning objectives S1/1 EXPLORING DATA 1.1. Introduction Motivation: the idea of a statistical investigation. Mention the idea of population and (random) sample (p5, p6) Types of data: categorical, discrete, continuous (D1) Idea of a distribution: classification using skewness (D9) References Teaching points Ch.1 p.1-6, 12-13 1.2. Stem-and-leaf diagrams Sorted stem-and-leaf diagrams: how to “expand”, keys (D6) Ch.1 p.6-8 Ex.1A Q1,2,5 (orally), Q7,9 1.3. Measures of central tendency Mean, median and mode; midrange (new); their relative usefulness (D10, D11) Ch.1 p.13-16 Ex.1B (self-study) 1.4. Frequency distributions Understanding and constructing frequency tables for ungrouped data (D2): calculating measures of central location for such data (D10) Grouped data: choice of class boundaries for discrete and continuous data: estimating the mean (D2,D10) Ch.1 p.17-19 Ex.1C (self-study) What is statistics? A possible definition is “collection, organisation and analysis of information”. Discuss some of the processes behind statistics: we may collect data to settle a question, given in the form of a hypothesis. We could try to collect data from every element of a population (a census) but we are much more likely to take a sample: discuss the process of taking a random sample. Use the Internet and find a large data file, e.g. Mayfield High School, and discuss the variables. Ask the students to use the file to provide examples of categorical data (e.g. favourite TV programme), continuous data (e.g. height) and discrete data (e.g. KS2 levels). The data file contains a great deal of raw data, e.g. the KS2 levels in Mathematics. How might we analyse this data? Organise it by tallying, and then draw a vertical line chart: this shows the distribution of the data. Discuss the possible shapes of distributions: symmetrical, unimodal, negative skew (mostly high values), positive skew (mostly low), bimodal, uniform. Take another data set, e.g. % examination scores (Autograph can be persuaded to generate something, or there are files in the Maths area from which the names can be removed). How could we display this without losing the information? Hopefully someone will suggest a stem-and-leaf diagram: the MEI definition requires a “scale” or key, and that the leaves are sorted. If there are too many values to each “stem”, the diagram can be “stretched” (see p.8). This should not need much time: review mean (mention mean age: age is always rounded down, so add 0.5), median and mode, and introduce the idea of midrange. Discuss the relative merits of each, and the relative positions in skewed distributions: mean is good for symmetric distributions but will be skewed if the data set contains very large or very small values, e.g. Cookson Enterprises has many workers earning £10,000 per year, and the Managing Director who earns £1m: calculate mean, median, mode and midrange. This example is a little positively-skewed distribution. Perhaps use the Mayfield KS2 levels to construct some discrete distributions (or e.g. survey no. of brothers and sisters) and calculate means, medians (perhaps via a column of cumulative frequencies, which Excel could put in), modes and midranges: discuss their usefulness. 1.5. Measures of spread Range: disadvantages (D12) Calculating and interpreting mean absolute deviation, mean square deviation and root mean square deviation (D13) Ch.1 p.31-36 Ch.1 p.22-29 Ex.1D Q2,6 Discuss need for grouping (see p.23): take a fairly large discrete data set, e.g. the examination marks mentioned above, and discuss choice of class boundaries (relate to stem-and-leaf diagram above, if drawn): estimate the mean using mid-interval values, and discuss estimating the median via linear interpolation. Now move on to continuous data: we must be careful about class boundaries. Why not use the example on p.27? The range will need little introduction, e.g. Cookson Enterprises salaries above have a range of £990000. Candidates will be familiar with standard deviation from previous work but now need to learn some new names. Take a fairly small data set, e.g. a sample of the Mayfield KS2 levels. We aim at something that shows how far on average data points are from the middle, i.e. the mean. Mean deviation is always zero: this motivates MAD, but MAD involves the modulus function which is not easy to work with. How else can we make the deviations positive? Square them: this motivates msd and rmsd. The new MEI notation Page 2 M.E.I. STRUCTURED MATHEMATICS UNIT S1 involves Sxx which is Calculating and interpreting variance and standard deviation: notation s2 and s, divisor n – 1 (D13) Statistical functions on a calculator (D14) Ch.1 p.36-39 Ex.1E Q1,4,7,10 Completion of above, including combining sets of data Ex.1E Q2,3,6,11,13 Identifying outliers using 2s (D16) Ch.1 p.40-41 Ex.1E Q8,9 1.6. Linear coding Effect of a linear transformation of the form y = a + bx on mean and standard deviation (D14) Completion of above Ch.1 p.45-48 Ex.1F Q2; Q6 [MEI] S1/2 DATA PRESENTATION 2.1. Charts Display of discrete data using a vertical line chart (D4) Pie charts and bar charts for categorical data (D3) Comparative pie charts 2.2. Histograms Displaying continuous data using x x 2 : it is not difficult to show that this is equivalent to x 2 2 nx . Then msd = Sxx ÷ n (the mean of the squares minus the mean squared) and rmsd is the square root. Yes, I know we think this is standard deviation, but from now on, it isn’t… …because this has a new (and strictly more correct) definition at A level. We should not really divide by n, because (as we discovered when discussing mean deviation) x x = 0 and so once we have found n – 1 of the deviations, the nth one follows on because of this fact. So we should divide by n – 1. Using S S xx the MEI notation, then, variance s 2 xx and standard deviation s . Calculate these for the n 1 n 1 small data set above, and then ensure that the class can use their own calculators to produce the right answers. Provide examples of the following: rmsd and standard deviation for a large data set, where summary statistics are given (perhaps do this in advance for one of the Mayfield variables); using this data set, investigate the truth of the statement “standard deviation ≈ range ÷ 6”; rmsd and standard deviation for a distribution and a grouped distribution (investigate use of calculator); combining sets of data whose summary statistics are given (find combined rmsd and standard deviation). Outliers are unusual or “freak” values which differ greatly in magnitude from the majority of the data values. But just how large or small does a value have to be to be an outlier? There is no simple answer to this question, but one rule derives from considering the fact that, in a symmetric population, about 95% of the values will be within two standard deviations of the mean. Any values outside this range are outliers and should be investigated as possibly not belonging to the data set. Calculate the mean and standard deviation of the data set 1, 2, 3, 4, 5: then repeat the calculation for 1001, 1002, 1003, 1004, 1005; 2, 4, 6, 8, 10; 12, 14, 16, 18, 20, etc. Generalise this: if a new data set is obtained via a linear relationship of the form y = a + bx, the mean scales in the same way, but the standard deviation is not affected by the value of a. Ex.1G (mixed) Q5-11 [MEI] Assessment 1 Ch.2 p.56-60 Ex.2A Q1 (orally), rest as self-study Ch.2 p.62-69 Ex.2B Q3,4,5 (no need E.g. discrete data set above (Mayfield KS2 levels). What is the most appropriate way to display it? Hopefully a vertical line chart (stick graph) will be suggested: it shows quite clearly that the levels can only take integer values. Categorical data (e.g. favourite subject) is better displayed using a bar chart. Take a look at the examples on p.58/59 illustrating compound and multiple bar charts. Pie charts may be used for many types of data and are used to show the size of constituent parts relative to the whole: comparative pie charts (area proportional to total frequency) may be used to show a change in the “whole” over time. Histograms are used to display continuous data. Brainstorm the differences between histograms and bar charts/frequency charts (p.64): no gaps between the bars; area proportional to frequency; vertical axis Page 3 M.E.I. STRUCTURED MATHEMATICS histograms with equal class intervals and unequal class intervals (D5) 2.3. Quartiles Quartiles for large and small data sets: interquartile range to worry about means) Ch.2 p.71-74 Ex.2C Q1 (some), Q3 Displaying and interpreting data on a box and whisker plot (D7) Outliers as data more than 1.5 × IQR beyond quartiles (D16) Displaying and interpreting a cumulative frequency distribution (D8) Percentiles (D12) Completion of above S1/3 PROBABILITY 3.1. Introduction Terminology: trial; event; classical definition; experimental probability (u1) Ch.2 p.74-77 Ex.2C Q5 Ch.3 p.86-91 Ch.3 p.92-95 Ex.3A Q6 (at least) General rule for P(A B) (u9) Tree diagrams (u8) Independent events: AND rule (u5, labelled “frequency density” (see example on p.65). Discuss using histograms to model discrete data via a “continuity correction”. Take care with class boundaries: what’s different about age? Standard deviation is an appropriate measure of spread when the distribution is symmetrical. What if it is not? Discuss how to find IQR: the method used to find the quartiles in the textbook is the one students should have met, which is to find the medians of the two halves of the data set (ignoring the median if there is an odd number of pieces of data). Investigate this using a small data set and a graphical calculator and Excel: do they get it right? Also investigate the truth of the statement “standard deviation ¾ × IQR”. This should be familiar: mention its obvious use in comparing two data sets, and the different shapes that can be expected from different distributions (negative and positive skew, etc.). If we have described a data set via its median and IQR, rather than via mean and s.d., we need to establish a new definition of outlier: the rule described here was developed by a statistician called John Tukey and should be familiar: outliers are more than 1.5 × IQR beyond the nearest quartile. Although the construction of a cumulative frequency curve should be familiar, it bears repeating in capital letters that the points should be plotted at the UPPER CLASS BOUNDARIES. A box plot can be drawn from a cumulative frequency curve, but the extreme values will be unknown, so the whiskers will end in arrows drawn to the 10th and 90th percentiles. Ex.2D (mixed) Q4,6,7,10 [MEI] Test 1 (chs 1 & 2) The idea of the complementary event A′ (u2) Expected frequency of an event (u4) 3.2. Combined events Use of sample space diagrams (u3) Use of Venn diagrams: , (u10) Mutually exclusive events: OR rule (u5,u6) UNIT S1 Ch.3 p.98-103 Ex.3B Q2,4,5,8; It would be a good idea to run through the precise terminology: we perform an experiment or trial, which has a finite number of outcomes: an event is a subset of the set of outcomes; the probability of an event is the number of outcomes it contains (“favourable”) divided by the total number of outcomes. This is easy to do with a fair dice, but what if the dice is not fair? The best we can do is do an experiment, where we throw the dice a large number of times, and produce relative frequencies or experimental probabilities. We can use the classical definition of probability to prove some results taken for granted: probabilities lie between 0 and 1 (mention certainty and impossibility, and give examples of events) and P(A′) = 1 – P(A) (draw two cards without replacement from an ordinary pack: find the probability that they are not both kings). How many sixes would you expect to throw if you threw a fair dice 1000 times? The formula is np. Simple combined events: what is the probability of getting a total score of more than 7 when two dice are thrown? Most efficient method: sample space diagram, and count outcomes. Venn diagrams provide an efficient way of dealing with probability problems: discuss the general idea first, e.g. by considering the sets of listeners to Saga 105.7 and Radio WM. Some will listen to both, and most young people to neither. Introduce the universal set and the notation and , and the idea of mutually exclusive events, e.g. universal set = “integers between 11 and 20”, A = “even numbers”, B = “prime numbers”. What is P(A B)? This motivates the OR rule. Now change the universal set to “integers between 1 and 10”. Are A and B still mutually exclusive? Does the OR rule still apply? Investigate a replacement which takes the “double counting” into account. A bag contains 7 red marbles and 4 white marbles. Two marbles are selected with replacement. What is the probability they are both the same colour? This motivates the AND rule. This can be combined with Page 4 M.E.I. STRUCTURED MATHEMATICS u7) Completion of above 3.3. Conditional probability Idea of conditional probability: calculation by formula. Examples involving Venn diagrams, tree diagrams and sample space diagrams (u11) Q13,14 [MEI] “at least” situations, e.g. I throw a fair dice 20 times. What is the probability that I get at least one six? These events are independent: we can probably have a stab at a rather unsatisfactory definition of independence, but we’ll be able to do better shortly. What if the marbles were selected without replacement? Ch.3 p.107-113 Ex.3C Q1,3,7; Q5,8,11,12,13 [MEI] Throw a dice behind a screen: who thinks the score was odd? Now inform the class that the score is prime. Who now thinks the score was odd? More, hopefully. Illustrate via a Venn diagram: events O (odd) and P (prime). P(O) = ½, but once we know that P has occurred, we only have three outcomes left (2, 3 and 5), of which two are prime. This is P(O|P), or “P of O given P”, and it is ⅔. These events are not independent: two events A and B are independent if P(A|B) = P(A), and if P(B|A) = P(B). Go back to the example of the marbles in the bag, and express the probabilities in terms of events such as R1 (a red marble the first time), R2, W1 and W2. The probabilities on the second set of branches are such things as P(R2|R1). The probability that both marbles drawn are red is P(R 1 R2) = P(R1) × P(R2|R1), multiplying along the branches. Writing this in terms of simpler events, P(A B) = P(A) × P(B|A), which gives a formula for P(B|A) and a definition of independence: P(B|A) = P(B) P(A B) = P(A) × P(B). Do plenty of examples: this topic is difficult. The students should look for the cue words “if” or “given”. Ch.4 p.118-124 Ex.4A Q2,3,5,8 Let’s start simple, with some board games. Game A: throw a dice and move that number of squares. “The number of squares moved in a turn” is a variable because it can take different values, which depend on chance. Hence “random variable”. Call it X. What is P(X = 3)? Tabulate the possible values of X and their probabilities: this is the distribution of X. Display it using a vertical line graph, and describe the shape. Game B: as above, except that a player is allowed a second throw of the dice if a six is thrown: tabulate and draw as above. Game C: throw two dice and add the scores (Ex.4A Q2 covers differences). Formalising: X is a discrete random variable if it takes values r1, r2,..., rn with probabilities p1, p2,..., pn. It has to take at least one of these values, and they are mutually exclusive, so p1 + p2 + ... + pn = 1. Other examples: using the bags as above, X = “no. of red marbles drawn”; giving the probability distribution algebraically, e.g. P(X = r) = cr2. r = 1,2,3. This is the probability function. Find c. Ch.4 p.126-130 Ex.4B Q1 (orally?), Q3,6,7 (at least) Expectation: play roulette: the wheel is numbered 1 to 37. If the ball lands on 10, you win £10: if it lands on a number ending in 5, you win £5; otherwise nothing. How much should you be charged for a turn? Play 3700 times, work out the expected frequencies and hence the mean, and demonstrate independence from the number of turns. Or extend Ex.4A Q5 using three (or four) fair coins, throw 2000 times, etc. All this generalises into E( X ) ri pi or E( X ) rP( X r ) . It need not be a possible value of X., and can Definition of independence (u12) Completion of above S1/4 DISCRETE RANDOM VARIABLES 4.1. Introduction The idea of a d.r.v.: display of values in a vertical line chart; notation and conditions; probability from tables or algebraic definition (R1,R2) Completion of above 4.2. Expectation and variance Simple cases of calculations of the expectation E(X) and variance Var(X): the meaning of E(X) (R3, R4) UNIT S1 often be arrived at “by symmetry”. How spread out is our probability distribution? By analogy with the work on data sets, we can introduce the variance of X as the msd of X, i.e. the mean of x where E( X ) . As with data sets, this can be 2 proved to be equivalent to Page 5 r i 2 pi 2 or E( X 2 ) 2 . Read the examples on p.128/129 for methods of M.E.I. STRUCTURED MATHEMATICS UNIT S1 calculation, and provide plenty for the students to do, although former S2 examples are likely to be beyond the standard required. Completion of above Completion of above S1/5 FURTHER PROBABILITY 5.1. Successive events n! as the number of ways of arranging n distinct objects in a line (H5) 5.2. Permutations and combinations n Pr as the number of ways to order r things selected from n n Cr as the number of ways to select r things from n (H4) Investigation of binomial coefficients Calculating probabilities in less simple cases S1/6 BINOMIAL DISTRIBUTION 6.1. Introduction Recognising binomial situations and the parameter p; notation (H1,H2) Deriving the probability function: calculations of probabilities (H3) “At least” situations 6.2. Using the distribution Calculation of expected frequencies (H7) Ex.4C (mixed) Q3,6,7-10 [MEI] Assessment 2 (chs 3 & 4); Test 2 (chs 3 & 4) Ch.5 p.138-140 Ex.5A (some orally) Pick a few names: rearrange their letters. How many ways? Chinese people have it easy (Ke, Gao, Chan) and it won’t take long to establish the well-known fact that n unlike objects can be arranged in n! different ways. Evaluate a few factorials on calculators (how far can you go?) and do some cancelling, e.g. 6! ÷ 4!. Ch.5 p.142-146 Ex.5B Q2,4,6 (orally) Sports Day: your commentator has to read out the first three runners in the 100m. In how many ways can this be done? Easy: 8 × 7 × 6 = 336. But can we write this using factorials? The expression is the start of 8!, and our work on cancelling above suggests 8! ÷ 5!. Generalise: the number of ways to order r things selected from n is nPr = n! ÷ (n – r)!. Sports Day: the 100m is about to start, when along comes GHC, who chooses three runners at random to help him do something very important. Then order does not matter, so ABC = ACB = BAC etc. and the number of permutations found above must be divided by 3!, which is the number of ways of ordering A, n! B, C. Hence we find n Cr . n r ! r ! Ex.5B Q9,11; Q12 [MEI] Ch.6 p.153-157 Ex.6A Q1,3,5,7,9 Ch.6 p.158-162 Ex.6B Q8-13 [MEI] There are plenty of possible examples, e.g. selecting people from a squad for a football team, committees. Optional: see p.145/146. Use the classical definition of probability to tackle problems such as these by counting outcomes: A box of one dozen eggs contains one that is bad. If three eggs are chosen at random, what is the probability that one of them is bad. Five cards are dealt without replacement from a standard pack of 52 cards. Find the probability that exactly 3 of the 5 cards are hearts. Rats run down a maze, looking suspiciously like Pascal’s Triangle. At each junction they can turn either left or right with equal probability. Label the exits 0,1,2,3,4 so that the exit number counts the number of right turns. This defines a discrete random variable: find the probabilities P(X = r), which are 1/16 × 4Cr (choosing r right turns from the possible 4). Now genetically modify the rats so that they turn right with probability ⅔, and find P(X = r). Generalise further: let ⅔ become p, and the maze become n deep. Define the binomial distribution X ~ B(n, p) and present some binomial situations, e.g. tossing four coins, throwing four dice and counting sixes, picking hearts from a pack of cards with replacement, defective light bulbs in a batch, “at least” situations, etc. (leave cumulative binomial tables until S1/7). Take one of the situations above, e.g. throwing the four dice. Repeat the experiment 7776 times, and find the expected frequencies by multiplying the probabilities by 7776. Find the mean number of sixes per throw: the answer should be ⅔. Do this via a calculation for E(X). Page 6 M.E.I. STRUCTURED MATHEMATICS The expectation of B(n, p) (H6) Completion of above S1/7 HYPOTHESIS TESTING 7.1. Introduction The process of hypothesis testing and the associated vocabulary and notation: null and alternative hypotheses (H8,H9) Significance levels (H10) Drawing the correct conclusion (H12) UNIT S1 Generalise (p.159 issues a challenge for a proof). What is the modal number of sixes? Why? Further examples will be required. Variance is not in the specification at this stage. Assessment 3 (chs 5 & 6) Ch.7 p.167-175 Ex.7A (all); Q2,7 [MEI] We could begin with an experiment, e.g. “Mind reading” or “Smarties” (p.179), which can lead into the correct structure of the test (H0: p = … and H1: p < … or p > …): the idea of testing the tail (test a coin by throwing it 2000 times: 1000 heads are obtained: find the probability: “test the result obtained and all other results as least as favourable to the alternative hypothesis”); use of cumulative binomial tables (and use of e.g. P(X ≥ 11) = 1 – P(X ≤ 10)); significance level (statistical “proof”: what strength of evidence do we require before we take the big step of rejecting H0? Why would a drug company use 0.01%? Why should we set the significance level before we collect the data?); the idea of rejecting H0 if the probability is “significantly small”; the correct language in the conclusion (“sufficient evidence that the coin is biased at the 5% level”, “insufficient evidence…”). Completion of above 7.2. Critical values and regions Identifying the critical and acceptance regions (H11) Ch.7 p.177-180 Ex.7B Q1-4 [MEI] 7.3. Two-tail tests When to apply a 2-tail test (H13) Symmetrical and asymmetrical cases Ch.7 p.182-185 Ex.7C (as many as possible); Q11 [MEI] Imagine you are setting up a hypothesis test for a non-statistician. You will not use the language and terminology, but instead specify sets of results from which the conclusions “sufficient evidence…” and “insufficient evidence…” can be drawn. The set of values for which H 0 is rejected is the critical region, and the complement is the acceptance region. Exam questions sometimes require these to be marked on a number line. The Bob Francis spreadsheet “Hypothesis Tester” illustrates this very well. Is a coin biased? Set the test up – but we do not know if the coin is biased in favour of heads or tails. Therefore if p = P(head), our alternative hypothesis is H 1: p ≠ ½, and this is a two-tail test. We can establish the probability here by symmetry, but what about an asymmetric case, e.g. p ≠ 0.2? We test the one tail we have against half the significance level. Cover finding critical regions in this case as well. Completion of above Assessment 4 Page 7