Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
S1: Chapters 2-3 Data: Location and Spread Dr J Frost ([email protected]) Last modified: 5th September 2014 Types of variables π₯ In statistics, we can use a variable to represent some quantity, e.g. height, age. This could be qualitative (e.g. favourite colour) or quantitative (i.e. numerical). Variables are often used differently in statistics than they are in algebra. π₯ In statistics, this would mean: βSum over the values of the variable weβre collected (i.e. our data).β 2 types of variable: Discrete variables Continuous variables Has specific values. e.g. Shoe size, colour, ? website visits in an hour period, number of siblings, β¦ Can have any value in a range. e.g. Height, distance, ? weight, time, wavelength, β¦ Quartiles for large numbers of items What item do we use for each quartile when π = β― 1 1 3 Rule: Find or or of π. Then: 4 2 4 β’ If not whole, round up. β’ If whole, use this item and one after. LQ Median UQ 31 8?th 16?th 24?th 19 5?th 10? th 15?th 3rd and ? 4th 5?th ? 8th 7th and 11?th 6 14 ?2nd 4?th Under what circumstances do we not round? When we have a grouped frequency table involving a continuous?variable. Quickfire Quartiles LQ Median UQ 1? 2? 3? 1, 2, 3, 4 ? 1.5 ? 2.5 ? 3.5 1, 2, 3, 4, 5 1.5 ? 2? 4.5 ? 2? 3.5 ? 5? 1, 2, 3 1, 2, 3, 4, 5, 6 Notation for quartiles/percentiles Lower Quartile: π1? Median: π?2 Upper Quartile: π?3 57th Percentile: π57 ? Grouped Frequency Data Recap This type of data is continuous. ? Height π of bear (in metres) Frequency 0 β€ β < 0.5 4 0.5 β€ β < 1.2 20 1.2 β€ β < 1.5 5 1.5 β€ β < 2.5 11 Estimate of Mean: π₯= What does the variable π represent? Why the βbarβ (horizontal line) over the π? Why is our mean just an estimate? ππ₯ ?π = 46.75 ? 40 = 1.17π ? The midpoints of each interval. Theyβre effectively a sensible single value used to represent each interval. ? Itβs the sample mean of π₯. It indicates that our mean is just based on a sample, rather than the whole population. ? Because we donβt know the exact heights within each group. Grouping data loses information. ? Grouped Frequency Data Recap Height π of bear (in metres) Frequency 0 β€ β < 0.5 4 0.5 β€ β < 1.2 20 1.2 β€ β < 1.5 5 1.5 β€ β < 2.5 11 Modal class interval: 0.5 β€ β < 1.2 (βmodalβ means βmostβ) Median class interval: ? There are 40 items, so determine where 20th item is. 0.5 β€ β < 1.2 ? Using STATS mode on your calculator Height π of bear (in metres) Frequency 0 β€ β < 0.5 4 0.5 β€ β < 1.2 20 1.2 β€ β < 1.5 5 1.5 β€ β < 2.5 11 Work out the mean for this example first using proper workings. 1. Go to SETUP (SHIFT ο MODE). Press down for the second page of menu, and select STAT. You want Frequency βONβ. (Note that you wonβt have to do this again in future) 2. MODE ο STAT 3. Select 1-VAR (as there is only β1 variableβ here β frequency is not a variable!) 4. Enter your x values, pressing = after each one. Navigate to the top of your table to enter your frequencies. 5. Press AC to βbankβ your table. 6. SHIFT ο 1 for βSTATβ. Select each βSumβ or βVarβ. Once youβve selected a statistic to use, itβll appear in your calculation. Once you want to calculate the value, press =. Try entering Ξ£π₯ ÷ π. (For this example: 1.16875) 7. MODE ο COMP to go back to normal computation mode. Important note: Confusingly, your calculator means Ξ£ππ₯ when you enter Ξ£π₯. And π = Ξ£π, i.e. itβs interpreting the data as if it was listed out with duplicated. Warning: You still need to show working in the exam. Whatβs different about the intervals here? Weight of cat to nearest kg Frequency 10 β 12 7 13 β 15 2 16 β 18 9 19 β 20 4 There are GAPS between intervals! What interval does this actually represent? 10 β 12 9.5 β? 12.5 Lower class boundary Class width = 3 Upper class boundary ? Identify the class width Distance π travelled (in m) β¦ Time π taken (in seconds) 0 β€ d < 150 0β3 150 β€ d < 200 πβπ πππ β€ π < πππ 7 β 11 β¦ Lower class boundary = 200 ? Lower class boundary = 3.5? Class width = 10? Class width = 3 ? Weight π in kg β¦ Speed π (in mph) 10 β 20 10 β€ s < 20 21 β 30 20 β€ π < 29 ππ β ππ ππ β€ π¬ < ππ Lower class boundary = 30.5 ? Class width = 10? β¦ Lower class boundary = 29? Class width = 2 ? S2 β Chapters 2/3 Interpolation RECAP: Quartiles of Frequency Table Age of squirrel Frequency Cumulative Freq 1 5 5 2 8 13 3 11 24 4 5 29 29 π1 ? 29 squirrels. 4 = 7.25 So look at 8th squirrel. ? Occurs within second group, so π1 = 2 π2 ? 29 2 π3 ? 3 4 = 14.5 so use 15th squirrel. ? Occurs in third group, so π2 = 3 × 29 = 21.75 so use 22nd squirrel. Still in third group, so?π3 = 3 Estimating the median GCSE Question Answer = 13.5 + 8? = 21.5 Estimating the median At GCSE, you were only required to give the median class interval when dealing with grouped data. Now, we want to estimate a value within that class interval. Weight of cat to nearest kg Frequency 10 β 12 7 13 β 15 2 16 β 18 9 19 β 20 4 (Why not the 11.5 item?) Frequency up until this interval 9? ? 11 15.5kg ? ? Item number weβre interested in. ? 18 18.5kg ? Weight at start of interval. Median = 15.5 + Frequency at end of this interval Weight at end of interval. 2 9 × 3? = 16.17ππ Estimating other values Weight of cat to nearest kg 34th LQ = UQ = Frequency 10 β 12 7 13 β 15 2 16 β 18 9 19 β 20 4 5.5 9.5 + × 3? = 11.86ππ 7 7.5 15.5 + × ?3 = 18ππ 9 Percentile = 12.5 + 0.48 2 × ?3 = 13.22ππ You should have a sheet in front of you 1a 1000.5 + 1 × 29 500 = 1017.74 ? years 1b 1000.5 + 26 × 29 500 = 1448.78 ? years 1c 1700.5 + 10 × 35 300 = 1786.21 ? years 1d Interquartile Range: 1786.21 β 1017.74 = 768.47 years 2a 40 + 2b 300 + 2c 555 β 58.35 = 496.65cm ? 5.2 × 60 17 6.8 × 8 = 58.35cm ? 300 = 555cm? ? Exercises Page 34 Exercise 3A Q4, 5, 6 Page 36 Exercise 3B Q1, 3, 5 S2 β Chapters 2/3 Variance and Standard Deviation What is variance? Distribution of IQs in L6Ms5 Distribution of IQs in L6Ms4 πΉππππ’ππππ¦ 110 πΉππππ’ππππ¦ πΌπ Here are the distribution of IQs in two classes. Whatβs the same, and whatβs different? 110 πΌπ Variance Variance is how spread out data is. Variance, by definition, is the average squared distance from the mean. π 2 Ξ£ = π₯βπ₯ 2 π Distance from meanβ¦ Squared distance from meanβ¦ Average squared distance from meanβ¦ Simpler formula for variance Variance βThe mean of the squares minus the square of the mean (βmsmsmβ)β Ξ£π₯ 2 Ξ£π₯ ππππππππ = ? β ? π π 2 Standard Deviation π = ππππππππ The standard deviation can βroughlyβ be thought of as the average distance from the mean. Starter Calculate the variance and standard deviation of the following heights: 2cm 3cm 3cm 5cm 7cm Variance = 19.2 β 42 ?= 3.2cm Standard Deviation = 3.2 = 1.79cm ? Practice Find the variance and standard deviation of the following sets of data. 2 Variance = 2.67 ? 4 6 Standard Deviation = 1.63? 1 2 3 4 5 Variance = 2 ? Standard Deviation = 1.41 ? Extending to frequency/grouped frequency tables We can just mull over our mnemonic again: Variance: βThe mean of the squares minus the square of the means (βmsmsmβ)β 2 Ξ£ππ₯ Ξ£ππ₯ ? ππππππππ = ? β Ξ£π Ξ£π 2 Bro Tip: Itβs better to try and memorise the mnemonic than the formula itself β youβll understand whatβs going on better, and the mnemonic will be applicable when we come onto random variables in Chapter 8. Example Height π of bear (in metres) Frequency 0 β€ β < 0.5 4 0.5 β€ β < 1.2 20 1.2 β€ β < 1.5 5 1.5 β€ β < 2.5 11 ? Ξ£ππ₯ = 46.75 ? Ξ£ππ₯ 2 = 67.81 67.81 46.75 ? ππππππππ = β 40 40 ? Ξ£π = 40 2 = 0.33 Sometimes weβre helpfully given summed data: Shoe Size π Ξ£ππ‘ = 252 Frequency 10 7 11 2 12 9 13 4 Ξ£ππ‘ 2 = 2914 2914 252 ? ππππππππ = β 22 22 Ξ£π = 22 2 = 1.25 Exercises Page 40 Exercise 3C Q1, 2, 4, 6 Page 44 Exercise 3D Q1, 4, 5 Recap Ξ£π₯ = 10, Ξ£y = 20, Ξ£ππ₯ 2 = 1000, Ξ£ππ‘ 2 = 400, Ξ£π₯ 2 = 50, π2 = 6 ? Ξ£y 2 = 100, π2 = 4 ? Ξ£ππ₯ = 100, π2 = 0 ? Ξ£ππ‘ = 20, π 2 = 75? π=5 n=5 Ξ£π = 10 Ξ£π = 4 S2 β Chapters 2/3 Coding Starter What do you reckon is the mean height of people in this room? Now, stand on your chair, as per the instructions below. INSTRUCTIONAL VIDEO Is there an easy way to recalculate the mean based on your new heights? And the variance of your heights? Starter Suppose now after a bout of βstretching you to your limitsβ, youβre now all 3 times your original height. What do you think happens to the standard deviation of your heights? It becomes 3 times larger (i.e. your heights are 3 times as spread out!) ? What do you think happens to the variance of your heights? It becomes 9 times larger ? (Can you prove the latter using the formula for variance?) The point of coding Cost π₯ of diamond ring (£) £1010 £1020 £1030 £1040 £1050 We βcodeβ our variable using the following: π₯ β 1000 π¦= 10 New values π¦: £1 £2 £3 ? £4 Standard deviation of π¦ (ππ¦ ): thereforeβ¦ Standard deviation of π₯ (ππ₯ ): £5 π ? 10 ? 2 Finding the new mean/variance Old mean π₯ Old variance Coding New mean π¦ New variance 36 4 π¦ = π₯ β 20 16 ? ?4 36 ? ?4 π¦ = 2π₯ 72 16 35 4 π¦ = 3π₯ β 20 85 ? 36 ? 20 3 2 ?7 ?3 40 5 40 ? ?6 11 27 300 ? 125 ? π₯ π¦= 2 π₯ + 10 π¦= 3 π₯ β 100 π¦= 5 Exercises Page 26 Exercise 2E Q3, 4 Page 47 Exercise 3E Q2, 3, 5, 7 Chapters 2-3 Summary I have a list of 30 heights in the class. What item do I use for: β’ π1 ? β’ π2 ? β’ π3 ? ? 8th Between 15 ?th and 16th 23rd ? For the following grouped frequency table, calculate: Height π of bear (in metres) 0 β€ β < 0.5 4 0.5 β€ β < 1.2 20 1.2 β€ β < 1.5 5 1.5 β€ β < 2.5 11 a) The estimate mean: β = b) The estimate median: c) The estimate variance: (youβre given Ξ£πβ2 = 67.8125) Frequency 0.25 × 4 + 0.85 × 20 + β― 46.75 ? = 40 = 1.17π π‘π 3π π 40 16 0.5 + × 0.7 = 1.06π ? 20 67.8125 46.75 2 ? π = β 40 40 2 = 0.329 π‘π 3π π Chapters 2-3 Summary What is the standard deviation of the following lengths: 1cm, 2cm, 3cm π2 = 14 2 β 22 = 3 3 ? π= 2 3 The mean of a variable π₯ is 11 and the variance 4. π₯+10 The variable is coded using π¦ = 3 . What is: a) The mean of π¦? b) The variance of π¦? π = π? π πππ = π? A variable π₯ is coded using π¦ = 4π₯ β 5. For this new variable π¦, the mean is 15 and the standard deviation 8. What is: a) The mean of the original data? π = π? b) The standard deviation of the original data? ππ = ? π