Download General maths: Univariate statistics

General Maths Chapter 1 Univariate Data Chapter One – Univariate Data • Assigned textbook questions to be up to date by 2nd Feb • Exercise 1A • Exercise 1B • Exercise 1D • Holiday Homework Sheet 1A – Categorical Data Types of Data Data can be divided into two major groups • Categorical Data (Qualitative) • Numerical Data (Quantitative) 1A – Types of Data Categorical Data can be placed into one of 2 categories – 1A – Types of Data Numerical Data is in the form of numbers and can be either – 1A – Types of Data What type of data is……??? The number of students who walk to school. Numerical – Discrete The types of vehicles that each of your parents drive. Categorical – Nominal The sizes of pizza available at a pizza shop. Categorical – Ordinal The varying temperature outside throughout the day. Numerical - Continuous 1A – Working with Categorical Data Once Categorical Data has been collected, it is important to be able to summarise and display the data using – Frequency Tables Graphs – Bar Graphs or Dot Plots 1A – Working with Categorical Data Once Categorical Data has been collected, it is important to be able to calculate – Frequency The number of times that a particular thing has occurred Relative Frequency The number of times that a particular thing has occurred The total amount of all data recorded % Frequency The relative frequency × 100 1A – Working with Categorical Data Class Hair Colour Survey Gather Data of the students in the classroom and use it to: 1. Summarise data using a frequency distribution table 2. Represent data using a bar chart 3. Find the frequency of those with brown hair 4. Find the relative frequency of those with brown hair 5. Find the % frequency of those with brown hair 1. Summarise data using a frequency distribution table 2. Represent data using a bar chart 3. Find the frequency of those with brown hair 4. Find the relative frequency of those with brown hair 5. Find the % frequency of those with brown hair 1. Summarise data using a frequency distribution table Hair Colour Brown Blonde Black Red Other Tally Total 1. Summarise data using a frequency distribution table 2. Represent data using a bar chart 3. Find the frequency of those with brown hair 4. Find the relative frequency of those with brown hair Remember – 5. Find the % frequency of those with brown hair 2. Represent data using a bar chart Class Hair Colours Brown Blonde Red Black In a bar chart the bars don’t touch. Leave gaps! Other 1. Summarise data using a frequency distribution table 2. Represent data using a bar chart 3. Find the frequency of those with brown hair 4. Find the relative frequency of those with brown hair 5. Find the % frequency of those with brown hair 3. Find the frequency of those with brown hair This is just the total number of people with brown hair 4. Find the relative frequency of those with brown hair the total number of people with brown hair Find this using: the total number of people with surveyed 5. Find the % frequency of those with brown hair Find this using: The relative frequency × 100 1A – Working with Categorical Data Your Turn Eye Colour Survey Gather Data of the students in the classroom and use it to: 1. Summarise data using a frequency distribution table 2. Represent data using a bar chart 3. Find the frequency of those with brown hair 4. Find the relative frequency of those with brown hair 5. Find the % frequency of those with brown hair 1. Summarise data using a frequency distribution table 2. Represent data using a bar chart Your Turn 3. Find the frequency of those with brown hair 4. Find the relative frequency of those with brown hair 5. Find the % frequency of those with brown hair 1. Summarise data using a frequency distribution table Eye Colour Tally Total Now complete the rest in your workbooks Chapter 1A Now do Questions from your work record Q1a,b Q2b,c Q3a,b,c Q6 Q8 Q9 Q10 1B – Working with Numerical Data The remainder of this topic is concerned with Numerical Data. With Numerical Data, each data point is known as a score. Grouping Data Numerical Data can be presented as either Ungrouped Data or Grouped Data. 1B – Working with Numerical Data Grouping Data When we have a large amount of data, it’s useful to group the scores into groups or classes. When making the decision to group raw data on a frequency distribution table, choice of class (group) size matters. As a general rule, try to choose a class size so that 5 – 10 groups are formed. Find the lowest and the highest scores to decide what numbers need to be included in the groups. 1B – Working with Numerical Data Grouping Data We use an open ‘ – ‘ to include all values up to the number in the next column eg1. Group the following data appropriately. 12 17 10 24 18 13 24 8 5 9 7 22 2 3 21 22 0- 5- 10 - 15 - 2 5 5 2 20 10 20 - Tally Frequency 6 11 6 1B – Working with Numerical Data Grouping Data eg2. Group the following data appropriately. 10.1 17.0 15.2 24.9 16.7 25 24.4 12.2 30.2 20 29 16.1 31.6 12.1 36.7 21 39.3 10 10 - 15 - 20 - 25 - 30 - 35 - 5 4 4 3 2 2 Tally Frequency 11.5 28.1 1B – Working with Numerical Data Histograms Similar to a bar chart with a few very important changes: • Columns are drawn right against each other • A gap is left at the very start of the chart • If coloured in, use the same colour for all columns • A polygon may be drawn to link the columns 1B – Working with Numerical Data Histograms Ungrouped Data – Data Labels appear directly under the centre of each column 1B – Working with Numerical Data Histograms Grouped Data – End points of each class appear under the edge of each column 1B – Working with Numerical Data Data Distribution We can name data according to how it’s distributed. Is it all crammed together or is there more data in certain areas?? We associate certain names with different shapes of distribution • • • • • Normal – Most common score in the centre of the data Skewed – Most common score is toward one end of the data Bimodal – More than one score that is most frequent Spread – Data is spread over a wide range Clustered – Most of the data is confined to a small range 1B – Working with Numerical Data Data Distribution Normally Distributed Data • The most common score in the centre of the data. • The graph is symmetrical. 1B – Working with Numerical Data Data Distribution Skewed Data • The most common score is toward one end of the data. • Most data toward the left – Postively Skewed • Most data toward the right – Negatively Skewed 1B – Working with Numerical Data Data Distribution Bimodal Data • More than one score that is most frequent • This looks like two peaks on the graph 1B – Working with Numerical Data Data Distribution Spread Data Data is rather evenly spread over a wide range 1B – Working with Numerical Data Data Distribution Clustered Data Most of the data is confined to a small range Now do Chapter 1B Questions from your work record Q1 Q2 Q4 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 1D – Measures of Centre Would you agree that one of the main things statisticians do with a set of data, is to find the average, the middle or the most commonly occurring score? We call these values: The Mean – The Average of all scores. The Median – The middle score in a set of ordered data. The Mode – The score which occurs most often. We can find these values as follows….. The Mean The average of the scores 𝑀𝑒𝑎𝑛 = 𝑥 = 𝑥𝑖 𝑛 𝑡ℎ𝑒 𝑠𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑠𝑐𝑜𝑟𝑒𝑠 (𝑎𝑙𝑙 𝑠𝑐𝑜𝑟𝑒𝑠 𝑎𝑑𝑑𝑒𝑑 𝑡𝑜𝑔𝑒𝑡ℎ𝑒𝑟) = 𝑡ℎ𝑒 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑐𝑜𝑟𝑒𝑠 eg. Find the mean of the data set: 4 2 𝑥= 6 7 10 4+2+6+7+10+3+7+3+6+7 10 3 7 = 55 10 3 = 5.5 6 7 The Median The middle score of an ordered data set 𝑛+1 𝑀𝑒𝑑𝑖𝑎𝑛 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 = 𝑡ℎ 𝑠𝑐𝑜𝑟𝑒 2 For an ODD number of scores – median is a score in the data For an EVEN number of scores – median is halfway between 2 scores eg. Find the median of the data set: 4 2 6 7 10 3 7 Write the scores in order smallest – largest 2 3 3 4 Median = 6 6 6 7 7 7 10 Median = 3 10+1 2 6 = 11 2 7 = 5.5𝑡ℎ 𝑠𝑐𝑜𝑟𝑒 The Mode The score which occurs most frequently There can be one or more than one score which occurs most frequently, in these cases they are both modes – list them both. eg. Find the mode of the data set: 4 2 6 7 10 3 7 3 6 You may wish to write the scores in order to ensure all data is accounted for but this is not necessary. 2 3 3 4 Mode = 7 6 6 7 7 7 10 7 eg. Find the Mean, Median and Mode of the following set of data 3 4 6 Mean 𝑥= 𝑥𝑖 𝑛 = 9 10 3 4 5 1 7 8 3+4+6+9+10+3+4+5+1+7+8 11 Median Order the scores…. = 60 11 = 5.45 1 3 3 4 4 5 6 7 8 9 10 Find the position of the median…. Median Position = 𝑛+1 2 = 11+1 2 = 12 2 = 6𝑡ℎ 𝑠𝑐𝑜𝑟𝑒 Median = 5 Mode Two numbers each occur the most frequently so, Mode = 3 and 4 Now do Questions from your work record Chapter 1D Q1b,c Q9a-f Q15 Q2 Q3 Q4 Q8 Q10 Q11a-d Q14 Q16b Holiday Homework Reminder! • Assigned textbook questions to be up to date by 2nd Feb • Exercise 1A • Exercise 1B • Exercise 1D • Holiday Homework Sheet Review Question I surveyed 15 students and asked them the score they got on their test (out of 60). The following data was obtained. 35, 45, 41, 46, 38, 56, 59, 43, 46, 45, 51, 53, 43, 46, 50 • What type of data is this called? • Find the Mean, Mode and Median. • Group this data in an appropriate class size and represent this using a frequency table. • Draw a histogram of the data. • What do we call the distribution of this data? I surveyed 15 students and asked them the current age of their mothers in whole years. The following data was obtained. 35, 45, 41, 46, 38, 56, 59, 43, 46, 45, 51, 53, 43, 46, 50 • What type of data is this called? Numerical - Discrete • Find the Mean, Mode and Median. Mean: 𝑥𝑖 𝑛 = 46.67 Median: Order data smallest to largest. Median is the midpoint. 35, 38, 41, 43, 43, 45, 45, 46, 46, 46, 50, 51, 53, 56, 59 Mode: Number which occurs most often = 46 I surveyed 15 students and asked them the current age of their mothers in whole years. The following data was obtained. 35, 45, 41, 46, 38, 56, 59, 43, 46, 45, 51, 53, 43, 46, 50 • Group this data in an appropriate class size and represent this using a frequency table. Smallest number = 35, Largest number = 59 35 - 40 - 45 - 50 - 55 - 2 3 5 3 2 • Draw a histogram of the data. • What do we call the distribution of this data? Normally Distributed I surveyed 15 students and asked them the current age of their mothers in whole years. The following data was obtained. 35, 45, 41, 46, 38, 56, 59, 43, 46, 45, 51, 53, 43, 46, 50 Now lets try to find the Mean, Mode and Median using our calculators. * Main Menu – choose Statistics * Enter the data into List 1 * Choose Calc then One-variable. * We have entered each number individually, so each occurs once, so choose X-List: list1 Freq: 1 * Read off the list: Mean = 𝑥 Mode = Mode Median = Med Review Worksheet Then problems from text Now Do 1A – 1e, 3d, 13 1B – 13 1D – 5, 6, 7 1E – Measures of Variability Data can be viewed in ways other than just finding the middle of the data (median/mode/mean). To give us a truer representation of a set of data, we can calculate how spread out our data is using: • • • • The Range The Interquartile Range (IQR) The Standard Deviation The Variance 1E – Measures of Variability Range 1. Find the largest value in the set of data Xmax 2. Find the smallest value in the set of data Xmin 3. Subtract the smallest value from the largest value Range = Xmax – Xmin eg. 2 3 4 4 4 5 6 8 What would the range be? 8–2 = 6 1E – Measures of Variability Interquartile Range (IQR) To overcome problems due to extreme values, we can exclude the top & bottom quarters of the data to find the range of the remaining data. The lower quartile (Q1) is the number occurring ¼ of the way through the data – the 25th percentile. The upper quartile (Q3) is the number occurring ¾ of the way through the data – the 75th percentile. The IQR is the difference between these values, so can be found using: IQR = Q3 – Q1 1E – Measures of Variability Interquartile Range (IQR) Steps to calculate the IQR 1. Arrange the data in order of size 2. Divide the data in two by finding the median 3. Using the lower half of the data, find the lower quartile (Q1) by dividing this in two and find the midpoint 4. Repeat this for the upper half of data to find Q3 5. Calculate the IQR by finding Q3 – Q1. Eg. Calculate the IQR of the data: 4 7 2 1 10 2 7 6 9 5 1E – Measures of Variability Interquartile Range (IQR) Our Data: 4 7 2 1 10 2 7 6 9 5 1. Arrange in size order: 1 2 2 4 5 6 7 7 9 10 2. Divide in half to find Median 3. Find Q1 1 2 2 4 5 4. Find Q3 5. 1 2 2 4 5 6 7 7 9 10 Find IQR = Q3 – Q1 =7–2 =5 6 7 7 9 10 1E – Measures of Variability Standard Deviation Shows how much variation there is from the average. A low standard deviation indicates that the data points tend to be very close to the mean. A high standard deviation indicates that the data points are spread out over a large range of values. Standard Deviation = 1E – Measures of Variability Standard Deviation 1. Find the mean 𝑥 2. Find the difference between each piece of data & the mean 3. Square the differences 4. Add the squared differences 5. Divide by the number of scores, less one 6. Take the square root 1E – Measures of Variability Variance The variance is simply the standard deviation squared. 1E – Measures of Variability Information Overload?? Relax! We can use our calculator to find these values too – make sure you know how to use it!! Video Use these to find Range Use these to find IQR Standard Deviation Variance = (Standard Deviation) 2 1E – Working with Grouped Data We can find the measures of centre (Mean, Mode, Median) and the measures of spread (Range, IQR, Standard Deviation, Variance) of grouped data by first finding the midpoint of each group. Data 20 - 30 - 40 - 50 - 60 - Frequency 4 5 10 7 6 Midpoint 25 35 45 55 65 These are the columns we enter into our calculator 1E – Working with Grouped Data eg. Find the mean and the standard deviation of the grouped data 35 - 40 - 45 - 50 - 55 - 2 3 5 3 2 Find the midpoint of each group. 37.5 42.5 47.5 52.5 57.5 2 3 5 3 2 Enter this new table into our calculator. Menu -> Statistics -> Enter the data into 2 lists Find the mean and the standard deviation of the grouped data 37.5 42.5 47.5 52.5 57.5 2 3 5 3 2 Calc  One-Variable Choose XList: list1 (where your values are) And Freq: list2 (where your frequencies are) Mean → 𝒙 Standard Deviation → 𝑺𝒙 Exercise 1E Now Do 1c, 1e, 4a, 4b, 4c, 10, 12, 13, 14, 15, 16, 19 1F – Stem and Leaf Plots Instead of using a frequency table, we can also display our data using a stem and leaf plot. Similar to frequency tables, we can choose an appropriate group size in which to represent our data, usually using a class size of 5 or 10. Example: Stem & Leaf representation of the following data (class size of 10): 6, 7, 8, 10, 12, 13, 14, 17, 17, 17, 18, 19, 21, 23, 24, 24, 25, 27, 31, 31, 32, 36, 36, 39, 41, 45, 45, 46, 49, 50 Example: Stem & Leaf representation of the following data (class size of 10): 6, 7, 8, 10, 12, 13, 14, 17, 17, 17, 18, 19, 21, 23, 24, 24, 25, 27, 31, 31, 32, 36, 36, 39, 41, 45, 45, 46, 49, 50 Now lets try using a class size of 5….. Separate each class using * beside the stem number for the upper end of the group Include a key, this time with an example for both lower (no *) and upper ends (with *) 1F – Stem and Leaf Plots Your Turn: Organise the following data onto a stem and leaf plot using class size of 10. 10, 12, 16, 21, 24, 27, 29, 31, 33, 34 Now try again using a class size of 5. Now Do Exercise 1F 1, 3, 4, 7, 8, 9, 10, 11, 12 1G – 5-Number Summary We can summarise a set of data using a 5-Number summary. This set of 5 numbers represents the spread of the set of data. The 5-Number summary includes (in order): Xmin (Lowest Score) Q1 (The Score of the way through the data) Median 1 4 1 2 (The Score way through the data) 3 4 Q3 (The Score of the way through the data) Xmax (Highest Score) 1G – 5-Number Summary eg1. Write the 5-Number summary for the following set of data. 3 4 4 6 8 9 9 10 13 15 16 18 19 19 20 3 4 4 6 8 9 9 Xmin Q1 Median Q3 Xmax 13 15 16 18 19 19 20 (Lowest Score) 1 (The Score of the way through) 4 1 (The Score way through) 2 3 (The Score of the way through) 4 (Highest Score) So the 5-Number summary is: 3, 6, 10, 18, 20 1G – 5-Number Summary 7.5 eg2. Write the 5-Number summary for the following set of data. Arrange in order: 5 8 2 10 13 8 9 3 4 4 16 18 7 3 2 3 3 4 2 3 3 4 Xmin Q1 Median Q3 Xmax 4 5 4 7 (Lowest Score) 1 (The Score of the way through) 4 1 (The Score way through) 2 3 (The Score of the way through) 4 (Highest Score) 5 7 8 8 9 10 13 16 18 8 8 9 10 13 16 18 So the 5-Number summary is: 2, 4, 7.5, 10, 18 Eg2 cont’d. Check your answer using the calculator 5 8 2 10 13 8 9 3 4 4 16 18 7 3 Our Answer was Menu → Statistics 2, 4, 7.5, 10, 18 Enter data into list1 Calc → One-Variable Select the list your data is in as XList minX Q1 Med Q3 maxX 1G - Boxplots We can represent the 5-number summary on a boxplot. Boxplots are: • Always drawn to scale • Drawn with labels (Xmin, Q1 etc) or with a scaled & labelled axis running alongside the plot Q1 Xmin Q1 Median Q3 Xmax (Lowest Score) 1 ( way through) 4 1 ( 2 3 ( 4 Xmin Median Q3 Xmax way through) way through) (Highest Score) Scale 1G - Boxplots eg3. Using the 5-figure summary from example 2, sketch it’s boxplot. The 5-Number summary is: 2, 4, 7.5, 10, 18 Xmin Q1 Median Q3 Xmax Eg3 cont’d. We can also use the calculator to help sketch the plot. Once the data has been input to the calculator, do the following: SetGraph → Check StatGraph1 box → Choose Setting Choose ‘Type’ → Select ‘MedBox’ Check that selections are OK → Then click ‘Set’ To select point on the plot click: Exercise 1G Now Do 2, 4, 5, 8, 9, 10, 11, 12, 13, 14, 15 1H - Comparing sets of data Back-to-back Stem & Leaf Plots • Used to compare 2 similar sets of data • The two sets of data share the same central stem • Data is ordered from smallest to largest around the central stem 1H - Comparing sets of data Back-to-back Stem & Leaf Plots Create a back-to-back Stem & Leaf for the two sets of data (using a class size of 10) : Sample A: 4, 6, 7, 10, 12, 15, 19, 24 Sample B: 5, 7, 9, 9, 13, 16, 20, 22 Remember to start each line from the centre and work your way out 7, 6, 4 9, 5, 2, 0 4 0 1 2 5, 7, 9, 9 3, 6 0, 2 Always include a key! 1H - Comparing sets of data Back-to-back Stem & Leaf Plots Create a back-to-back Stem & Leaf for the two sets of data (using a class size of 5) : Sample A: 4, 6, 7, 10, 12, 15, 19, 24 Sample B: 5, 7, 9, 9, 13, 16, 20, 22 Remember to start each line from the centre and work your way out Always include a key! 4 0 7, 6 0* 2, 0 1 9, 5 1* 4 2 5, 7, 9, 9 3 6 0, 2 1H - Comparing sets of data Back-to-back Stem & Leaf Plots Find the 5-number summary for each set of data Xmin, Q1, Median, Q3, Xmax Group One 2, 10, 19, 24, 27 Group Two 1, 7, 14, 23, 27 1H - Comparing sets of data Side-by-Side Box Plots • Recall Boxplot – Q1 Xmin Median Q3 Xmax • Two or more sets of data compared using side-by-side boxplots. • The boxplots share a common scale so they can be compared appropriately 1H - Comparing sets of data Side-by-Side Box Plots Compare the two box plots……..what can be said about the data? 1H - Comparing sets of data Side-by-Side Box Plots eg. Two sets of data gave the following 5-figure summaries. Sample A 8, 10, 15, 21, 23 Sample B 5, 12, 18, 22, 25 Compare the two using side-by-side box plots. Sample A Sample B Now Do Exercise 1H 1 – 10; 14 Revision Problems Univariate Data Revision Question One Group Size Frequency The following table shows the dinner bookings from a local restaurant over an evening. 1 2 2 14 3 10 4 13 5 8 • What is the frequency of a group having 3 people? • What is the relative frequency of a group with 3 people? • What is the percentage frequency of a group with 3 people? • What is the total number of people who attended the restaurant that evening? • Draw a histogram of the data. • What is the average group size? Revision Question Two • State the minimum height. Key: 15* 16 The stem and leaf plot below shows the height of a group of 20 students. 8 = 158cm 0 = 160cm Stem Leaf • State the median height. • State the Mode. • State the IQR. 15* 8, 9 16 0, 2, 4 • State the Standard Deviation. 16* 5, 6, 6, 8, 9 • How many people over 172cm tall? 17 1, 3, 4, 4, 4 17* 5, 8, 9 • What is the relative frequency of a person who is 166cm tall? 18 1, 4 • What type of distribution is this? Revision Question Three The batting scores of two batsmen were collected over a cricket season. Their results are compared on the boxplots below. • Which batsman had the highest score? What was this score? • Write a 5-number summary for each Batsman A & Batsman B. • Which batsman had the best median performance? • Which batsman had the smallest range? • What scores made up the top 50% of the runs by Batsman A? • What scores made up the bottom 25% of the runs of Batsman B? • Which batsman had the best overall result? Explain. Revision Question Four Consider the following data that shows the heights (in cm) of 40 girls who are competing in trials to form a basketball squad. • Using your calculator * Find the points of central tendency, that is the Mean, Mode and Median. * Find the measures of variability, that is the Range, IQR, Standard Deviation & Variance. * Find the 5-number summary and use this to draw the boxplot of the data. • Draw a frequency table of the data, using a class size of 5. • Represent the data on a histogram

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download General maths: Univariate statistics