Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics – What is it? Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What they reveal is suggestive, but what they conceal is vital. ~Aaron Levenstein Statistics can be made to prove anything - even the truth. ~Author Unknown Lottery: A tax on people who are bad at math. ~Author Unknown He uses statistics as a drunken man uses lampposts - for support rather than for illumination. ~Andrew Lang The theory of probabilities is at bottom nothing but common sense reduced to calculus. ~Laplace, Théorie analytique des probabilités, 1820 I could prove God statistically. Take the human body alone - the chances that all the functions of an individual would just happen is a statistical monstrosity. ~George Gallup Statistics are just a way for the mathematician to evangelize his faith. ~Hunter Brinkmeier There are three kinds of lies: lies, damned lies, and statistics.“ ~ Benjamin Disraelie Statistics is the science of using of mathematical tools to interpret data Lesson Objective Understand the different ways of describing data Understand the importance of different sampling techniques when collecting data The Different Ways of Describing Data Discrete data Continuous data Categorical data Numerical data Qualitative data Quantitative data The Different Ways of Describing Data Discrete data Continuous data Categorical data Data that is digital and has specific values with gaps in between. A slight improvement in the accuracy of the measuring device does not alter the data. Data that is analogue and takes a range of values. A slight improvement in the accuracy of the measuring device alters the data collected. Data that falls into different labelled groups. If the labels are numerical then they have no numerical worth so calculating a mean is meaningless. Numerical data Data that is based on the size of numbers where the size of the numbers have some meaning. Qualitative data Data that has been collected based on some quality or categorization that in some cases may be 'informal' or may use relatively ill-defined characteristics such as warmth and flavour; Data that can be observed but not measured. Quantitative data Data that has a been collected by using a measuring scale is data measured or identified on a numerical scale. Give 3 examples of each type of data: Discrete data Continuous data Categorical data Numerical data Qualitative data Quantitative data The Different Ways of Describing Data Discrete data Eg Shoe Size, Dice score, Type of Pet Continuous data Eg Time to run a mile, length of a hair Categorical data Eg Types of Pet, House Number, Colour Numerical data Eg Score on a dice, Weight of a lemon Qualitative data Eg I feel happy, The weather is good today Quantitative data Eg The score obtained in a test, the height of a tree Decide whether each of the following sets of data is categorical or numerical, and if numerical whether it is discrete or continuous. 1) Cards drawn from a set of playing cards: {2 of diamonds, ace of spades, 3 of hearts etc…} 2) Number of aces in a hand of 13 cards: {1, 2, 3, 4} 3) Time in seconds for 100 metre sprint: {10.05, 12.31, 11.20, 10.67, 11.56, …etc} 4) Fraction of coin tosses which were Heads after 1, 2, 3, … tosses for the following sequence: H T H T T T H H … {1, ½, 2/3, ½, 2/5, 1/3, 3/7, ½, …} 5) Number of spectators at a football match: {23 456, 40 132, 28 320, 18 214, …etc} 6) Day of week when people were born: {Wednesday, Monday, Sunday, Sunday, Saturday, etc…} 7) Times in seconds between ‘blips’ of a Geiger counter in a physics experiment: {0.23, 1.23, 3.03, 0.21, 4.51, …etc} 8) Percentages gained by students for a test out of 60: {20, 78.33, 80, 75, 53.33, …etc} 9) Number of weeds in a 1 m by 1 m square in a biology experiment: {2, 8, 12, 3, 5, 8, …} Solution 1 and 6 are categorical data, all the others are numerical. 2 - discrete 3 - continuous 4 - discrete, as the possible fractions can be listed 5 - discrete 7 - continuous 8 - discrete, as there are only 60 possible percentage scores. 9 - discrete, as there must be a whole number of weeds. Different Sampling Techniques There are many different ways to generate a sample for data collection: 4 of the most common are: Random Sampling Systematic Sampling Stratified Sampling Convenience Sampling Look at the cards on the next slide and decide which sampling technique is being described. Think of an advantage and a disadvantage for the technique described. A pollster stands in Huntingdon market square and asks the first 30 people that will listen to her their opinions on a market revamp. In a survey to assess opinions about Year 10 uniform a school list is printed and every 10th pupil on the list selected. At a local club it is known that ¾ of the membership is female. A sample of 21 females and 7 males is drawn by randomly picking names from a hat. To find out opinions about a web site you ask the first 30 people to visit the site to complete a questionnaire using their browser. To select a sample of 6 people from a class of 30 to do a maths test, the class are lined up in height order and every 5th pupil selected. In a class of 20 pupils each pupil is assigned a number and 4 members are selected for a competition by using the random number generator on a calculator. A Secondary school has 3 Key Stages with pupils split between them in the ratio 3:2:3 To survey opinions about the school canteen they interview 30 students from KS3, 20 from KS4 and 30 from KS5. To investigate the health of whales a marine biology charity decide to estimate the length of whales in the South Atlantic by measuring the first 10 whales they find. A bag contains 100 names. It is shaken and 30 names are drawn from the bag without looking Lesson Objective Understand the three key things required to analyse data In an experiment pupils were selected randomly from their maths lessons and asked to estimate the area of a triangle and a rectangle . The area of both shapes was 15cm2. The results are shown below: age gender 11 f 11 f 11 m 11 f 11 f 11 f 11 m 11 f 11 f 11 m 11 f 11 m 11 m 11 f 11 f Rec:15 12 10 15 15 18 30 16 18 16 3 15 8 8 14 15 Tr:15 11 50 10 16 64 5 25 15 16 4.5 20 12 9 13 11 age gender 17 f 17 m 17 f 17 m 17 f 17 f 17 m 17 m 17 f 17 f 17 f 17 f 18 m 18 f 18 m 19 f 19 m Analyse this data. Rec:15 13 14 15 12 13 16 16 14 15 12 15 10 13 14 18 15 18 Tr:15 8 16 18 20 12 12 14 10 12 13 13 20 30 15 12 15 15 In an experiment pupils were selected randomly from their maths lessons and asked to estimate the area of a triangle and a rectangle . The area of both shapes was 15cm2. The results are shown below: age gender 11 f 11 f 11 m 11 f 11 f 11 f 11 m 11 f 11 f 11 m 11 f 11 m 11 m 11 f 11 f Rec:15 12 10 15 15 18 30 16 18 16 3 15 8 8 14 15 Tr:15 11 50 10 16 64 5 25 15 16 4.5 20 12 9 13 11 age gender 17 f 17 m 17 f 17 m 17 f 17 f 17 m 17 m 17 f 17 f 17 f 17 f 18 m 18 f 18 m 19 f 19 m What things could we investigate? Rec:15 13 14 15 12 13 16 16 14 15 12 15 10 13 14 18 15 18 Tr:15 8 16 18 20 12 12 14 10 12 13 13 20 30 15 12 15 15 Some nuggets of wisdom: 1) “This shows that the boys had a greater spread of data, meaning that the girls were more accurate” so spread implies accuracy? 2) “I predict that the girls will be more accurate than the boys at estimating the area as there are more of them and so a greater chance that more will correctly estimate the area” so the more people you have guessing the more accurate they will be? 3) “I predict that the boys will be better at estimating as there are fewer, meaning that there is less chance for anomalous results” so you get the best results by having a small sample size? Mode – generally useless for this exercise Calculating how many got it exactly right is generally useless as the data is continuous - the fact that some people guessed it correctly has more to do with Psychology than good estimating skills. Averaging averages to get an all embracing average is NEVER a good idea: Data set 1 Data set 2 1 and 8 6 Things to consider: 1) Is what they have tried to analyse clearly stated? Is there a hypothesis or some 1 mark alternate statement explaining what they are trying to achieve? 2) Have they attempted to find an average? Is it the most appropriate average for the task? Is the average calculated properly? 1 mark relevant average 1 mark accuracy 3) Have they attempted to look at the consistency of the data? Have they used an appropriate method to measure consistency? Is their measure of consistency (range, IQR) calculated properly? 1 mark relevant measure 1 mark accuracy 4) Have they drawn a graph or chart to help show the distribution of the data? 1 mark relevant graph 1 mark accuracy 4) Have they written a final comment that refers to their initial statement/hypothesis and that attempts to provide a conclusion? Does the final comment agree with their actual maths? Have they referred to/tied their maths to the conclusion?(Eg the mean of …. for boys was greater than the mean for girls ….. therefore …) Does the conclusion comment on both consistency and averages? 3 marks – you Is there anything in the conclusion to suggest deeper analysis? judge! Is there anything that makes you go – that’s cleaver I like that! When we are analysing numerical data we are interested in 3 things: 1) The Location (Size) of the data 2) The variation (Spread) of the data 3) The shape (Distribution) of the data 1) The Location (Size) of the data We use averages for this purpose: Mean Mode Median Mid Range 2) The variation (Spread) of the data Range Inter-quartile Range Standard deviation/Root Mean Squared Deviation 3) The shape (Distribution) of the data We use graphs for this purpose: Stem and Leaf diagrams Box and Whisker Plots Bar Chars Histograms Lesson Objective Revise basic graph types and their uses Focus on drawing and interpreting histograms This data set is the heights of a group of 38 ‟A‟ level students. GIRLS 1) 2) 2) 3) 4) BOYS How tall is the shortest person in the sample? How many girls in the sample? What is the range of the boys heights? What is the median height of the girls? What is the inter-quartile range of the boys heights? The Pie Charts show how Year 10 and 11 students travel to school. From the Pie Chart a) Can you tell if more Boys or Girls walk to school? b) If the angle for walking in the girls section is 18 degrees and represents 10 pupils, how many girls were surveyed. This histogram illustrates the time students in a form group take to get to school in the morning. a) Find the number of students in the class. b) Estimate the probability that a randomly chosen pupil takes between 10 and 20 minutes to get to school. Question 1 The table below shows the heights, to the nearest centimetre, of a group of students. height (cm) 110-119 120-129 130-134 135-139 140-149 150-159 160-179 180-189 frequency 2 4 3 5 6 5 5 1 a) Draw a histogram for this data. b) Use your histogram to estimate the number of students taller than 153cm. c) Estimate the number of students between 127 and 143 cm tall. The class width of the first bar would appear to be 9, but it is not. Because the heights are measured to the nearest centimetre, the first class embraces all heights between 109.5cm and 119.5cm. This is a class width of 10, and also involves labelling 109.5, 119.5 etc. on the horizontal axis of the histogram. Adding the frequency density row to the table... height (cm) 110-119 120-129 130-134 135-139 140-149 150-159 160-179 180-189 frequency 2 4 3 5 6 5 5 1 frequency density 0.2 0.4 0.6 1 0.6 0.5 0.25 0.1 b) To find how many students are above 153cm in height, we would add the frequencies of the last two bars to the correct proportion of the previous bar. So there are approximately 9 students above 153 cm. 6.5 5 5 1 9.25 10 9 students c) The number of students between 127 and 143 cm tall is given by… frequency density 1.0 represents one person 0.8 0.6 153 0.4 0.2 0 109.5 119.5 129.5 139.5 149.5 159.5 169.5 179.5 189.5 height (cm) 2.5 3.5 433 6 9.1 10 10 9 students 2) Complete the table and histogram below. time frequency (minutes) 0-15 90 15-20 40 20-25 25-35 time (minute frequency s) 0-15 90 15-20 40 20-25 80 25-35 100 frequency density 6 8 16 10 Most suitable Data Type(s) Discrete or Continuous Numerical or Categorical Bar Chart Pie Chart Stem and Leaf Box and Whisker Histogram Advantages Disadvantages Most suitable Data Type Advantages Disadvantages Bar Chart Categorical Discrete Easy to see how many are in each category. Shows shape well. Can’t see proportions so easily Pie Chart Categorical Discrete Shows proportions Clearly Can’t see how many are in each category. Not good if there are too many categories Stem and Leaf Numerical Small data sets continuous or discrete Keeps the raw data Shape of data clear Ordered data helps with medians etc Not good for large data sets Box and Whisker Numerical Continuous data Good for showing/comparing the spread of data Looses raw data Histogram Numerical Continuous data Good for showing the shape of the data and the proportions Can’t read actual frequencies for the groups easily Lesson Objective Be able to calculate measures of Location/Averages Understand summation notation for the mean What is an average and why do we have more than one way of calculating them? These quotes might help you consider the answer to this question: “Say you were standing with one foot in the oven and one foot in an ice bucket. According to the percentage people, you should be perfectly comfortable. ” ~Bobby Bragan, 1963 “The average human has one breast and one testicle.” ~Des McHale “I abhor averages. I like the individual case. A man may have six meals one day and none the next, making an average of three meals per day, but that is not a good way to live.” ~Louis D. Brandeis Averages for raw/untabulated data The data shows the number chocolates gratefully provided to a particular maths teacher from his sixth form classes over a 3 week period: Find the mean, mode, median and mid-range of the number of gifts received: 1, 2, 0, 3, 5, 1, 2, 0, 0, 4, 1, 1, 2, 1, 3 The data shows the number chocolates gratefully provided to a particular maths teacher from his sixth form classes over a 3 week period: Find the mean, mode, median and mid-range of the number of gifts received: 1, 2, 0, 3, 5, 1, 2, 0, 0, 4, 1, 1, 2, 1, 3 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4, 5 Median: 1 Mode: 1 Mean: x = x f f 26 1.73 15 Averages for tabulated data Find the mean, mode, median and mid-range for this data, showing shoe size shoe size (x) 5 6 7 8 9 10 Total frequency (f) 3 14 13 21 16 8 75 Find the mean, mode, median and mid-range for this data, showing shoe size shoe size (x) 5 6 7 8 9 10 Total frequency (f) 3 14 13 21 16 8 75 frequency × shoe size (fx) 15 84 91 168 144 80 582 Median: 75 items of data median at (75 + 1)/2 = 38th position Counting through the list median shoe size is 8 Mode: Mean = 8 x = x f f 582 7.76 75 Averages for tabulated data Find the mean, mode, median and mid-range for this data, showing speeds of vehicles along a road: speed, s (mph) 20 ≤ s < 25 25 ≤ s < 30 30 ≤ s < 35 35 ≤ s < 40 40 ≤ s < 45 45 ≤ s < 50 Total number of vehicles (f) 7 11 31 20 14 9 92 Find the mean, mode, median and mid-range for this data, showing speeds of vehicles along a road: speed, s (mph) 20 ≤ s < 25 25 ≤ s < 30 30 ≤ s < 35 35 ≤ s < 40 40 ≤ s < 45 45 ≤ s < 50 Total number of vehicles (f) 7 11 31 20 14 9 92 frequency × midpoint (fx) 157.5 302.5 1007.5 750 595 427.5 3240 mid-point (x) 22.5 27.5 32.5 37.5 42.5 47.5 Median: Use Cumulative Frequency Curve Instead for better accuracy! Estimate 92 items of data median at (92 + 1)/2 = 46.5th position Counting through the list this will be in the 30 to 35 interval. Modal interval : 30 ≤ s < 35 x Mean: Can only be estimated as lack of raw data = x f = f 3240 35.21mph 92 Lesson Objective Be able to calculate Interquartile Range for a list of data Drawing and Interpreting Box and Whisker Plots Understanding Skewness and identifying outliers Two classes did a test (out of 100) Here are the results Class A: 50 82 40 51 45 50 48 49 47 10 43 58 56 52 39 Class B: 20 34 50 48 62 70 39 47 12 38 40 a) Find the median and interquartile range of the set of marks for each class. b) Draw a box and whisker plot to compare the results for each class. 16 Two classes did a test (out of 100) 48.5 Here are the results Class A: 10 16 39 40 43 45 47 48 49 50 50 51 52 56 58 82 Class B: 62 70 12 20 34 38 39 40 47 48 50 a) Find the median and interquartile range of the set of marks for each class. b) Draw a box and whisker plot to compare the results for each class CLASS A CLASS B 0 10 20 30 40 50 60 70 Class A Median: 48.5, IQ Range = 10 Negatively Skewed Class B Median: 40, IQ Range = 16 Positively Skewed 80 90 100 A piece of data is generally considered an outlier if it is : 1.5 × IQR below the lower quartile OR 1.5 × IQR above the upper quartile Class A: Class B: 10 16 39 40 12 20 34 38 43 39 45 47 48 49 40 47 48 50 Are there any outliers in each class? 50 50 51 52 56 58 82 62 70 Design a data set for one of the box and whisker charts on the next page Swap with a partner They must design a data set to recreate your graph as best as possible Compare at the end Lesson Objective Be able to calculate the Standard Deviation for a set of data Use calculator to find the Standard Deviation for a set of data Write down some statements to compare these two sets of data. Which features are the same and which are different? Here is the actual data? How does this clash with your previous assumptions? Consider the following sets of numbers. Find the Range, The Interquartile Range and the Mean What are the limitations of the Range and the Interquartile Range in measuring consistency in a data set 4, 5, 9, 6, 6, 10, 10, 10, 11, 19 The Root Mean Squared Deviation (Commonly called the Standard Deviation of a Sample) R.M.S The value of this equation before you square root is referred to as the VARIANCE The Standard Deviation for a Population (It can be shown the Root Mean Squared Deviation formula when calculated on a sample taken from a population generally produces a result that is lower than the actual Standard Deviation of the Population – this is S3 + S4). The formula can therefore be adjusted as follows to take this into account: S.D of Population The value of this equation before you square root is still referred to as the VARIANCE NOTE: FOR OUR SYLLABUS IT IS EXPECTED THAT YOU WILL ALWAYS USE THE BOTTOM FORMULA WHEN ASKED TO CALCULATE STANDARD DEVIATION!! Find the standard deviation for this set of data Lesson Objective Understand the concept of ‘Coding’ Be able to find the mean and standard deviation of ‘coded’ data and related data sets Here is some data. We will call this data the ‘x’ data: Find the mean and the standard deviation of this data? Check your results on your calculator. Investigation Suppose you multiply each of the data you just used by 2 and add 3. Write down the new set of data. Call it the y-data. Now calculate the and the standard deviation of the y-data. What do you notice? How is it related to the original x-data? What if you multiply it by 2 and add 5? What if you multiply by 3 and add 5? Can you predict what will happen if you multiply by ‘a’ and add ‘b’? Can you justify your results? Suppose you have a set of values (x-data) x1, x2, x3, x4, x5 ………. Let the mean of the set of data be ‘m’ and the standard deviation ‘s’ Let another set of values (y-data) be so related to the x-data by a linear formula of the form yi = a × xi + b (‘a’ and ‘b’ are constants) Then: The mean of the y values The standard deviation of the y values = a × mean of ‘x-data’ + b = a × standard deviation of ‘x-data’ We can use this to find the mean of related sets of data. This process is called ‘Coding’ Eg Consider the values 1002, 1004, 1006, 1008, 1010 This data set is merely the data set 1, 2, 3, 4, 5 multipled by 2 and with 1000 added. The mean of 1, 2, 3, 4, 5 is 3 and the sd of 1, 2, 3, 4, 5 is 1.58 so the mean of the original data is 2 × 3 + 1000 = 1006 the sd of the original data is 2 × 1.58 = 3.16 Ex 50 Book S1 Third Edition Lesson Objective Recognise and be able to use the alternative formula for standard deviation. 75 adults were asked to their shoe size. The results are recorded in the table below. Calculate the standard deviation in the shoe-sizes using the formula: ( x x )2 n 1 Check your result using your calculator shoe size (x) 5 6 7 8 9 10 Total frequency (f) 3 14 13 21 16 8 75 Lesson Objective Recognise and be able to use the alternative formula for standard deviation. 75 adults were asked to their shoe size. The results are recorded in the table below. Calculate the standard deviation in the shoe-sizes using the formula: ( x x )2 n 1 Check your result using your calculator: shoe size (x) 5 6 7 8 9 10 Total frequency (f) 3 14 13 21 16 8 75 Mean = 582÷ 75 =7.76 x×f 15 84 91 168 144 80 582 ( x x )2 f 22.8528 43.3664 7.5088 1.2096 24.6016 40.1408 139.68 sd = √(139.68 ÷ 74) = 1.37 An alternative (rearrangement) of the formula: (x x) 2 n 1 x Is: 2 nx 2 n 1 This gives the same answer but is slightly easier to use when the data is in a frequency table: shoe size (x) 5 6 7 8 9 10 Total frequency (f) 3 14 13 21 16 8 75 Mean = 582÷ 75 =7.76 x×f 15 84 91 168 144 80 15 x2 f 75 504 637 1344 1296 800 4656 sd = 4656 75 7.76 = 1.37 74 2 50 female students had their heights measured. The results were put into the table below. Find the mean height and the standard deviation in the heights: Check your result using your calculator. Height, h (cm) mid-points 158.5 160.5 162.5 164.5 166.5 168.5 Total frequency (f) 4 11 19 8 5 3 50 female students had their heights measured. The results were put into the table below. Find the mean height and the standard deviation in the heights: Check your result using your calculator. Height, h (cm) mid-points 158.5 160.5 162.5 164.5 166.5 168.5 Total frequency (f) 4 11 19 8 5 3 Mean 162.5 cm sd = 2.56 cm Different style of exam question Standard deviation formulae x 2 nx 2 n 1 (x x) n 1 Given the following information relating to data placed in a frequency distribution. Find the mean and the standard deviation of the data 2 Different style of exam question Standard deviation formulae x 2 nx 2 n 1 (x x) n 1 Given the following information relating to data placed in a frequency distribution. Find the mean and the standard deviation of the data Mean = 6.1 sd = 2.25 (3 sig fig) 2 Lesson Objective Understand what cumulative frequency curves represent Be able to draw a cumulative frequency curve Use a cumulative frequency curve to find medians, quartiles and percentiles An egg farmer wants to grade his eggs in terms of size. Grade A will be the biggest size of egg Grade B the next, biggest etc with Grade D the smallest. Each grading should contain the same proportion of eggs. The table shows the weight of his first batch of eggs. What ‘boundaries’ should he choose for each egg Grade? Weight of the Egg, w (grams) Frequ ency 30 ≤ w < 40 15 40 ≤ w < 50 25 50 ≤ w < 60 50 60 ≤ w < 70 40 70 ≤ w < 80 10 Weight of the Egg, w (grams) Frequ ency Weight of the Egg, w (grams) Cum. Freq. 30 ≤ w < 40 15 0 ≤ w < 40 15 40 ≤ w < 50 25 0 ≤ w < 50 40 50 ≤ w < 60 50 0 ≤ w < 60 90 60 ≤ w < 70 40 0 ≤ w < 70 130 70 ≤ w < 80 10 0 ≤ w < 80 140 Quartile values will be roughly around: 35 (LQ), 70 (MEDIAN), 105 (UQ) LQ could be found by saying 40 + 20/25 of 10 = 48 MEDIAN 50 + 30/50 of 10 = 56 UQ 60+ 15/ 40 of 10 = 63.75 But this approach assumes a linear growth in the frequency across each interval Weight of the Egg, w (grams) Frequ ency 140 30 ≤ w < 40 15 130 40 ≤ w < 50 25 120 50 ≤ w < 60 50 60 ≤ w < 70 40 70 ≤ w < 80 10 110 a) How a many eggs did the farmer harvest on this particular day? b) Estimate the Median weight of the eggs collected. c) Estimate the Inter-quartile range in the Eggs collected. Weight of the Egg, w (grams) 0 ≤ w < 40 Cum. Freq. Cumulative frequency 100 90 80 70 60 50 40 30 20 0 ≤ w < 50 10 0 ≤ w < 60 0 0 ≤ w < 70 30 35 40 45 50 55 60 65 70 75 80 0 ≤ w < 80 Weight Graph shows how long people waited to be seen at an eye clinic. Key 100 Points: You plot Cumulative Frequency at the end of the interval. (35,10) (40,21) etc Waiting Time Cum. Freq. 0 ≤ w < 35 10 0 ≤ w < 40 21 0 ≤ w < 45 46 0 ≤ w < 50 73 ……etc. …etc 90 There were 100 people. The median waiting time is that obtained by the 50th person (half of 100) = 46 mins. 80 Cumulative frequency Cumulative Frequency goes up the side A Cumulative frequency graph tells you how many items are below each value. Here 80 people waited for less than 53 mins. It is mainly used to estimate medians and percentiles for grouped data. 70 60 To find the Upper quartile, read the time at 75. For the lower quartile read the time at 25. 50 40 30 20 10 0 30 35 40 45 50 Time in mins 55 60 Horizontal axis has a continuous scale Can you find data sets to match these cumulative frequency curves Summary of what we have learned: Summary of what we have learned: When comparing data we are interested in the location of the data (averages) the consistency of the data (measures of spread) and the shape of the data (Graphs) Averages: A single item of data that represents the whole data set Mean, Mode, Median, Mid Range Spread: Range, Interquartile Range, Root Mean Squared Deviation, Standard Deviation Shape: Bar Charts, Frequency Charts, Histograms, Frequency Polygons Can also draw Box and Whisker Plots (Good for showing skewness and spread) Pie Charts (Good for showing proportions) Cumulative Frequency Curves (Good for finding Interquartile Range for grouped data) Standard deviation formulae: x 2 nx 2 n 1 ( x x )2 n 1 Root Mean squared Formulae: x n 2 x2 (x x) 2 n The formula for the Variance is that for standard deviation without the square root Outliers are defined as being either: 1.5xIQR above the UQ or below the LQ or above/below mean +/- 2 standard deviations