Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
3/14/2013 Ways to Represent Data… There are quite a few! Data and Variation Let’s look at a few that we have seen, along with some that we have seen in previous years. • Pie Charts (We will not cover these!) • Venn Diagrams • Raw Data • Here are all the first quiz scores for the 200 students enrolled in Algebra I. • Put them in order. How’d they do? (These were in day 4 packet.) How’d they do? 1 3/14/2013 • Frequency Histogram • Stem-and-Leaf Plot How’d they do? • Same data, different interval widths How’d they do? Measures of Central Tendency • What is the “average” versus the average? • Average can mean different things! – MEAN: the average of an entire set of data – MEDIAN: the data point in the middle when a data set is ordered from lowest to highest – MODE: the most common occurring data value(s) How’d they do? Variation • 2000 Batting Averages • Highest was 0.372 What do you see? • 1920 Batting Averages • Highest was over 0.400 and 2 players were in the 0.380s • 2000 Batting Averages • Not much variation in data • 1920 Batting Averages • More variation in data 2 3/14/2013 Measuring Variation • Five-Number Summary – Minimum Value – Maximum Value – Median Value of all data – Median of Bottom Half of Data (1st quartile) – Median of Top Half of Data (3rd quartile) Measuring Variation Box and Whisker Plots • Here is a plot of the exam data from before. • Dots are outliers (more than 1.5 times the distance from Q1 to Q3). • How’d they do? Standard Deviation • Calculate the Mean. • Find out how far each value is from the mean. • How far on average is each value from the mean? • This is called the deviation from the mean. Don’t worry! You do NOT need to know this equation! I will show you how to find this using the graphing calculator! Look back at our data… • The standard deviation of 1920 batting averages is 0.050 and of 2000 batting averages is 0.038. Smaller standard deviation implies the data is more tightly grouped. Look back at our data… • The standard deviation of exam scores is 14.782. (The higher deviation is due to outliers and the skew of the data. Outliers affect the mean as well.) 3 3/14/2013 Shapes of Graphs • Graphs can be skewed one direction or the other. • Graphs of batting averages and height were symmetrical around the central value. • Exam scores were not symmetrical since most students scored higher. This is skewed to the left (where the tail is). This is called a negative skew. • A graph skewed to the right means the tail is on the right side of the graph. This is called a positive skew. Housing Prices • Skewed to the right. • Mean pulled in direction of skew relative to median. • Mean is HIGHER than median. • Exam scores • Data is skewed to the left. • Mean is LOWER than median. Example #3 • The following histogram shows the exam scores for 30 students in a freshman accounting class. Estimate the mean of these scores. Is the standard deviation of these scores likely to be closer to 12 or to 25? Answer to Example #3 • The mean score is approximately 70 The standard deviation is more likely to be closer to 12 because about half of the scores are within 10 of 70 and the other half are further than 10 but less than 30 away therefore it seems more likely that the standard deviation would average out to close to 12 rather than 25. 4 3/14/2013 SAT Scores The Bell Curve • Most famous of the shapes is the bell-shaped curve, aka normal curve, aka normal distribution, aka Gaussian distribution. • Appears often in nature and in mathematics. • Lots of formulas to describe it and analyze it. • Let’s look at some examples! • What do you see? • Bimodal distribution – often experienced on test scores. Students who know what they are doing come exam time and students who do NOT know. Why should we expect bells? • Around the mean, there should be an expected amount of variation above and below. The more the variation, the less likely it is. Thus we have a cluster in the middle and approximately the same in high and low ends. 5 3/14/2013 Normal Curves and Standard Deviation • 68% of the data differ from the mean by less than one standard deviation. • 95% of the data differ from the mean by less than two standard deviations. • 99.7% of the data differ from the mean by less than three standard deviations. ***You MUST memorize this chart!!! Example #1 • All freshmen entering NHS have their heads measured for the beanies they are required to wear. One year the head circumference data had a normal distribution with mean 55 cm and standard deviation 1.7 cm. What percentage of the students that year had a head circumference between 53.3 cm and 56.7 cm? What percentage had circumference above 58.4 cm? Example #2 • The average high temperature in Anchorage, Alaska, in January is 21ºF with a standard deviation of 10º. The average high temperature in Honolulu in January is 80ºF with a standard deviation of 8º. In which location would it be more unusual to have a day in January with a high of 57ºF? Answer to Example #1 • For data with a normal distribution, about 68% of the values differ from the mean by less than one standard deviation. The normally distributed head measurements have mean 55 cm and standard deviation 1.7 cm, so heads within one standard deviation of the mean will measure between 55 - 1.7 = 53.3 cm and 55 + 1.7 = 56.7 cm. Thus approximately 68% of the freshmen have head circumferences between 53.3 and 56.7 cm. A head measuring more than 58.4 cm is more than 3.4 cm, or two standard deviations, above the mean. For the second question, recall that approximately 95% of the values in a normal distribution are within two standard deviations, so only 5% lie above or below those limits. Thus, in this case, roughly 5%/2 = 2.5% of the freshmen will have head circumferences measuring more than 58.4 cm. Answer to #2 • A January temperature of 57° would be more unusual in Anchorage. This temperature is within three standard deviations (3 * 8° = 24°) of the mean (80°) in Honolulu but is outside the range of three standard deviations (3 * 10° = 30°) of the mean (21°) in Anchorage. 6