Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Curves and Histograms The Normal Distribution The 68–95–99.7 Rule Two Variable Statistics Mathematics 1000, Winter 2008 Lecture 4 Sheng Zhang Department of Mathematics Wayne State University January 16, 2008 S. Zhang Lecture 4 Curves and Histograms The Normal Distribution The 68–95–99.7 Rule Two Variable Statistics Announcement Monday is Martin Luther King Day NO CLASS S. Zhang Lecture 4 Curves and Histograms The Normal Distribution The 68–95–99.7 Rule Two Variable Statistics Today’s Topics 1 Curves and Histograms 2 The Normal Distribution 3 The 68–95–99.7 Rule 4 Two Variable Statistics Scatterplots S. Zhang Lecture 4 Curves and Histograms The Normal Distribution The 68–95–99.7 Rule Two Variable Statistics From histograms to curves A histogram consists of several rectangles, each of the same width, all based on a line. The height of each rectangle represents the number of data points in a given range. For large data sets, we can have many rectangles, each quite narrow. If they are narrow enough, the histogram smoothes out, and the top of it resembles a curve, not merely a jagged line. S. Zhang Lecture 4 Curves and Histograms The Normal Distribution The 68–95–99.7 Rule Two Variable Statistics Number of finishers per hour, New York Marathon, 2005 15000 10000 5000 2 – 2:59 3 – 3:59 4 – 4:59 5 – 5:59 6 – 6:59 7 – 7:59 8 – 8:59 S. Zhang Lecture 4 Curves and Histograms The Normal Distribution The 68–95–99.7 Rule Two Variable Statistics Number of finishers per half-hour, New York Marathon 2005 8000 6000 4000 2000 2 2:30 3 3:30 4 4:30 5 5:30 6 6:30 7 7:30 8 S. Zhang Lecture 4 Curves and Histograms The Normal Distribution The 68–95–99.7 Rule Two Variable Statistics Number of finishers per ten minutes New York Marathon, 2005 3000 2500 2000 1500 1000 S. Zhang 8 – 8:09 7 – 7:09 7:30 – 7:39 6:30 – 6:39 6 – 6:09 5:30 – 5:39 5 – 5:09 4:30 – 4:39 4 – 4:09 3:30 – 3:39 3 – 3:09 2 – 2:09 2:10 – 2:19 2:20 – 2:29 2:30 – 2:39 500 Lecture 4 Curves and Histograms The Normal Distribution The 68–95–99.7 Rule Two Variable Statistics New York Marathon results, five minute by five minute histogram. S. Zhang Lecture 4 Curves and Histograms The Normal Distribution The 68–95–99.7 Rule Two Variable Statistics Per capita GDP by country in 115 poor countries Number of countries 6 90 60 30 0 2500 Per capita GDP in dollars S. Zhang Lecture 4 Curves and Histograms The Normal Distribution The 68–95–99.7 Rule Two Variable Statistics Per capita GDP by country in 115 poor countries Number of countries 6 60 40 20 0 10 20 30 40 Per capita GDP in thousands of dollars S. Zhang Lecture 4 Curves and Histograms The Normal Distribution The 68–95–99.7 Rule Two Variable Statistics Per capita GDP by country in 115 poor countries 6 Number of countries 40 30 20 10 0 5 10 15 20 25 30 35 40 45 Per capita GDP in hundreds of dollars S. Zhang Lecture 4 Curves and Histograms The Normal Distribution The 68–95–99.7 Rule Two Variable Statistics Per capita GDP by country in 115 poor countries Number of countries 6 15 10 5 0 4 8 12 16 20 24 28 32 36 40 44 48 Per capita GDP in hundreds of dollars S. Zhang Lecture 4 Curves and Histograms The Normal Distribution The 68–95–99.7 Rule Two Variable Statistics Interpreting curves When a distribution is given by a curve, the rectangles have vanished. The areas below the curve represent the proportion of the data that lie within a given horizontal range. S. Zhang Lecture 4 Curves and Histograms The Normal Distribution The 68–95–99.7 Rule Two Variable Statistics Special typse of curves With a large number of data points, distributions tend to resemble curves. There are many possible curves that can serve as “models” for how the data should lie. We will only consider one of them. S. Zhang Lecture 4 Curves and Histograms The Normal Distribution The 68–95–99.7 Rule Two Variable Statistics Example of a normal distribution S. Zhang Lecture 4 Curves and Histograms The Normal Distribution The 68–95–99.7 Rule Two Variable Statistics Properties of the normal distribution symmetric mean and median are the same quartiles lie about 2/3 of a standard deviation from the mean satisfies the 68 - 95 - 99.7 rule S. Zhang Lecture 4 Curves and Histograms The Normal Distribution The 68–95–99.7 Rule Two Variable Statistics The 68 – 95 – 99.7 Rule In a normal distribution with mean x̄ and standard deviation s: 68% of the observations lie between x̄ − s and x̄ + s 95% of the observations lie between x̄ − 2s and x̄ + 2s 99.7% of the observations lie between x̄ − 3s and x̄ + 3s S. Zhang Lecture 4 Curves and Histograms The Normal Distribution The 68–95–99.7 Rule Two Variable Statistics Application ACT scores approximately follow a normal distribution with mean 20.8 and standard deviation 4.8. This means that 68% of such scores lie between 16 ( = 20.8 4.8) and 25.6 ( = 20.8 + .48). So 32% lie outside that range. Since the distribution is symmetric, roughly half of the 32% will lie below 16. That is, 16% of scores will lie below 16. S. Zhang Lecture 4 Curves and Histograms The Normal Distribution The 68–95–99.7 Rule Two Variable Statistics Exercise If adult women’s heights are distributed normally with a mean of 64.5 inches and a standard deviation of 2.5 inches, what proportion of women will be under 59.5 inches tall? Answer: 59.5 inches is 2 standard deviations below the mean. So 5% ( = 100% - 95%) of women will have height farther from the mean than that. Half of them will be short, and half will be tall. So 2.5% will be shorter than 59.5 inches tall. S. Zhang Lecture 4 Curves and Histograms The Normal Distribution The 68–95–99.7 Rule Two Variable Statistics Exercise If adult women’s heights are distributed normally with a mean of 64.5 inches and a standard deviation of 2.5 inches, what proportion of women will be under 59.5 inches tall? Answer: 59.5 inches is 2 standard deviations below the mean. So 5% ( = 100% - 95%) of women will have height farther from the mean than that. Half of them will be short, and half will be tall. So 2.5% will be shorter than 59.5 inches tall. S. Zhang Lecture 4 Curves and Histograms The Normal Distribution The 68–95–99.7 Rule Two Variable Statistics WSU application If there are 10,000 women at this university, how many would we expect to be under 59.5 inches tall? Answer: About 250, 2.5% of 10,000. S. Zhang Lecture 4 Curves and Histograms The Normal Distribution The 68–95–99.7 Rule Two Variable Statistics WSU application If there are 10,000 women at this university, how many would we expect to be under 59.5 inches tall? Answer: About 250, 2.5% of 10,000. S. Zhang Lecture 4 Curves and Histograms The Normal Distribution The 68–95–99.7 Rule Two Variable Statistics Calculators The Chapter 6 material will require the calculator more intensively than the material we covered up to now. If you need help with using the calculator, then you should be sure to get the one that we support. The quiz instructors and the tutors in the Mathematics Resource Center may well be able to help you with calculators. S. Zhang Lecture 4 Curves and Histograms The Normal Distribution The 68–95–99.7 Rule Two Variable Statistics Scatterplots Some applications of two-variable statistics Do students who spend more time studying get better grades? Do people who smoke tend to die earlier? What is the relationship between the amount of carbon dioxide in the atmosphere and the average global temperature? S. Zhang Lecture 4 Curves and Histograms The Normal Distribution The 68–95–99.7 Rule Two Variable Statistics Scatterplots Two variable statistics One variable statistics: Study shape, center, spread, look for outliers. Step 1: Draw a picture. Two variable statistics: Study patterns, relationship (correlation and regression), look for outliers. Step 1: Draw a picture. S. Zhang Lecture 4 Curves and Histograms The Normal Distribution The 68–95–99.7 Rule Two Variable Statistics Scatterplots Two variable statistics One variable statistics: Study shape, center, spread, look for outliers. Step 1: Draw a picture. Two variable statistics: Study patterns, relationship (correlation and regression), look for outliers. Step 1: Draw a picture. S. Zhang Lecture 4 Curves and Histograms The Normal Distribution The 68–95–99.7 Rule Two Variable Statistics Scatterplots Overview of next four lectures Scatterplots (today) Regression lines Correlation and regression Interpretation S. Zhang Lecture 4 Curves and Histograms The Normal Distribution The 68–95–99.7 Rule Two Variable Statistics Scatterplots Example 1 from the text Student Beers BAC Student Beers BAC 1 5 0.10 9 3 0.02 2 2 0.03 10 5 0.05 3 9 0.19 11 4 0.07 S. Zhang 4 8 0.12 12 6 0.10 Lecture 4 5 3 0.04 13 5 0.085 6 7 0.095 14 7 0.09 7 3 0.07 15 1 0.01 8 5 0.06 16 4 0.05 Curves and Histograms The Normal Distribution The 68–95–99.7 Rule Two Variable Statistics Scatterplots We plot one variable on the horizontal axis, another on the vertical axis. The variable on the horizontal axis is called the explanatory variable. The variable on the variable axis is called the response variable. Key question: How much does the explanatory variable explain the response variable? S. Zhang Lecture 4 Curves and Histograms The Normal Distribution The 68–95–99.7 Rule Two Variable Statistics Scatterplots Beer and Blood Alcohol Blood alcohol content 0.20 q 0.15 q q 0.10 q q q q q 0.05 q q q q q q q q 0.00 0 2 4 6 8 10 Beers S. Zhang Lecture 4 Curves and Histograms The Normal Distribution The 68–95–99.7 Rule Two Variable Statistics Scatterplots Beer and Blood Alcohol Blood alcohol content 0.20 q 0.15 q q 0.10 q q q q q 0.05 q q q q q q q q 0.00 0 2 4 6 8 10 Beers S. Zhang Lecture 4 Curves and Histograms The Normal Distribution The 68–95–99.7 Rule Two Variable Statistics Scatterplots State versus national standards There is a national test to measure the proficiency of fourth graders in mathematics. Most students are not proficient in mathematics according to this measure. Each of the states has its own separate proficiency test. Let’s compare how students did on their two tests. S. Zhang Lecture 4 Curves and Histograms The Normal Distribution The 68–95–99.7 Rule Two Variable Statistics Scatterplots State versus national standards S. Zhang Lecture 4 Curves and Histograms The Normal Distribution The 68–95–99.7 Rule Two Variable Statistics Scatterplots Smoking and mortality A much debated example: Does smoking cause health problems, or do people likely to have bad health just tend to smoke more? S. Zhang Lecture 4 Curves and Histograms The Normal Distribution The 68–95–99.7 Rule Two Variable Statistics Scatterplots Smoking and mortality scatterplot Mortality rate from coronary heart disease and number of cigarettes smoked per day t Death rate 375 t 250 40-49 year old males t 125 t t 0 0 10 20 30 40 Cigarettes per day S. Zhang 50 Source: 1969 Surgeon General’s Report Lecture 4