Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MAT 1000 Mathematics in Today's World Last Time We looked at the standard deviation, a measurement of the spread of a distribution. We introduced a special type of distribution, the normal distribution. These highly symmetric distributions are very common. We saw how, using only the mean and standard deviation, we can find the first and third quartiles of a normal distribution. Today Using the mean and standard deviation, we can find out much more about a normal distribution. In particular, we will be able to easily find any of the percentiles of the distribution. To do so, we need to find standard scores, or z-scores. First, we address the question of why normal distributions are so common. Today Note: Today’s material is not in the textbook. Example of normal distributions • Physical characteristics like height or weight. • The annual returns on the S&P 500 over the last 50 • • years. Cars in the parking lot of a mall. How long it takes a kernel of popcorn to pop in the microwave. Why are normal distributions so common? Normal distributions are “bell” shaped, so most of the data is close to the center (the mean), and only rarely are there numbers far from the mean. We expect this distribution whenever there are many conflicting forces that tend to cancel each other out. This is the case in lots of situations. Why are normal distributions so common? Example What forces can affect stock returns during a year? Usually lots of little things: new products, bad publicity, changing government regulations, even the weather. If we combine the returns of 500 companies, then all of these small factors tend to cancel out. This means the S&P 500 return will usually be close to the average. Why are normal distributions so common? Example What forces determine a person’s height? Lots of reasons. There are genetic factors, but things like childhood nutrition or illness also play a role. With lots of small forces that tend to conflict, it’s no surprise that most people tend to be close to average height. Why are normal distributions so common? Isn’t it true that in any data set most of the data will be close to the mean? Absolutely not! Suppose 10 people take a test. Five score 0, and five score 100. The mean is 50, but nobody is close to that. Why do people believe that most of the data in a distribution should be close to the mean? Precisely because normal distributions are so common. Percentiles The median and the first and third quartiles are examples of what are called percentiles. For any number P between 0 and 100, we can find the Pth percentile of a distribution. By definition, P percent of the data is less than the Pth percentile For example, Q1 could also be called the 25th percentile—25% of the data is less than Q1 Percentiles Example The heights of adult men in the US are normally distributed with mean 69.3 in. (5′ 9") and standard deviation 2.9 in. We will see that the 90th percentile of this distribution is: 73.1 in (6′ 1") This tells us that a man who is 6′ 1" is taller than 90% of the men in the US Percentiles Example On the other hand, a man who is 66 in. tall (5′ 6") has a height equal to about the 14th percentile. So 14% of the adult men in the US are less than 5′ 6" tall. Percentiles Percentiles tell us what percent of the data is below a number. What if we want to know what percent is above that number? Example If 14% of adult men in the US are shorter than 5′ 6", what percent are taller than 5′ 6" ? The percent of men shorter than 5′ 6" plus the percent of men taller 5′ 6" adds up to 100% Percentiles Example Why? Think of it this way: any man is either taller than 5′ 6" or he is not. (With an accurate enough ruler we can assume no one is exactly 5′ 6“) (% of men shorter than 5′ 6" ) + (% of men taller than 5′ 6" ) = 100% 14% + (% of men taller than 5′ 6" ) = 100% % of men taller than 5′ 6" = 100% − 14% % of men taller than 5′ 6" = 86% Standard scores For a data value in a normally distributed data set, we can find its percentile by first finding its standard score. Let’s call the data value 𝑥𝑖 . With our usual notation 𝑥 for the mean and 𝑠 for the standard deviation, the standard score (also called the z score) is: 𝑥𝑖 − 𝑥 𝑧= 𝑠 Standard scores Example As I said earlier, the heights of adult men in the US are normally distributed with mean 69.3 in. (5′ 9") and standard deviation 2.9 in. What is the standard score for a man who is 73.1 in (6′ 1") tall? 73.1 − 69.3 𝑧= 2.9 𝑧 = 1.31 Finding percentiles Example The standard score for a man who is 73.1 in (6′ 1") tall is 𝑧 = 1.31. Using the standard score we can consult a table to tell us the percentile. The table uses standard scores rounded to the nearest tenth, so we need to look up the percentile corresponding to 𝑧 = 1.3 Table of percentiles Standard Score –3.4 –3.3 –3.2 –3.1 –3.0 –2.9 –2.8 –2.7 –2.6 –2.5 –2.4 –2.3 –2.2 –2.1 –2.0 –1.9 –1.8 –1.7 –1.6 –1.5 –1.4 –1.3 –1.2 Percentile 0.03 0.05 0.07 0.10 0.13 0.19 0.26 0.35 0.47 0.62 0.82 1.07 1.39 1.79 2.27 2.87 3.59 4.46 5.48 6.68 8.08 9.68 11.51 Standard Score –1.1 –1.0 –0.9 –0.8 –0.7 –0.6 –0.5 –0.4 –0.3 –0.2 –0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 Percentile 13.57 15.87 18.41 21.19 24.20 27.42 30.85 34.46 38.21 42.07 46.02 50.00 53.98 57.93 61.79 65.54 69.15 72.58 75.80 78.81 81.59 84.13 86.43 Standard Score 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 Percentile 88.49 90.32 91.92 93.32 94.52 95.54 96.41 97.13 97.73 98.21 98.61 98.93 99.18 99.38 99.53 99.65 99.74 99.81 99.87 99.90 99.93 99.95 99.97 Finding percentiles Example The percentile is 90, so that means a height of 73.1 in (6′ 1") tall is the 90th percentile of all heights of American men. In other words, a man who is 6′ 1" is taller than 90% of the men in the US. Note Use the same table for standard scores from any data set. Finding percentiles Example Scores on the SAT math exam are normally distributed with a mean of 500 points and a standard deviation of 100 points. What percent of test takers score below 450? What percent are below 600? We need to find the percentiles. Start with the standard scores: 𝑧= 450−500 100 = −0.5 𝑧= 600−500 100 Now find the percentile from the table: =1 Table of percentiles Standard Score –3.4 –3.3 –3.2 –3.1 –3.0 –2.9 –2.8 –2.7 –2.6 –2.5 –2.4 –2.3 –2.2 –2.1 –2.0 –1.9 –1.8 –1.7 –1.6 –1.5 –1.4 –1.3 –1.2 Percentile 0.03 0.05 0.07 0.10 0.13 0.19 0.26 0.35 0.47 0.62 0.82 1.07 1.39 1.79 2.27 2.87 3.59 4.46 5.48 6.68 8.08 9.68 11.51 Standard Score –1.1 –1.0 –0.9 –0.8 –0.7 –0.6 –0.5 –0.4 –0.3 –0.2 –0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 Percentile 13.57 15.87 18.41 21.19 24.20 27.42 30.85 34.46 38.21 42.07 46.02 50.00 53.98 57.93 61.79 65.54 69.15 72.58 75.80 78.81 81.59 84.13 86.43 Standard Score 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 Percentile 88.49 90.32 91.92 93.32 94.52 95.54 96.41 97.13 97.73 98.21 98.61 98.93 99.18 99.38 99.53 99.65 99.74 99.81 99.87 99.90 99.93 99.95 99.97 Finding percentiles Example The percentile corresponding to a standard score of − 0.5 is 30.85, and the percentile corresponding to a standard score of 1 is 84.13 This means that (roughly) 31% of test takers score below 450 on the SAT math exam, and 84% are below 600. What percent of test takers score between 450 and 600? Finding percentiles Example We can find the percent who score between 450 and 600 using the fact that 31% of test takers are below 450 and 84% are below 600. Take the number of people who score below 600 and subtract the number who scored below 450. The result is the number who scored between 450 and 600. The same is true for percentages: (% below 600)-(% below 450)= (% between 450 and 600) So the percent who score between 450 and 600 is: 84% − 31% = 53% Comparing percentiles Using the mean and standard deviation of a normal distribution, we can find the percentile of any data value from that distribution. Percentiles are also very useful for comparing data values from different distributions. Is a 600 on the SAT math test better or worse than a 26 on the ACT math test? We can’t compare the numbers—the SAT is out of 800 and the ACT is out of 36. Comparing percentiles Instead of comparing the numbers, we compare these test scores using percentiles. Scores on the SAT are normally distributed with a mean of 500 and standard deviation of 100. Scores on the ACT are normally distributed with a mean of 18 and standard deviation of 6. Find the standard scores: 600 − 500 =1 100 26 − 18 8 = ≈ 1.3 6 6 Comparing percentiles From the table, a standard score of 1 is the 84th percentile, while a standard score of 1.3 is the 90th percentile. So a 26 on the ACT is better than a 600 on the SAT. In what sense is it a better score? Percentiles describe these scores relative to all the other test takers. Scoring higher than 90% of the people who took a test is better than scoring higher than 84%. Another normal distribution One of the most important examples of a normal distribution is the sampling distribution of statistics In a sample survey, we choose a sample and compute a statistic. A different sample would have given a different statistic. If we consider every possible sample, we would have a distribution of statistics (which numbers occur, and how often they occur). Another normal distribution It turns out that if our sample size is large enough, the distribution of statistics will be normal. What is a large enough sample? A general rule of thumb is a sample size of 30. Another normal distribution Example In 2012 Barack Obama won the presidential election with 51.1% of the vote. In the run up to the election, there were many polls of likely voters. These polls were producing statistics to estimate a parameter: the proportion of all voters who were going to vote for Obama. Now, we know the value of this parameter to be 51.1% Another normal distribution Example Suppose a polling company sampled 100 voters before the election. It turns out that the distribution of statistics for a sample size of 100 is normal with mean 51.1% and standard deviation 5%. (We’ll see the formulas for these later in the course.) As decimals these are 0.511 and 0.05. Another normal distribution Example In what percent of samples of 100 would more than 50% of the sample support Obama? This is a question about percentiles. As a decimal 50% is 0.5 Find the standard score: 0.5 − 0.511 = −0.2 0.05 Another normal distribution Example From the table, the corresponding percentile is 42.07. This means that in 42.07% of samples of 100 voters, the proportion who supported Obama would have been less than 50%. To answer our question we must subtract the percentile from 100: 100 − 42.07 = 57.93 Another normal distribution Example So, in about 58% of the possible samples of 100 voters, we would have seen more than 50% of the sample supporting Obama. Another normal distribution This graphic illustrates the idea of a sampling distribution of statistics (denoted 𝑝). Imagine taking many samples of size 100 from a population with parameter 0.511.