Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Math 3307 Module 1: Representations of Data Descriptive Statistics Central Tendency Spread Fractiles Rates of change Z score Representations Dot diagrams Charts Stem and Leaf Plots Box and Whisker Plots Scatter Plots 1 Descriptive Statistics Descriptive statistics take raw data and present it in a way that highlights the important material WITHOUT drawing inferences or generalizations for the viewer. Predictive statistics provide a means to make a judgment, a prediction, or an inference about a situation. For example, taking the 2010 census data and noting that the population of the United States is now 307, 006, 550 people and reporting this is descriptive. Even noting that in 2000 the population was 281, 421, 906 and the new number is 1.091 times larger than the populations in 2009 is descriptive. Taking these numbers, plotting a linear regression line and predicting the population of the USA in 2015 is NOT descriptive, it is making an estimate, a prediction. Making generalizations is NOT descriptive either. If you come to a generalization about a situation, you are out of the area of descriptive statistics. Problem DS1 Which of the following conclusions may be obtained from the following data by purely descriptive methods and which require generalizations? A student in my Spring Pre-calculus class took 4 consecutive daily quizzes and got the following scores: 3, 8, 10, and 12. a.) b.) c.) d.) On only 1 day did he get less than 5 right. The student’s number correct increased on each successive quiz. The student got better at guessing what I was going to ask each day. On the last day the student copied his answers from his neighbor. Problem DS2 Smith and Jones are hairdressers. On a recent day, Smith cut the hair of 4 male clients and 2 female clients. While Jones cut hair on 3 males and 3 females. a.) b.) c.) d.) The amount of time it takes Smith and Jones to do a haircut is approximately the same. Smith always cuts hair on more males than females. The two always have the same number of clients per day. Over a week, Smith averages 6 clients a day. 2 Problem DS3 Which of the following conclusions can be obtained by descriptive methods and which require generalizations? Driving the same model of car, 5 different drivers averaged 15.5, 14.7, 16.0, 15.5 and 14.8 mpg. a.) b.) c.) d.) None of the drivers averaged more than 16 mpg. The second driver must have driven on rural roads. 15.5 is the average mpg most often achieved. The third driver drove faster than the other 3. Typical types of summaries of data Measures of Central Tendency – these are the numbers that describe what is normal, usual, and in the middle or the center. These terms are very loose and need firming up mathematically, of course. The most popular measure of “centeredness” is the Mean (sometimes called the average). The mean of n numbers is the sum of the numbers divided by n. If you are working with a data set of measurements, the mean is denoted: x . There are some very cogent reasons for it’s popularity: It can always be calculated and it’s easy to calculate. It is unique: there is only ONE mean for a data set. It uses EVERY data point; nothing is eliminated. It doesn’t depend on chance or luck. There are some equally important reasons to take the mean with a grain of salt: It is heavily affected by outliers! Recall the data on the number of pets owned by the 3307 population! 3 Problem CT1 An elevator in PGH is designed to carry a maximum load of 3,200 pounds. If it is loaded with 18 people with a mean weight of 166 pounds, is it in any danger of being overloaded? Weighted Mean Sometimes each data point is not “equal” in weight, meaning some have more importance than others. For example, in my Math 3379 class there are 4 papers; the first 3 are 10% of the grade and the fourth is a term paper worth 20% of the grade. In order to calculate a student’s average on this 50% of the course grade, I would take the 3 grades and TWICE the term paper grade and divide by 5. Note that you use “proportional” multiplication to even things up! Problem CT2 Having received a bonus of $20,000 for accepting early retirement, a company’s sales representative invested $6,000 in a bond paying 3.75%, $10,000 in a mutual fund paying 3.96%, and $4,000 in a CD paying 3.25%. Find the weighted mean of these percentages. Problem CT3 A lecturer counts the final exam in a course 4 times as much as each of the 3 small exams during the semester. Which of the following students has the higher average? Mikey Lizbeth Test 1 72 81 Test 2 80 87 Test 3 65 75 Final 82 78 4 Problem CT4 A home appliance store has the following inventory: Refrigerator A B C D E A. B. C. # in stock 18 12 9 14 25 Size in CuFt 15 21 19 21 24 Price - $ 416 549 649 716 799 What is the average size of these refrigerators? What will the average income per unit if they sell them all? What is the average price for a refrigerator? Another measure of central tendency is the Median: The median is the value that is at the numerical middle of the data if there are an odd number of data points and they are arranged in order by size. It is the mean of the 2 middle data points if the number of data points is even and arranged in order by size. The formula for finding the location of the median for n data points is 0.5(n + 1). The process is to order the data and then find the measurement at that location. Problem CT5 In golf the holes are rated for a recommended number of strokes needed to sink the golf ball into the hole. A score of par means the golfer used the recommended number, a birdie is one fewer than recommended, a bogey is one more than the recommended number, an eagle is 2 fewer strokes. At a recent televised tournament, 7 golfers had the following scores, ranked alphabetically by last name: par, birdie, par, par, birdie, bogey, and eagle. What was the median score? 5 Problem CT6 Find the median location for A. B. n = 19 n = 52 The final measure of central tendency is the Mode. This is the number that occurs most frequently in a data set. Problem CT7 What is the mode for the data in Problem CT5? The Mode is the measurement in the data set that occurs most often. Problem CT8 Which of the following bars shows the mode in this histogram? Age and saying No Number of No's per hour 6 5 4 3 Series1 2 1 0 1 2 3 4 5 6 Age 6 Relationships among Mean, Median, and Mode: Problem CT 9 x axis STTR STTL 1 2 3 4 5 6 7 8 9 10 1 2 4 5 4 3 2 2 1 1 Symm 1 2 3 4 5 6 8 5 4 3 1 2 3 4 5 5 4 3 2 1 Calculate mean, median, and mode for these 3 charts. Mark on the x-axis where each goes. Skewed to the right 6 5 4 3 Series1 2 1 0 1 2 3 4 5 6 7 8 9 10 7 Skewed to the left 9 8 7 6 5 Series1 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 Symmetric 6 5 4 3 Series1 2 1 0 1 2 3 4 5 6 7 8 9 10 Summarize your results with a mnemonic device. 8 Which measurement is most sensitive to outliers? Mean or Median? What does it mean to say “most sensitive” Discuss this idea using the salaries of baseball players. Problem CT 10 The data shown in the table are the median prices of existing homes in the USA from 1981 through 1986. If the average prices of existing homes were calculated for each of these years, how do you think these values would compare to the median prices shown? Would the average price be higher, lower, or the same? Year 1981 1982 1983 1984 1985 1986 Median 66,460 67,800 70,300 72,400 75,500 80,300 9 Problem CT 11 Car A 27.9 30.4 30.6 31.4 31.7 Car B 31.2 28.7 31.3 28.7 31.3 Car C 28.6 29.1 28.5 32.1 29.7 Above is mileage data from 3 compact cars on 5 trials each. Each car was manufactured by a different car company. If the manufacturers of Car A want to advertise how fuel efficient their car is, what statistics might they use to substantiate their claim? If the manufacturers of Car B want to advertise how fule efficient their car is, what statistics might they use to substantiate their claim? What about the maker of Car C? 10 Measures of Variability A measure of variability is a number that describes the spread or the variety of measurements in a data set. The range of a data set is equal to the largest measurement minus the smallest measurement. The sample variance is calculated with the following formula for n data points: s 2 ( x x) 2 n 1 First calculate the sample mean, then subtract the mean from each measurement individually and square the answer. Add up all the squares and divide by n 1. The standard deviation for a set of data is the square root of the variance: s. Problem MV 1 Calculate the mean for each sample below. Calculate the variance for each sample. Discuss the information available in the variance. N=5 1.2 1 0.8 0.6 Series1 0.4 0.2 0 1 2 3 4 5 11 N=5 3.5 3 2.5 2 Series1 1.5 1 0.5 0 1 2 3 4 5 Problem MV 2 Here is a data set: (8, 2, 2, 7, 4, 6, 5, 3, 4) Describe this data set using mean, median, mode, range, and standard deviation. Problem MV 3 Three sets of data are shown below. What are the number of data points in each set? What is the mean for each set (do this WITHOUT a calculator!). Rank the sets from the most variable to the least variable and tell why you made those choices. (again: calculator free). Hint: use the formula for variance to help you reason it out! s 2 ( x x) 2 n 1 12 Data set 1 7 6 Frequency 5 4 Series1 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 Measurement Data Set 2 6 Frequency 5 4 3 Series1 2 1 0 1 2 3 4 5 6 7 8 9 10 11 Measurement 13 Frequency Data Set 3 10 9 8 7 6 5 4 3 2 1 0 Series1 1 2 3 4 5 6 7 8 9 10 11 Measurement Problem MV 4 Consider the following 2 samples: Sample A: 10, 0, 1, 9, 10, 0 Sample B: 0, 5, 10, 5, 5, 5 Describe these data sets using mean, median, mode, range, and variance. What statistics are the same and what statistics are different. Which data set is the more variable and why? Which is the better predictor of variability: range or variance? Grouped Data for Variance calculations If f is the frequency of a data measurement, then the following formula calculates the variance for the data: n s2 f ( x x) i 1 i 2 i n 1 14 Problem MV 5 The data in the following table are for the inner diameters of some tubes manufactured by a machine. This table is called a “distribution” because it gives the values and their frequency. Find the mean diameter and the variance for the tubes. D, inches frequency 2.0 2 2.2 4 2.3 6 2.8 3 3.0 5 Problem MV 7 The following table is a distribution of the top speeds in mph at which 30 racers were clocked in an auto race. Find the mean and variance for the race. Top Speed Number of racers 145 9 150 8 160 11 170 2 15 Fractiles and Percentages A fractile ranking means that a given number of measurements lie below the given measurement and a given number above. Suppose your child comes home to tell you that she’s in the 90th percentile of her class on a particular test. This means that 90% of the children have lower scores or the same score as she does and 10% have higher scores. You do need to be a little careful with these measurements of relative ranking, though. It could be that 91% of the children failed the test and 9% passed. In this scenario, of course, being in the 90% percentile isn’t much to brag about. You need absolute measures AND relative measures to evaluate a situation about fractiles. Deciles divide the measurements into 10ths and quartiles divide the measurements into quarters. The median is both a decile and a quartile ranking. Let’s look at quartiles: Q1 is the median of all measurements less than the median of the data set. Q3 is the median of all measurements greater than the median of the data set. Problem FP 1 The 21 meetings of the West U Orchid Breeders club had the following attendances: 22, 24, 23, 24, 27, 25, 24, 19, 24, 26, 28, 32, 21, 24, 25, 23, 26, 25, 18, 24 Find all 3 measure of central tendency, Q1, Q3, and the standard deviation for the data set. Problem FP 2 Find the positions of the median, Q1, and Q3 for A. B. n = 32 n = 35 16 Problem FP 3 The following numbers are weekly lumber production (in million board feet) for a company in Oregon. Find the first quartile and the 90th percentile for the data. 390 406 447 410 370 338 410 320 359 392 315 480 17 Percentage change in a measurement: The percent change in a measurement is often of interest to managers, doctors, and teachers. It is used as a measure of efficacy. The calculation is final - initial initial Suppose you have a student who was reading poorly – 15 words a minute. You train the student using your favorite method and test him again to find him reading 27 words a minute. The percent change is 27 15 15 which is 80%. You would then report an 80% improvement in speed. Problem PC 1 You’ve been looking at a sweater in the store but it costs $135 and that’s too much. BUT one day you go and check and it’s been marked down to $65…what is the percent change? Problem PC2 A student has been working with a tutor on his math skills. His weekly quiz average was a 65% when he started with the help program. His quizzes are 30 points each. During the program his weekly grades are 20, 23, 21, 28, 27, 29 What is the percent change in his average? Would you say that the tutoring helped? 18 Z score Z scores are used on data that are collected from populations that have a normal distribution for the property under scrutiny. A z-score tells you how far from the mean a particular measurement is. A z-score is calculated with the following formula: z x where x is a particular data measurement and the other 2 symbols stand for the mean and standard deviation of a particular population. Note that standard deviation is the square root of variance. Sketch a normal distribution here: The Empirical Rule for normally distributed data: Approximately 68.3 percent of the observations will fall within one standard deviation of the mean ( x s ). Approximately 95.4 percent of the observations will fall within 2 standard deviations of the mean. Approximately 99.7 percent of the observations will fall within 3 standard deviations of the mean. A rough estimate of the range is the mean +/ 3 standard deviations. Why is this true? ZS Problem 1 If you have 2 students applying for entrance to a G&T program and you have room for only one, which one will you pick based on the following test information? Gina got a 78 on a test with an average of 72 and a standard deviation of 5. Mike got an 87 on a test with an average of 85 and standard deviation 1.5. Who is the stronger student and how do you know? 19 ZS Problem 2 Given the following distribution Measurement 1 2 3 4 5 6 7 8 9 10 11 12 13 number 0 3 1 5 2 7 5 6 3 0 1 0 2 Discuss the measures of central tendency mean median mode the measures of variability range variance standard deviation and give the z score for the measurement 7. Verify the Empirical Rule by making a dot or bar chart of the data and marking off where each of the standard deviations from the mean are. (s, 2s, 3s) 20 ZS Problem 3 The mean salary of the employees at a high school in Missouri is $28, 500 with a standard deviation of $2,100. Discuss the Empirical Rule and who might fit where on a bar chart of employee salaries. The state announces a flat raise of $500 per employee for the next year. Find the mean and standard deviation of the new salaries. Who will benefit the most in a percentage change analysis? ZS Problem 4 Given that the mean is 90 and the standard deviation is 1.4 give the numbers of the 2,000 data points that should be within 1, 2, and 3 standard deviations of the mean. Then count the numbers that actually ARE within these bounds. 21 Value Frequency 0 1 1 2 2 4 3 8 4 20 5 35 6 60 7 120 8 25 9 500 10 1000 ZS Problem 5 For 50 days, the number of vehicles using a particular road was tracked by a city engineer. She found that the mean was 385 and the standard deviation was 15 vehicles. Suppose you are interested in opening a franchise shop along the road and you know you need traffic between 340 and 430 cars per day to be successful. How many days have this much traffic? Is this a good location or a marginal location? 22 ZS Problem 6: Analyze the following nuclear reactor data (@2010) In operation Country Electr. net output MW 2 1 7 2 2 18 935 375 5,926 1,884 1,906 12,569 1 1 2 - Electr. net output MW 692 1,245 1,906 - 13 10,048 27 27,230 6 6 4 58 17 4 20 54 21 2 1 2 2 32 4 1 2 8 10 5 6 15 19 104 442 4,980 3,722 2,716 63,130 20,490 1,889 4,391 46,823 18,665 1,300 487 425 1,300 22,693 1,792 666 1,800 7,514 9,303 3,238 4,980 13,107 10,137 100,747 374,958 2 1 1 5 1 2 5 1 11 2 2 2 1 65 2,600 Number Argentina Armenia Belgium Brazil Bulgaria Canada China Mainland Taiwan Czech Republic Finland France Germany Hungary India Iran Japan Korea, Republic Mexico Netherlands Pakistan Romania Russian Federation Slovakian Republic Slovenia South Africa Spain Sweden Switzerland Taiwan Ukraine United Kingdom USA Total Under construction Number 1,600 1,600 3,564 915 2,650 5,560 300 9,153 782 2,600 1,900 1,165 62,862 23 Work: Some thoughts: A histogram for the number per country? Calculate the measures of center, the variability Check the Empirical Rule? An average output for each reactor? A z-score for the USA? 24 Representations Dot diagrams Charts Stem and Leaf Plots Box and Whisker Plots Scatter Plots A. Dot diagrams: These summarize data visually and quickly. Put one dot for each observation. Note that you don’t need to sort the data to make a dot diagram. For example: If I toss a die 6 times and get: 1 4 5 6 1 2 I’d put a horizontal line down and mark off the 6 possible numbers and then put a dot above each recorded value: 25 DD Problem 1 2150132071342412251343110241132352244 This data summarizes the number of times per week that a small regional airport with 48 flights per day that there are delayed takeoffs. Make a dot diagram and analyze the data completely. Dot diagrams are also useful with qualitative or categorical data. DD Problem 2: At a recent televised tournament, 7 golfers had the following scores, ranked alphabetically by last name: par, birdie, par, par, birdie, bogey, and eagle. Analyze this with a dot diagram. 26 B. Charts Example: Here is a distribution of information about Americans aged 18 or older: Marital status Percent Single Count In Millions 41.8 Married 113.3 61.1 Widowed 13.9 7.5 Divorced 16.3 8.8 22.6 There are a couple of ways to display this information graphically. One is a histogram or bar chart and another is a pie chart. Pie chart 27 Histogram Why was it important to use the percentages and not the raw counts? 28 Charts Problem 1 Here’s some 2000 Census Data – percent of the population by state. Note that it is not quite strictly descending order – the data was in descending order during the 1990 census and when I cut out the intervening years – since some states have lower percentages than in 1990, they got “out of order” How would you display this data in a small box in the middle of a report? You don’t want visual distortion; you do want to avoid a histogram with 51 bars, though! April 1, 2000 United States 281,421,906 % 1 California 33,871,648 11.04 2 Texas 20,851,820 7.41 3 New York 18,976,457 6.74 4 Florida 15,982,378 5.68 5 Illinois 12,419,293 4.41 6 Pennsylvania 12,281,054 4.36 7 Ohio 11,353,140 4.03 8 Michigan 9,938,444 3.53 9 Georgia 8,186,453 2.91 10 North Carolina 8,049,313 2.86 11 New Jersey 8,414,350 2.99 12 Virginia 7,078,515 2.52 13 Washington 5,894,121 2.09 14 Arizona 5,130,632 1.82 15 Massachusetts 6,349,097 2.26 16 Indiana 6,080,485 2.16 17 Tennessee 5,689,283 2.02 18 Missouri 5,595,211 1.99 19 Maryland 5,296,486 1.88 20 Wisconsin 5,363,675 1.91 21 Minnesota 4,919,479 1.75 22 Colorado 4,301,261 1.53 23 Alabama 4,447,100 1.58 24 South Carolina 4,012,012 1.43 29 25 Louisiana 4,468,976 1.59 26 Kentucky 4,041,769 1.44 27 Oregon 3,421,399 1.22 28 Oklahoma 3,450,654 1.23 29 Connecticut 3,405,565 1.21 30 Iowa 2,926,324 1.04 31 Mississippi 2,844,658 1.01 32 Arkansas 2,673,400 0.95 33 Kansas 2,688,418 0.96 34 Utah 2,233,169 0.79 35 Nevada 1,998,257 0.71 36 New Mexico 1,819,046 0.65 37 West Virginia 1,808,344 0.64 38 Nebraska 1,711,263 0.61 39 Idaho 1,293,953 0.46 40 New Hampshire 1,235,786 0.44 41 Maine 1,274,923 0.45 42 Hawaii 1,211,537 0.43 43 Rhode Island 1,048,319 0.37 44 Montana 902,195 0.32 45 Delaware 783,600 0.28 46 South Dakota 754,844 0.27 47 Alaska 626,932 0.22 48 North Dakota 642,200 0.23 49 Vermont 608,827 0.22 50 District of Columbia 572,059 0.20 51 Wyoming 493,782 0.18 0.00 0.00 Puerto Rico 3,808,610 1.35 30 Charts Problem 2 United States AGE DISTRIBUTION When drawn as a "population pyramid," age distribution can hint at patterns of growth. A top heavy pyramid, like the one for Grant County, North Dakota, suggests negative population growth that might be due to any number of factors, including high death rates, low birth rates, and increased emigration from the area. A bottom heavy pyramid, like the one drawn for Orange County, Florida, suggests high birthrates, falling or stable death rates, and the potential for rapid population growth. But most areas fall somewhere between these two extremes and have a population pyramid that resembles a square, indicating slow and sustained growth with the birth rate exceeding the death rate, though not by a great margin. Discuss this representation of ages from the census 10 years ago. What kind of difficulties did the authors overcome with this particular version of a histogram? What kinds of ancillary information can be drawn from this data? 31 Charts Problem 3 Although there have been advances in medical technology and donation, the demand for organ, eye and tissue donation still vastly exceeds the number of donors. More than 100,000 men, women and children currently need life-saving organ transplants. Every 10 minutes another name is added to the national organ transplant waiting list. An average of 18 people die each day from the lack of available organs for transplant. In 2009, there were 8,021 deceased organ donors and 6,610 living organ donors resulting in 28,465 organ transplants. Last year, more than 42,000 grafts were made available for transplant by eye banks within the United States. According to research, 98% of all adults have heard about organ donation and 86% have heard of tissue donation. 90% of Americans say they support donation, but only 30% know the essential steps to take to be a donor. Statistics 110,541 Patients Waiting* 60,758 Multicultural Patients* 1,785 Pediatric Patients* 28,663 Organ Transplants Performed in 2010 14,502 Organ Donors in 2010 32 Waiting list candidates as of 5pm 6/13/11 All Kidney 111,671 89,060 Pancreas 1,369 Kidney/Pancreas 2,191 Liver 16,291 Intestine 266 Heart 3,178 Lung 1,770 Heart/Lung 66 All candidates will be less than the sum due to candidates waiting for multiple organs Transplants performed January - March 2011 Total 6,709 Deceased Donor 5,276 Living Donor 1,433 Based on OPTN data as of 06/03/2011 Donors recovered January - March 2011 Total 3,346 Deceased Donor 1,921 Living Donor 1,425 Based on OPTN data as of 06/03/2011 Let’s try to think of a more compelling way to present this data. How would you arrange this information in a more visual style? 33 Presentation: 34 Charts Problem 4 Fifty-four candidates entering an astronaut training program were given a psychological profile test measuring bravery. NASA grouped the data to make it more compact. Note that the scores are grouped into units of the SAME length. Why is this important? Would you present this as a pie chart? A dot diagram? A bar chart or histogram? Score in points # of candidates 60 - 79 8 80 - 99 16 100 - 119 18 120 - 139 8 140 - 159 6 What do you think about the extreme values on the results? 35 C. Stem and Leaf Plots An improvement on dot diagrams, stem and leaf plots work on data with many various measurements. It is fairly low tech and can be quickly done in a meeting or on the fly. I find them exceptionally useful in small classes (n < 50) for a quick grade analysis. The stems are the 10’s and the leaves are the single digits in each day’s total. It can be useful to organize the leaves in order, too. Here is one of my classes, a final: 10 123 09 45779 08 327758 07 459 06 78 BELOW 1111 Turn the page sideways (clockwise)…note the resemblance to a dot diagram! What does this tell you about my class? Note that in each case, there was somebody pretty close to the next level. What grade is “BELOW”? Sometimes if the data is unusually condensed, you might split the stems making more rows rather than fewer rows. Here are some quiz grades out of 130 points: 112 114 114 116 118 119 120 121 122 123 124 125 125 126 127 127 129 The best data presentation is to show 110 – 114, 115 – 119, 120 – 124, 125 – 129 rather than just 2 stems with LOOOOONG leaf lines: 11 244 11 689 12 01234 12 556779 Note that the stems are now both a hundreds and a tens digit! 36 SL Problem 1 -A hotel has 85 rooms. In February of last year they had the following rental statistics: 75 79 37 57 60 64 35 73 62 81 43 72 78 54 69 75 78 49 59 80 58 76 52 49 42 62 81 77 Produce a stem and leaf plot of this data. SL Problem 2 The following weights are ounces packed in 30 one pound bags. Display the data and analyze the data. 15.6 15.9 16.2 16.0 15.6 15.9 16.0 15.6 15.6 16.0 1506 15.9 16.2 15.6 16.2 16.0 15.8 15.9 16.2 15.8 15.8 16.2 16.2 16.0 16.2 15.9 16.2 15.8 16.2 16.0 37 SL Problem 3 Decide which representation you’d like to use with this data to show the age of the presidents at inauguration. Consider doing a time plot*, too. Are we electing younger people than earlier in our history? How could you present the categorical data? Party affliation, home state, religion… *a chronological presentation with time on the x axis. Presidents Find information about U.S. presidents, including party affiliation, term in office, age at inauguration, age at death, and more. State of birth Born Died 1789– 1797 Va. 2/22/1732 12/14/1799 Episcopalian J. Adams (F) 1797– 1801 Mass. 10/30/1735 7/4/1826 3. Jefferson (DR) 1801– 1809 Va. 4/13/1743 4. Madison (DR) 1809– 1817 Va. 5. Monroe (DR) 1817– 1825 6. 7. Name and (party)1 Term 1. Washington (F)3 2. Religion2 Age Age at at inaug. death 57 67 Unitarian 61 90 7/4/1826 Deist 57 83 3/16/1751 6/28/1836 Episcopalian 57 85 Va. 4/28/1758 7/4/1831 Episcopalian 58 73 J. Q. Adams 1825– (DR) 1829 Mass. 7/11/1767 2/23/1848 Unitarian 57 80 Jackson (D) S.C. 3/15/1767 6/8/1845 Presbyterian 61 78 1829– 38 1837 8. Van Buren (D) 1837– 1841 N.Y. 12/5/1782 7/24/1862 Reformed Dutch 54 79 9. W. H. Harrison (W)4 1841 Va. 2/9/1773 4/4/1841 Episcopalian 68 68 10. Tyler (W) 1841– 1845 Va. 3/29/1790 1/18/1862 Episcopalian 51 71 11. Polk (D) 1845– 1849 N.C. 11/2/1795 6/15/1849 Methodist 49 53 12. Taylor (W)4 1849– 1850 Va. 11/24/1784 7/9/1850 Episcopalian 64 65 13. Fillmore (W) 1850– 1853 N.Y. 1/7/1800 Unitarian 50 74 14. Pierce (D) 1853– 1857 N.H. 11/23/1804 10/8/1869 Episcopalian 48 64 Buchanan (D) 1857– 1861 Pa. 4/23/1791 6/1/1868 Presbyterian 65 77 16. Lincoln (R)5 1861– 1865 Ky. 2/12/1809 4/15/1865 Liberal 52 56 A. Johnson (U)6 1865– 1869 N.C. 12/29/1808 7/31/1875 (7) 56 66 18. Grant (R) 1869– 1877 Ohio 4/27/1822 7/23/1885 Methodist 46 63 19. Hayes (R) 1877– 1881 Ohio 10/4/1822 1/17/1893 Methodist 54 70 20. Garfield (R)5 1881 Ohio 11/19/1831 9/19/1881 Disciples of Christ 49 49 21. Arthur (R) 1881– 1885 Vt. 10/5/1829 11/18/1886 Episcopalian 50 56 22. Cleveland (D) 1885– 1889 N.J. 3/18/1837 6/24/1908 Presbyterian 47 71 23. B. Harrison (R) 1889– 1893 Ohio 8/20/1833 3/13/1901 Presbyterian 55 67 1893– N.J. 3/18/1837 6/24/1908 Presbyterian 55 71 15. 17. 24. Cleveland 3/8/1874 39 (D)8 1897 25. McKinley (R)5 1897– 1901 Ohio 1/29/1843 26. T. Roosevelt 1901– (R) 1909 N.Y. 27. Taft (R) 1909– 1913 28. Wilson (D) 9/14/1901 Methodist 54 58 10/27/1858 1/6/1919 Reformed Dutch 42 60 Ohio 9/15/1857 Unitarian 51 72 1913– 1921 Va. 12/28/1856 2/3/1924 Presbyterian 56 67 29. Harding (R)4 1921– 1923 Ohio 11/2/1865 8/2/1923 Baptist 55 57 30. Coolidge (R) 1923– 1929 Vt. 7/4/1872 1/5/1933 Congregationalist 51 60 31. Hoover (R) 1929– 1933 Iowa 8/10/1874 10/20/1964 Quaker 54 90 F. D. 32. Roosevelt (D)4 1933– 1945 N.Y. 1/30/1882 4/12/1945 51 63 33. Truman (D) 1945– 1953 Mo. 5/8/1884 12/26/1972 Baptist 60 88 34. Eisenhower (R) 1953– 1961 Tex. 10/14/1890 3/28/1969 62 78 35. Kennedy (D)5 1961– 1963 Mass. 5/29/1917 11/22/1963 Roman Catholic 43 46 36. L. B. Johnson (D) 1963– 1969 Tex. 8/27/1908 1/22/1973 Disciples of Christ 55 64 37. Nixon (R)9 1969– 1974 Calif. 1/9/1913 4/22/1994 Quaker 56 81 38. Ford (R) 1974– 1977 Neb. 7/14/1913 12/26/2006 Episcopalian 61 — 39. Carter (D) 1977– 1981 Ga. 10/1/1924 — Southern Baptist 52 — 40. Reagan (R) 1981– 1989 Ill. 2/6/1911 6/5/2004 Disciples of Christ 69 93 3/8/1930 Episcopalian Presbyterian 40 1989– 1993 Mass. 6/12/1924 — Episcopalian 64 — 1993– 2001 Ark. 8/19/1946 — Baptist 46 — G. W. Bush (R) 2001– 2009 Conn. July 6, 1946 — Methodist 54 — 44. Obama (D) 2009– Hawaii Aug. 4, 1961 — United Church of Christ 47 41. G.H.W. Bush (R) 42. Clinton (D) 43. NOTE: 1. F—Federalist; DR—Democratic-Republican; D—Democratic; W—Whig; R—Republican; U—Union. 2. Religious affiliation at election. Several presidents changed religions during their lifetimes. 3. No party for first election. The party system in the U.S. made its appearance during Washington's first term. 4. Died in office. 5. Assassinated in office. 6. The Republican National Convention of 1864 adopted the name Union Party. It renominated Lincoln for president; for vice president it nominated Johnson, a War Democrat. Although frequently listed as a Republican vice president and president, Johnson undoubtedly considered himself strictly a member of the Union Party. When that party broke apart after 1868, he returned to the Democratic Party. 7. Johnson was not a professed church member; however, he admired the Baptist principles of church government. 8. Second nonconsecutive term. 9. Resigned Aug. 9, 1974. 41 42 D. Box and Whisker plots: Disability Adjusted Life Expectancy 1999 age freq 25 1 30 9 35 11 40 11 45 10 50 40 55 37 60 33 65 20 70 15 75 4 191 Sierra Leone: 29.5 Japan: 73.8 The UN calculated the Disability Adjusted Life Expectancy for citizens of the 191 member countries in 1999. The table above is their findings. Five number summary: Max Q3 Median Q1 Min IQR: interquartile range (upper quartile – lower quartile) Upper fence: 1.5 IQR above Q3; Lower fence: 1.5 IQR below Q2. [never really show these in your box plot] 43 Box: Q3, Median, Q2. Make a rectangle; any width works fine Whiskers: lines to the most extreme value inside the fences, top with horizontal “stop” Show asterisks as outlier values. Vertical display: Horizontal display under histogram: 44 BW01 Here is some pre-lesson grades (10 points) plotted with a “double stem”…each 10’s category is broken into 2 parts: 0 – 4 and 5 - 9 3 4 4 5 5 6 6 7 7 8 57 0023 5666899 234 56789 1224 9 23 8 1 Recreate the first 4 measurements from either end: Find Q1 and Q3 and the Median Find the “fences” Do a horizontal box and whisker plot. Are there any outliers in this data? 45 BW02 Comparing groups with box and whisker plots. A student designed an experiment to test the efficiency of 4 coffee containers from different manufacturers by pouring coffee at 180 into each container and then measuring the temperature difference after 30 minutes. She did the experiment 5 times – using different cups of the same type each time (she didn’t reuse any of the cups). So she used 20 cups total, 5 from each manufacturer. The 5 number summary average temperature differences are in the table below Min Q1 Median Q3 Max IQR Cup 1 6F 6 8.33 14.25 18.5 8.25 Cup 2 0F 1 2 4.5 7 3.5 Cup 3 9F 11.5 14.25 21.75 24.5 10.25 Cup 4 6F 6.50 8.50 14.25 17.5 7.75 Using VERTICAL box and whisker diagrams and a vertical axis of Temperature Change, Compare the data. Which cup has the best heat retention property? 46 Scatter Plots, Time Plots, and Line Plots Plotting: Problem 1 The Bureau of Labor Statistics tracks the buying power of our currency by using a fixed basket of goods and services. It prices the items and records how much the same items cost over time. The base period is the average cost of the basket for some given period of time of the time series. The base period for the following data is 1982 – 1984 – the basket costs about $100 for the time period. 1970 1973 1976 1979 1982 1985 1988 $39 $44 $57 $73 $97 $108 $118 Plot the data and see if you can discuss both trend and rate of change of the purchasing power of $100. 47 Plotting: Problem 2 Deaths from cancer: 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 120 per 100,000 people 137 140 147 148 153 160 170 182 190 201 Plot the data and think critically about whether cancer is getting much, much more prevalent or if there’s something else going on socially, too! 48 Plotting: Problem 3 Sometimes 2 different views can each provide information for decisions. Here are 20 measurements, taken over 20 hours IN ORDER. They measure the tension on a wire grid behind an electronic display. If the tension is too high or too low, the display quits working for safety reasons. 265.5 297.0 269.6 283.3 304.8 280.4 283.5 257.4 317.5 327.4 264.7 307.7 310.0 343.3 328.1 342.6 338.8 340.1 374.6 336.1 Make a stem plot using 2 digits for the stem PLUS make a time plot. What items of interest to the managers do you see in EACH display. Describe the distributions and what the management might need to do. 49 Scatter plots Here is some data taken after an airport opened near a neighborhood. The first column is the number of weeks since the airport opened and the second column is the sound frequency range to which the person’s hearing will respond. Weeks Range 47 56 116 178 19 75 160 31 12 164 43 74 15.1 14.1 13.2 12.7 14.6 13.8 11.9 14.8 15.3 12.6 14.7 14.0 x 47 56 116 178 19 75 160 31 12 164 43 74 Formula 14.4775 14.32 13.27 12.185 14.9675 13.9875 12.5 14.7575 15.09 12.43 14.5475 14.005 We’ll graph these with the vertical axis going from 12 to 15 and the horizontal axis going from 0 to 200. Linear regression line: y = 0.0175x +15.3 If you plot this line THROUGH your scatter plot it will be APPROXIMATELY the line the data points are going in. (these points are on the right in the table). Naturally the data points will be off the plotted line. How much off is recorded in a statistic called the “r”, regression coefficient. An r of 0 is very bad – your data is basically a cloud. An 4 of 1 is PERFECT, every point is on the line. The r for this data is .88. A negative slope to the line and the points are quite close to the line. 50 18 16 14 Hearing 12 10 Series1 8 6 4 2 0 0 50 100 150 200 Weeks 51 16 14 12 Hearing 10 Series1 Linear (Series1) 8 6 4 2 0 0 50 100 150 200 Weeks 52