Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
What to put on the Board Job Title Salaries Mean Median Describing Distributions with Numbers CHAPTER 2 Travel times (mins.) for 15 workers in North Carolina 30 20 10 40 25 20 10 10 60 15 40 5 30 12 10 0 1 2 3 4 5 6 5 000025 005 00 00 0 Shape, Center, Spread Shape: Skewed right Center: 20 Spread: 5 to 60 Measuring Center MEAN AND MEDIAN Finding the Mean = Average xi x n x1 x2 ... xn X n OR x Sample mean ∑ = “the sum of” n = # in sample Example: Find mean of travel time in North Carolina 30 + 20 + … + 10 15 337 = 22.5 minutes 15 The most common measure of center Sensitive to the influence of extreme observations Outliers pull the mean towards the outlier Skewed data pulls the mean toward the longer tail (in the direction of the skew) Mean is not a resistant measure of center because of this sensitivity Example (find the mean of both) A) 1,2,3,4,5 B) 1,2,3,4,50 Median Median = midpoint Symbol: M Arrange numbers in increasing order and count to center n 1 If n is odd, M is located in the position 2 on the list n 1 If n is even, location of M is also at position but will 2 require you to find the mean of the two center numbers on the list since n 1 will give you a decimal (.5) value 2 Example (find the median of both) 1,2,8,9,15 1,2,8,9 Median is a resistant measure (not influenced by extreme observations) 1,2,3,4,5 1,2,3,4,50 Find the median for North Carolina driver’s REFER BACK TO EXAMPLE 1 5 10 10 10 10 12 15 20 20 25 30 30 40 40 60 (n+1) 2 M= (15 + 1)/2 = 8 Note: the formula does not give you the median, just the location Comparing mean and median In a symmetric distribution, the mean and median are equal x M If the distribution is roughly symmetric, the mean and median are close together. If the distribution is skewed, the mean is farther out in the long tail than the median (mean is past median in the direction of the skew). Cont… Mean and median both give a measure of center. Often one is a better choice than the other depending on the situation and the data set. “average “typical value” usually implies mean value” usually implies median Using the Graphing Calculator When given a larger data set, it might be easier to find the mean and median using your TI-83/84. Steps 1. Hit STAT, Edit (make sure lists are all clear) 2. Enter values in L1 (type number, hit ENTER or ▼) 3. Hit STAT, over to CALC. Choose 1: 1-Var Stats. Hit ENTER. Mean is ( x on top of screen) Use down arrow to scroll down and see median (Med=) Practice Problem: Exit Ticket The Major League Baseball single-season home run record is held by Barry Bonds, who hit 73 in 2001. Below is Bond’s homeruns totals from 1986 to 2004: 16 19 24 25 25 33 33 34 34 37 37 40 42 45 45 46 46 49 73 Bond’s record year is a high outlier. How does his career mean and median number of home runs change when we drop the 73? What general fact about mean and median does your results illustrate? Measuring Spread THE QUARTILES Actuaries Actuaries earned a mean annual wage of $95,420 in 2007, with the top tier hauling down a tasty $145,600. Most (60 percent) are employed in the insurance industry, crunching numbers to determine risks in pension planning, insurance coverage, or investment strategies. That means they need a high math aptitude and financial savvy. However, many only hold a bachelor's degree in math, business or statistics. Dental Hygienists Not all dental hygienists' earnings skyrocket into the six figures -- but there are enough that do, making this a surprisingly rich opportunity for someone who holds only an associate degree. The Labor Department reports that while the median earnings are in the high $60k range, the top-end hygienists found themselves in the $90k range last year. And you can prep for this career in the two-year, online career training program and be loving life in a matter of a few years with experience. What do you think is the median household income?? Average household income: 2004 Census Bureau reported: Median- $44,389 Mean- $60,528 Bottom 10%- less than $10,927 Upper 5%- above $157,185 Median household income (2014), 1 earner to 4 people… Why spread? Mean and median are useful for center but can be misleading or not tell us “the whole story.” We need to also know about the spread and variability in the data. The simplest useful numerical description of a distribution requires both a measure of center and a measure of spread. Range Range = difference of smallest and largest numbers = max – min(SINGLE NUMBER ANSWER) Shows full spread of data, but could involve outliers so not a great choice Improve description by looking at spread of middle half of data as well: Quartiles Use Quartiles Put numbers in increasing order and find M (also called Q2 or 2nd quartile) Find the median of the first half (numbers to left of M) = Q1 (1st quartile) Find median of the second half (numbers to right of M) = Q3 (3rd quartile) Quick Facts 25% of data is below Q1 25% of data is above Q3 75% of data is below Q3 75% of data is above Q1 50% of data is between Q1 & Q3 50% of data is below (or above) Q2 Examples 5, 10, 10, 10, 10, 12, 15, 20, 20, 25, 30, 30, 40, 40, 60 n=15 **If n is odd then M is not included when counting to find Q1 & Q3 Examples cont… 5, 10, 10, 15, 15, 20, 20, 40, 45, 60, 65, 85 n=12 **If n is even, all values are used to count into Q1 & Q3 Five Number Summary Gives a reasonably complete description of center and spread Consists of minimum, Q1, median, Q3, maximum Can be used to make a Box Plot (or Box and Whisker Plot) Try this one Examples: Finding Five Number Summaries 5,10,10,10,10,12,15,20,20,25,30,30,40,40,60 Examples cont… 5, 10, 10, 15, 15, 20, 20, 40, 45, 60, 65, 85 Constructing Box and Whisker Plots 1. Draw a number line (usually by 5’ or 10’s). Place dots above line at each of the five values from your Five Number Summary. 2. Draw a box around Q1 & Q3. 3. Draw a vertical line through M. 4. Draw “whiskers” out to max and min. Can draw more than one plot over the same axis to do a side-by-side comparison of multiple data sets. Called a Stacked Box Plot. Can discuss shape similar to histograms Symmetric = Q1 to Med to Q3 evenly spaced Skewed = Q1 & Q3 not evenly spaced or whiskers uneven in length (which also shows possible outliers) Example Examples: Constructing Box Plots (use data sets above, make stacked plot) Assignment Construct a box plot for each offensive position in the following table and have them stacked. 10 point homework assignment Page 59 in text book(Just do offense) Spotting Suspected Outliers Warm Up Give the five-number summary of the following 19 #s 12, 14, 15, 15, 15, 15, 18, 22, 30, 33, 33, 34, 35, 39, 40, 41, 72, 78, 91 Min- Q1- M- Q3- Max- Finding Outliers Interquartile Range = the spread of the quartiles IQR = Q3 – Q1 Use this value when finding the boundaries for outliers: Upper Bound = Q3 + (IQR x 1.5) Lower Bound = Q1 – (IQR x 1.5) Any data values beyond the boundaries on either end of your list are outliers. Examples NC: 30,20,10,40,25,20,10,60,15,40,5,30,12,10,10 Are there any outliers?? NY: 10,30,5,25,40,20,10,15,30,20,15,20,85,15,65,15 60,60,40,45 On a Box Plot, outliers should be marked with a star (*), then end the whisker on that side at the highest non-outlier value. Why Spot Outliers? The town of Manhattan, Kansas is sometimes called the “Little Apple” to distinguish it from the other Manhattan. A few years ago, a house there appeared in the country appraiser’s records valued at $200,059,000.00. That would be quite a house, even on Manhattan Island. As you might guess, the entry was wrong: the true value was $59,000.00. but before error was discovered, the country, city, and the school board had based their budgets on the total appraised value of real estate, which the one outlier jacked up by 6.5%. It can pay to check for outliers! Measuring Spread: The Standard Deviation Standard deviation= measures spread by looking at how far the observations are from the mean. Formula: 2 s (X X) i n 1 Variance= square of the standard deviation Example Use the following data set to complete the steps below: 41 38 39 45 47 41 Steps 1. To find these measures, the first step is always to find the mean of the data set. 2. Make a chart to complete the rest of the calculations (see below). 3. Subtract the mean from each number in the data set. Make sure to include the positive or negative sign. This is called finding the deviation. It shows us how much each value varies from the mean of the set. For every data set, the sum of this column will always be zero, so we need to take other steps 4. Square each value from step 3 in the next column. Add this column to get a total. 5. Divide this total by one less than the number of entries in the data set n–1. This is called the sample variance. Because it involves a total of squared values, we need to take one last step. 6. Take the square root of the answer from step 5. This Is called the sample standard deviation. This is the final answer of the problem. It is used for: 1. shows how much in general a data set varies from its average 2. shows consistency when comparing data sets (lower SD = more consistent values = closer as a whole to the mean) Example: Data Values Mean= 41 38 39 Value − Mean 45 47 41 Squares Total of Squares=___________ Squares / (n-1)=________(Variance) Square root=______=standard deviation Properties s measures spread about the mean and should only be used when mean is chosen as the measure of center s 0 always. s=0 only when no spread (when all observations have the same value). More spread out = greater s s has the same units as original data values s is not resistant to outliers and skew (like the mean) Cont… Because X and s are sensitive to extreme observations, they can be misleading when data is strongly skewed or has outliers. Because of this: If the distribution is skewed or has outliers, describe the data set with the Five- Number Summary If the distribution is reasonably symmetric and free of outliers, describe with X and s Choosing Measure of Center and Spread Five-Number Summary Mean and Standard Deviation Better for describing a Best for reasonably skewed distribution or a distribution with strong outliers symmetric distributions that are free of outliers Reminder! Remember that a graph gives the best overall picture of a distribution. Numerical measures of center and spread report specific facts about a distribution, but they do not describe its entire shape. Always plot your data! Find the standard deviation (hand in 3pts) 10, 8, 12, 14, 16, 8