Download AP Stats / Topic TWO “Summarizing Distributions” Contents

AP Stats / Topic TWO “Summarizing Distributions” Contents 1. Measuring the center / Measures of central tendency 2. Measuring spread / Measures of variation 3. Empirical rule 4. Measuring position Objectives / SWBAT          Find mean, median, and mode of a set of data, including weighted mean and the mean of a frequency distribution Describe the shape of a distribution as symmetric, skewed, or uniform Compare the mean and median for each shape of a distribution. Find the range, the variance, and standard deviation of a set of data. Find an approximation of the sample standard deviation for grouped data. Find the first, second, and third quartiles of a set of data. Find the interquartile range Display data by using a box plot / whisker plot Find, interpret and comparing Z-scores (standard score) Measuring the center / Measures of central tendency: Mean and Median / Notes  Population Mean (µ is a Greek letter) µ = ∑X / N Sample Mean (ẋ read as “x bar”) ẋ = ∑x / n  Median/ value that lies in the middle of the data when the data is ORDERED  Mode / data entry that occurs with the greatest frequency. If no entry is repeated, dataset has not mode. (bimodal, multimodal). Ordering the data helps to find the mode.  Outlier/ a data entry that is far removed from the other entries in the data set. Outliers cause gaps.  Weighted mean/ mean of data set whose entries have varying weights. ẋ = ∑ (x . w / ∑ w ), where w is the weight of each entry x  Mean of a frequency distribution for a sample is approximated by ẋ = ∑ (x . f / n ), where x and f are the midpoint and frequency of a class, respectively. Note n= ∑ f  Shape of distributions: Symmetric (when a vertical line can be drawn through the middle of a graph of the distribution, and the resulting halves are APPROXIMATELY mirror images). Uniform (or rectangular) when all entries or classes have equal or APPROXIMATELY equal frequencies. A uniform distribution is also symmetric. Skewed when the distribution has a tail extended to the left or to the right. Bell-shaped (mount shaped)  In general the median is at (n+1) / 2 position. If we have 28 entries in order, we will find the median at the (28+1)/ 2 =14.5 th position, that is between the 14th and 15th terms  Median and mean are both measures of center, but sometime we must make the selection of which to use to describe the distribution. To use Mean or Median depend of the SHAPE of the Distribution:  Symmetric and bell-shape: mean and median will be close  Distribution has outliers or is strongly skewed, the median is probably the better choice to describe the center because MEDIAN is a resistant statistic; it’s not dramatically affected by extreme values. The mean is not resistant; it’s dramatically affected by extreme values. Class Examples 1. The prices (in dollars) for a sample of roundtrip flights from Chicago, Illinois to Cancun, Mexico are listed below 872 a) 2. 432 397 427 388 782 397 Find the mean, median, and mode of the data set. The ages of students in a college class are listed below 20 20 20 20 20 20 21 21 21 21 22 22 22 23 23 23 23 24 24 65 a) b) c) d) e) f) Find the mean, median, and the mode. Make a histogram. Indicate measures of central tendency. Which measure of central tendency best describes a typical entry of the data set? Are there any outliers? Remove the data entry of 65 from the preceding data set. Find the mean, median, and the mode. How does the absence of this outlier change each of the measures? Compare these measures with those found in part a). The median is not affected by outliers; it is a particular useful measurement to describe a distribution when the distribution has outliers (extreme values). The mean is affected by outliers. 3. 4. 5. Suppose that the number of unnecessary procedures recommended by five doctors in a 1-month period are given by the set {2, 2, 8, 20, 33}. a. Find the mean and the median. b. If it discovered that the fifth doctor also recommended an additional 25 unnecessary procedures, how will median and mean be affected? You are taking a class in which your grade is determined from five sources: 50% from your test mean, 15% from your midterm, 20% from your final exam, 10% from your computer lab work, and 5% from your homework. Your scores are 86 (test mean), 96 (midterm), 82 (final exam), 96 (computer lab), and 100 (homework). What is the weighted mean of your scores? If the minimum average for an A is 90, did you get an A? Approximate the mean of the following frequency distribution. The data represents number of minutes that a sample of Internet subscribers spent online during their last session. Class Midpoint Frequency 12.5 6 24.5 36.5 10 13 48.5 60.5 8 5 72.5 6 84.5 2 6. Suppose the salaries of six employees are listed below $ 3 000 $7 000 $15 000 $22 000 $23 000 a) What is the mean salary? b) What will the new mean salary be if everyone receives a $3 000 increase? c) What will the new mean salary be if everyone receives a 10% raise? d) Make conclusions $38 000 Adding the same constant to each value increases the mean and median by the same constant. Multiplying each value by the same constant multiplies the mean and median by a like amount. Measuring spread / Measures of variation  Range / Difference between the largest and smallest values. Range gives some impression of the dispersion (spreading). Range is not sensitive to the ones in the middle. We could use range to evaluate samples with very few terms.  Interquartile range/ IQR = Q3 – Q1 . IQR is useful to remove the influence of extreme values or outliers on range. IQR remove the upper and lower quartiles of the values. Represent the range of the middle 50% of the values (or entries).  The numerical rule for distinguish outliers is to calculate 1.5 x IQR and then call a value an outlier if it is more than 1.5 x IRQ below the first quartile or 1.5 x IQR above the third quartile.  Deviation of an entry x in a population data set is the difference between the entry and the mean µ of the data set. Distance between every observation and the mean. The sum of the deviations is zero. To overcome this problem we can square each deviation. (x-ẋ) is called residuals.  Variance by definition is the average squared deviation from the mean. It is a measure of spread because the more distant a value is from the mean, the larger will be the square of the difference between it and the mean.  Population Variance σ2 = ∑ (x-µ)2 / N  Population standard deviation Sq root of σ2 = Sq root ∑ (x-µ)2 / N  Sample variance s2 = ∑ (x-ẋ)2 / n-1  Sample standard deviation s = sq root ∑ (x-ẋ)2 / n-1 (n-1 is representing the number of independent values, not n. If you know n-1 and the mean (x bar), then the nth term is determined.  Standard deviation is a measure of the typical (usual, representative) amount an entry deviates from the mean. The more the entries are spread out, the greater standard deviation. S does give a measure of the spread of the x-values around the sample mean.  (x-ẋ) is called residual and s is a “typical value “ of the residuals.  Variance is measured in squared units. Standard deviation is measured in the same units as are the data.  Standard deviation for grouped data s = sq root ∑ (x-ẋ)2 f / n-1  Sx gives us the sample standard deviation  The definition of standard deviation Class Examples 7. Suppose that the starting salaries (in $1 000) for college graduates who took AP Stats in high school have the following characteristic: the smallest value is 18.8, 10% of the values are below 25.6, 25% are below 41.1, the median is 59.3, 60% are below 84.3, 75% are below 101.9, 90 % are below 118.0, and the top value is 201.7. a. What is the range b. What is the IQR? c. When the numerical rule is used for outliers, should either the smallest or largest value be called an outlier? 8. The numbers of calories in 12-ounce servings of five popular beers are {95, 152, 188,205, and 131}. Use a calculator to find the mean (ẋ), the sample standard deviation (sx), and variance of the data. 9. Sample office rental rate (in dollars per square foot per year) for Seattle’s central business district are listed. Use a calculator to find the mean rental rate and the sample standard deviation. 40.00 43.00 46.00 40.50 35.75 39.75 32.75 36.75 35.75 38.75 38.75 36.75 38.75 39.00 29.00 35.00 42.75 32.75 40.75 35.25 Class 0-99 100-199 200-299 300-399 400-499 500 -600 10. The following frequency distribution shows the results of a survey in which 1000 adults were asked how much they spend in preparation for personal travel each year X (midpoint) f xf x-ẋ (x-ẋ)2 (x-ẋ)2 f 49.5 380 149.5 230 249.5 210 349.5 50 449.5 60 549.5 70 ∑ 1000 a) Find the sample mean and the sample standard deviation of the set of data. Empirical rule  Empirical rule (also called 68-95-99.7 rule) applies to symmetric bell-shaped data. In this case about 68% of the values lie within 1 standard deviation of the mean, about 95 % of the values lie within 2 standard deviations of the mean, and more than 99% of the values lie within 3 standard deviations of the mean. 11. Suppose that taxicabs in New York City have driven an average of 75, 000 miles per year with a standard deviation of 12, 000 miles. What information does the empirical rule give us? Assume that the distribution is bell-shaped. Use a graphical representation to illustrate your answer. Answer: 68% of taxicabs in New York City have driven between 63 000 and 87 000 miles per year 95% of taxicabs in New York City have driven between 51 000 and 99 000 miles per year 99.7% taxicabs in New York City have driven between 39 000 and 111 000 miles per year Use a graphical representation to illustrate your answer. Measuring position: simple ranking, percentile ranking, and z-score  To describe data, we also need to be able to talk about the position of any values.  For describing position, there are three procedures: o Simple ranking / involves arranging in some order and noting where in that order a particular value falls. o Percentile ranking /indicates what percentage of all values fall below the value under consideration. For Q1 and Q3 the percentile ranks are 25% and 75% respectively. o The z-score (standard score) / states by how many standard deviations a particular value varies (diverges) from the mean. Z= (x-µ) / σ Class Examples 12. Suppose the average price of gasoline in a large city is $3.80 per gallon with a standard deviation of $0.05. Calculate the z-score of the following values: a. $3.90 b. $3.65 c. For a z-core of +2.2, what is a raw score? 13. The water capacity (in gallons) of some of major solid-fuel boilers in USA are 6.3 21 7.4 8.6 12.1 16.1 23 23 28 33 26 21 65 65 56 66 70 34 35 50 a. What is the position of the Passat HO-45 which has a capacity of 34 gallons? Answer: Enter information in L1 . Organize data set in descending order. As you can see, there are 7 boilers with higher capacity on the list. The Passat HO-45 has a SIMPLE RANKING of 8Th. The Passat has a PERCENTILE RANKING of 8/20= 0.4 = 40% The above list has a mean of 33.325, and a standard deviation of 21.244. Then the Passat has z –score of (34-33.325) / 21.244 = 0.031. 14. A small used car dealer wanted to get an idea of how many cars her dealership sells per day. Listed below is the number of cars per day sold over a two-week period 14 9 23 7 11 23 17 11 3 24 21 2 20 20 a. Find the mean number of cars sold per day b. Find the range of cars sold per day c. Find the standard deviation of the number of cars sold per day d. Find the median of cars sold per day e. Find the first and third quartiles of the number of cars sold per day. Find IQR f. Find the 90th percentile of the number of cars sold per day. Calculator Tips To summarize distribution/Finding five-number summary of data)  STAT → Edit →List entries  STAT → CALC → 1: 1-Var Stats→ 1-var Stats L1 (list of entries)

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download AP Stats / Topic TWO “Summarizing Distributions” Contents