AP STATISTICS LESSON 1 – 2 ( DAY 1) Describing Distributions with Numbers Essential Question: What are the measures of center, when and how are they used? To find the mean of a data set To find the median of a data set To compare the mean and the median of data set ∑ - ( capital letter sigma ) In the formula for the mean is short for “ add them all up” The Mean X To find the mean of a set of observation, add their values and divide the number of observations are x1+ x2,+ x3, ……..+ xn,, their mean is x = x1 + x2 + ………+ xn n or in more compact notation x = 1 ∑ xi n Input the following data sets into your calculator: Aaron’s 13, 27, 26, 44, 30, 39, 40, 34, 45, 44, 24, 32, 44, 39, 29, 44, 38, 47, 34, 40, and 20. Bond’s 16,19, 24, 25, 25, 33, 33, 34, 34, 37, 37, 40, 42, 46, 49, and 73. Ruth’s 54, 59, 35, 41, 46, 25, 47, 60, 54, 46, 49, 46, 41, 34, and 22 Warm – up Describe or write a formula for the mean and median of a data set. The Median M The median M is the midpoint of a distribution, the number such that half the observations are smaller and the other half are larger. To find the median of a distribution: 1. 2. 3. Arrange all observations in order of size, from smallest to largest. If the number of observations n is odd, the median M is the center observation in the ordered list. If the number of observations n is even, the median M is the mean of the two center observations in the ordered list. Comparing Mean and Median The median is a resistant measure and the mean is not. (resistant means that the measure is not influenced by extreme observations.) The mean and median of symmetric observations are close together. In a skewed situation the mean is further out in the tail than the median. The situation and intensions of the reporter may have an influence on which measure of center is used. Measures of Spread Spread can be thought of as a variability. Range – The difference between the largest and smallest observations. The Quartiles Q1 and Q2 To calculate the quartiles: 1. Arrange the observations in increasing order and locate the median M in the ordered list of observations. 2. The first quartile Q1 is the median of the observations whose position in the ordered list is to the left of the location of the overall median. 3. The third quartile Q3 is the median of the observations whose position in the ordered list is to the right of the location of the overall median. The Interquartile Range The interquartile range (IQR) is the distance between the first and third quartiles, IQR = Q3 – Q1 Outliers: The 1.5 x IQR Criterion Call an observation an outlier if it falls more than 1.5 x IQR is above the third quartile or below the first quartile. The Five – Number Summary The five – number summary of a data set consists of the smallest observation, the first quartile, the median, the third quartile, and the largest observation, written in order from smallest to largest. In symbols, the five – number summary is: Minimum Q1 M Q3 Maximum Boxplot ( Modified ) A modified boxplot is a graph of the five – number summary, with outliers plotted individually. Box Plot Modified Box Plot A central box spans the quartiles. A line in the box marks the median. Observations more than 1.5 x IQR outside the central box are plotted individually. Lines extend from the box out to the smallest and largest observations that are not outliers. Measure of Spread: ( Standard Deviation ) s The Variance s2 of a set of observations is the average of the squares of the deviations of the observations from their mean. In symbols, the variance of n observations x1, x2,……,xn is s2 = ( x1 – x )2 + ( x2 – x )2 +….( xn – x )2 n-1 Or, more compactly, s2 = 1 n–1 ∑ ( x 1 – x )2 The Standard Deviation The standard deviation s is the square root of the variance s2 s= √ 1 n-1 ∑ ( x1 – x )2 (n-1) – this is referred to as the degree of freedom. It is found by subtracting one from the number of elements in a set of data. Properties of the Standard Deviation s measures spread about the mean and should be used only when the mean is chosen as the measure of center. s = 0 only when there is no spread. This happens only when all observations have the same value. Otherwise, s> 0. As the observations become more spread out about their mean, s gets larger. s, like the mean x, is not resistant. Strong skewness or a few outliers can make s very large.