Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
AP Statistics: 5E Section 1.3 In Section 1.2 we discussed the shape of a distribution and briefly touched on outliers. In Section 1.3 we discuss outliers in more detail as well as measures of center and spread. The two most common measures of center are the mean and the median. To find the mean of a set of data values, find the sum of the values and divide by the number of observations n will always be used to represent the number of data values x1 x2 xn xi mean = n n x NOTATION: Mean of a sample: _______ Mean of a population: _______ The median of a set of data values, is a number such that roughly half the values are smaller and roughly half are larger To find the median: 1. Arrange the values from smallest to largest 2. If n is odd, the median is the center value 3. If n is even, the median is the mean of the two center values EXAMPLE: It is commonly believed that “normal” human body temperature is 98.6°F (or 37°C). In fact, “normal” temperature can vary from person to person, and for a given person it can vary over the course of a day. The table below gives a set of temperature readings of a healthy woman taken over a two-day period. Find the mean and the median. 97.911 median: __________ 97.7 mean:_________ The lowest recorded temperature was 97.2. Assume this temperature was changed to 96.8, and recompute the mean:_________ 97.7 97.867 and the median: __________ A statistic is resistant if it is relatively unaffected by outliers In a roughly symmetric distribution, the mean and median are approximately equal In a right-skewed distribution, the mean is larger than the median In a left-skewed distribution. the mean is smaller than the median The most common measures of spread, or variability, are the range, IQR and standard deviation. The range of a set of data is simply the difference between the largest and smallest values. Before we can define the IQR we need to discuss the 1st and 3rd quartiles (Q1 and Q3). Q1 is the median of the values that are to the left of the median in the ordered list Q3 is the median of the values that are to the right of the median in the ordered list Note: The quartiles (including Q2, the median) divide a set of data into roughly 4 equal parts with 25% in each part. Q3 Q1 The IQR, or interquartile range, is _________. Like the median, Q1 and Q3 are resistant to outliers. Thus the IQR is also resistant. The IQR Rule for Outliers: An observation in a set of data is an outlier if it is less than Q1 1.5 IQR ____________________ OR greater than Q 3 1.5 IQR _______________________ Example: The data at the right shows the amount of fat (in grams) for McDonald’s beef sandwiches. Determine whether there are any outliers in this data. IQR 29 21 8 21 1.5(8) 9 29 1.5(8) 41 Since 43 41, it is an outlier The five-number summary consists of Minimum Q1 Median Q3 Maximum A boxplot, or box-and-whisker plot, is a graph for displaying the fivenumber summary. Procedure for constructing a boxplot: 1. Construct a horizontal scale that includes the minimum and maximum values. 2. Construct a rectangle that extends from Q1 to Q3 and draw a line in the rectangle at the median. 3. Draw lines extending out from the rectangle to the most extreme values that are not outliers. 4. Identify each outlier individually with a symbol such as an asterisk Example: Barry Bonds set the major league record for homeruns in a season when he hit 73 HRs in 2001. Here are the data on the number of HRs Bonds hit in each of his 21 complete seasons. 16, 25, 24, 19, 33, 25, 34, 46, 37, 33, 42, 40, 37, 34, 49, 73, 46, 45, 45, 26, 28 Construct a boxplot for these values. Min 16 Q1 25.5 Med 34 Q 3 45 Max 73 IQR 45 - 25.5 19.5 25.5 - 1.5(19.5) -3.75 45 1.5(19.5) 74.25 No Outliers TI-83/84: Put data in L1. Press STATPLOT (2nd function of Y=) and press ENTER. Select ON and arrow down. Select the boxplot type that is first in the 2nd row by using the right arrow, press ENTER, and arrow down. Be sure the Xlist is where you have your data and the freq. is 1. Press GRAPH. Press ZOOM, select option 9 for ZoomStat and press ENTER. The boxplot should be displayed. If you press the TRACE key and use the arrows, the calculator will give you the values in the 5number summary and any outliers. The final measure of spread we need to consider is standard deviation. A deviation is the difference between a data value and the mean Note: The sum of the deviations should be zero. The standard deviation of a set of data values is the typical distance between a data value and the mean xi x Standard deviation = n 1 2 NOTATION: Standard deviation of a sample: _______ s Standard deviation of a population: _______ • The standard deviation will have the same unit of measure as the data • The standard deviation 0, and equals all the data vales are equal zero when ________________________ (i.e. no variability) The variance of a set of data is equal to the square of the standard deviation NOTATION: 2 Variance of a sample: _______ s Variance of a population: _______ 2 TI83/84: Standard deviation is found in the same way as the mean or five-number summary. Always use Sx! Example: Find the standard deviation for the data on Bond’s HRs. 12.687 Example: Rank the measures of spread (range, IQR or standard deviation) as to their resistant to outliers from least to most resistant. Range Standard deviation IQR Since the standard deviation measures spread about the mean it should always be used as the measure of spread when the mean is used as the measure of center. In the same way, the IQR should be used as the measure of spread when the median is used as the measure of center. Because of their resistance to outliers the median and the IQR are usually better when describing a skewed distribution or a distribution with outliers. The mean and the standard deviation are best used when the distribution is roughly symmetric.