Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 3 Statistics for Describing, Exploring and Comparing Data Section 3.2 Measures of Center Measure of Center is the value at the center or middle of a data set. -mean, median, mode and midrange Notation: x n N denotes the sum of a set of values. is the variable usually used to represent the individual data values. represents the number of data values in a sample. represents the number of data values in a population. Note: When rounding the value of a measure of center, carry one more decimal place than is present in the original data set. -Round only the final answer, not the calculations in the middle. Measure of center Arithmetic Mean Definition Formula the measure of x mean of sample values x center obtained by n adding the values x mean of all values in a and dividing the total N by the number of population values. Example: Find the mean of 12, 14, 10, 8, 16, 8, 16 Find the mean of 6, 9, 15, 12, 5, 4, 7, 3 Comment Advantages: Is relatively reliable, takes every data value into account. Disadvantage: Is sensitive to every data value, one extreme value can affect it dramatically; is not a resistant measure of center Median the middle value when the original data values are arranged in order of increasing (or decreasing) magnitude. First sort the values (arrange them in order), the follow one of these -If the number of data values is odd, the median is the number located in the exact middle of the list. -If the number of data values is even, the median is found by computing the mean of the two middle numbers. The median is not affected by an extreme value - is a resistant measure of the center Example: Find the median of 12, 14, 10, 8, 16, 8, 16 Find the median of 6, 9, 15, 12, 5, 4, 7, 3 Mode the value that occurs -Bimodal two data values occur with the Mode is the only measure of with the greatest same greatest frequency central tendency that can be used frequency. -Multimodal more than two data values with nominal data occur with the same greatest frequency -No Mode no data value is repeated Example: Find the mode of 12, 14, 10, 8, 16, 8, 16 Find the mode of 6, 9, 15, 12, 5, 4, 7, 3 Midrange the value midway between the maximum and minimum values in the original data set. midrange max value min value 2 Sensitive to extremes because it uses only the maximum and minimum values, so rarely used. Example: Find the midrange of 12, 14, 10, 8, 16, 8, 16 Find the midrange of 6, 9, 15, 12, 5, 4, 7, 3 Mean from a frequency distribution: Assume that all sample values in each class are equal to the class midpoint. Use class midpoint of classes for variable x. Example: Find the mean. Section 3.3 Measures of Variation Measures how much data values vary. -range, standard deviation and variance Note: When rounding the value of a measure of center, carry one more decimal place than is present in the original data set.-Round only the final answer, not the calculations in the middle. Measure of Variation Range Definition Formula Comment The difference between the maximum and minimum value. Range = (max value) – (min value) Very sensitive to extreme values; therefore it is not as useful as other measures of variation. Example: Find the range a) 12, 14, 10, 8, 16 Standard deviation It’s a measure of variation of values about the mean. b) 6, 9, 15, 12, 5, 4 s x x s n x 2 x 2 sample s.d. n 1 2 n(n 1) sample s.d. shortcut x 2 N population s.d. Because we generally deal with sample data we will usually use the formula for sample standard deviation. -Most commonly used measure of variation in statistics. -The value of the standard deviation can increase dramatically with the inclusion of one or more outliers (data values far away from all others). -Values close together have a small standard deviation, but values with much more variation have a larger standard deviation -The units of the standard deviation are the same as the units of the original data values. Example: Find the standard deviation a) 12, 14, 10, 8, 16 b) 6, 9, 15, 12, 5, 4 Variance measure of variation equal to the square of the standard deviation. s 2 sample variance 2 population variance Example: Find the variance a) 12, 14, 10, 8, 16 b) 6, 9, 15, 12, 5, 4 The sample variance is an unbiased estimator of the population variance, which means values of sample variance tend to target the value of population variance. Standard Deviation from a frequency distribution: -x represents the class midpoint -f represents the frequency -n represents the total number of sample values (add up all the frequencies) s f x n f x2 2 n( n 1) Example: Find the standard deviation. 3.4 Relative Standing and Boxplots This section introduces measures of relative standing, which are numbers showing the location of data values relative to the other values within a data set. They can be used to compare values from different data sets, or to compare values within the same data set. The most important concept is the z score. We will also discuss percentiles and quartiles, as well as a new statistical graph called the boxplot. Z-score (or standardized value) is the number of standard deviations that a given value x is above or below the mean. (round 2 decimal places) z xx s z x Example: For men, the heights yield a sample mean of 68.34 in. and sample standard deviation 3.02 in.; the weights yield a sample mean of 172.55 lbs and sample standard deviation of 26.33 lbs. Which value is more extreme: 76.2 in. man or 237.1 lb man? Interpreting z-scores: Whenever a value is less than the mean, its corresponding z score is negative. Ordinary values: Unusual Values: –2 ≤ z score ≤ 2 z score < –2 or z score > 2 Example: The U.S. Army requires women’s heights to be between 58 inches and 80 inches. Women have heights with a mean of 63.6 inches and standard deviation of 2.5 inches. Find the z-score corresponding to the minimum and maximum height requirement. Determine whether the minimum and maximum heights are unusual. Percentiles: Percentiles are measures of location. There are 99 percentiles denoted set of data into 100 groups with about 1% of the values in each group. For example: the 40th percentile, denoted P40 has about 60% of the data values above it and 40% below it. Finding the percentile of a data value: Percentile of x number of values less than x 100% total number of values Example: 34, 36, 39, 43, 51, 53, 62, 63, 73, 79 What is the percentile of the 51? What is the percentile of the 73? P1 , P2 , P3 ,..., P99 , which divide a Quartiles: Are measures of location, denoted 25% of the values in each group. Q1 ,Q2 and Q 3 , which divide a set of data into four groups with about Q1 (First Quartile) separates the bottom 25% of sorted values from the top 75%. Q2 (Second Quartile) same as the median; separates the bottom 50% of sorted values from the top 50%. Q3 (Third Quartile) separates the bottom 75% of sorted values from the top 25%. Example: 34, 36, 39, 43, 51, 53, 62, 63, 73, 79 What is Q2 ? What is Q1 ? What is Q3 ? Boxplots: A boxplot (or box-and-whisker-diagram) is a graph of a data set that consists of a line extending from the minimum value to the maximum value, and a box with lines drawn at the first quartile, Q1; the median; and the third quartile, Q3 Q1 (minimum) Q2 (median) Q3 (maximum) Example: A simple random sample of pages from Merriam Webster Dictionary was obtained. Listed below are the numbers of defined words on those pages, and they are arranged in order. Construct a box plot and include the values of the five number summary ( Q1 , Q3 , median, min and max value). Also determine if there are any outliers. ` 34, 36, 39, 43, 51, 53, 62, 63, 73, 79