Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Page 1 of 4 Chapter 2: Describing Data with Numerical Measures Notation: A data set consisting of n measurements will be denoted by x1 , x2 , ..., xn . The sum of n all these measurements will be written as x1 + x2 + + xn = ∑ xi , where Σ, the summation sign, i =1 is called sigma. • Section 2.2 Measures of Center We will consider the mean, the median, and the mode. • The sample mean of n measurements is the sum of the measurements divided by n. x= 1 n ∑ xi n i =1 The population mean is denoted by µ . • The sample median of n measurements is the middle value when the measurements are ordered. If n is odd, the median is the middle value. Its position is n +1 . If n is even, the 2 median is the average of the middle two values. Their positions are • n n and + 1. 2 2 The sample mode is the measurements with the highest number of occurrence. For grouped data the modal class is the class with the highest frequency. Example: Find the mean, median, and mode of 1, 2, 2, 2, 3, 3, 3, 3, 3, 5, 5, 4, 4 Basic Shapes Skewed left ( x < md) Symmetric ( x ≈ md) Skewed right ( x > md) Page 2 of 4 • Section 2.3 Measures of Variability or Spread These measure the extent of variation (or spread) around the center. Examples of such measures are range, variance, and standard deviation. • The sample range of a data set is the largest observation minus the smallest observation. • The sample variance of n measurements is the sum of squared deviations from the mean divided by n – 1. 1 n 2 1 n 1 n 2 2 . For calculation, use = s ( ) = s x − x ∑ xi − ∑ xi ∑ i n − 1 i 1 = n i 1 n − 1 i =1 = 2 • 2 The sample standard deviation is the positive square root of the variance, s = s 2 . The population variance is denoted by σ 2 and the population standard deviation is denoted by σ. Example: Ex. 2.14 page 61. Discussion (the effect on mean and standard deviation when adding or deleting observations from data) Page 3 of 4 • Section 2.4 Interpreting the standard deviation Tchebyshev’s Rule (For all data sets): • The interval ( x − 2 s, x + 2 s ) contains at least 3/4 of the data set. • The interval ( x − 3s, x + 3s ) contains at least 8/9 of the data set. • It is possible that the interval ( x − s, x + s ) will contain very few of the measurements. Empirical Rule (For mound-shaped frequency distributions): • Approximately 68% of the measurements fall within 1 standard deviation of the mean, ( x − s, x + s ) • Approximately 95% of the measurements fall within 2 standard deviations of the mean, ( x − 2 s, x + 2 s ) • Essentially all the measurements fall within 3 standard deviations of the mean, ( x − 3s , x + 3s ) Note that Range = R ≈ 4s or s ≈ R/4 Example: Ex. 2.17 page 68 Can the sample variance be greater than the sample standard deviation? Explain Page 4 of 4 • Section 2.6 Measures of Position (or Measures of Relative standing) Examples are the z-scores and the percentiles. • The z-score is a standardized score for an observation x and it is defined as z= x−x . s Example: The average height of men is 69 inches with a std. of 2.8 inches. The average height of women is 63.6 inches with a std. of 2.5 inches. Michael Jordan is 78 inches tall. Rebecca Lobo is 76 inches tall. Calculate the z-score for Michael and Rebecca. z-scores can be used to identify outliers. If z < − 3 or z > 3, such an observation is an outlier. Example: Body temperatures of healthy human children have mean = 98.60oF and standard deviation = 0.62oF. Your child has temperature of 101oF. What should you do? • The pth percentile of n measurements is the value such that p percent of the measurements are less than that value and (100 – p) percent are greater. 25th percentile ≡ lower quartile or 1st quartile. This is denoted Q1 50th percentile ≡ median or 2nd quartile. This is denoted Q2 75th percentile ≡ upper quartile or 3rd quartile. This is denoted Q3 Note that Q1 is in position smallest to largest. Supplementary #2, #3 n +1 3(n + 1) and Q3 is in position when values are ordered from 4 4