Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 4 Numerical Methods for Describing Data Section 4.1 Describing the Center of a Data Set Population characteristic—a fixed value about a population that is typically Suppose we want to know the MEAN length of unknown all the fish in Lake Sam Rayburn . . . Is this a value that is known? Can we find it out? At any given point in time, how many values are there for the mean length of fish in the lake? Statistic—a value calculated from a sample Suppose we want to know the MEAN length of all the fish in Lake Sam Rayburn. What can we do to estimate this unknown population characteristic? Measures of Central Tendency mode--the observation that occurs the most often • Can be more than one mode • If all values occur only once – there is no mode • Not used as often as mean & median Measures of Central Tendency The mean of a set of numerical observations is just the familiar arithmetic average: the sum of the observations divided by the number of observations. Important Notations • x = the variable for which we have sample data • n = the number of observations in the sample (the sample size) • x1 = the first observation in the sample • x2 = the second observation in the sample… • xn = the nth (last) observation in the sample Battery Life Example We might have a sample consisting of n = 4 observations on x = battery lifetime (in hours): x1 = 5.9, x2 = 7.3, x3 = 6.6, x4 = 5.7 • x1 is just the first observation in the data set and not necessarily the smallest observation • xn is the last observation but not necessarily the largest More Notation The sum of x1, x2,… ,xn can be denoted by x1 + x2 + … + xn, but this could be a daunting task for a large sample. The Greek letter S (pronounced sigma) is traditionally used in mathematics to denote summation. •In particular, S x denotes the sum of all the x values in the data set under consideration Sample Mean The sample mean of a numerical sample, x1, x2, x3, …, xn, denoted 𝑥, is sum of all observations in the sample x number of observations in the sample x1 x 2 x n x n n Fancytown Example During a two-week period, 10 houses were sold in Fancytown. Calculate the sample mean. x House Price in Fancytown x 225,000 311,000 299,000 310,000 285,000 315,000 291,000 287,000 300,000 287,000 2,910,000 x 2,910,000 x 291,000 n 10 The average (or mean) price for this sample of 10 houses in Fancytown is $291,000. Lowtown Example During a two-week period, 10 houses were sold in Lowtown. Calculate the sample mean. x House Price in Lowtown x 97,000 93,000 110,000 121,000 113,000 95,000 100,000 122,000 99,000 2,000,000 2,950,000 x 2,950,000 x 295,000 n 10 The average (or mean) price for this sample of 10 houses in Lowtown is $295,000. Outlier Reflections on the Sample Mean Calculations Looking at the dotplots of the samples for Fancytown and Lowtown we can see that the mean, $295,000 appears to accurately represent the “center” of the data for Fancytown, but it is not representative of the Lowtown data. Clearly, the mean can be greatly affected by the presence of even a single outlier. Dotplots for Fancytown and Lowtown Outlier Lowtown Fancytown 500000 295000 1000000 1500000 2000000 Describing the Center of a Data Set with the arithmetic mean The population mean, denoted by µ, is the average of all x values in the entire population. Important Note •The value of 𝑥 varies from sample to sample. •There is only one value for µ. Drawback with the Mean One potential drawback to the mean as a measure of center for a data set is that its value can be greatly affected by the presence of even a single outlier (an unusually large or small observation) in the data set. Describing the Center of a Data Set with the median The sample median is obtained by first ordering the n observations from smallest to largest (with any repeated values included, so that every sample observation appears in the ordered list). Then the single middle value if n is odd sample median= the mean of the middle two values if n is even Population Median The population median is the middle value in the ordered list consisting of all population observations. The population median plays the same role for the population as the sample median plays for the sample. The Median The stability of the median is what sometimes justifies its use as a measure of center in some situations. Income distributions are commonly summarized by reporting the median rather than the mean, because otherwise a few very high salaries could result in a mean that is not representative of a typical salary Median Calculation Consider the Fancytown data. Calculate the median house value for Fancytown. x House Price in Fancytown x 225,000 311,000 299,000 310,000 285,000 315,000 291,000 287,000 300,000 287,000 2,910,000 First, we put the data in numerical increasing order to get: 225,000 285,000 287,000 287,000 291,000 299,000 300,000 310,000 311,000 315,000 Median Calculation Since there is an even number of data values, the median is the mean of the two values in the middle. median = 291000+299000 2 = $295,000 Another Median Calculation Consider the Lowtown data. Calculate the median house value for Lowtown. x House Price in Lowtown x 97,000 93,000 110,000 121,000 113,000 95,000 100,000 122,000 99,000 2,000,000 2,950,000 We put the data in numerical increasing order to get: 93,000 95,000 97,000 99,000 100,000 110,000 113,000 121,000 122,000 2,000,000 100000 110000 median $105, 000 2 Imagine a ruler with pennies placed at 3”, 4”, 5”, 6”, 8” and 10”. To balance the ruler on your finger, you would need to place your finger at the mean of 6. The mean is the balance point of a distribution Comparing the Sample Mean & Sample Median Comparing the Sample Mean & Sample Median Comparing the Sample Mean & Sample Median Notice from the preceding pictures that the median splits the area in the distribution in half and the mean is the point of balance. Typically, 1. when a distribution is skewed positively, the mean is larger than the median, 2. when a distribution is skewed negatively, the mean is smaller then the median, and 3. when a distribution is symmetric, the mean and the median are equal. Mean vs. Median •In a skewed distribution, the mean is pulled in the direction of the skewness. •In a symmetrical distribution, you should report the mean! •In a skewed distribution, the median should be reported as the measure of center! The Trimmed Mean A trimmed mean is computed by first ordering the data values from smallest to largest, deleting a selected number of values from each end of the ordered list, and finally computing the mean of the remaining values. The trimming percentage is the percentage of values deleted from each end of the ordered list. The Trimmed Mean Purpose is to remove outliers from a data set To calculate a trimmed mean: • Multiply the percent to trim by n • Truncate that many observations from BOTH ends of the distribution (when listed in order) • Calculate the mean with the shortened data set Find the mean of the following set of data: 12 14 19 Mean = 23.8 20 22 24 25 26 26 Find the 10% trimmed mean. 10%(10) = 1 So remove one observation from each side! 14 19 20 22 24 25 26 26 xT 22 8 50 FancyTown Trimmed Mean House Price inHouse Fancytown Price Sum of the eight in 231,000 Fancytown middle is 285,000 Sum ofvalues the eight 231,000 2,402,000 287,000 middle values is 285,000 294,000 2,402,000 287,000 Divide this value 297,000 294,000 by 8 to obtain Divide this value 299,000 297,000 the by 810% to obtain 312,000 299,000 trimmed the 10% mean. 313,000 312,000 trimmed mean. 315,000 313,000 317,000 315,000 317,000 x 2,950,000 2,950,000 xx 291,000 291,000 x 295,000 Calculate the 10% trimmed mean for FancyTown median 295,000 median 10% Trim Mean 300,250 10% Trim Mean 300,250 Summary of Trimmed Means A trimmed mean with a small to moderate trimming percentage—between 5% and 25%--is less affected by outliers than the mean, but it is not as insensitive as the median. Is the median affected by extreme values? NO Is the mean affected by extreme values? YES Sample Proportion for Categorical Data The sample proportion of success, denoted by p, is p= number of successes in the sample (S) 𝑛 Where S is the label used for the response designated as success. The population proportion of successes is denoted by p. Tampering with Automobile Antipollution Equipment Example The use of antipollution equipment on automobiles has substantially improved air quality in certain areas. Unfortunately, many car owners have tampered with smog control devices to improve performance. Suppose that a sample of 15 cars is selected and that each car is classified as S or F, according to whether or not tampering has taken place. The resulting data are: S F S S S F F S S F S S S F F If we consider the variable of successes, the sample proportion (of successes) is: Example Tampering with Automobile Antipollution Equipment That is, 60% of the sample responses are S’s. In 60% of the cars sampled, there has been tampering with the air pollution control devices. Section 4.2 Describing Variability in a Data Set Why is the study of variability important? Does this can of soda contain exactly 12 ounces? There is variability in virtually everything Allows us to distinguish between usual & unusual values Reporting only a measure of center doesn’t provide a complete picture of the distribution. 20 30 40 50 60 70 20 30 40 50 60 70 20 30 40 50 60 70 Notice that these three data sets all have the same mean and median (at 45), but they have very different amounts of variability. Describing Variability The simplest numerical measure of the variability of a numerical data set is the range, which is defined to be the difference between the largest and smallest data values. range = maximum - minimum Calculating Range Calculate the range for each data set from the previous example: 20 30 40 50 60 70 20 30 40 50 60 70 20 30 40 50 60 70 The first two data sets have a range of 50 (7020) but the third data set has a much smaller range of 10. Describing Variability The n deviations from the sample mean are the differences: 𝑥1 - 𝑥, 𝑥2 - 𝑥, … , 𝑥𝑛 - 𝑥 Note: The sum of all of the deviations from the sample mean will be equal to 0, except possibly for the effects of rounding the numbers. This means that the average deviation from the mean is always 0 and cannot be used as a measure of variability. Calculating Deviations from the Sample Mean Suppose we caught a sample of 6 fish from the lake with the following lengths: 3”, 4”, 5”, 6”, 8”, 10” Calculate the deviations from the sample mean. What must we find first? Now find how each observation deviates from the mean. x 3 4 5 6 8 10 Sum (x - x) -3 3-6 -2 -1 0 2 4 0 The mean is considered This is the deviation from the balance the point of the mean. distribution because it “balances” thethepositive Find rest of the and negative deviations. deviations from the mean What is the sum of Will sum always thethis deviations from zero? theequal mean? YES Notes on Deviations A particular deviation is positive if the x value exceeds 𝑥 and negative if the x value is less than 𝑥. In general, the greater the amount of variability in the sample, the larger the magnitudes (ignoring the signs) of the deviations. Measures of Variability What can we do to the deviations so that we Can we find an average deviation? Suppose caught aof sample of 6 fish that we caught could an average? Anotherwe measure the variability in find a data set from the lake with the following lengths: uses the deviations from the mean (𝑥). 3”, 4”, 5”, 6”, 8”, 10” The mean length is 6 inches. Recall that we calculated the deviations from the mean. What was the sum of The estimated these deviations? average of the deviations Population varianceis is called the variance. squared denoted by s 2. Degree of freedom x x 2 s 2 n 1 The customary way to prevent negative and positive deviations from counteracting one another is to square them before combining. Suppose that everyone in the class caught a sample When variance, of 6calculating fish from thesample lake. Would eachwe of our use degrees of freedom (n same – 1) infish? the samples contain the denominator instead of n because this tends to produce better estimates. Degrees of freedom will be revisited Would our in mean lengths again Chapter 8. be the same? The samples would also have different ranges! Remember the sample of 6 fish that we caught from the lake . . . Find the variance of the length of square the fish. First the deviations x 3 4 5 6 8 10 Sum (x - x) -3 -2 -1 0 2 4 0 (x - x)2 9 4 1 0 4 16 34 Finding the average of the deviations would always equal 0! What is the sum of the deviations Divide this by 5. squared? s2 = 6.8 Sample Standard Deviation The sample standard deviation, denoted s is the positive square root of the sample variance. s s 2 (x x) 2 n 1 The population standard deviation is denoted by s. Sxx n 1 Sample Variance A large amount of variability in the sample is indicated by a relatively large value of s2 or s, whereas a value of s2 or s close to 0 indicates a small amount of variability. For most statistical purposes, s is the desired quantity, but s2 must be computed first. The most commonly used measures of center and variability are the mean and standard deviation, respectively. Measures of Variability Calculate the standard deviation for the fish sample. s2 = 6.8 inches2 so s = 2.608 inches The fish in our sample deviate from the mean of 6 inches by an average of 2.608 inches. Apple Weight Example A sample of 10 Macintosh apples were randomly selected and weighed (in ounces). Calculate the standard deviation of the sample. s 2 2 (x x) 5.5398 10 1 n 1 5.5398 0.61554 9 s= 0.61554 0.78456 Interquartile Range Interquartile range (iqr)--the range of the middle half of the data. What advantage does the interquartile range have over the standard deviation? The iqr is resistant to extreme values. iqr The iqr is based on quantities called quartiles. The lower quartile separates the bottom 25% of the data set from the upper 75%, and the upper quartile separates the top 25% of the data set from the bottom 75%. Quartiles Finding Quartiles The quartiles for sample data are obtained by dividing the n ordered observations into a lower half and an upper half: if n is odd, the median is excluded from both halves. Quartiles and the Interquartile Range Lower Quartile (Q1) = median of the lower half of the data set. Upper Quartile (Q3) = median of the upper half of the data set. The interquartile range (iqr), a resistant measure of variability is given by iqr = upper quartile – lower quartile = Q3 – Q 1 Note: If n is odd, the median is excluded from both the lower and upper halves of the data. Quartiles and IQR Example A sample of 15 students with part time jobs were randomly selected and the number of hours worked last week was recorded. Find the interquartile range for this set of data. 19, 12, 14, 10, 12, 10, 25, 9, 8, 4, 2, 10, 7, 11, 15 The data is put in increasing order to get 2, 4, 7, 8, 9, 10, 10, 10, 11, 12, 12, 14, 15, 19, 25 Quartiles and IQR Example With 15 data values, the median is the 8th value. Specifically, the median is 10. Upper Half Lower Half 2, 4, 7, 8, 9, 10, 10, 10, 11, 12, 12, 14, 15, 19, 25 Lower quartile Q1 Median Upper quartile Q3 Lower quartile = 8 Upper quartile = 14 Iqr = 14 - 8 = 6 The Chronicle of Higher Education (2009-2010 issue) published the accompanying data on the percentage of the population with a bachelor’s or higher degree in 2007 for each of the 50 states and the District of Columbia. 21 27 35 25 22 26 27 30 38 32 25 26 24 25 26 29 19 29 31 24 33 30 22 19 22 34 35 24 24 28 30 35 29 27 26 17 26 20 27 30 25 Find the interquartile range for this set of data. 47 20 23 23 23 26 27 34 25 34 21 17 27 23 35 25 25 27 22 31 26 47 27 19 30 23 38 26 32 27 25 32 26 19 24 25 26 26 28 29 33 19 20 29 24 31 26 24 29 33 34 30 20 22 24 19 26 22 29 34 35 21 24 26 24 28 29 30 34 35 22 29 25 27 26 26 30 17 35 26 22 20 25 27 30 25 35 47 22 20 25 23 27 23 30 23 35 First put the data (Q in order & find the the median median.of the Find the lower quartile ) by finding 1 ) by finding the median of the Find the upper quartile (Q 3 lower upperhalf. half. iqr = 30 – 24 = 6 26 23 27 25 34 27 25 30 34 38 Quartiles and iqr The resistant nature of the interquartile range follows from the fact that up to 25% of the smallest sample observations and up to 25% of the sample observations can be made more extreme without affecting the value of the interquartile range. Special Note on Rounding Protection against adverse rounding effects can almost always be achieved by using four digits of decimal accuracy. Section 4.3 Summarizing a Data Set: Boxplots Boxplots A boxplot is a picture that conveys information about the most important features of a data set: center, spread, extent of skewness, and presence of outliers. Boxplots What are some advantages of boxplots? • • • • • ease of construction convenient handling of outliers construction is not subjective (like histograms) used with medium or large size data sets (n > 10) useful for comparative displays Boxplots When Use: Univariate numerical Thetofive-number summary is thedata smallest observation, first quartile, median, third How to construct a Skeleton Boxplot: quartile, and largest observation • Calculate the five number summary • Draw a horizontal (or vertical) scale • Construct a rectangular box from the lower quartile (Q1) to Use for moderate the upper quartile (Q3) toatlarge data sets. • Draw a line inside the rectangle the median value • Draw lines from the lower quartile to theuse smallest Don’t with observation and from the upper quartile to the largest data sets of n < 10. observation To describe: comment on the center, spread, and shape of the distribution and if there is any unusual features Remember the data on the percentage of the population with a bachelor’s or higher degree in 2007 for each of the 50 states and the District of Columbia. 17 23 25 27 31 47 19 23 26 27 32 19 24 26 28 33 20 24 26 29 34 20 24 26 29 34 21 24 26 29 34 22 25 26 30 35 22 25 27 30 35 22 25 27 30 35 23 25 27 30 38 First draw a for scale Draw aalines box from Q1 Draw line forthe the to Q3 median whiskers 10 20 30 Percentages 40 50 Outliers An observation is an outlier if it is more than 1.5 iqr away from the closest end of the box (less than the lower quartile minus 1.5 iqr or more than the upper quartile plus 1.5 iqr). An outlier is extreme if it is more than 3 iqr from the closest end of the box, and it is mild otherwise. Modified Boxplots A modified boxplot represents mild outliers by shaded circles and extreme outliers by open circles. Whiskers extend on each end to the most extreme observations that are not outliers. Remember the data on the percentage of the To describe: population with a bachelor’s or higher degree in 2007 The distribution of percent of the population with for each of the 50 states and the District of Columbia. a bachelor’s degree or higher for the U.S. states and District of Columbia positively skewed with the is upper end 17There 19 is one 19 outlier 20 at20 21 22 22 22 23 outlier at 47%. is at23 the distribution, but none the 23 an 24 24 The 24 median 24 at percentage 25 25 25at 25 a end. range 30%. lower extreme? 25 26% 26with 26 26 Isofit26 26 26 27 27 27 27 31 47 27 32 28 33 29 34 29 29 30 30 30 30 for the 38 34 Place 34 aDraw 35 35 for 35 solidlines dot the whiskers First, draw the scale, boxfor outlier Next calculate the fences and the line for the outliers. median 24-1.5(6) = 15 30+1.5(6) = 39 30+3(6) = 48 10 20 30 Percentages 40 50 Symmetrical boxplots Approximately symmetrical boxplot Notice that the range of Notice that all 3 boxplots the lower half and the are identical, but their range of the upper half corresponding of this distribution are histograms are very approximately equal so different. Can you we can say that it is determine the number of approximately modes from a boxplot? symmetrical. However, the range of Skewed boxplot the two halves of this distribution are definitely different sizes, so it would be skewed in the direction of the longest side. The 2009-2010 salaries of NBA players published on the web site hoopshype.com were used to construct the comparative boxplot of salary data for five teams. Discuss the similarities and differences. Modified Boxplot Example Consider the ages of the 79 students from the classroom data set from Chapter 3. Create a modified boxplot for the data below. Iqr = 22 – 19 = 3 Lower quartile – 3 iqr = 10 Upper quartile + 3 iqr = 31 17 19 19 20 21 22 22 25 18 19 19 20 21 22 23 26 18 19 19 20 21 22 23 28 18 19 19 20 21 22 23 28 Lower quartile – 1.5 iqr =14.5 Upper quartile + 1.5 iqr = 26.5 18 19 19 20 21 22 23 30 Moderate Outliers 18 19 19 20 21 22 23 37 19 19 20 21 21 22 23 38 19 19 20 21 21 22 24 44 Extreme Outliers 19 19 20 21 21 22 24 47 19 19 20 21 21 22 24 Lower Quartile Median Upper Quartile Modified Boxplot Example Here is the modified boxplot for the student age data. Smallest data value that isn’t an outlier Largest data value that isn’t an outlier Mild Outliers 15 20 25 30 Extreme Outliers 35 40 45 50 Modified Boxplot Example 50 45 40 Here is the same boxplot reproduced with a vertical orientation. 35 30 25 20 15 Comparative Boxplot Example By putting boxplots of two separate groups or subgroups we can compare their distributional behaviors. Describe the similarities and differences among the two groups. The distributional pattern of female and male student weights have similar shapes, although the females are roughly 20 pounds lighter (as a group). G Males e n d e Females r 100 120 140 160 180 Student Weight 200 220 240 Comparative Boxplot Example Boxplots of Age by Gender (means are indicated by solid red circles) 50 Age 40 30 Male Gender Female 20 Section 4.4 Interpreting Center and Variability: Chebyshev’s Rule, the Emperical Rule, and z Scores Interpreting Center & Variability This rule can be used with Chebyshev’s Rule–-The percentage of any distribution – no observations that are within k standard deviations matter it’s shape! of the mean is at least 1 100 1 2 % k where k > 1 1 100 1 2 % 75 % 2 If k = 2, then at least 75% of the observations are within 2 standard deviations of the mean. Interpreting Variability Chebyshev’s Rule For specific values of k Chebyshev’s Rule reads • At least 75% of the observations are within 2 standard deviations of the mean. • At least 89% of the observations are within 3 standard deviations of the mean. • At least 90% of the observations are within 3.16 standard deviations of the mean. • At least 94% of the observations are within 4 standard deviations of the mean. • At least 96% of the observations are within 5 standard deviations of the mean. • At least 99% of the observations are with 10 standard deviations of the mean. For a sample of families with one preschool child, it was reported that the mean child care time per week was approximately 36 hours with a standard deviation of approximately 12 hours. At least 89% of the observations are between 0 & 72 hours. Since time Using Chebyshev’s rule, least 75% the 11% sample can’t beatnegative, atof most of observations must be 12 and hours72 thebetween observations are60above (within 2 standard deviations of hours. the mean). At most, what percent of the observations are greater than 72 hours? Example - Chebyshev’s Rule Consider the student age data 17 19 19 20 21 22 22 25 18 18 19 19 19 19 20 20 21 21 22 22 23 23 26 28 18 18 18 19 19 19 19 19 19 20 20 20 21 21 21 22 22 22 23 23 23 28 30 37 19 19 20 21 21 22 23 38 19 19 20 21 21 22 24 44 19 19 20 21 21 22 24 47 19 19 20 21 21 22 24 Color code: within 1 standard deviation of the mean within 2 standard deviations of the mean within 3 standard deviations of the mean within 4 standard deviations of the mean within 5 standard deviations of the mean Example - Chebyshev’s Rule Summarizing the student age data Interval Chebyshev’s Actual within 1 standard deviation of the mean 0% 72/79 = 91.1% within 2 standard deviations of the mean 75% 75/79 = 94.9% within 3 standard deviations of the mean 88.8% 76/79 = 96.2% within 4 standard deviations of the mean 93.8% 77/79 = 97.5% within 5 standard deviations of the mean 96.0% 79/79 = 100% Notice that Chebyshev gives very conservative lower bounds and the values aren’t very close to the actual 84 percentages. What’s my area? Input the following command into a graphing calculator in order to graph a normal curve with a mean of 20 and standard deviation of 3: •Y1 = normalpdf(X,20,3) (Window x: [10,30] y: [0,0.2]) •Use the command 2nd trace, 7 to find the area under the curve for: (Round to 4 decimal places.) •Lower limit: 17 •Lower limit: 14 •Lower limit: 11 Upper limit: 23 Upper limit: 26 Upper limit: 29 Area: ____________________ Area: ____________________ Area: ____________________ What’s my area? Graph a normal curve with a mean of 50 and standard deviation of 5. •Y1 = normalpdf(X,50,5) (x: [30,70] y: [0,0.1]) •Find the area under the curve for the following: •Lower limit: 45 •Lower limit: 40 •Lower limit: 35 Upper limit: 55 Upper limit: 60 Upper limit: 65 Area: ________ Area: ________ Area: ________ What pattern do you notice? Chebyshev’s Rule Chebyshev’s Rule states that 75% of the observations in a data set are within 2 standard deviations of the mean, however, in many data sets substantially more than 75% of the values satisfy this condition Interpreting Center & Variability • Empirical Rule• Approximately 68% of the observations 99.7% 68% are within 1 95% standard deviation of the mean Can ONLY be used with distributions that are mound • Approximately 95% of shaped! the observations are within 2 standard deviation of the mean • Approximately 99.7% of the observations are within 3 standard deviation of the mean The height of male students at PWSH is approximately normally distributed with a mean of 71 inches and standard deviation of 2.5 inches. a)What percent of the male shorter than 66 inches? About 2.5% b) Taller than 73.5 inches? About 16% c) Between 66 & 73.5 inches? About 81.5% students are Empirical Rule vs. Chebyshev’s Rule The Empirical Rule makes “approximately” instead of “at least” statements, and the percentages for k = 1, 2, and 3 standard deviations are much higher than those allowed by Chebyshev’s Rule. Empirical Rule vs. Chebyshev’s Rule In contrast to Chebyshev’s Rule, dividing the percentages in half is permissible because a normal curve is symmetric. Empirical Rule Another reminder!! The Empirical Rule can only be used If the histogram of values in a data set is reasonably symmetric and unimodal (specifically, is reasonably approximated by a normal curve) Empirical Rule It is unusual to see an observation from a normally distributed population that is farther than 2 standard deviations from the mean (only 5%), and it is very surprising to see one that is more than 3 standard deviations away. z Scores The z score is how many standard deviations the observation is from the mean. A positive z score indicates the observation is above the mean and a negative z score indicates the observation is below the mean. The z score corresponding to a particular The z score corresponding a particular observation in a data set istocalculated as: observation in a data set is zscore observation mean standard deviation What do these z scores mean? -2.3 2.3 standard deviations below the mean 1.8 1.8 standard deviations above the mean -4.3 4.3 standard deviations below the mean z Scores Computing the z score is often referred to as standardization and the z score is called a standardized score. The formula used with sample data is z score x s x Sally is taking two different math achievement tests with different means and standard deviations. The mean score on test A was 56 with a standard deviation of 3.5, while the mean score on test B was 65 with a standard deviation of 2.8. Sally scored a 62 on test A and a 69 on test B. On which test did Sally score the best? z-score on test A z-score on test B 62 56 z 1.714 3 .5 69 65 z 1.429 2.8 She did better on test A. Measures of Relative Standing percentiles--A value in the data set where r percent of the observations fall AT or BELOW that value. In addition to weight and length, head circumference is another measure of health in newborn babies. The National Center for Health Statistics reports the following summary values for head circumference (in cm) at birth for boys. Head circumference (cm) Percentile 32.2 33.2 34.5 35.8 37.0 38.2 38.6 5 10 25 50 75 90 95 What percent of newborn boys had head circumferences greater than 37.0 cm? 25% 10% of newborn babies have head circumferences bigger than what value? 38.2 cm