Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Describing Location in a Distribution Text 2.1 Measures of Relative Standing and Density Curves Sample Data Consider the following test scores for a small class: 79 81 80 77 73 83 74 93 78 80 75 67 77 83 86 90 79 85 83 89 84 82 77 72 73 Julia’s score is noted in red. How did she perform on this test relative to her peers? 6| 7 7 | 2334 7 | 5777899 8 | 00123334 8 | 569 9 | 03 Her score is “above average”... but how far above average is it? Standardized Value One way to describe relative position in a data set is to tell how many standard deviations above or below the mean the observation is. Standardized Value: “z-score” If the mean and standard deviation of a distribution are known, the “z-score” of a particular observation, x, is: x mean z standard deviation Calculating z-scores Consider the test data and Julia’s score. 79 81 80 77 73 83 74 93 78 80 75 67 77 83 86 90 79 85 83 89 84 82 77 72 73 According to Minitab, the mean test score was 80 while the standard deviation was 6.07 points. Julia’s score was above average. Her standardized zx 80 86 80 score is: z 0.99 6.07 6.07 Julia’s score was almost one full standard deviation above the mean. What about Kevin: x=72 Calculating z-scores 79 81 80 77 73 83 74 93 78 80 75 67 77 83 86 90 79 85 83 89 84 82 77 72 6| 7 7 | 2334 7 | 5777899 8 | 00123334 8 | 569 9 | 03 73 Julia: z=(86-80)/6.07 z= 0.99 {above average = +z} Kevin: z=(72-80)/6.07 z= -1.32 {below average = -z} Katie: z=(80-80)/6.07 z= 0 {average z = 0} Comparing Scores Standardized values can be used to compare scores from two different distributions. Statistics Test: mean = 80, std dev = 6.07 Chemistry Test: mean = 76, std dev = 4 Jenny got an 86 in Statistics and 82 in Chemistry. On which test did she perform better? Statistics 86 80 z 0.99 6.07 Chemistry 82 76 z 1.5 4 Although she had a lower score, she performed relatively better in Chemistry. Percentiles Another measure of relative standing is a percentile rank. pth percentile: Value with p % of observations below it. median = 50th percentile {mean=50th %ile if symmetric} Q1 = 25th percentile Q3 = 75th percentile 6| 7 7 | 2334 7 | 5777899 8 | 00123334 Jenny got an 86. 8 | 569 22 of the 25 scores are ≤ 86. Jenny is in the 22/25 = 88th %ile. 9 | 03 Density Curve In Chapter 1, you learned how to plot a dataset to describe its shape, center, spread, etc. Sometimes, the overall pattern of a large number of observations is so regular that we can describe it using a smooth curve. Density Curve: An idealized description of the overall pattern of a distribution. Area underneath = 1, representing 100% of observations. Density Curves Density Curves come in many different shapes; symmetric, skewed, uniform, etc. The area of a region of a density curve represents the % of observations that fall in that region. The median of a density curve cuts the area in half. The mean of a density curve is its “balance point.” Example • Pretend you are rolling a die. The numbers 1,2,3,4,5,6 are the possible outcomes. In 120 rolls, how many of each number would you expect to roll? • Calculator can do a simulation: • Clear L1 in your calc. Use random integer generator to generate 120 random whole numbers between 1 and 6 then store in L1 • RandInt (1, 6, 120) STO-> L1 • Set viewing window: X (1,7) by Y (-5,25). • Specify a histogram using the data in L1 • Repeat simulation several times. 2nd Enter will recall/reuse the previous command. In theory we should expect a uniform outcome... Summary We can describe the overall pattern of a distribution using a density curve. The area under any density curve = 1. This represents 100% of observations. Areas on a density curve represent % of observations over certain regions. An individual observation’s relative standing can be described using a z-score or percentile rank. x mean z standard deviation Normal Distributions • Normal Curves: symmetric, single-peaked, bellshaped. and median are the same. Size of the will affect the spread of the normal curve. Example • Scores on the SAT verbal test in recent years follow approximately the N (505, 110) distribution. How high must a student score in order to place in the top 10% of all students taking the SAT? • 1. State the problem and draw a picture. Shade the area we’re looking for. • 2. Find the Z score with the table • 3. Convert to raw score. Assessing Normality • Method 1: Construct a histogram, see if graph is approximately bell-shaped and symmetric. Median and Mean should be close. Then mark off the -2, -1, +1, +2 SD points and check the 68-95-99.7 rule. Normal Probability Plot • Method 2: Construct Normal Probability Plot • 1. Arrange the observed data values from smallest to largest. Record what percentile of the data each value occupies (example, the smallest observation in a set of 20 is at the 5% point, the second is at 10% etc.) • Use Table A to find the Z’s at these same percentiles (example -1.645 is @ 5%, -1.28 is @10% • Plot each data point against the corresponding Z (xvalues on the horizontal axis, z-scores on the vertical axis is what I do, either is fine) • • rkgnt Normal w/Outliers Right Skew Normal Interpretation: draw your X = Y line with a straight edge- points shouldn’t vary too much Constructing Probability Plot on Calculator 79 81 80 77 73 83 74 93 78 80 75 67 77 83 86 90 79 85 83 89 84 82 77 72 • Students in math class • X values on horizontal axis 73