Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Percentiles The pth percentile of a data set is the data value such that p percent of the data is less than or equal to it. If you scored the 90th percentile on the SAT, 90% of people scored that score or less. The median is the 50th percentile The First Quartile or Q1 is the 25th percentile. The Third Quartile or Q3 is the 75th percentile. You guessed it, Q2=Median! The difference of Q3 and Q1 is called the Interquartile Range and gives the range of the middle 50% of the data. It is a good measure of spread for skewed data IQR = Q3 − Q1 • Actually, you need to say a bit more to specify a percentile because of mutliple values, and whether the number of data points divides properly. Conventions are inconsistent here, and we will not worry about those details. • People also talk about deciles (10th, 20th, etc.). I have seen books define octiles, and even once nanile (ninths!!!) though I don’t believe I have ever seen these used in real life. The Five Number Summary The five numbers Min, Q1, Median, Q3, Max, divide the data into quarters and are a convenient summary of the data set. Min Q1 Med Q3 Max Ex: For grades on a test: 37 70 82 88 99 • 25% of grades are between 37 and 70 • 25% of grades are between 70 and 82 • 25% of grades are between 82 and 88 • 25% of grades are between 88 and 99 We will represent the 5 number summary with a beautiful graphical representation, the Boxplot. • The 5 number summary gives a similar amount of information for skewed data as the mean and standard deviation give for bell shaped data. Why do you need 5 numbers instead of 2? Because skewed data is more complicated! 1 The Boxplot Popularized by the great 20th century statistician and proponent of exploratory data analysis John Tukey. • Draw number line for range • Draw rectangle from Q1 to Q3 • Draw vertical line at Median • Draw horizontal lines out to Min and Max Mn 37 Q1 70 Md 82 Q3 88 Mx 99 30 40 50 60 70 80 90 100 • I will usually give you the three quartiles after a test, but not the max and min because that is someone’s grades! • Notice you can see at a glance that the data is skewed left. • Boxplots are particularly useful for comparing multiple distributions (test scores throughout the semester perhaps). • Boxplots are sometimes called Box-and-whisker plots, because I guess the lines to the extremes. I hope my whiskers don’t look like that! • People will often stop the lines at the last score that they do not consider an outlier, and mark the outliers as a *. z-score Just as percentile tells you where a particular data value fits in a skewed distribution, the z-score tells you where a value fits in a bell-shaped distribution. The z-score of a data value is the number of standard deviations above (or below) the mean it is. Specifically z= x−µ . σ Ex: Women’s heights are bell shaped with a mean of µ = 65.5 in. and a standard deviation of σ = 2.5 in. The z-score of a woman whose height is 69 inches would be 3.5 69 − 65.5 z= = = 1.4 2.5 2.5 so she is 1.4 s.d.s above the mean. • 2 Examples of z-score z= x−µ . σ Women’s heights are bell-shaped with a mean of 65.5 and an s.d. of 2.5. Ex: What would be the z-score of a woman who is 60.5 inches tall? z= 60.5 − 65.5 −5 = = −2 2.5 2.5 so she is 2 s.d.s below the mean. Ex: What would be the z-score of a woman who is 72 inches tall? z= 6.5 72 − 65.5 = = 2.6 2.5 2.5 so she is 2.6 s.d.s above the mean. • z-score as Universal Measure of Distance This is units in which every bellshaped distribution looks approximately the same. No matter what the variable is (if symmetric, unimodal), a z score . . . between −1 and 1 (2/3) between −2 and 2 (95%) over 2 or under −2 (2.5% each) over 3 or under −3 (1 in a 1000 each) . . . is . . . typical usual unusual shocking • If you see a woman whose height z-score is between 2 and 3 you would call her tall but you would not be shocked. Of your z score on the test was between 2 and 3, you are doing well, but you are not blowing your teacher away. If the z score for how much TV you watch per week is between 2 and 3, everyone would say you watch a lot of TV but no one is going to do an intervention. 3 Lecture 8 Key Points After this lecture you should be able to • Interpret and use the quartiles and the IQR • Interpret box plots, relate to shape of histogram • Know when to use mean and s.d. versus median, quartiles, etc. • Compute z-scores. After processing this lecture you should be able to • Use z-score to give a universal picture of the place of a data value in the data set. • Compute percentiles, quartiles, and z-scores in Excel. 4