Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Measures of Position Where does a certain data value fit in relative to the other data values? To accompany Hawkes lesson 3.3 Original content by D.R.S. 1 Nth Place • The highest and the lowest • 2nd highest, 3rd highest, etc. • “If I made $60,000, I would be 6th richest.” 2 Another view: “How does my 𝑥 compare to the mean?” • “Am I in the middle of the pack?” • “Am I above or below the middle?” • “Am I extremely high or extremely low?” • 𝑧 Score is the measuring stick 3 𝒛 Score: 𝑥 is how many standard deviations away from the mean? If you know the x value • Population: 𝑥−𝜇 𝑧= 𝜎 • Sample 𝑥−𝑥 𝑧= 𝑠 To work backward from z to x • Population 𝑥 =𝑧∙𝜎+𝜇 • Sample 𝑥 =𝑧∙𝑠+𝑥 4 𝑧 score is also called “Standard Score” • No matter what 𝑥 is measured in or how large or small the 𝑥 values are…. • The 𝑧 score of the mean will be 0 – Because numerator 𝑥 − 𝑥 turns out to be 0. • If 𝑥 is above the mean, its 𝑧 is positive. – Because numerator 𝑥 − 𝑥 turns out to be positive • If 𝑥 is below the mean, its 𝑧 is negative. – Because numerator 𝑥 − 𝑥 turns out to be negative 5 𝑧 score values • Typically round to two decimal places. – Don’t say “0.2589”, say “0.26” • If not two decimal places, pad – Don’t say “2”, say “2.00” – Don’t say “-1.1”, say “-1.10” • 𝑧 scores are almost always in the interval − 4 < 𝑧 < 4. Be very suspicious if you calculate a 𝑧 score that’s not a small number. 6 Practice: Given x, compute z Find the 𝑧 scores corresponding to the 𝑥 salary values, given that the mean, 𝑥 = $51168 and the standard deviation 𝑠 = $16291. • 𝑥 = $90,000 • 𝑥 = $70,000 • 𝑥 = $50,000 • 𝑥 = $30,000 • 𝑥 = $10,000 7 Practice: Given z, compute x Find the 𝑥 scores (salaries) corresponding to these 𝑧 standard scores, given that the mean, 𝑥 = $51168 and the standard deviation 𝑠 = $16291. • 𝑧=0 • 𝑧 = 1 and 𝑧 = −1 • 𝑧 = 2 and 𝑧 = −2 • 𝑧 = 3 and 𝑧 = −3 8 Two parallel axes (scales), 𝑥 and 𝑧 9 Example: Using 𝑧 scores to compare unlike items The Literature test • The mean score was 77 points. • The standard deviation was 11 points • Sue earned 91 points • Find her z score for this test The Biology test • The mean score was 47 points • The standard deviation was 6 points • Sue earned 55 points • Find her z score for this test • On which test did she have the “better” performance? 10 𝑧 scores caution with negatives • Example: compare test scores on two different tests to ascertain “Which score was the more outstanding of the two?” • Be careful if the 𝑧 scores turn out to be negative. Which is the better performance? 𝑧 = −1.99 or 𝑧 = −0.34 ? • Stop and think back to your basic number line and the meaning of “<“ and “>” 11 Percentiles • “What percent of the values are lower than my value?” – 90th percentile is pretty high – 50th percentile is right in the middle – 10th percentile is pretty low • If you scored in the 99th percentile on your SAT, I hope you got a scholarship. 12 Salary data for our percentile examples • With these salary values again • What’s the percentile for a salary of $59,000 ? • You can see it’s going to be higher than 50th Because it’s in the top half. 13 Example: Given x, find the percentile • Count 𝑥 = how many values below $59,000 • Count 𝑛 = how many values in the data set 𝑥 𝑛 • Formula for percentile 𝑝 = ∙ 100% • Here we have 𝑥 = 15 values lower than our $59,000 • Here we have 𝑛 = 20 values in the data set. • 𝑝= 15 20 ∙ 100% so 𝑝 = 75, “75th percentile” 14 Continued: Given x, find the percentile • 𝑝= 15 20 ∙ 100% so 𝑝 = 75 • Do not say “75%”, but say “the 75th percentile” • Other sources use different formulas, beware! – Some other books use 𝑥 + 0.5 in the numerator. – Excel has two different answers, PERCENTILE.EXC and PERCENTILE.INC functions. 15 Given Percentile 𝑝, find the 𝑥 value • Formula: position from bottom 𝑐 = 𝑛∙𝑝 100 – Again, 𝑛 = how many data values in the set – and 𝑝 = the percentile rank that’s given. • Is there a decimal remainder in position 𝑐? – If so, then BUMP UP to the next highest whole # and take the value in that position. – Or if 𝑐 is an exact whole number, take the average from positions 𝑐 and (𝑐 + 1). • Note: Book uses lowercase 𝑙 instead of 𝑐. 16 Given Percentile 𝑝, find the 𝑥 value • Example: What is the 31st percentile in the salary data? • 31st percentile: plug in 𝑛 = 20, 𝑝 = 31 • Compute 𝑐 = 20∙31 100 = 6.2. It has a remainder. • Bump it up! 𝑐 =7. – Not rounding, but rather bumpety-upping • So we look 7 positions from the bottom • “The 31st percentile is $44,476” 17 Given Percentile 𝑝, find the 𝑥 value • Example: What is the 40th percentile in the salary data? Plug in 𝑛 = 20, 𝑝 = 40 • Compute 𝑐 = 20∙40 100 = 8. Exact integer! • So count 𝑐 = 8th and 𝑐 + 1 =9th from bottom. 47043+47692 2 • “The percentile is $47,367.50, or $47,368.” 40th = 18 Excel gives different answers • Excel does some fancy interpolation 19 Quartiles Q1, Q2, Q3 • • • • • Data values are arranged from low to high. The Quartiles divide the data into four groups. Q2 is just another name for the Median. Q1 = Find the Median of Lowest to Q2 values Q3 = Find the Median of Q2 to Highest values • It gets tricky, depending on how many values. 20 Quartiles example • • • • • 10, 20, 30, 40, 50, 60, 70, 80, 90 The Second Quartile, Q2 = median = 50 Find the medians of the subsets left and right. Keep the 50 in each of those subsets. The First Quartile, Q1 = median of { 10, 20, 30, 40, 50 } = 30 • The Third Quartile, Q3 = median of { 50, 60, 70, 80, 90 } = 70 21 Quartiles example • 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 • Q2 = median 50+60 = 2 = 𝟓𝟓. (two middle #s) • Leave the 50 and 60 in place; do not reuse 55 • Q1 = median of {10, 20, 30, 40, 50} = 30 • Q3 = median of {60, 70, 80, 90, 100} = 80 22 Quartiles example • 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110 • Q2 = median • • • • • 50+60 = 2 = 𝟓𝟓 (two middle #s). 55 isn’t really there so you can’t remove it! Leave the 50 and 60 in place Q1 = median of {0, 10, 20, 30, 40, 50} = 25 Q3 = median of {60, 70, 80, 90, 100, 110} = 85 Two middle numbers happened again! 23 Interquartile Range • Definition: IQR = Q3 – Q1 • In the previous example, 85 – 25 = 60. • Interquartile Range measures how spread out the middle of the data are – The lowest quartile (x < Q1) is not involved – And the highest quartile (x > Q3) is not involved. 24 Quartiles with TI-84 • 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110 • Put values into a TI-84 List • Use STAT, CALC, 1-Var Stats • Scroll down down down to get to them. 25 There is disagreement about Quartiles • The TI-84 sometimes gives different answers than the method we use in the Hawkes materials • Excel might give different answers from Hawkes and TI-84, both. • Use the Hawkes method in this course’s work • Be aware of the others – You should know how to use TI-84 and Excel – You should be aware that differences can occur. 26 Quartiles with TI-84 vs. Hawkes • 10, 20, 30, 40, 50, 60, 70, 80, 90 • We got Q1=30 and Q3=70 before. • Hawkes keeps the 50, using 10,20,30,40,50 to compute Q1. • But the TI-84 throws out 50 and uses 10,20,30,40. • Hawkes says the TI-84 is computing “hinges”. 27 Quartiles in Excel • =QUARTILE.INC(cells, 1 or 2 or 3) seems to give the same results as the old QUARTILE function • There’s new =QUARTILE.EXC(cells, 1 or 2 or 3) • Excel does fancy interpolation stuff and may give different Q1 and Q3 answers compared to the TI-84 and our by-hand methods. 28 The Five Number Summary • Again: 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110 • Q2 = median 50+60 = 2 = 𝟓𝟓, Q1 = 25 and Q3 = 85 • “The Five Number Summary” is defined as: the minimum, then Q1, Q2, Q3, then the maximum • For this set of numbers, the Five Number Summary is “0, 25, 55, 85, 110” 29 The Five Number Summary • Again: 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110 • Q2=55, Q1=25, Q3 = 85 • Min is 0, Max is 110 • For this set of numbers, the Five Number Summary is “0, 25, 55, 85, 110” • Box Plot Min 0 Q1 25 Q2 55 Q3 85 Max 110 • TI-84 can do Box Plot too, but again its quartiles disagree with the way Hawkes defines quartiles. 30 Why Box Plot? • Don’t lose sight of the big picture here: – We have a data set – It’s a bunch of numbers – We want to summarize the data • Summarize means make it into a sound bite – We must be Concise – don’t say too much – We must be Informative – don’t say too little 31 We must be Concise • Bad: “Here is a report that tells you the mean and the variance and the standard deviation and the quartiles and the percentiles from 0 to 100… and the marketing survey analyzed by demographic subgroups …” (there is a place for that, but not right now) • Good: “Got fifteen seconds? Here’s what we found.” 32 Notice the pieces of the boxplot: • Horizontal scale, maybe a little beyond the min and the max. A generic number line. • The five numbers. • The box holds the quartiles – With a line in the middle at the median. • The whiskers extend out to the min and the max. 33 TI-84 Boxplot • See instructions on separate handout. • Caution again that TI-84 computes quartiles differently from Hawkes and differently from Excel, so the results aren’t always going to agree. 34 Additional Topics • Might not be needed for Hawkes homework • But you should be aware of them • Quintiles and Deciles • Interquartile Range and Outliers • TI-84 Box Plot 35 Quintiles and Deciles • You might also encounter – Quintiles, dividing data set into 5 groups. – Deciles, dividing data set into 10 groups. • Reconcile everything back with percentiles: – Quartiles correspond to percentiles 25, 50, 75 – Deciles correspond to percentiles 10, 20, …, 90 – Quintiles correspond to percentiles 20, 40, 60, 80 36 Interquartile Range and Outliers • Concept: An OUTLIER is a wacky far-out abnormally small or large data value compared to the rest of the data set. • We’d like something more precise. • Define: IQR = Interquartile Range = Q3 – Q1. • Define: If 𝑥 < 𝑥 − 1.5 ∙ 𝐼𝑄𝑅, 𝑥 is an Outlier. • Define: If 𝑥 > 𝑥 + 1.5 ∙ 𝐼𝑄𝑅, 𝑥 is an Outlier. • (Other books might make different definitions) 37 Outliers Example • • • • Here’s an quick elementary example: Data values 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20 Mean 𝑥 = 6.8 and 𝐼𝑄𝑅 = 9 – 3 = 6 Or in Hawkes method, 𝑄1 = 3.5, 𝑄3 = 9.5, and we still get interquartile range = 9.5 – 3.5 = 6 (it won’t always work out the same but in this case the IQR is the same either way) 38 Outliers Example • • • • • Data values 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20 We found IQR = 6 and the mean is 6.8 One definition uses 𝐼𝑄𝑅 ∗ 1.5 to define outliers Here, 6 ∗ 1.5 = 9 Anything more than 9 units away from 𝒙 is then considered to be abnormally small or large. • 6.8 – 9 = −3.2, nothing smaller than −3.2 • 6.8 + 9 = 15.8: the 20 is an outlier. 39 No-Outliers Example • Data values 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10 • Mean 𝑥 = 5.9 and 𝐼𝑄𝑅 = 9 – 3 = 6 (coincidence that 𝑥 = 𝐼𝑄𝑅, insignificant) • 𝐼𝑄𝑅 ∗ 1.5 = 9 • Anything more than 9 units away from 𝒙 is abnormal. 5.9 − 9 = −3.1; 5.9 + 9 = 14.9 • This data set has No Outliers. 40 Outliers: Good or Bad? • “I have an outlier in my data set. Should I be concerned?” – Could be bad data. A bad measurement. Somebody not being honest with the pollster. – Could be legitimately remarkable data, genuine true data that’s extraordinarily high or low. • “What should I do about it?” – The presence of an outlier is shouting for attention. Evaluate it and make an executive decision. 41