* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download APSTAT PART ONE Exploring and Understanding Data
Survey
Document related concepts
Transcript
There are three kinds of lies - lies, damned lies and statistics. ~Benjamin Disraeli, commonly misattributed to Mark Twain APSTAT PART ONE Exploring and Understanding Data What is Statistics? Chapters 1-3 What is Stat? • Book Says: – A way of reasoning – Collection of tools and methods – Helps us understand the world – Statistics is about variation Stat Basics • Individuals – Object described by a set of data – People (#1), cars, animals, groups… • Variables – Categorical (Qualitative)– Usually involves words • Examples: sex, advisor, social security #... – Quantitative – Involve #’s • Examples: age, height, income, test score… Displaying Categorical Data • Frequency tables: Favorite Color Red Count Yellow 15 Blue 78 Burnt Sienna 382 25 Displaying Categorical Data • Realtive Frequency tables: – Just roll up the %’s Favorite Color Red Count Yellow 3% Blue 15.6% Burnt Sienna 76.4% 5% Displaying Categorical Data • Contingency Table – Two Way table Age at first “Real Kiss” (ahhhhhhhhhhhh…) 10-14 15-19 20-24 Male 12 13 5 Female 18 8 2 Marginal Distribution Age at first “Real Kiss” (ahhhhhhhhhhhh…) 10-14 15-19 20-24 Total Male 12 13 5 30 Female 18 8 2 28 Total 30 21 7 58 • Conditional Distribution: – % of males whose first kiss came when they were 10-14 – % of 20-24 year old first kissers who were male The Rest of Chapters 1-3 • Displaying the data – Pie Charts – Bar Charts – Blah Blah Blah…. • Simpson’s Paradox – AP MC • Being Skeptical – Important for real life – 5 W’s + 1H • Ex: 4 out of 5 dentists…. – Displaying data • Lies, Dammed Lies, and Statistics Showing Off Your Data Chapters 4-5 Histograms • Remember bar graphs? Same, but different. • Think of sorting boxes… – Same size boxes • ON TI-83 – – – – Enter Data into L1 (STAT>EDIT) Go to STAT PLOT (2ND Y=) Change Options Go to ZOOM Choose Stat OR Go to WINDOW Change Options Go to GRAPH Histograms • Make a histogram of the following data: • Age of Teachers At WPS 25, 34, 37, 42, 51, 43, 49, 35, 37, 65, Outliers • An observation that is outside the pattern – For example, ages in this classroom 16, 17, 16, 17, 18, 17, 17, 16, 18, 36 • Formula to determine (l8r, sk8r) – For now “potential” or “possible” outlier Describing a distribution • Center – Mean - Average – Median - Middle • Shape – – – – – Symmetric Skewed Uniform Bell Shaped Bi- or Multi-modal • Spread – Standard Deviation – Range – IQR • Weird-ness – Outliers – Gaps Stemplots • Basic • Split Stems • Back-To-Back Basic Stemplot Boys Weight in class (pounds) 10 11 12 13 14 34699 15 025788 16 001344589 17 1 18 9 KEY: 10 8 = 108 pounds Split Stem Stemplot Boys Weight in class (pounds) 14 34 14 699 15 02 15 5788 16 001344 16 589 17 1 17 18 9 KEY: 10 8 = 108 pounds Back to Back Stemplot Girls vs. Boys Weight in class (pounds) 8 10 93 11 8773 12 940 13 2 14 34699 1 15 025788 16 001344589 17 1 18 9 KEY: 10 8 or 8 10 = 108 pounds Mean • Average! Add ‘em up and divide by n • Sample Mean denoted as x (x-bar) • Not Resistant to extreme measures – ie. Ages in Mrs. Smith’s Kindergarten Class – 4,5,4,4,4,5,5,4,4,4,5,5,4,4,5,39 Median • Middle! Line ‘em up (in order) and find the middle. If two share it, find their mean. • Resistant to extreme measures – ie. Ages in Mrs. Smith’s Kindergarten Class – 4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,39 Quartiles • Median cuts data in half, Quartiles cut the Halves in Half! Recall Teacher Ages: 25, 34, 35, 37, 37, 42, 43, 49, 51, 65 1st Quartile Q1 Median 3rd Quartile Q3 5-Number Summary • Low-Q1-Median-Q3-High • Shows Spread of Data Recall Teacher Ages: 25, 34, 35, 37, 37, 42, 43, 49, 51, 65 • 5-Number Summary: 25 35 39.5 49 65 Boxplot • Graphical Representation of 5-Number Summary • Shows Shape, Spread, and Center • Always draw to scale: 25 35 39.5 49 65 Outliers • First off, IQR – InterQuartile Range – Distance between Quartiles… Recall Teacher Ages: 25, 34, 35, 37, 37, 42, 43, 49, 51, 65 • IQR is 49-35=14 • Outlier is anything 1.5 times IQR below Q1 or above Q3 • Sooo…. An outlier would have to be 21 below 35 or 21 above 49…Below 14 or above 70. Nothing in our data is an outlier! Boxplot Using TI-83 Enter Teacher Ages into L1 (clear old stuff first): 25, 34, 35, 37, 37, 42, 43, 49, 51, 65 • ON TI-83 – Go to STAT PLOT (2ND Y=) – Change Options – Go to ZOOM Choose Stat OR Go to WINDOW Change Options Go to GRAPH Variance & Standard Deviation • Variance - s2 – Average of Squared distances from mean – In example 26/5 = 5.2 • Standard Deviation – s – Square Root of Variance – In example, about 2.28 • Standard Deviation – Measure of Spread – Use with Mean – Non-Resistant • On TI-83 Now….. Raw Data Distance From Mean Distance Squared 3 3 9 5 1 1 6 0 0 6 0 0 10 4 16 STAT>CALC-1VARSTAT Mean = 6 It’s Normal to Deviate Chapter 6 – The Normal Model Density Curve • Area under a density curve is always 1 • Symmetric density curve: Mean, Median and Mode Density Curve Continued • Density curves are often skewed • Recall Median is “resistant” while Mean is not Mean Mode Skewed to the Left (tail trails to the left) Mean Skewed to the Right (tail trails to the right) Median Histograms • Median is “equal areas” point • Mean is “balance point” – “think Physics” 50% of 50% of Population Populatio n Normal Distributions (bell shaped) • Center is mean m – (population mean) • Spread is Standard Deviation s – (population standard deviation) – To find, look for inflection points Concave Down Concave Up s ms Concave Up s m m+s 68 – 95 – 99.7 Rule • Also called EMPIRICAL RULE Probability = 99.7% within 3s Probability = 95% within 2s Probability = 68% within 1s Raw-Score (X) z-Score (z) m 3s 3 m 2s 2 m 1s m m + 1s m + 2s m + 3s 1 0 1 2 3 Percentiles (and quartiles) • Think standardized tests or class rankings • Percent of observations to the LEFT of an observation • Quartiles: – First is at 25th percentile – Median is at 50th percentile – Third is at 75th percentile Z-SCORE • Number of Standard Deviations (s) away from the Mean (m) Raw-Score (X) z-Score (z) m 3s 3 m 2s 2 m 1s m m + 1s m + 2s m + 3s 1 0 1 2 3 Z-SCORE Continued X m z s m = population mean s = population standard deviation, X = Raw-Score, z = z-Score Normal Distribution Notation N (m , s) • Example, You have an IQ of 148 The IQ test you took has a distribution N(105, 20). What is your Z-Score? What does this mean? Using Tables • Ex. – Your IQ ZSCORE was 2.15. What does it mean now? Using Tables • Ex. – If someone’s IQ was at the 10th percentile, what would their ZSCORE be? Using TI-83 • Normalcdf (Xlower, Xupper, m, s) : - use to convert Raw-Score directly to probability. • Normalcdf (Zlower, Zupper) : - use to convert z-Score to probability ***For Graphics use Shadenorm (GTANG notes) Using TI-83 • Test Empirical Rule (68-95-99.7) – Find Normalcdf(-1,1), Normalcdf(-2,2), Normalcdf(-3,3) • Ex. What percent of IQ Scores would fall between 100 and 110 Using N(105, 20)? What percent would be above 150? – Normalcdf(100,110,105,20) – Normalcdf(150,1000000000,105,20) Normality • Just check Box and Whisker plot or Histogram on TI-83 • ALWAYS do this if raw data is given – Sketch result and comment on it! Distributions Column 1 70 60 50 40 30 20 Quantile s 100.0% maximum 99.5% 97.5% 90.0% 75.0% quartile 50.0% median 25.0% quartile 10.0% 2.5% 0.5% 0.0% minimum M ome nts 65.000 65.000 65.000 63.600 45.000 39.000 36.250 25.900 25.000 25.000 25.000