Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics An Introduction 1-1 Learning Objectives 1. Define Statistics 2. Describe the Uses of Statistics 3. Distinguish Descriptive & Inferential Statistics 4. Define Population, Sample, Parameter, & Statistic 5. Identify data types 1-2 What is Statistics? The practice (science?) of data analysis Summarizing data and drawing inferences about the larger population from which it was drawn 1-3 Statistical Methods Statistical Methods Descriptive Statistics 1-4 Inferential Statistics Descriptive Statistics 1. Involves 2. Collecting Data Presenting Data Characterizing Data Purpose Describe Data 1-5 50 $ 25 0 Q1 Q2 Q3 Q4 X = 30.5 S2 = 113 Inferential Statistics 1. Involves 2. Estimation Hypothesis Testing Purpose Make Decisions About Population Based on Sample Characteristics 1-6 Population? Key Terms 1. Population (Universe) All Items of Interest 2. Sample Portion of Population • P in Population & Parameter • S in Sample & Statistic 3. Parameter Summary Measure about Population 4. Statistic 1-7 Summary Measure about Sample Data Types Quantitative Discrete Continuous Qualitative Nominal (categorical) Ordinal (rank ordered categories) 1-8 Sampling Representative sample Same characteristics as the population Random sample Every subset of the population has an equal chance of being selected 1-9 Review Descriptive vs. Inferential Statistics Vocabulary Population (Random, representative) sample Parameter Statistic Data types 1 - 10 Methods for Describing Data 1 - 11 Learning Objectives 1. 2. 3. 4. 5. Describe Qualitative Data Graphically Describe Numerical Data Graphically Create & Interpret Graphical Displays Explain Numerical Data Properties Describe Summary Measures 6. Analyze Numerical Data Using Summary Measures 1 - 12 Data Presentation Data Presentation Qualitative Data Summary Table Bar Chart 1 - 13 Pie Chart Numerical Data Stem-&-Leaf Display Dot Chart Frequency Distribution Histogram Presenting Qualitative Data 1 - 14 Data Presentation Data Presentation Qualitative Data Summary Table Bar Chart 1 - 15 Pie Chart Numerical Data Stem-&-Leaf Display Dot Chart Frequency Distribution Histogram Student Specializations Specialization | Freq. Percent Cum. ---------------+---------------------------------HCI | 9 39.13 39.13 IEMP | 9 39.13 78.26 LIS | 3 13.04 91.30 Undecided | 2 8.70 100.00 ---------------+---------------------------------Total | 23 100.00 1 - 16 Student Specializations 10 9 8 7 6 5 HCI 4 LIS 3 Undecided IEMP 2 1 0 HCI 1 - 17 IEMP LIS Undecided Undergrad Majors UG major | Freq. Percent Cum. --------------------------+----------------------------------American Studies | 1 4.76 4.76 Cog Sci | 1 4.76 9.52 Comp Sci | 3 14.29 23.81 Economics | 3 14.29 38.10 English | 5 23.81 61.90 Environmental Engineering | 1 4.76 66.67 Graphic Design | 1 4.76 71.43 Math | 2 9.52 80.95 Mechanical Engineering | 1 4.76 85.71 Nutrition | 1 4.76 90.48 Sci and Tech Policy | 1 4.76 95.24 Telecommunications | 1 4.76 100.00 --------------------------+----------------------------------Total | 21 100.00 1 - 18 Favorite Colors color | Freq. Percent Cum. ------------+----------------------------------black | 2 8.70 8.70 blue | 12 52.17 60.87 green | 1 4.35 65.22 orange | 1 4.35 69.57 purple | 1 4.35 73.91 red | 5 21.74 95.65 white | 1 4.35 100.00 ------------+----------------------------------Total | 23 100.00 1 - 19 Calculus Knowledge integrals | Freq. Percent Cum. ------------+----------------------------------1 | 3 13.04 13.04 2 | 1 4.35 17.39 3 | 11 47.83 65.22 4 | 6 26.09 91.30 5 | 2 8.70 100.00 ------------+----------------------------------Total | 23 100.00 1 - 20 Presenting Numerical Data 1 - 21 Data Presentation Data Presentation Qualitative Data Summary Table Bar Chart 1 - 22 Pie Chart Numerical Data Stem-&-Leaf Display Dot Chart Frequency Distribution Histogram Student Age (Reported) Data Stem-and-leaf plot for age 2* 3* 4* 5* 6* 7* | 22233444555777899 | 01257 | | | | 6 1 - 23 6 4 0 2 Frequency 8 10 Histogram 20 30 40 50 age 1 - 24 60 70 Starting Salaries (in $K) 3* 4* 5* 6* 7* 8* 1 - 25 | | | | | | 8 000025 0000 0000005 5 0 Numerical Data Properties 1 - 26 Thinking Challenge $400,000 $70,000 $50,000 $30,000 ... employees cite low pay -most workers earn only $20,000. $20,000 ... President claims average pay is $70,000! 1 - 27 Standard Notation Measure Mean Stand. Dev. Sample Population x s Variance s Size n 1 - 28 2 2 N Numerical Data Properties Central Tendency (Location) Variation (Dispersion) Shape 1 - 29 Numerical Data Properties & Measures Numerical Data Properties Central Tendency Variation Shape Mean Range Median Interquartile Range Mode Variance Skew Standard Deviation 1 - 30 Central Tendency 1 - 31 Numerical Data Properties & Measures Numerical Data Properties Central Tendency Variation Shape Mean Range Median Interquartile Range Mode Variance Skew Standard Deviation 1 - 32 What’s wrong with this? Measurements 1 4 2 9 8 Middle measurement is 2, so that’s the median X i X1 X 2 X n X i 1 n n 1 4 2 9 8 5 24 / 5 1 - 33 4.8 Ages Mean = 29 Median = 27 2* 3* 4* 5* 6* 7* | 22233444555777899 | 01257 | | | | 6 1 - 34 Summary of Central Tendency Measures Measure Equation Mean Xi / n Median (n+1) Position 2 Mode none 1 - 35 Description Balance Point Middle Value When Ordered Most Frequent Shape 1 - 36 Numerical Data Properties & Measures Numerical Data Properties Central Tendency Variation Shape Mean Range Median Interquartile Range Mode Variance Skew Standard Deviation 1 - 37 Shape 1. Describes How Data Are Distributed 2. Measures of Shape Skew = Symmetry Left-Skewed Mean Median Mode 1 - 38 Symmetric Mean = Median = Mode Right-Skewed Mode Median Mean Variation 1 - 39 Numerical Data Properties & Measures Numerical Data Properties Central Tendency Variation Shape Mean Range Median Interquartile Range Mode Variance Skew Standard Deviation 1 - 40 Quartiles 1. Measure of Noncentral Tendency 2. Split Ordered Data into 4 Quarters 25% 25% Q1 25% Q2 3. Position of i-th Quartile 25% Q3 i (n 1) Positioning Point of Qi 4 1 - 41 Ages Range Quartiles 2* 3* 4* 5* 6* 7* | 22233444555777899 | 01257 | | | | 6 1 - 42 Box Plots - Age and Salary Quartiles: 41K, 50K, 60K Inner fences: ?? Outer fences: ?? 50,000 1 - 43 40,000 20 40 60,000 60 70,000 80 80,000 Quartiles: 24, 27, 30 Inner fences: (15,39) Outer fences: (6, 48) Variance & Standard Deviation 1. Measures of Dispersion 2. Most Common Measures 3. Consider How Data Are Distributed 4. Show Variation About Mean (X or ) X = 8.3 4 6 1 - 44 8 10 12 Sample Variance Formula n S 2 (X i X) i 1 n 1 2 2 n - 1 in denominator! (Use N if Population Variance) 2 (X1 X) (X 2 X) ... (Xn X) 1 - 45 n 1 2 Equivalent Formula n xi x s i 1 2 2 n 1 n 2 2 xi 2 xi x x i 1 n 1 2 2 2 2 xi 2 xi x x xi 2 x xi n x n 1 2 2 xi 2 xn x n x n 1 1 - 46 n 1 2 2 xi n x n 1 Another Equivalent Formula 2 2 2 xi n x s n 1 1 - 47 xi 2 xi n n n 1 2 x i xi 2 n 1 n 2 Empirical Rule If x has a “symmetric, mound-shaped” distribution Pr xi 32% Pr xi 2 5% Pr xi 3 0.3% Justification: Known properties of the “normal” distribution, to be studied later in the course 1 - 48 Preview of Statistical Inference You observe one data point Make hypothesis about mean and standard deviation from which it was drawn Empirical Rule tells you how (un)likely the data point is If very unlikely, you are suspicious of the hypothesis about mean and standard deviation, and reject it 1 - 49 Summary of Variation Measures Measure Range Interquartile Range Equation Xlargest - Xsmallest Total Spread Q3 - Q1 Standard Deviation (Sample) X Standard Deviation (Population) Xi Variance (Sample) 1 - 50 Description i X Spread of Middle 50% 2 n 1 X 2 Dispersion about Sample Mean Dispersion about Population Mean N (Xi -X )2 n-1 Squared Dispersion about Sample Mean Z-scores Number of standard deviations from the mean xi zi 1 - 51 Conclusion 1. 2. 3. 4. 5. Described Qualitative Data Graphically Described Numerical Data Graphically Created & Interpreted Graphical Displays Explained Numerical Data Properties Described Summary Measures 6. Analyzed Numerical Data Using Summary Measures 1 - 52