Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Descriptive Statistics A.A. Elimam College of Business San Francisco State University Statistics The Science of collecting, organizing, analyzing, interpreting and presenting data Topics • Descriptive Statistics • Frequency Distributions and Histograms Relative / Cumulative Frequency • Measures of Central Tendency Mean, Median, Mode, Midrange Topics • Measures of Dispersion (Variation) Range, Standard Deviation, Variance and Coefficient of variation • Shape Symmetric, Skewed, using Box-andWhisker Plots • Quartile • Statistical Relationships Correlation , Covariance Descriptive Statistics A collection of quantitative measures and ways of describing data. This includes: Frequency distributions & histograms, measures of central tendency and measures of dispersion Descriptive Statistics •Collect Data e.g. Survey •Present Data e.g. Tables and Graphs •Characterize Data e.g. Mean xi n A Characteristic of a: Population is a Parameter Sample is a Statistic. Summary Measures Summary Measures Central Tendency Mean Quartile Mode Median Range Midrange Variation Coefficient of Variation Variance Standard Deviation Measures of Central Tendency Central Tendency Mean Median Mode n xi i 1 n Midrange The Mean (Arithmetic Average) •It is the Arithmetic Average of data values: x Sample Mean n xi i 1 n xi x2 xn n •The Most Common Measure of Central Tendency •Affected by Extreme Values (Outliers) 0 1 2 3 4 5 6 7 8 9 10 Mean = 5 0 1 2 3 4 5 6 7 8 9 10 12 14 Mean = 6 The Median •Important Measure of Central Tendency •In an ordered array, the median is the “middle” number. •If n is odd, the median is the middle number. •If n is even, the median is the average of the 2 middle numbers. •Not Affected by Extreme Values 0 1 2 3 4 5 6 7 8 9 10 Median = 5 0 1 2 3 4 5 6 7 8 9 10 12 14 Median = 5 The Mode •A Measure of Central Tendency •Value that Occurs Most Often •Not Affected by Extreme Values •There May Not be a Mode •There May be Several Modes •Used for Either Numerical or Categorical Data 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Mode = 9 0 1 2 3 4 5 6 No Mode Midrange •A Measure of Central Tendency •Average of Smallest and Largest Observation: Midrange x l arg est x smallest 2 •Affected by Extreme Value 0 1 2 3 4 5 6 7 8 9 10 Midrange = 5 0 1 2 3 4 5 6 7 8 9 10 Midrange = 5 Quartiles • • Not a Measure of Central Tendency Split Ordered Data into 4 Quarters 25% 25% Q1 • 25% Q2 Position of i-th Quartile: 25% Q3 position of point Qi i(n+1) 4 Data in Ordered Array: 11 12 13 16 16 17 18 21 22 Position of Q1 = 1•(9 + 1) 4 = 2.50 Q1 =12.5 Quartiles • • Not a Measure of Central Tendency Split Ordered Data into 4 Quarters 25% 25% Q1 • 25% Q2 Position of i-th Quartile: 25% Q3 position of point Qi i(n+1) 4 Data in Ordered Array: 11 12 13 16 16 17 18 21 22 Position of Q3 = 3•(9 + 1) 4 = 7.50 Q3 =19.5 Summary Measures Summary Measures Central Tendency Mean Median n xi i 1 n Mode Midrange Quartile Range Variance x i x s n 1 2 2 Variation Coefficient of Variation Standard Deviation Measures of Dispersion (Variation) Variation Variance Range Population Variance Sample Variance Standard Deviation Population Standard Deviation Sample Standard Deviation Coefficient of Variation S CV X 100% Understanding Variation • The more Spread out or dispersed data the larger the measures of variation • The more concentrated or homogenous the data the smaller the measures of variation • If all observations are equal measures of variation = Zero • All measures of variation are Nonnegative The Range • Measure of Variation • Difference Between Largest & Smallest Observations: Range = x La rgest x Smallest • Ignores How Data Are Distributed: Range = 12 - 7 = 5 Range = 12 - 7 = 5 7 8 9 10 11 12 7 8 9 10 11 12 Variance •Important Measure of Variation •Shows Variation About the Mean: 2 2 Xi •For the Population: N •For the Sample: X i X s n1 2 2 For the Population: use N in the denominator. For the Sample : use n - 1 in the denominator. Standard Deviation •Most Important Measure of Variation •Shows Variation About the Mean: •For the Population: •For the Sample: s For the Population: use N in the denominator. 2 X i N X i X n 1 2 For the Sample : use n - 1 in the denominator. Sample Standard Deviation X i X n1 2 s Data: Xi : 10 12 n=8 s= For the Sample : use n - 1 in the denominator. 14 15 17 18 18 24 Mean =16 (10 16)2 (12 16)2 (14 16)2 (15 16)2 (17 16)2 (18 16)2 (24 16)2 81 = 4.2426 Comparing Standard Deviations Data : X i : 10 N= 8 12 14 15 17 18 18 24 Mean =16 s = X i X n 1 X i N 2 = 4.2426 = 3.9686 2 Value for the Standard Deviation is larger for data considered as a Sample. Comparing Standard Deviations Data A 11 12 13 14 15 16 17 18 19 20 21 Mean = 15.5 s = 3.338 Data B 11 12 13 14 15 16 17 18 19 20 21 Mean = 15.5 s = .9258 Data C 11 12 13 14 15 16 17 18 19 20 21 Mean = 15.5 s = 4.57 Coefficient of Variation •Measure of Relative Variation •Always a % •Shows Variation Relative to Mean •Used to Compare 2 or More Groups •Formula ( for Sample): S CV 100% X Comparing Coefficient of Variation Stock A: Average Price last year = $50 Standard Deviation = $5 Stock B: Average Price last year = $100 Standard Deviation = $5 S CV 100% X Coefficient of Variation: Stock A: CV = 10% Stock B: CV = 5% Shape • • Describes How Data Are Distributed Measures of Shape: Symmetric or skewed Shape • • Describes How Data Are Distributed Measures of Shape: Symmetric or skewed -0.5 <0 < 0.5 Symmetric Mean = Median = Mode Shape • • Describes How Data Are Distributed Measures of Shape: Symmetric or skewed < -1 -0.5 <0 < 0.5 Left-Skewed Symmetric Mean Median Mod e Mean = Median = Mode Shape • • Describes How Data Are Distributed Measures of Shape: Symmetric or skewed < -1 -0.5 <0 < 0.5 Left-Skewed Symmetric Mean Median Mod e Mean = Median = Mode >1 Right-Skewed Mode Median Mean Box-and-Whisker Plot Graphical Display of Data Using 5-Number Summary X smallest Q1 Median Q3 4 6 8 10 Xlargest 12 Distribution Shape & Box-and-Whisker Plots Left-Skewed Q1 Median Q3 Symmetric Q1 Median Q3 Right-Skewed Q1 Median Q3 Correlation A measure of the strength of linear relationship between two variables X and Y , and is measured by the (population) correlation coefficient: cov X , Y xy x y The numerator is the covariance Covariance The average of the products of the deviations of each observation from its respective mean: x y N cov X , Y i 1 i x N i y Sample Correlation Coefficient xi x y i i 1 r n 1 s x s y n y Correlation Coefficient ranges from –1 to +1 +1 perfect positive correlation 0 no linear correlation -1 perfect negative correlation Summary • Discussed Measures of Central Tendency Mean, Median, Mode, Midrange • Quartiles • Addressed Measures of Variation The Range, Interquartile Range, Variance, Standard Deviation, Coefficient of Variation • Determined Shape of Distributions Symmetric, Skewed, Box-and-Whisker Plot Mean Median Mode Mean = Median = Mode Mode Median Mean