Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore Index Subject 1. Index 2. Index 3. Statistics (Definitions) 4. Descriptive Statistics 5. Inferential Statistics 6. Examples of 4 and 5 7. Data, Level of measurements 8. Variable 9. Discrete variable 10. Continues variable 11. Frequency Distribution 12. Constructing Freq. Distn. 13. Example of 12 14. Displaying the Data 15. Bar Chart, Pie Chart 16. Stem Leaf Plot 17. Graph 18. Histogram 19. Frequency Polygon 20. Cumulative Freq. Polygon Slide No. 2 3 4 5 11 14 15 8 10 9 6 22, 23 24, 25 7 16 32-34 17 26, 27 28, 29 30, 31 Subject Slide No. 21. Summary Measures 18 22. Goals 19 23. Arithmetic Mean 37, 40 24. Characteristic of Mean 20 25. Examples of 23 38-39 26. Weighted Mean 41 27. Example weighted Mean 42 28. Geometric Mean 43 29. Example: Geometric Mean 44 30. Median 45 31. Example of Median 46 32. Properties of Median 47 33. Mode 48 34. Examples of Mode 49-50 35. Positions of mean, median and mode. 51 36. Dispersion 52 37. Range and Mean Deviation 53 39. Example of Mean Deviation 54-55 40. Variance 56 Index Subject 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. Examples of variance Moments Examples of Moments Skewness Types of Skewness Coefficient of Skewness Example of skewness Empirical Rule Exercise Slide No. 57-59 60 61-62 63 64 65 66-67 68-69 70 Subject 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 75. 79. 80. Slide No. STATISTICS Numerical Facts (Common Usage) 1. No. of children born in a hospital in some specified time. 2. No. of students enrolled in GCU in 2007. 3. No of road accidents on motor way. 4. Amount spent on Research Development in GCU during 2006-2007. 5. No. of shut down of Computer Network on a particular day. Examples of Descriptive And Inferential Statistics Field or Discipline of Study Definition The Science of Collection, Presentation, Analyzing and Interpretation of Data to make Decisions and Forecasts. Probability provides the transition between Descriptive and Inferential Statistics Descriptive Statistics Inferential Statistics 1 Descriptive Statistics Consists of methods for Organizing, Displaying, and Describing Data by using Tables, Graphs, and Summary Measures. Data Types of Data A data set is a collection of observations on one or more variables. 1 1 Organizing the Data Construction of Frequency Distribution Tables Frequency Table A grouping of qualitative data into mutually exclusive classes showing the number of observations in each class. Preference of four type of beverage by 100 customers. Beverage Number Cola-Plus 40 Coca-Cola 25 Pepsi 20 7-UP 15 Frequency Distribution A grouping of quantitative data into mutually exclusive classes showing the number of observations in each class. Selling price of 80 vehicles Vehicle Selling Number of Price Vehicles 15000 to 24000 48 24000 to 33000 30 33000 to 42000 2 Displaying the Data Diagrams/Charts Stem and Leaf Plot Graph Histogram Frequency Polygon Bar Chart Pie Chart 1 Variable A characteristic under study that assumes different values for different elements. (e.g Height of persons, no. of students in GCU ) Quantitative Variable Qualitative or Categorical variable A variable that can not assume a numerical value but can be classified into two or more non numeric categories is called qualitative or categorical variable. Educational achievements Marital status Brand of PC A variable that can be measured numerically is called quantitative variable. Continuous variable Discrete variable 1 Go to Descriptive Statistics Continuous variable A variable whose observations can assume any value within a specific range. Amount of income tax paid. Weight of a student. Yearly rainfall in Murree. Time elapsed in successive network breakdown. Back 1 Discrete variable Variable that can assume only certain values, and there are gaps between the values. Children in a family Strokes on a golf hole TV set owned Cars arriving at GCU in an hour Students in each section of statistics course 1 Back Inferential Statistics Consists of methods, that use sample results to help make decisions or predictions about population. 1 Sample 1. A portion of population selected for study. 2. A sub set of Data selected from a population. Estimation Point Estimation Selecting a Sample Testing of Hypothesis Interval Estimation Go to Inferential Statistics 1 Population 1. Consists of all-individual items or objects-whose characteristics are being studied. 2. Collection of Data that describe some phenomenon of interest. Examples Finite Population Infinite Population Length of fish in particular lake. No. of students of Statistics course in BCS. No. of traffic violations on some specific holiday. Depth of a lake from any conceived position. Length of life of certain brand of light bulb. Stars on sky. Go to Inferential Statistics 1 Descriptive and Inferential Statistics Examples Descriptive 1. At least 5% of all fires reported last year in Lahore were deliberately set. 2. Next to colonial homes, more residents in specified locality prefer a contemporary design. Inferential 1. As a result of recent poll, most Pakistanis are in favor of independent and powerful parliament. 2. As a result of recent cutbacks by the oil-producing nations, we can expect the price of gasoline to double in the next year. 1 1 Types of Data Data can be classified according to level of measurement. The level of measurement dictates the calculations that can be done to summarize and present the data. It also determines the statistical tests that should be performed. Level of measurement Nominal Ordinal Data may only be classified Data are ranked no meaningful difference between values Jersey numbers of football player. Make of car. Your rank in class. Team standings. Interval Meaningful difference between values. Temperature Dress size Ratio Meaningful 0 point and ratio between values. No. of patients seen No of sales call made Distance students travel to class Diagrams/Charts Bar Chart Pie Chart A graph in which the classes are reported on the horizontal axis and the class frequencies on vertical axis. The class frequencies are proportional to the heights of the bars. A chart that shows the proportion or percent that each class represents of the total number of frequencies. f White Fusion red Magnetic lime 600 400 200 0 Bright white No. of Covers(Class Frequency) Covers for Cell phones Cover Color(variable of interest) 130 36 Black 104 29 Lime 325 90 Orange 455 126 Red 286 79 1300 360 n= Back Angle Red 22% Orange 35% White 10% Black 8% Lime 25% Angle = (f/n)360 1 Graphs Histogram Frequency Polygon Cumulative Frequency Polygon Go to Descriptive Statistics 1 Describing the Data Summary Measures Measures of Location Goals Measures of Dispersion Moments Arithmetic Mean Weighted Arithmetic Mean Geometric Mean Median Mode Moments about Origin Moments about mean Range, Mean Deviation Variance, Standard Deviation Skewness 1 Summary Measures Goals Calculate the arithmetic mean, weighted mean, median, mode, and geometric mean. Explain the characteristics, uses, advantages, and disadvantages of each measure of location. Identify the position of the mean, median, and mode for both symmetric and skewed distributions. Compute and interpret the range, mean deviation, variance, and standard deviation. Understand the characteristics, uses, advantages, and disadvantages of each measure of dispersion. Understand Chebyshev’s theorem and the Empirical Rule as they relate to a set of observations. 1 Characteristics of the Mean The arithmetic mean is the most widely used measure of location. It requires the interval scale. Its major characteristics are: All values are used. It is unique. The sum of the deviations from the mean is 0. It is calculated by summing the values and dividing by the number of values. Every set of interval-level and ratio-level data has a mean. All the values are included in computing the mean. A set of data has a unique mean. The mean is affected by unusually large or small data values. The arithmetic mean is the only measure of central tendency where the sum of the deviations of each value from the mean is zero. 1 Selecting a Sample Use of Tables of Random Numbers Random numbers are the randomly produced digits from 0 to 9. Table of random numbers contain rows and columns of these randomly produced digits. In using Table, choose: the starting point at random read off the digits in groups containing either one, two, three, or more of the digits in any predetermined direction (rows or columns). Example Choose a sample of size 7 from a group of 80 objects. Label the objects 01, 02, 03, …, 80 in any order. Arbitrarily enter the Table on any line and read out the pair of digits in any two consecutive columns. Go to Sample Ignore numbers which recur and those greater than 80. 1 Construction of Frequency Distribution Step 1 Step 2 How many no. of groups (classes)? Determine the class interval (width). Just enough classes to reveal the the class interval should be the same shape of the distribution. Let k be the desired no. of classes. k should be such that 2k > n. If n = 80 and we choose k = 6, 26 = then 64 which is < 80, so k = 6 is not desirable. If we take k = 7, then 27 = 128, which is > 80, so no. for all classes. The formula to determine class width: i where i is the class width, H is the highest observed value, L is the lowest observed value, and k is the number of classes. Next of classes should be 7. 1 H L k Construction of Frequency Distribution (continued) Step 3 Set the individual class limits. Class limits should be very clear. Class limits should not be overlapping. Some time class width is rounded which may increase the range H-L. Make the lower limit of the first class a multiple of class width. Step 4 Make tally of observations falling in each class. Step 5 Count the number of items in each class (class frequency) Back Example 1 Construction of Frequency Distribution ( Example ) Raw Data ( Ungrouped Data ) 23197 23372 20454 23591 24220 30655 22442 17891 18021 28683 30872 19587 21558 21639 24296 15935 20047 24285 24324 24609 26651 29076 20642 19889 19873 25251 25277 28034 23169 28337 17399 20895 20004 17357 20155 19688 28670 20818 19766 21981 20203 23765 25783 26661 24533 27453 32492 17968 24052 25799 15794 18263 23657 35851 20642 20633 20356 21442 21722 19331 32277 15546 29237 18890 20962 22845 26285 27896 35925 27443 17266 23613 21740 22374 24571 25449 22817 26613 19251 20445 Back Continued 1 Construction of Frequency Distribution ( Example Continued ) Following Step 1, with n = 80 k should be 7. Following Step 2 the class width should be 2911. The width size is usually rounded up to a number multiple of 10 or 100. The width size is taken as i = 3000. Following Step 3, with i = 3000 and k = 7, the range is 7×3000=21000. Where as the actual range is H – L = 35925 - 15546 = 20379. The lower limit of the first class should be a multiple of class width. Thus the lower limit of starting class is taken as 15000. Following Step 4 and Step 5 Back Selling Price Frequency 15000 up to 18000 8 18000 up to 21000 23 21000 up to 24000 17 24000 up to 27000 18 27000 up to 30000 8 30000 up to 33000 4 33000 up to 36000 2 1 Total = 80 Histogram A graph in which the classes are marked on the horizontal axis and the class frequencies on the vertical axis. The class frequencies are represented by the heights of the bars and the bars are drawn adjacent to each other. Example 1 k = 6 cf Histogram (Example 1) Group H f 1.6 - 2.2 2.1 2 2 35 30 2.2 – 2.8 2.7 6 4 25 20 2.8 - 3.4 3.3 19 13 15 10 3.4 – 4.0 3.9 32 13 5 0 1.60 4.0 - 4.6 4.5 38 6 4.6 - 5.2 5.1 40 2 Next 2.20 2.80 3.40 4.00 4.60 5.20 Groups 1 Histogram Example 1 k = 7 Group 1.5 - 2.0 H cf f 2 2 2 Histogram (Example 1) 40 2.5 4 2 2.5- 3.0 3 9 5 3.0 - 3.5 3.5 24 15 3.5- 4.0 4 32 8 30 Percent 2.0 - 2.5 20 10 0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 4.0 - 4.5 4.5 38 6 4.5 - 5.0 5 40 2 Back Groups 1 FrequencyPolygon A graph in which the points formed by the intersections of the class midpoints and the class frequencies are connected by line segments. Frequency Polygon (Example 1) Example 1 k = 6 Group Mid pt cf 35.0 f 3.10 3.70 30.0 1.9 2 2 2.2 - 2.8 2.5 6 4 2.8 - 3.4 3.1 19 13 25.0 Percent 1.6 - 2.2 20.0 15.0 4.30 10.0 3.4 – 4.0 3.7 32 13 4.0 - 4.6 4.3 38 6 2.50 5.0 1.90 0.0 1 4.6 - 5.2 4.9 Mid point = ( Li +Hi )/2 40 4.90 2 2 3 3 4 5 Raw Data 2 Back 1 Frequency Polygon Continued Example 1 k = 7 Mid pt cf f 1.5 – 2.0 1.75 2 2 2.0 - 2.5 2.25 3 1 2.5 – 3.0 2.75 3.0 - 3.5 3.25 7 22 4 15 3.5 – 4.0 3.75 32 10 4.0 - 4.5 4.25 37 5 Frequency Polygon (Example 1) 40.0 3.25 35.0 30.0 Percent Group 25.0 3.75 20.0 15.0 10.0 4.25 2.75 5.0 1.75 0.0 1 2 4.75 2.25 3 4 Data Example1 4.5 – 5.0 4.75 Back 40 3 1 Cumulative Frequency Polygon A graph in which the points formed by the intersections of the class midpoints and the class cumulative frequencies are connected by line segments. A cumulative frequency polygon portrays the number or percent of observations below given value. Example 1 k = 6 1.6 - 2.2 Mid pt cf f 1.9 2 2 2.2 - 2.8 2.5 6 4 2.8 - 3.4 3.1 19 13 3.4 – 4.0 3.7 32 13 4.0 - 4.6 4.3 38 6 4.6 - 5.2 4.9 40 2 100.0 Cumulative Percent Group Ogive Example 1 5.20 4.60 4.00 75.0 50.0 3.40 25.0 2.80 2.20 0.0 1 2 2 3 3 4 5 Data Example 1 Next 1 Cumulative Frequency Polygon Continued Example 1 K = 7 Mid pt cf f 1.5 – 2.0 1.75 2 2 2.0 - 2.5 2.25 3 1 2.5 – 3.0 2.75 7 4 3.0 - 3.5 3.25 22 15 3.5 – 4.0 3.75 32 10 Ogive Example 1 100.0 Cumulative Percent Group 5.00 4.50 4.00 75.0 3.50 50.0 25.0 3.00 2.00 2.50 0.0 1 4.0 - 4.5 4.25 37 5 4.5 – 5.0 4.75 40 3 Back 2 3 4 Data Example 1 1 Stem and Leaf Plot What is A Stem and Leaf Plot Diagram? What Are They Used For? A Stem and Leaf Plot is a type of graph that is similar to a histogram but shows more information. Summarizes the shape of a set of data. provides extra detail regarding individual values. The data is arranged by placed value. Stem and Leaf Plots are great organizers for large amounts of information. The digits in the largest place are referred to as the stem. The digits in the smallest place are referred to as the leaf The leaves are always displayed to the left of the stem. Series of scores on sports teams, series of temperatures or rainfall over a period of time, series of classroom test scores are examples of when Stem and Leaf Plots could be used. Constructing Stem and Leaf Plot 1 Constructing Stem and Leaf Plot Make Stem and Leaf Plot with the following temperatures for June. 77 80 82 68 65 59 61 57 50 62 61 70 69 64 67 70 62 65 65 73 76 87 80 82 83 79 79 71 80 77 Stem (Tens) and Leaf (Ones) Temperature Stem (Tens) Leaf (Ones) 5 079 6 11224 555789 7 001367799 8 0002237 1 Begin with the lowest temperature. The lowest temperature of the month was 50. Enter the 5 in the tens column and a 0 in the ones. The next lowest is 57. Enter a 7 in the ones Next is 59, enter a 9 in the ones. find all of the temperatures that were in the 60's, 70's and 80's. Enter the rest of the temperatures sequentially until your Stem and Leaf Plot contains all of the data. Next Stem and Leaf Example Make a Stem and Leaf Plot for the following data. Freq Stem 6 0 234479 14 1 12233456778889 17 2 00111334455667889 2.4 0.7 3.9 2.8 1.3 1.6 2.9 2.6 3.7 2.1 3.2 3.5 1.8 3.1 0.3 4.6 0.9 3.4 2.3 2.5 0.4 2.1 2.3 1.5 4.3 8 3 12455799 1.8 2.4 1.3 2.6 1.8 2 4 36 2.7 0.4 2.8 3.5 1.4 1.7 3.9 1.1 5.9 2.0 2 5 39 5.3 6.3 0.2 2.0 1.9 1 6 3 1.2 2.5 2.1 1.2 1.7 1 50 Next Leaf Back Stem and Leaf Plot Example Following are the car battery life Data. 2.2 4.1 3.5 4.5 3.2 3.7 3 2.6 3.1 1.6 3.1 3.3 3.8 3.1 4.7 3.7 f S L 2 1 69 2 2566 9 25 3 00111112223334455677788 99 8 4 11234577 5 2.5 4.3 3.4 3.6 2.9 3.3 3.9 3.1 3.3 3.1 3.7 4.4 3.2 4.1 1.9 3.4 4.7 3.8 3.2 2.6 3.9 3 4.2 3.5 40 Make a Stem and Leaf Plot. 1 Next Back Stem and Leaf Plot Example Stem 1 69 1 2 2 4 2 5669 15 3 001111122233344 10 3 5567778899 5 4 11234 3 4 577 Leaf Frequenc y 2 Go to Stem and Leaf Plot 40 Back 1 Measures of Location 1 Point of Equilibrium Arithmetic Mean Ungrouped Data Population N observations X1, X2,…, XN in the population. Grouped Data Sample n observations X1, X2 ,…, Xn in the sample n N X i 1 N i X X 2 ... X N 1 N X X i 1 i Population Let Xi and fi be the mid point and frequency respectively of the ith group in the population The mean is defined as Let Xi and fi be the mid point and frequency respectively of the ith group in the sample The mean is defined as n N n Next Sample fX i 1 N i f i 1 i i X fX i 1 n i f i 1 i i Numerical Examples Of Arithmetic Mean Ungrouped Data Example of Sample Mean Following is a random sample of 12 Clients showing the number of minutes used by clients in a particular cell phone last month. Example of Population Mean There are automobile manufacturing Companies in the U.S.A. Listed below is the no. of patents granted by the US Government to each company. Number of 90 110 89 113 91 94 100 112 77 92 119 83 Company What is the mean number of Minutes Used? X X n 1 90 91 77 ... 83 1170 97.5 12 12 Next Patent Granted Company Patent Granted General Motors 511 Mazda Nissan 385 Chrysler 97 DaimlerChrysler 275 Porsche 50 Toyota 257 Mistubishi 36 Honda 249 Volvo 23 Ford 234 BMW 13 210 Is this information a sample or population? Back Number of X N 511 385 ... 13 2340 195 12 12 Numerical Examples Of Arithmetic Mean Grouped Data Following is the frequency distribution of Selling Prices of Vehicles at Whitner Autoplex Last month. Selling Price Frequency Midpoint ($ thousands) f X fX 15 - 18 8 16.5 132.0 18 - 21 23 19.5 448.5 21 - 24 17 22.5 382.5 24 - 27 18 25.5 459.0 27 - 30 8 28.5 228.0 30 - 33 4 31.5 126.0 33 - 36 2 34.5 69.0 Total Find arithmetic mean. X 1845.0 fX 1845 23.1 f 80 So the mean vehicle selling price is $23100. Back 80 Go to Summary measures 1 Point of Equilibrium X1 X2 X3 f1 f2 f3 X4 X5 X f 4 f5 X6 f6 An object is balanced at X when ( X 1 X ) f1 ( X 2 X ) f 2 ( X 3 X ) f 3 ( X X 4 ) f 4 ( X X 5 ) f 5 ( X X 6 ) f 6 f1 X 1 f 2 X 2 f3 X 3 ( f1 f 2 f3 ) X ( f 4 f 5 f 6 ) X ( f 4 X 4 f 5 X 5 f 6 X 6 ) f1 X 1 f 2 X 2 f3 X 3 f 4 X 4 f 5 X 5 f 6 X 6 ( f1 f 2 f 3 f 4 f 5 f 6 ) X X f1 X 1 f 2 X 2 f3 X 3 f 4 X 4 f 5 X 5 f 6 X 6 f1 f 2 f3 f 4 f5 f 6 6 fX i 1 6 i f i 1 i i Back 1 Summary Measures Weighted Mean A special case of arithmetic mean. Case when values of variable are associated with certain quality, e.g price of medium, large, and big Soft Drink Price Weights Medium $0.90 3 Large $1.25 4 Big $1.50 3 The weight mean of a set of numbers X1, X2, ..., Xn, with corresponding weights w1, w2, ...,wn, is computed from the following formula: Xw w1 X 1 w2 X 2 ... wn X n w1 w2 ... wn n w X i 1 n w i 1 EXAMPLE Weighted Mean 1 i i i EXAMPLE Weighted Mean The Carter Construction Company pays its hourly employees $16.50, $19.00, or $25.00 per hour. There are 26 hourly employees, 14 of which are paid at the $16.50 rate, 10 at the $19.00 rate, and 2 at the $25.00 rate. What is the mean hourly rate paid the 26 employees? Go to Summary measures Back 1 Summary Measures Geometric Mean The geometric mean of a set of n positive numbers is defined as the nth root of the product of n values. The formula for the geometric mean is written: GM ( X1 X 2 ... X n ) 1 n The geometric mean used as the average percent increase over time n is calculated as: GM n Value at the end of period Value at the start of period Useful in finding the average change of percentages, ratios, indexes, or growth rates over time. It has a wide application in business and economics because we are often interested in finding the percentage changes in sales, salaries, or economic figures, such as the GDP, which compound or build on each other. The geometric mean will always be less than or equal to the arithmetic mean. Example 1 Example of Geometric Mean The return on investment by certain Company for four successive years was 30%, 20%, -40%, and 200%. Find the geometric mean rate of return on investment. Solution: The 1.3 represents the 30 percent return on investment, i.e original Investment of 1.0 plus the return of 0.3. So If you earned $30000 in 1997 and $50000 in 2007, what is your annual rate of increase over the period? GM n Value at the end of period Value at the start of period GM n 50000 1 0.0524 30000 GM 4 (1.3)(1.2)(0.6)(3.0) 1.294 The annual rate of increase is 5.24 percent. Which shows that the average return is 29.4 percent. Summary Measures Back 1 Median Median is the midpoint of the values after they have been ordered from the smallest to the largest, or the largest to the smallest If number of observations n is odd, the median is( n+1)/2th observation. If n is even the median is the average of n/2th and (n/2+1)th observations Example: Determine the median for each set of data. (1) 41 15 39 54 31 15 33 (2) 15 16 27 28 41 42 Arrange the set of data 1) (1) 15 15 31 33 39 41 54 (2) 15 16 27 28 41 42 n=7 median is 4th observation that is 33. 2) n=6, median is average of 3rd and 4th observation, that is (27+28)/2 = 27.5. Median for Grouped Data The median is obtained by using the formula: X Lm Im n ( cf m 1 ) fm 2 Where m is the group of n/2th obs. Lm, Im, fm, and cfm-1 are the lowest value, class width, frequency, and cumulative frequency respectively of the mth group. Example 1 Example (Median) Find the Median for the following data. Example 1 L H f cf 1.60 < 2.20 2 2 2.20 < 2.80 4 6 2.80 < 3.40 13 19 3.40 < 4.00 13 32 4.00 < 4.60 6 38 4.60 < 5.20 2 40 n/2 = 20, so median group is 3.40-4.00 Lm = 3.40, Im = 0.6, fm = 13, cfm-1 = 19 0.6 X 3.40 (20 19) 3.45 3.5 13 Back Go to Summary Measures 1 Properties of the Median There is a unique median for each data set. It is not affected by extremely large or small values and is therefore a valuable measure of central tendency when such values occur. It can be computed for ratio-level, interval-level, and ordinal-level data. It can be computed for an open-ended frequency distribution if the median does not lie in an open-ended class. Go to Summary Measures 1 Mode Region No. of Seniors The mode is the value of the observation that appears most frequently. No. of Seniors 679 E.S.Central 196 W.S.Central 436 Mountain 346 Pacific 783 Pacific S.Atlantic Mountain 367 W.S.Central W.N.Central E.S.Central 815 S.Atlantic E.N.Central W.N.Central 818 E.N.Central Middle Atlantic Middle Atlantic 524 New England New England 900 800 700 600 500 400 300 200 100 0 Regions Next M o d e 1 Mode (Example) Next Back 1 Mode Grouped Data Calculating Mode for Grouped Data. Mode Lm f m f m 1 Im ( f m f m 1 ) ( f m f m 1 ) Calculate the mode of the following Distribution. Group f 1.6 - 2.2 2 2.2 - 2.8 4 2.8 - 3.4 14 3.4 - 4.0 12 4.0 - 4.6 6 4.6 - 5.1 2 Back Solution: Modal Group is 2.8 - 3.4 fm = 14, fm-1 = 4, fm+1 = 12 and Im= 0.6 Mode Lm f m f m 1 Im ( f m f m 1 ) ( f m f m 1 ) 2.8 14 4 0.6 (14 4) (14 12) 3.3 Go to Summary Measures 1 The Relative Positions of the Mean, Median and the Mode Go to Summary Measures 1 Dispersion Why Study Dispersion? A measure of location, such as the mean or the median, only describes the center of the data. It is valuable from that standpoint, but it does not tell us anything about the spread of the data. For example, if your nature guide told you that the river ahead averaged 3 feet in depth, would you want to wade across on foot without additional information? Probably not. You would want to know something about the variation in the depth. A second reason for studying the dispersion in a set of data is to compare the spread in two or more distributions. Studying dispersion through display. Next 1 Range and Mean Deviation Range Range = Largest value – Smallest value Mean Deviation n M .D X i 1 i X n Example The number of cappuccinos sold at the Starbucks location in the Orange Country Airport between 4 and 7p.m. for a sample of 5 days last year were 20, 40, 50, 60, and 80. Determine the mean deviation for the number of cappuccinos sold. Range = Largest – Smallest value = 80 – 20 = 60 Next Back 1 Mean Deviation Example Solution Number of Cappuccinos Example The number of cappuccinos sold at he Starbucks location in the Orange Country Airport between 4 and 7 p.m. for a sample of 5 days last year were 20, 40, 50,60, and 80. Determine the mean deviation for the number of cappuccinos sold. n M .D X i 1 i n X Sold Daily ( X ) Absolute Deviation X X X X 20 20 - 50 = 30 30 40 40 - 50 = 10 10 50 50 - 50 = 0 0 60 60 - 50 = 10 10 80 80 - 50 = 30 30 80 16 5 Total Next Back 80 1 Mean Deviation (Grouped Data) Mean Deviation for Grouped Data k MD i 1 fi X i X k f i 1 Selling Price ($ thousands) 15 - 18 i 18 - 21 21 - 24 24 - 27 Frequency f 8 23 17 18 27 - 30 30 - 33 33 - 36 8 4 2 Total 80 X fX f 1845 23.1 80 Back X 16.5 19.5 22.5 25.5 28.5 31.5 34.5 f 8 23 17 18 8 4 2 X X X 16.5 19.5 22.5 25.5 28.5 31.5 34.5 f X X -6.6 -3.6 -0.6 2.4 5.4 8.4 11.4 52.8 82.8 10.2 43.2 43.2 33.6 22.8 Total k MD f i 1 Xi X i k f i 1 Go to Summary Measures 288.6 288.6 3.61 80 i 1 Variance and Standard Deviation Population variance and standard deviation. Let X1, X2,…, XN be N observations in the population. The variance is defined as: N 2 (X i 1 )2 i n N The standard deviation is defined as: N (X i 1 The sample variance and Standard deviation. Let X1, X2,…, Xn be n observations in the sample. The variance is defined as: i )2 s2 (X i 1 X )2 n 1 The standard deviation is defined as: n N s Next i (X i 1 i X )2 n 1 1 Example Variance and standard deviation The number of traffic citations issued during the last five months in Beaufort County, South Carolina, is 38, 26, 13, 41, and 22. What is the population variance? The hourly wages for a sample of part-time employees at Home Depot are: $12, $20, $16, $18, and $19. What is the sample variance? X Hourly Wage 85 17.0 5 2 $ ( X ) X X (X X ) Next Back 2 12 -5 25 20 3 9 16 -1 1 18 1 1 19 2 4 85 0 40 n s2 (X i 1 i X )2 n 1 40 10.0 4 Example Grouped Data The sample standard deviation is defined as: f ( X X )2 s ( f ) 1 Example: For the following frequency distribution of prices of vehicle, compute the standard deviation of the prices. Next Back 2 Example (continued) Alternate method of computing variance is: Example fX2 Group Mid pt (X) f 1.5- 2.0 1.75 2 3.5 6.125 2.0 - 2.5 2.25 2 4.5 10.13 2.5 - 3.0 2.75 5 13.75 37.81 3.0 - 3.5 3.25 15 48.75 158.4 3.5 - 4.0 3.75 8 30 112.5 4.0 - 4.5 4.25 6 25.5 108.4 4.5 - 5.0 4.75 2 9.5 45.13 135.5 478.5 Total Back 40 fX ( fX ) 2 1 2 s ( fX ) n 1 n 2 1 (135.5) 2 s (478.5 ) 0.5 40 1 40 2 Go to Measures of Dispersion 2 Moments Moments about Origin The rth moment about origin ‘a’ is defined as: ( X a) m r Moments of Grouped Data The rth moment about origin ‘a’ is defined as: r f ( X a) m f n r Moments about Mean The rth moment about mean is defined as: mr (X X ) The rth moment about mean is defined as: r n First moment about mean is Zero. Next r mr f ( X X )r f First moment about mean is Zero. 2 Example of Moments f ( X X )2 Mid pt Moments about Mean. Group (X) f fX 1.5- 2.0 1.75 2 3.5 2.0 - 2.5 2.25 2 4.5 2.5 - 3.0 2.75 5 13.75 3.0 - 3.5 3.25 15 48.75 3.5 - 4.0 3.75 8 30 4.0 - 4.5 4.25 6 25.5 4.5 - 5.0 4.75 2 9.5 Total X 40 135.5 fX 135.5 3.4 f 40 Next mr f ( X X )3 5.445 -8.98425 14.824013 2.645 -3.04175 3.4980125 2.1125 -1.373125 0.8925313 0.3375 -0.050625 0.0075937 0.98 0.343 0.12005 4.335 3.68475 3.1320375 3.645 4.92075 6.6430125 19.5 -4.50125 29.11725 f (X X ) f r -4.50125 m3 0.1125 40 Back f ( X X )4 m2 m4 19.5 0.5 40 29.11725 0.7279 40 2 Example of Moments (Continued) X X Example f ( X X ) f ( X X )2 f ( X X )3 f ( X X )4 Class f X fX 0.0-0.8 5 0.4 2 -1.97 -9.84 19.37 -38.11 75.00 0.8-1.6 9 1.2 10.8 -1.17 -10.51 12.28 -14.34 16.75 1.6-2.4 15 2 30 -0.37 -5.52 2.03 -0.75 0.28 2.4-3.2 10 2.8 28 0.43 4.32 1.87 0.81 0.35 3.2-4.0 6 3.6 21.6 1.23 7.39 9.11 11.22 13.82 4.0-4.8 2 4.4 8.8 2.03 4.06 8.26 16.78 34.10 4.8-5.6 1 5.2 5.2 2.83 2.83 8.02 22.71 64.32 5.6-6.4 2 6 12 3.63 7.26 26.38 95.82 348.03 87.31 0 1.75 50 87.31 Total fX X f m4 50 118.4 2.37 50 f (X X ) f 4 m2 118.4f ( X X )2 552.65 11.05 50 f Back m3 94.14 f (X X ) f 3 Go to Dispersion 552.65 94.14 1.88 50 2 Skewness Mean, median and mode are measures of central location for a set of observations and measures of data dispersion are range and the standard deviation. Another characteristic of a set of data is the shape. There are four shapes commonly observed: symmetric, positively skewed, negatively skewed, Bimodal The coefficient of skewness can range from -3 up to 3. A value near -3, such as 2.57, indicates considerable negative skewness. A value such as 1.63 indicates moderate positive skewness. A value of 0, which will occur when the mean and median are equal, indicates the distribution is symmetrical and that there is no skewness present. Next 2 Types of Skewness Next Back 2 Coefficient of Skewness The Pearson coefficient of skewness is defined as: 3( X X ) sk s Example Following are the earnings per share for a sample of 15 software companies for the year 2005. The earnings per share are arranged from smallest to largest. Compute the mean, median, and standard deviation. Find the coefficient of skewness using Pearson’s estimate. What is your conclusion regarding the shape of the distribution? Next Back Solution X X n $74.26 $4.95 15 s X X 2 n 1 ($0.09 $4.95) 2 ... ($16.40 $4.95) 2 ) 15 1 $5.22 3( X Median) s 3($4.95 $3.18) 1.017 $5.22 sk The shape is moderately positively skewed. 2 Example of Skewness (Continued) X Lm Example cf fX 5 5 2 0.8-1.6 8 13 9.6 12.14 1.6-2.4 14 27 28 2.61 2.4-3.2 11 38 30.8 1.49 3.2-4.0 7 45 25.2 9.55 4.0-4.8 2 47 8.8 7.75 4.8-5.6 1 48 5.2 7.66 5.6-6.4 2 50 12 25.46 Total 50 118.4 87.31 Class f 0.0-0.8 X fX 121.6 2.43 f 50 Back 1.6 f (20.65 X X )2 s f (X X ) n 1 Next sk Im n ( cf m 1 ) fm 2 0.8 50 ( 13) 2.29 14 2 3( X X ) 3(2.43 2.29) 0.3147 s 1.3348 The skewness can also be measured with moments as: m32 b 3 m2 = 1.75, m3 = 62 m2 b = 0.492 The shape is slightly positively skewed 2 87.31 1.3348 49 2 Go to Skewness Example Skewness Histogram 30 25 Percent 20 15 10 5 0 0.00 0.80 1.60 2.40 3.20 4.00 4.80 5.60 6.40 Data Mode Back Median Next Mean Go to Skewness 2 Empirical Rule Empirical Rule For a symmetrical, bell-shaped frequency distribution: Approximately 68% of the observations will lie within plus and minus one standard deviations of the mean. ( mean ±s.d ) About 95% of the observations will lie within plus and minus two standard deviations of the mean. ( mean ± 2s.d ) Practically all (99.7%) wiill lie within plus and minus three standard deviations of the mean. ( mean ± 3s.d ) Let the mean of a symmetric distribution be 100 and standard deviation be 10, then the empirical rule is as follows: 70 80 90 100 110 120 Next 68% Back 95% 99.7% 130 Go to Skewness 2 Example Empirical Rule Consider the following distribution: Group f X fX fX^2 1.6 2.5 3 3.4 3.8 1.5- 2.0 2 1.75 3.5 6.13 1.8 2.6 3.2 3.5 4.1 2.0 - 2.5 5 2.25 11.3 25.3 2 2.6 3.2 3.6 4.1 2.5 - 3.0 8 2.75 22 60.5 2.3 2.6 3.2 3.6 4.2 3.0 - 3.5 10 3.25 32.5 106 2.3 2.8 3.3 3.6 4.3 3.5 - 4.0 8 3.75 30 113 2.3 2.8 3.3 3.7 4.3 4.0 - 4.5 5 4.25 21.3 90.3 2.4 2.9 3.4 3.7 4.5 4.5 - 5.0 2 4.75 9.5 45.1 2.5 3 3.4 3.8 4.6 130 446 Check the empirical rule. Mean = 3.2 s.d = 0.75 Mean ± sd = ( 2.45 – 3.95 ) ( 67.5%) Mean ± 2sd = ( 1.7 – 4.7 ) ( 97.5%) Mean ± 3sd = ( 0.89 – 5.45 ) (100%) 40 Mean = 3.25 sd = 0.77 Mean ± sd = ( 2.48 – 4.05) ( 67.5%) Mean ± 2sd = ( 1.71 – 4.79 ) ( 97.5%) Mean ± 3sd = ( 0.94 – 5.56 ) ( 100%) Back Next 2 Exercise For the following data of examination marks find the Mean, Median, Mode, Mean Deviation and variance. Also find the Skewness. No. of students Marks 30 – 39 8 40 – 49 87 50 – 59 190 60 – 69 304 70 – 79 211 80 – 89 85 90 - 99 20 Back 3 The following is the distribution of Wages per thousand employees in a Certain factory. No. of Daily Wages Employees 22 24 26 28 30 32 34 36 38 40 42 44 3 13 43 102 175 220 204 139 69 25 6 1 Calculate the Modal and Median wages. Why is difference b/w the two.