Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Describing Data: Frequency Tables, Frequency Distributions, and Graphic Presentation GOALS 1. Organize qualitative data into a frequency table. 2. Present a frequency table as a bar chart or a pie chart. 3. Organize quantitative data into a frequency distribution. 4. Present a frequency distribution for quantitative data using histograms, frequency polygons, and cumulative frequency polygons. Why we describe data Descriptive statistics organize data to show the general shape of the data and where values tend to concentrate and to expose extreme or unusual data values. Remember Quantitative data ≠ Qualitative data Frequency Table Relative Class Frequencies Class frequencies can be converted to relative class frequencies to show the fraction of the total number of observations in each class. A relative frequency captures the relationship between a class total and the total number of observations. Bar Charts Pie Charts How to construct charts with SPSS Each group uses “employee data.sav” Make bar charts/pie charts from – – – – Gender, Employment category Educational level Minority Graphs bar and/or pie Describe the output (Charts) Frequency Distribution A Frequency distribution is a grouping of data into mutually exclusive categories showing the number of observations in each class. EXAMPLE – Constructing Frequency Distributions: Quantitative Data Ms. Kathryn Ball of AutoUSA wants to develop tables, charts, and graphs to show the typical selling price on various dealer lots. The table on the right reports only the price of the 80 vehicles sold last month at Whitner Autoplex. Constructing a Frequency Table Example Step 1: Decide on the number of classes. A useful recipe to determine the number of classes (k) is the “2 to the k rule.” such that 2k > n. There were 80 vehicles sold. So n = 80. If we try k = 6, which means we would use 6 classes, then 26 = 64, somewhat less than 80. Hence, 6 is not enough classes. If we let k = 7, then 27 128, which is greater than 80. So the recommended number of classes is 7. Step 2: Determine the class interval or width. The formula is: i (H-L)/k where i is the class interval, H is the highest observed value, L is the lowest observed value, and k is the number of classes. ($35,925 - $15,546)/7 = $2,911 Round up to some convenient number, such as a multiple of 10 or 100. Use a class width of $3,000 Constructing a Frequency Table Example Step 3: Set the individual class limits Constructing a Frequency Table Step 4: Tally the vehicle selling prices into the classes. Step 5: Count the number of items in each class. Practice There were 200 tables sold. The lowest value was 10,000 baht and the highest value was 100,000 baht (use 2 to the k rule) Construct the class limits Class Intervals and Midpoints Class midpoint: A point that divides a class into two equal parts. This is the average of the upper and lower class limits. Class frequency: The number of observations in each class. Class interval: The class interval is obtained by subtracting the lower limit of a class from the lower limit of the next class. Class Intervals and Midpoints Example Referring to the AutoUSA example Class midpoint: For the first class the lower class limit is $15,000 and the next limit is $18,000. The class midpoint is $16,500, found by: ($15,000 + $18,000)/2 Class interval: The class interval of the vehicle selling price data is $3,000. It is found by subtracting the lower limit of the first class, $15,000, from the lower limit of the next class: ($18,000 - $15,000) Relative Frequency Distribution To convert a frequency distribution to a relative frequency distribution, each of the class frequencies is divided by the total number of observations. Graphic Presentation of a Frequency Distribution The three commonly used graphic forms are: Histograms Frequency polygons Cumulative frequency distributions Histogram (For Quantitative data) Histogram •A frequency distribution based on quantitative data •A graph in which the classes are marked on the horizontal axis and the class frequencies on the vertical axis. • The class frequencies are represented by the heights of the bars and the bars are drawn adjacent to each other. SPSS guide Use “employee data.sav” Graphs histogram – – Variable : Current salary Panel by row : Gender Describe the output Frequency Polygon A frequency polygon also shows the shape of a distribution and is similar to a histogram. It consists of line segments connecting the points formed by the intersections of the class midpoints and the class frequencies. Cumulative Frequency Distribution Cumulative Frequency Distribution Class Practice Show how to apply descriptive statistics that we study today – – – frequency table (AnalyzeDescriptive statisticsfrequencies) bar chart or a pie chart (Graphsbar/pie). Histograms (GraphsHistogram) Interpret the results in one paragraph for each table or chart The End Describing Data: Numerical Measures GOALS 1. Calculate the arithmetic mean, weighted mean, median, mode, and geometric mean. 2. Explain the characteristics, uses, advantages, and disadvantages of each measure of location. 3. Identify the position of the mean, median, and mode for both symmetric and skewed distributions. 4. Compute and interpret the range, mean deviation, variance, and standard deviation. 5. Understand the characteristics, uses, advantages, and disadvantages of each measure of dispersion. Numerical Descriptive Measures Measures of Location Arithmetic Mean Weighted Mean Median Mode Geometric Mean Measures of Dispersion Range Mean Deviation Variance Standard Deviation Population Mean For ungrouped data, the population mean is the sum of all the population values divided by the total number of population values: EXAMPLE – Population Mean Sample Mean For ungrouped data, the sample mean is the sum of all the sample values divided by the number of sample values: EXAMPLE – Sample Mean Properties of the Arithmetic Mean 1. 2. 3. 4. Every set of interval-level and ratio-level data has a mean. All the values are included in computing the mean. The mean is unique. The sum of the deviations of each value from the mean is zero. Weighted Mean The weighted mean of a set of numbers X1, X2, ..., Xn, with corresponding weights w1, w2, ...,wn, is computed from the following formula: EXAMPLE – Weighted Mean The Carter Construction Company pays its hourly employees $16.50, $19.00, or $25.00 per hour. There are 26 hourly employees, 14 of which are paid at the $16.50 rate, 10 at the $19.00 rate, and 2 at the $25.00 rate. What is the mean hourly rate paid the 26 employees? The Median The Median is the midpoint of the values after they have been ordered from the smallest to the largest. There are as many values above the median as below it in the data array. For an even set of values, the median will be the arithmetic average of the two middle numbers. Properties of the Median 1. 2. 3. 4. There is a unique median for each data set. It is not affected by extremely large or small values and is therefore a valuable measure of central tendency when such values occur. It can be computed for ratio-level, intervallevel, and ordinal-level data. It can be computed for an open-ended frequency distribution if the median does not lie in an open-ended class. EXAMPLES - Median The ages for a sample of five college students are: 21, 25, 19, 20, 22 The heights of four basketball players, in inches, are: 76, 73, 80, 75 Arranging the data in ascending order gives: Arranging the data in ascending order gives: 19, 20, 21, 22, 25. Thus the median is 21. 73, 75, 76, 80. Thus the median is 75.5 The Mode The mode is the value of the observation that appears most frequently. Example - Mode The Relative Positions of the Mean, Median and the Mode Skewness Measures of central location for a set of observations (the mean, median, and mode) and measures of data dispersion (e.g. range and the standard deviation) were introduced Another characteristic of a set of data is the shape. There are four shapes commonly observed: 1. 2. 3. 4. 4-42 symmetric, positively skewed, negatively skewed, bimodal. Skewness - Formulas for Computing The coefficient of skewness can range from -3 up to 3. – – – 4-43 A value near -3, such as -2.57, indicates considerable negative skewness. A value such as 1.63 indicates moderate positive skewness. A value of 0, which will occur when the mean and median are equal, indicates the distribution is symmetrical and that there is no skewness present. Commonly Observed Shapes 4-44 Skewness – SPSS example •From ‘employee data.sav’ •Calculate skewness, mean, median, maximum and minimum •Analyzedescriptive statistics frequencies statistics •Select “beginning salary and current salary”, then discuss, which one is more skewed? 4-45 The Geometric Mean Useful in finding the average change of percentages, ratios, indexes, or growth rates over time. Has a wide application in business and economics because we are often interested in finding the percentage changes in sales, salaries, or economic figures, such as the GDP, which compound or build on each other. Will always be less than or equal to the arithmetic mean. Defined as the nth root of the product of n values. The formula for the geometric mean is written: EXAMPLE – Geometric Mean The return on investment earned by Atkins construction Company for four successive years was: 30 percent, 20 percent, -40 percent, and 200 percent. What is the geometric mean rate of return on investment? GM 4 ( 1.3 )( 1.2 )( 0.6 )( 3.0 ) 4 2.808 1.294 Dispersion Why Study Dispersion? A measure of location, such as the mean or the median, only describes the center of the data, but it does not tell us anything about the spread of the data. For example, if your nature guide told you that the river ahead averaged 3 feet in depth, would you want to wade across on foot without additional information? Probably not. You would want to know something about the variation in the depth. A second reason for studying the dispersion in a set of data is to compare the spread in two or more distributions. Samples of Dispersions Measures of Dispersion Range Mean Deviation Variance and Standard Deviation EXAMPLE – Range The number of cappuccinos sold at the Starbucks location in the Orange Country Airport between 4 and 7 p.m. for a sample of 5 days last year were 20, 40, 50, 60, and 80. Determine the range for the number of cappuccinos sold. Range = Largest – Smallest value = 80 – 20 = 60 EXAMPLE – Mean Deviation The number of cappuccinos sold at the Starbucks location in the Orange Country Airport between 4 and 7 p.m. for a sample of 5 days last year were 20, 40, 50, 60, and 80. Determine the mean deviation for the number of cappuccinos sold. EXAMPLE – Variance and Standard Deviation The number of traffic citations issued during the last five months in Beaufort County, South Carolina, are 38, 26, 13, 41, and 22. What is the population variance? EXAMPLE – Sample Variance The hourly wages for a sample of parttime employees at Home Depot are: $12, $20, $16, $18, and $19. What is the sample variance? Describing Data: Displaying and Exploring Data GOALS 1. 2. 3. 4. 5. 4-56 Develop and interpret a dot plot. Construct and interpret box plots. Compute and understand the coefficient of skewness. Draw and interpret a scatter diagram. Construct and interpret a contingency table. Dot Plots 4-57 A dot plot groups the data as little as possible and the identity of an individual observation is not lost. To develop a dot plot, each observation is simply displayed as a dot along a horizontal number line indicating the possible values of the data. If there are identical observations or the observations are too close to be shown individually, the dots are “piled” on top of each other. Dot plots are most useful for smaller data sets, whereas histograms tend to be most useful for large data sets. Dot Plot – SPSS Example •Use employee data.sav •GraphsScatter/Dot… •Simple Dot •Define •(From employee data.sav, select “current salary” for X-Axis variable 4-58 Boxplot 4-59 In descriptive statistics, a box plot or boxplot is a convenient way of graphically depicting groups of numerical data through their five-number summaries: the smallest observation (sample minimum), lower quartile (Q1), median (Q2), upper quartile (Q3), and largest observation (sample maximum). A boxplot may also indicate which observations, if any, might be considered outliers. Boxplot - Example 4-60 Boxplot Example 4-61 Describing Relationship between Two Variables 4-62 One graphical technique we use to show the relationship between variables is called a scatter diagram. To draw a scatter diagram we need two variables. We scale one variable along the horizontal axis (X-axis) of a graph and the other variable along the vertical axis (Y-axis). Describing Relationship between Two Variables – Scatter Diagram Examples 4-63 Scatter Diagram - SPSS •From employee data.sav •GraphsScatter/Dot… •Simple scatter •Define • employee data.sav, •select “current salary” for Y-Axis variable •select “months since hired” for X-Axis variable 4-64 Contingency Tables 4-65 A scatter diagram requires that both of the variables be at least interval scale. What if we wish to study the relationship between two variables when one or both are nominal or ordinal scale? In this case we tally the results in a contingency table. Contingency Tables – An Example A manufacturer of preassembled windows produced 50 windows yesterday. This morning the quality assurance inspector reviewed each window for all quality aspects. Each was classified as acceptable or unacceptable and by the shift on which it was produced. The two variables are shift and quality. The results are reported in the following table. 4-66 Contingency Tables – An Example Usefulness of the Contingency Table: By organizing the information into a contingency table we can compare the quality on the three shifts. For example, on the day shift, 3 out of 20 windows or 15 percent are defective. On the afternoon shift, 2 of 15 or 13 percent are defective and on the night shift 1 out of 15 or 7 percent are defective. Overall 12 percent of the windows are defective. Observe also that 40 percent of the windows are produced on the day shift, found by (20/50)(100). 4-67 Contingency table- SPSS •From employee data.sav •AnalyzeDescriptive Statistics Crosstabs… •Row- Gender •Column- Employment Category • In “Cells” •Percentages: Row 4-68 The End 4-69