Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Descriptive Statistics Essential Ideas for Chapter 14 Statistics and Data Analysis The Process of Statistics Step 1: Identify a Research Objective • Researcher must determine question he/she wants answered - question must be detailed. • Identify the group to be studied. This group is called the population. • An individual is a person or object that is a member of the population being studied. Variables are the characteristics of the individuals within the population. Qualitative variables allow for classification of individuals based on some attribute or characteristic (gender, eye color, etc.) Numbers can be qualitative if used for identification purposes (phone number, SSN, etc.) Quantitative variables provide numerical measures of individuals. •discrete variables – values are derived from counting •continuous variables – values are derived from measuring Stem-and-Leaf Plots (Discrete Data) The stem of the graph will consist of the leading digits. The leaf of the graph will be the rightmost digit. The choice of the stem depends upon the class width desired. Stem and Leaf Plot of Ages of Best Actor, 1928 - 2002 2 3 4 5 6 7 9 011122334455556777888888899 000011111222333334455666777888999 11223556669 012 6 Note: This stem and leaf plot has a class width of 10 Split Stem and Leaf Plot of Ages of Best Actor, 1928 - 2002 2 2 3 3 4 4 5 5 6 6 7 9 0111223344 55556777888888899 0000111112223333344 55666777888999 11223 556669 012 6 Note: This stem and leaf plot has a class width of 5 Bar Graphs Bar graphs are used to summarize both qualitative and quantitative data. Bar graphs are constructed by labeling each category or class of data on a horizontal axis and the frequency or relative frequency of the category on the vertical axis. A rectangle of equal width is drawn for each category whose height is equal to the category's frequency or relative frequency. There should be gaps between the bars when summarizing qualitative data. Bar graphs summarizing quantitative data are called histograms. There should be no gaps between the bars in a histogram. A frequency distribution lists the number of occurrences for each category of data. Constructing a Frequency Distribution and Bar Graph for Qualitative Data - M&M colors Yellow Orange Brown Green Green Blue Brown Red Brown Brown Orange Brown Red Brown Red Green Brown Red Green Yellow Yellow Red Red Brown Orange Yellow Orange Red Orange Blue Brown Red Yellow Brown Red Brown Yellow Yellow Brown Yellow Yellow Blue Green Yellow Orange Frequency Distribution Constructing a Frequency Distribution and Histogram for Discrete Quantitative Data The following data represent the number of available cars in a household based on a random sample of 50 households. Construct a frequency and relative frequency distribution. 3 4 1 3 2 0 2 1 3 3 1 2 3 2 2 2 2 2 1 1 1 1 4 2 2 1 2 1 2 2 1 2 2 0 1 2 0 1 3 1 0 2 2 2 3 2 4 2 2 5 Frequency Distribution Number of Available Cars Tally Frequency 0 IIII 4 1 IIIII IIIII III 13 2 IIIII IIIII IIIII IIIII II 22 3 IIIII II 7 4 III 3 5 I 1 Constructing a Frequency Distribution and Histogram for Continuous Quantitative Data Frequency distributions and histograms for continuous data are constructed by separating the data using intervals of numbers called classes. Our Excel project will demonstrate how to create a frequency distribution and histogram for continuous data. Line Graphs A graph that uses a broken line to illustrate how one quantity changes with respect to another is called a line graph. Line graphs are often used to represent changes over time. Time is plotted on the horizontal axis and the corresponding values of the variable on the vertical axis. Lines are then drawn connecting the points. Line Graph Circle Graphs A circle graph or pie chart is a circle divided into sectors. Each sector represents a category of data. The area of each sector is proportional to the frequency of the category. Measures of Central Tendency The mean of a data set is the sum of all the values of the variable in the data set divided by the number of observations. We use to denote the sample mean. x The median of a data set is the value that lies in the middle of the data when arranged in ascending order. That is, half the data is below the median and half the data is above the median. We use M to represent the median. The mode of a data set is the most frequent observation of the variable that occurs in the data set. If there is no observation that occurs with the most frequency, we say the data has no mode. Measures of Central Tendency Find the mean, median, and mode of the data set: {5, 3, 8, 5, 9} Mean: Median: x = (5 + 3 + 8 + 5 + 9)/5 = 30/5 = 6 1) Arrange the data in ascending order 3 5 5 8 9 2) Locate the middle value which is 5 so M = 5 Mode: The most frequently occurring data value is 5 (it occurs twice) so the mode = 5 Measures of Central Tendency Find the mean, median, and mode of the data set: {10, 5, 4, 7, 1, 9} x Mean: = (10 + 5 + 4 + 7 + 1 + 9)/6 = 36/6 = 6 Median: 1) 1) Arrange the data in ascending order 1 4 5 7 9 10 2) Locate the middle value. Since the middle falls between two data values (5 and 7), the median is the mean of these two values. Thus, M = (5 + 7)/2 = 12/2 = 6 Mode: There is no mode Measures of Dispersion The range, R, of a variable is the difference between the largest data value and the smallest data values. That is Range = R = Largest Data Value – Smallest Data Value The variance and standard deviation are measures that use all the numbers in the data set to give information about the dispersion. The steps for finding the sample variance and standard deviation are given on page 233. Find the range, variance, and standard deviation for the data set: {5, 3, 8, 5, 9} Range = 9 – 3 = 6 To find the variance, we must first find the mean. From a previous slide, we found the mean =6 x Subtracting the mean from each data value, squaring the difference, summing the squares, and dividing by one less than the number of data values gives us the sample variance Sample Variance = [(5 – 6)2 + (3 – 6)2 + (8 – 6)2 + (5 – 6)2 + (9 – 6)2] ÷ 4 = 6 Taking the square root of the sample variance gives us the sample standard deviation s = √6 = 2.449 Note: It is coincidental that the mean, range, and sample variance for this data set are equal to 6 Measures of Position The median divides the data into two equal parts, with half the values above the median and half the values below the median, so the median is called a measure of position. Percentiles divide the data into 100 equal parts. For example, when a person takes the SAT, the score is recorded as a percentile. A person who scores in the 92nd percentile means that the score was better than 92% of those who took the SAT. The most common percentiles are quartiles. Quartiles divide data sets into fourths or four equal parts. The 1st quartile, denoted Q1, divides the bottom 25% the data from the top 75%. •The 3rd quartile divides the bottom 75% of the data from the top 25% of the data The Five-Number Summary Min Q1 Median Q3 Max Steps for Drawing a Boxplot Step 1: Draw vertical lines at Q1, M, and Q3. Enclose these vertical lines in a box. Step 2: Draw a line from Q1 to the smallest data value (minimum). Draw a line from Q3 to the largest data value (maximum).