Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 3 Data Description • You can describe a human being by physical and intellectual measures. • You can describe a sample data set by using two types of measures: 1. Measures of central tendency, 2. Measures of dispersion or variation Measures of Central Tendency • There are three measures of central tendency: 1. Arithmetic mean 2. Mode 3. Median Arithmetic means • The formula for unorganized data: _ X = (Xi) / n formula 3-1 page 40 The formula for organized data: _ X = ( Xj • Fr ( Xj) ) / n formula 3-2, p.41 Exercises from book • Example problem 3-1 (page 40) • The mean of 11 observations: _ 1 + 1 + 2 + 2 + 3 + 4 + 5 + 5 + 9 + 10 • X = --------------------------------------------11 • = 44/11= 4 Median • The median is the middle value after a sample data set is arranged in order (descending or ascending). Median for unorganized data • If there are n-number of observations in a data set, then (n+1)/2th observation represents the median. Median for organized data • The median for organized data The formula for calculating the median of an organized data set is shown in 3-5 on page 46. The formula is: MD = (n/2) – CF(xm-1 L + ----------------------- w m FR(xm) formula 3-5 (p. 46) Quartiles • There are three quartiles (q1, q2, and q3) that divide a data set into four equal parts. • Q1 one-fourth of data are below q1 Q2 half of data are below q2 same as median and mean Q3 three-fourths of data are below q3 Q1 Q2 Q3 Percentiles • There are 99 percentiles that divide a set of data into 100 equal parts. Each part is called a percentile. 1st percentile-1% of data below 10th percentile-10% of data below 75th percentile-75% of data below The mode • The value that appears most frequently is the mode of a data set. 20 students are classified according to the colors of their eyes eye color blue # of students 6 brown 8 dark 4 green 2 Which value appears most frequently? The mode Brown (not 8) So, brown is the mode Measures of dispersion • A human being can’t be described fully by height only. Weight is another measure that we use to describe someone. Similarly, a data set can’t be described fully by measures of central tendency. We need a new measures, called measures of dispersion or variation or variability. • Review the table on page 53. Measures of dispersion • There are three measures of dispersion: 1. range 2. average deviation 3. variance Range/Average Deviation • Range The difference between the highest and lowest value of a data set is the range. • Average deviation Arithmetic mean of the absolute values of the deviations from the mean. _ AD = (x – xi |) / n Formula 3-6, page 54 Consider table 3-3 on page 55 Average Deviation is calculated as 1.6 Variance and Standard deviation • Variance is somewhat similar to average deviation. If individual deviations from the mean are squared, and their average is calculated, it represents variance of a data set. 2 _ • S = x – xi)2 / (n-1) formula 3-11 on page 59. Alternative formula: 2 2 2 S = (X – (X) / n) / (n-1) The square root of S2 gives standard deviation, s. • Exercises from the book Problem 2 (page 61) Uses of the standard deviation • What is standard deviation ? It is a measure of how much a data set deviates, on the average, from its mean. Uses of the standard deviation • The further away we go (each direction) from the mean, more and more observations are covered by the two points. Look at the following: Uses of the standard deviation • ------------Mean---------- ------------------ Mean ----------- ----------------------Mean---------------------- • There are two theorems that tell us what proportion of observations lies within a specified number of standard deviation from the mean: Tchebycheff’s Theorem • The proportion of any set of values that will lie within k standard deviations from the mean is at least 1-(1/k2) where k is greater than 1 • Exercises from the book: Example problem 3-15 (page 65) Example problem 5 (p.67) The Normal Rule • If a data set follows a symmetrical, bellshaped distribution, then 68 percent of the individual observations fall within one standard deviation from the mean; 95 percent of the observations fall within two standard deviations from the mean: and almost 100 percent of the observations fall within three standard deviations from the mean. • Example Problem 3-17 (page 66) A) how many members earn between $1250 and $1550? B) how many members earn between $1100 and $1700?