Download Chapter 3 Data Description

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Chapter 3 Data Description
•
You can describe a human being by
physical and intellectual measures.
• You can describe a sample data set by
using two types of measures:
1. Measures of central tendency,
2. Measures of dispersion or variation
Measures of Central Tendency
•
There are three measures of central
tendency:
1. Arithmetic mean
2. Mode
3. Median
Arithmetic means
• The formula for unorganized data:
_
X = (Xi) / n
formula 3-1 page 40
The formula for organized data:
_
X = ( Xj • Fr ( Xj) ) / n
formula 3-2, p.41
Exercises from book
• Example problem 3-1 (page 40)
• The mean of 11 observations:
_
1 + 1 + 2 + 2 + 3 + 4 + 5 + 5 + 9 + 10
• X = --------------------------------------------11
• = 44/11= 4
Median
• The median is the middle value after a
sample data set is arranged in order
(descending or ascending).
Median for unorganized data
• If there are n-number of observations
in a data set, then (n+1)/2th
observation represents the median.
Median for organized data
• The median for organized data
The formula for calculating the median of an
organized data set is shown in 3-5 on page 46.
The formula is:
MD =
(n/2) – CF(xm-1
L +
----------------------- w
m
FR(xm)
formula 3-5 (p. 46)
Quartiles
• There are three quartiles (q1, q2, and q3) that divide a data
set into four equal parts.
• Q1 one-fourth of data are below q1
Q2 half of data are below q2
same as median and mean
Q3 three-fourths of data are below q3
Q1
Q2
Q3
Percentiles
• There are 99 percentiles that divide a set of
data into 100 equal parts.
Each part is called a percentile.
1st percentile-1% of data below
10th percentile-10% of data below
75th percentile-75% of data below
The mode
• The value that appears most frequently is the
mode of a data set.
20 students are classified according to the colors
of their eyes
eye color blue
# of students 6
brown
8
dark
4
green
2
Which value appears most frequently?
The mode
Brown (not 8)
So, brown is the mode
Measures of dispersion
• A human being can’t be described fully by height
only. Weight is another measure that we use to
describe someone. Similarly, a data set can’t be
described fully by measures of central tendency.
We need a new measures, called measures of
dispersion or variation or variability.
• Review the table on page 53.
Measures of dispersion
• There are three measures of dispersion:
1. range
2. average deviation
3. variance
Range/Average Deviation
• Range
The difference between the highest and
lowest value of a data set is the range.
• Average deviation
Arithmetic mean of the absolute values
of the deviations from the mean.
_
AD = (x – xi |) / n
Formula 3-6, page 54
Consider table 3-3 on page 55
Average Deviation is calculated as 1.6
Variance and Standard deviation
• Variance is somewhat similar to
average deviation. If individual
deviations from the mean are squared,
and their average is calculated, it
represents variance of a data set.
2
_
• S = x – xi)2 / (n-1)
formula 3-11 on page 59.
Alternative formula:
2
2
2
S = (X – (X) / n) / (n-1)
The square root of S2 gives standard
deviation, s.
• Exercises from the book
Problem 2 (page 61)
Uses of the standard deviation
• What is standard deviation ? It is a
measure of how much a data set
deviates, on the average, from its
mean.
Uses of the standard deviation
• The further away we go (each
direction) from the mean, more and
more observations are covered by the
two points. Look at the following:
Uses of the standard deviation
•
------------Mean----------
------------------ Mean -----------
----------------------Mean----------------------
• There are two theorems that tell us
what proportion of observations lies
within a specified number of standard
deviation from the mean:
Tchebycheff’s Theorem
• The proportion of any set of values that will lie
within k standard deviations from the mean is at
least
1-(1/k2) where k is greater than 1
• Exercises from the book:
Example problem 3-15 (page 65)
Example problem 5 (p.67)
The Normal Rule
• If a data set follows a symmetrical, bellshaped distribution, then 68 percent of the
individual observations fall within one
standard deviation from the mean; 95
percent of the observations fall within two
standard deviations from the mean: and
almost 100 percent of the observations fall
within three standard deviations from the
mean.
• Example Problem 3-17 (page 66)
A) how many members earn between $1250
and $1550?
B) how many members earn between $1100
and $1700?
Related documents