Download Sept 13

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Describing Data Using
Numerical Measures
Mean
The mean is a numerical measure of
the center of a set of quantitative
measures computed by dividing the
sum of the values by the number of
values in the data set.
Population Mean
N

where:
x
i 1
i
N
 = population mean (mu)
N = number of data values
xi = ith individual value of variable x
Population Mean
Example 3-1
Table 3-1: Foster City Hotel Data
Week
1
2
3
4
5
6
7
8
Rooms Rented
22
13
10
16
23
13
11
13
Revenue
Complaints
$1,870.00
0
1,590.00
2
1,760.00
1
2,345.00
0
4,563.00
2
1,630.00
1
2,156.00
0
1,756.00
0
Population Mean
Example 3-1
The population mean for the number of
rooms rented is computed as follows:
x


 (22  13  10  16  23  13  11  13) / 8
N
 121 / 18
  15.125
Sample Mean
n
x
where:
x
x
i 1
i
n
= sample mean (pronounced “x-bar”)
n = sample size
xi = ith individual value of variable x
Sample Mean
Housing Prices Example
{xi} = {house prices} = {$144,000; 98,000; 204,000;
177,000; 155,000; 316,000; 100,000}
x 144,000  98,000  204,000  ...  100,000

x

n
7
$1,194,000

 $170,571
7
Median
The median is the center value
that divides data that have been
arranged in numerical order (i.e.
an ordered array) into two halves.
Median
Housing Prices Example
{xi} = {house prices} = {$144,000; 98,000; 204,000;
177,000; 155,000; 316,000; 100,000}
Ordered array:
$98,000; 100,000; 144,000; 155,000; 177,000; 204,000; 316,000
Middle Value
Median = 155,000
Median
Another Housing Prices Example
{xi} = {house prices} = {$144,000; 98,000; 204,000;
177,000; 155,000; 316,000; 100,000;
177,000; 177,000; 170,000}
Ordered array:
$98,000; 100,000; 144,000; 155,000; 170,000; 177,000;
177,000; 177,000; 204,000; 316,000
Middle Values
Median = (170,000 + 177,000)/2
= 173,500
Skewed Data
Right-skewed data: Data are
right skewed if the mean for the
data is larger than the median.
Left-skewed data: Data are left
skewed if the mean for the data is
smaller than the median.
Skewed Data
(Figure 3-3)
Median Mean
a) Right-Skewed
Mean Median
b) Left-Skewed
Mean = Median
c) Symmetric
Percentiles
The pth percentile in a data array is
a value that divides the data into
two parts. The lower segment
contains at least p% and the upper
segment contains at least (100 - p)%
of the data.
The median is the 50th percentile.
Quartiles
Quartiles in a data array are those
values that divide the data set into
four equal-sized groups.
The median corresponds to the second quartile.
Measures of Variation
A set of data exhibits variation if
all of the data are not the same
value.
Range
The range is a measure of variation
that is computed by finding the
difference between the maximum
and minimum values in the data set.
R = Maximum Value - Minimum Value
Interquartile Range
The interquartile range is a measure
of variation that is determined by
computing the difference between
the first and third quartiles.
Interquartile Range = Third Quartile - First Quartile
Variance & Standard Deviation
The population variance is the
average of the squared distances
of the data values from the mean.
The standard deviation is the
positive square root of the
variance.
Population Variance
N
 
2
where:

N
 (x  )
i 1
2
i
N
= population mean
= population size
2 = population variance (sigma squared)
Population Variance
(Bryce Lumber Example)
x
(x   )
(x   )2
15 15 - 25 = -10
100
25 25 -25 = 0
0
35 35 -25 = 10
100
20 20 - 25 = -5
25
30 30 - 25 = 5
25
  250
0

2
(x  )


N
2
250

 50
5
Population Standard Deviation
(Bryce Lumber Example)
  
2
 (x  )
 7.07 products
N
2
 50
Sample Variance
n
s 
2
where:
 (x  x)
2
i
i 1
n 1
= sample mean
x
n
= sample size
s2
= sample variance
Sample Standard Deviation
n
s s 
2
where:
2
(
x

x
)
 i
i 1
n 1
= sample mean
x
n
= sample size
s
= sample standard deviation
The Empirical Rule
If the data distribution is bell-shaped,
then the interval:
  1 contains approximately 68% of
the values in the population or the sample
  2 contains approximately 95% of
the values in the population or the sample
  3 contains virtually all of the data
values in the population or the sample
The Empirical Rule
(Figure 3-11)
95%
68%
x
 x  1
 x  2
X
Tchebysheff’s Theorem
Regardless of how the data are distributed, at
least (1 - 1/k2) of the values will fall within k = 1
standard deviations of the mean. For example:



At least (1 - 1/12) = 0% of the values will fall
within k=1 standard deviation of the mean
At least (1 - 1/22) = 3/4 = 75% of the values will
fall within k=1 standard deviation of the mean
At least (1 - 1/32) = 8/9 = 89% of the values will
fall within k=1 standard deviation of the mean
6 Sigma Quality
Specification for a quality
characteristic is six standard deviation
away from the mean of the process
distribution.
Translates into process output that
does not meet specifications two out of
one billion times.
Sigma Quality Levels
Sigma ()
Quality Level
1
2
3
4
5
6
Defects per Million
Opportunities for Defects
317,400
45,400
2700
63
0.57
0.002
Sigma Quality Level Concepts
Sigma ( )
Quality Level
1
2
3
4
5
6
7
Equated to
Relative Area
Floor space of a typical factory
Floor space of a typical supermarket
Floor space of a small hardware store
Floor space of a typical living room
Area under a typical desk telephone
Top surface of a typical diamond
Point of a sewing needle
Standardized Data Values
A standardized data value refers to the
number of standard deviations a value
is from the mean. The standardized
data values are sometimes referred to
as z-scores.
Standardized Data Values
STANDARDIZED SAMPLE DATA
xx
z
s
where:
x = original data value
x
= sample mean
s = sample standard deviation
z = standard score
Related documents