Download Chapter 2: Describing Data with Numerical Measures

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Page 1 of 4
Chapter 2: Describing Data with Numerical Measures
Notation: A data set consisting of n measurements will be denoted by x1 , x2 , ..., xn . The sum of
n
all these measurements will be written as x1 + x2 +  + xn =
∑ xi , where Σ, the summation sign,
i =1
is called sigma.
•
Section 2.2 Measures of Center
We will consider the mean, the median, and the mode.
•
The sample mean of n measurements is the sum of the measurements divided by n.
x=
1 n
∑ xi
n i =1
The population mean is denoted by µ .
•
The sample median of n measurements is the middle value when the measurements are
ordered. If n is odd, the median is the middle value. Its position is
n +1
. If n is even, the
2
median is the average of the middle two values. Their positions are
•
n
n
and + 1.
2
2
The sample mode is the measurements with the highest number of occurrence. For
grouped data the modal class is the class with the highest frequency.
Example: Find the mean, median, and mode of 1, 2, 2, 2, 3, 3, 3, 3, 3, 5, 5, 4, 4
Basic Shapes
Skewed left ( x < md)
Symmetric ( x ≈ md)
Skewed right ( x > md)
Page 2 of 4
•
Section 2.3 Measures of Variability or Spread
These measure the extent of variation (or spread) around the center. Examples of such measures
are range, variance, and standard deviation.
•
The sample range of a data set is the largest observation minus the smallest observation.
•
The sample variance of n measurements is the sum of squared deviations from the mean
divided by n – 1.
1  n 2 1 n 
1 n
2
2
.
For
calculation,
use
=
s
(
)
=
s
x
−
x
 ∑ xi −  ∑ xi 
∑ i
n − 1  i 1 =
n i 1 
n − 1 i =1
=
2
•
2



The sample standard deviation is the positive square root of the variance, s = s 2 .
The population variance is denoted by σ 2 and the population standard deviation is
denoted by σ.
Example: Ex. 2.14 page 61.
Discussion (the effect on mean and standard deviation when adding or deleting observations
from data)
Page 3 of 4
•
Section 2.4 Interpreting the standard deviation
Tchebyshev’s Rule (For all data sets):
•
The interval ( x − 2 s, x + 2 s ) contains at least 3/4 of the data set.
•
The interval ( x − 3s, x + 3s ) contains at least 8/9 of the data set.
•
It is possible that the interval ( x − s, x + s ) will contain very few of the measurements.
Empirical Rule (For mound-shaped frequency distributions):
•
Approximately 68% of the measurements fall within 1 standard deviation of the mean,
( x − s, x + s )
•
Approximately 95% of the measurements fall within 2 standard deviations of the mean,
( x − 2 s, x + 2 s )
•
Essentially all the measurements fall within 3 standard deviations of the mean,
( x − 3s , x + 3s )
Note that Range = R ≈ 4s or s ≈ R/4
Example: Ex. 2.17 page 68
Can the sample variance be greater than the sample standard deviation? Explain
Page 4 of 4
•
Section 2.6 Measures of Position (or Measures of Relative standing)
Examples are the z-scores and the percentiles.
•
The z-score is a standardized score for an observation x and it is defined as
z=
x−x
.
s
Example: The average height of men is 69 inches with a std. of 2.8 inches. The average height
of women is 63.6 inches with a std. of 2.5 inches.
Michael Jordan is 78 inches tall. Rebecca Lobo is 76 inches tall. Calculate the z-score for
Michael and Rebecca.
z-scores can be used to identify outliers. If z < − 3 or z > 3, such an observation is an outlier.
Example: Body temperatures of healthy human children have mean = 98.60oF and standard
deviation = 0.62oF. Your child has temperature of 101oF. What should you do?
•
The pth percentile of n measurements is the value such that p percent of the measurements
are less than that value and (100 – p) percent are greater.
25th percentile ≡ lower quartile or 1st quartile. This is denoted Q1
50th percentile ≡ median or 2nd quartile. This is denoted Q2
75th percentile ≡ upper quartile or 3rd quartile. This is denoted Q3
Note that Q1 is in position
smallest to largest.
Supplementary #2, #3
n +1
3(n + 1)
and Q3 is in position
when values are ordered from
4
4
Related documents