Download Math 165-01 Ch.2 Lecture Note

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Time series wikipedia , lookup

World Values Survey wikipedia , lookup

Transcript
Chapter 2: Describing Distributions with Numbers
Descriptive Statistics
Describe the important characteristics of a set of measurements.
Analyzing Data
Finding centers of data set, describing variation of data set, and a shape of data set.
Two Basic Concepts of Measures of Center
Mean (x) (Arithmetic Mean) / (An average) :
Found by adding the data values and dividing the total by the number of data.
∑
Sample mean x =
x
n
∑
Population mean µ =
x
N
Median(M): The middle of value when the original data values are arranged in order of
increasing (or decreasing).
(A center of an ordered data)
Round-off Rule: Carry one more decimal place than is present in the original set of values.
Ex. 1
17, 19, 21, 18, 20, 18, 19, 20, 20
Ex. 2
17, 19, 21, 18, 20, 18, 19, 20, 20, 21
Comparing the mean and median
The mean and median of a roughly symmetric distribution are close together.
If the distribution is exactly symmetric, the mean and median are exactly the same.
In a skewed distribution, the mean is usually farther out in the long tail than is the median.
Percentiles: The position measures used in educational
and health-related fields to indicated
the position of an individual in a group.
P%
(100 − P )%
———————————-+————————————–
Pth percentiles
Median: = P50 = Q2
The 50th percentile, denoted P50 , has about 50% of the data values below it and about 50% of
the data value above it.
Measuring variability: the quartiles
First quartile (Q1 ) = also called the lower quartile or the 25th percentile. P25
Second quartile (Q2 ) = also called the median or the 50th percentile. P50
Third quartile (Q3 ) = also called the upper quartile or the 75th percentile. P75
Measuring variability: The five-number summary and boxplots
Boxplot: a graph of a data set that consists of a line extending from the minimum value to
the maximum value, and a box with lines drawn at the first quartile, Q1, the median, and the
third quartile, Q3.
5-Number Summary and Boxplot
1. Minimum data value
2. First quartile (Q1 )= P25 :
At least 25% of the sorted values are less than or equal to Q1 , and at least 75% of
the values are greater than or equal to Q1 .
3. Second quartile (Q2 )= P50 :
Same as the median; separates the bottom 50% of the sorted values from the top 50%.
4. Third quartile (Q3 )= P75 :
At least 75% of the sorted values are less than or equal to Q3 , and at least 25% of
the values are greater than or equal to Q3 .
5. Maximum data value
When the rth number that is a Q1 , Q2 , Q3 ,
r satisfies
r
= 0.25 ⇒ r = 0.25× (total number of values)
total number of values
r
= 0.50 ⇒ r = 0.50× (total number of values)
total number of values
r
= 0.75 ⇒ r = 0.75× (total number of values)
total number of values
If r is a whole number:
The value of the rth percentile is the midway between the rth and the (r + 1)th value.
If r is not a whole number:
Round up to the next larger whole number. Use the rth value.
Ex.
Construct Boxplot of the data set:
34, 36, 39, 43, 51, 53, 62, 63, 73, 79
Minimum data value ⇒ 34
Q1 = P25 ⇒
⇒
r
= 0.25
10
r = 2.5 ⇒ the 3rd value
⇒ 39
Q2 = P50 ⇒
⇒
Q3 = P75 ⇒
⇒
r
= 0.50
10
r = 5 ⇒ the value between 5th and 6th
51 + 53
⇒
= 52
2
r
= 0.75
10
r = 7.5 ⇒ the 8th value
⇒ 63
Maximum data value ⇒ 79
Spotting suspected outliers (Median as the center)
Using a median and the Interquartile Range (IQR) to analyze data.
Interquartile Range (IQR) : (Q3 − Q1 )
Outliers with IQR
Lower fence: Q1 − 1.5 · (IQR)
Upper fence: Q3 + 1.5 · (IQR)
Measuring spread: Variance and the Standard Deviation
Those tools show the characteristic of data’s variation.
Range = (maximum data value) − (minimum data value)
Variance (s2 ): The average of the squares of the distance each value is from the mean.
Standard Deviation (s):
A measure of how much data values deviate away from the mean.
The square root of the variance.
A.M ean
V ariance
Standard Deviation
—————————————————————————————————————–
∑
Sample
x=
x
n
s2 =
∑
Population
x
µ=
N
(x − x)2
n−1
√∑
(x − x)2
s=
n−1
(x − µ)2
N
√∑
(x − µ)2
σ=
N
∑
∑
2
σ =
Ex.
5, 7, 1, 2, 4
Range = (maximum data value) − (minimum data value)
=7−1=6
∑
x
(5 + 7 + 1 + 2 + 4)
19
Mean (x) =
=
=
= 3.8
n
5
5
Steps
Step 1: Compute the mean x.
Step 2: Subtract the mean from each individual
value (x - x).
Step 3: Square each of the deviations obtain
from Step 2. (x − x)2 .
Step 4: Add
∑ all of2 the squares obtained from Step 3.
(x − x)
Step 5: Divided the total from Step 4 by the number
n − 1, which is 1 less than the total number of
sample values present.
The result is the variance.
Step 6: Find the square root of the result of Step 5.
The result is the standard deviation.
x
(xi − x)
(xi − x)2
x2
—————————————————————
5
1.2
1.44
25
7
3.2
10.24
49
1
−2.8
7.84
1
2
−1.8
3.24
4
4
.2
.04
16
—————————————————————
19
0.0
22.80
95
Variance:
∑
(x − x)2
22.80
s =
=
= 5.7
n−1
5−1
2
Standard Deviation: s =
√
5.7 = 2.387 ≈ 2.4
Shortcut formula
Sample Variance
( ∑ )2
∑ 2
n
(x
)
−
x
s2 =
n(n − 1)
Sample Standard Deviation
√ ∑
( ∑ )2
n (x2 ) −
x
s=
n(n − 1)
Round-Off Rule for Measures of Variation
When rounding the value of a measure of variation, carry one more decimal place than is present
in the original set of data.
Round only the final answer, not values in the middle of a calculation.
Properties of the Standard Deviation (1)
1. n − 1 is called the degrees of freedom.
2. s measures variability about the mean and should be used only when the mean is chosen
as the measure of center.
3. s is always zero or greater than zero. s = 0 only when there is no variability. This happens
only when all observations have the same value. Otherwise, s > 0.
4. As the observations become more variable about their mean, s gets larger.
5. s has the same units of measurement as the original observations. For example, if you
measure weight in kilograms, both the mean x and the standard deviation s are also
in kilograms. This is one reason to prefer s to the variance s2 , which would be in squared
kilograms.
6. Like the mean x, s is not resistant. A few outliers can make s very large.
Properties of the Standard Deviation (2)
The standard deviation measures the variation among data values.
The standard deviation is a measure of variation of all values from the mean.
Data values close together ⇒ A small standard deviation
Data values with much more ⇒ A larger standard deviation
Ex.
4.2, 3.5, 3.2, 4.0, 4.1
S.D : 0.430116
Ex.
5, 7, 1, 2, 4
S.D : 2.387467
Spotting suspected outliers (Mean as the center)
Using a mean and the standard deviation to analyze data.
Range Rule of Thumb:
The vast majority (such as 95%) of sample values lie within two standard deviations of
the mean for many data set.
Minimum ”usual” value = mean − 2 × standard deviation
Maximum ”usual” value = mean + 2 × standard deviation