Download Lecture 5: Measures of Center and Variability for Distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Transcript
Lecture 5:
Measures of Center and Variability for
Distributions (Population);
Quartiles, Boxplots for Data (Sample)
Chapter 2
2.1
Measures of Center (Distributions)
•  Mean for continuous distributions
–  Let f(x) be the density function for a
continuous random variable X, then the mean
of X is:
∞
µ X = ∫ x ⋅ f ( x)dx
−∞
•  Mean for discrete distributions
–  Let p(x) be the mass function for a continuous
random variable X, then the mean of X is:
µ X = ∑ x ⋅ p(x)
Examples—Means
•  Continuous distributions
–  Normal (µ,σ) — µ
–  Exponential (λ) — 1/ λ
–  Uniform (a, b) — (a+b)/2
•  Discrete distributions
–  Binomial (n, π) — nπ
–  Poisson (λ) — λ
Example 1
•  Find the mean value of a variable x with
density function:
f(x) = 1.5(1-x^2), 0<x<1
= 0, o.w.
•  What’s the median of X distribution?
Example - Medians
•  Median for continuous distributions
–  Let f(x) be the density function for a continuous
random variable X, then the median of X is
whatever value which satisfies:
∫
µ~
−∞
f ( x)dx = 0.5
•  What is the median of a Normal Distribution?
•  If a continuous distribution is perfectly
symmetric, mean = median.
•  No median for discrete distributions.
2.2
Measures of Variability (Distributions)
•  Variance for continuous distributions
–  Let f(x) be the density function for a continuous
random variable X, then the variance of X is:
2
X
∞
2
(
x − µX )
−∞
σ =∫
⋅ f ( x)dx
•  Variance for discrete distributions
–  Let p(x) be the mass function for a continuous
random variable X, then the variance of X is:
2
X
2
σ = ∑ (x − µ X ) ⋅ p ( x )
•  σ X - Standard deviation of X, is square root of the
variance σ X2
Examples—Variances
•  Continuous distributions
–  Normal (µ,σ) — σ2
–  Exponential (λ) — 1/ λ2
–  Uniform (a, b) — (b–a)2/12
•  Discrete distributions
–  Binomial (n, π) — nπ(1–π)
–  Poisson (λ) — λ
•  Self-reading: “empirical rule” (68-95-99.7
rule) in the middle of Pg 76, “unbiasedness”
on top of Pg 77
Example 2
•  What’s the standard deviation of X from
Example 1?
2.3
Other measures—Quartiles
•  The median is the midpoint of the data (Sample)
•  Quartiles break the data into quarters
–  1st Quartile (Q1) = lower quartile = 25th percentile
–  2nd Quartile = median = 50th percentile
–  3rd Quartile (Q3) = upper quartile = 75th percentile
•  How to find the quartiles?
–  They are just medians of the two halves of the data
•  Interquartile Range (or IQR) = Q3 – Q1
•  Self reading: Percentiles, Pg 85
Example
•  Scores for 10 students are:
78  80 80 81 82 83 85 85 86 87
•  Find the median and quartiles:
1. Median= Q2 = M = (82+83)/2 = 82.5
2. Q1 = Median of the lower half, i.e. 78 80 80 81 82, = 80
3. Q3 = Median of the upper half, i.e. 83 85 85 86 87, = 85
Therefore, IQR = Q3 – Q1 = 85 – 80 = 5
•  Additionally, find Min and Max
Min = 78, and Max = 87
–  We get a five-number summary!
–  Min
Q1
Median Q3
78
80
82.5
85
Max
87
Boxplots; Modified Version
•  Visual representation of the five-number summary
–  Central box: Q1 to Q3
–  Line inside box: Median
–  Extended straight lines: from each end of the box to lowest and
highest observation.
•  Modified Boxplots: only extend the lines to the smallest and largest
observations that are not outliers. Each mild outlier* is
represented by a closed circle and each extreme outlier** by an
open circle.
*Any observation farther than 1.5 IQR from the closest quartile
is an outlier.
**An outlier is extreme if more than 3 IQR from the nearest quartile,
and is mild otherwise.
Example
• 
• 
• 
• 
• 
• 
• 
Five-number summary is:
Min: 78
Q1: 80
Median: 82.5
Q3: 85
Max: 87
Draw a boxplot:
More on Boxplots
•  Much more compact than histograms
•  “Quick and Dirty” visual picture
•  Gives rough idea on how data is distributed
–  Shows center/typical value (the median);
–  Position of median line indicates symmetric/not
symmetric, positively/negatively skewed.
–  IQR gives the middle 50%
–  Min to Max gives the entire range
•  Side-by-side boxplots very useful for
comparisons
–  See from slide 10
Describe a Boxplot
•  Symmetric? if not, positively or negatively
skewed (based on median line)
•  Outliers? Based on 1.5IQR rule (and 3IQR
rule for extreme outliers)
•  Overall range : = Max - Min;
•  IQR : = Central box’s range;
•  Similar procedure for side-by-side
comparison
Examples--MPG
After Class …
•  Review sections 2.1 and 2.2, especially Pg
63 – 68, 74 – 77
•  Review section 2.3, self-reading Pg95
•  Read section 2.4 and 5.1
•  Hw#2, due by 5pm, next Monday
•  Start Lab#2 now, you don’t want to wait
till the last minute of next Thursday pm…