Download The process of Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
ST 311- Introduction to Statistics
Instructor: Judith Canner
Learning Objectives D
Spring 2010
THE NORMAL DISTRIBUTION
Probability Definitions and Notation
P ( X > 10) = the probability that a value is greater than 10 (based on a given distribution)
Bell-shaped Histogram
Patterns are common in nature; one of the most common is the bell-shaped curve. Most
individuals are clumped near the center, with fewer individuals the greater the distance from the
center.
Column
n
Mean
Std. Dev.
textbooks 534 348.10675 143.71208
Empirical Rule
For any bell-shaped curve, approximately
• 68% of the values fall within 1 standard deviation of the mean in either direction
• 95% of the values fall within 2 standard deviations of the mean in either direction
• 99.7% of the values fall within 3 standard deviations of the man in either direction
http://www.stat.tamu.edu/~west/applets/empiricalrule.html
1
ST 311- Introduction to Statistics
Instructor: Judith Canner
Learning Objectives D
Spring 2010
Consider the text book data:
What are the minimum and maximum?
How many standard deviations from the mean are the min and the max?
In general, how many standard deviations are the max and the min from the mean?
Approximate of the standard deviation for relatively large samples (>200).
Will this approximation work for skewed data? Why?
Characteristics of a Normal Distribution
1. Continuous random variable P ( X = k ) = 0
2. Symmetric and Bell-shaped: The normal distribution
is a model for bell-shaped curves. All normal
distributions are bell-shaped, but no all bellshaped curves are normal.
3. Empirical Rule holds
4. Equal probability for a measurement to less than the mean or greater than the mean
P ( X ≤ μ ) = P ( X ≥ μ ) = 0.5
5. The probability that a value is so many units (d) below the mean is equal to the probability
that the value is the same number of units above the mean.
P( X ≤ μ − d ) = P( X ≥ μ + d )
Question: Why are 4 and 5 true (what is the characteristic that determines this property)?
2
ST 311- Introduction to Statistics
Instructor: Judith Canner
Learning Objectives D
Spring 2010
Example: The Empirical Rule and the Normal Distribution
Consider a population with mean 75 and standard deviation of 10. If each tick mark represents a
standard deviation, note the mean and the values of 1, 2, and 3 standard deviations from the
mean.
Answer the following questions (it may help to shade/draw each problem on the normal curve):
1. P ( X < 65) =
2. P (55 < X < 85) =
3. Find x: P ( X > x ) = 0.16
3
ST 311- Introduction to Statistics
Instructor: Judith Canner
Learning Objectives D
Spring 2010
Standardized Score
________________________: the distance (number of standard deviations) between a specified
value and the mean
z=
Observed Value - Mean x − μ
=
σ
Standard Deviation
How does it work?
When we convert values for any normal random variable to z-scores, it is equivalent to
converting the random variable of interest to a standard normal random variable, i.e.,
⎛ X −μ x−μ ⎞
P ( X ≤ x) = P ⎜
≤
= P( Z ≤ z )
σ ⎟⎠
⎝ σ
where Z ~ Normal ( μ = 0, σ = 1)
Example: Consider the shoe size for the female population. The population mean is 8 and the
standard deviation is 1.5. What is the probability of selecting a woman with a shoe size less than
6 from the population?
⎛ __ − __ __ − __ ⎞
P ( X ≤ __) = P ⎜
=
⎟ = P ( Z ≤ __)
__
__
⎝
⎠
X
Z
-2
-1
0
1
2
Now that we know the z-score, how do we find the probability associated with that score?
4
ST 311- Introduction to Statistics
Instructor: Judith Canner
Learning Objectives D
Spring 2010
How to use the Z-Table
Notice that the table only gives probabilities associated with values less than the calculated zscore. Fortunately, there are several properties of the Normal distribution that allow to use these
values for different and more complicated calculations of probability.
Probability Relationships for Normal Random Variables
1. P ( X > a ) = 1 − P ( X ≤ a )
In words:
2. P ( a < X < b) = P ( X ≤ b) − P ( X ≤ a )
In words:
3. P ( X > μ + d ) = P ( X < μ − d )
In words:
5
ST 311- Introduction to Statistics
Instructor: Judith Canner
Learning Objectives D
Spring 2010
Examples: Consider again the shoe size for the female population. The population mean is 8
and the standard deviation is 1.5. Answer the following:
*EXTRA CHALLENGE*
Try only using the first page of the attached z-table to answer the questions
1. What is the probability that a woman has a shoe size greater than size 11?
2. What is the probability that that woman’s shoe size is between size 7.5 and size 9?
3. What is the probability that a woman’s shoe size is less than 5 (can you figure this out
without looking at the table? Hint: look at the 3rd relationship)?
4. What is the probability that a woman’s shoe size is less than 5 or greater than 11?
Percentiles and Probabilities
A percentile is another way of describing a value’s cumulative probability.
• That is P ( X ≤ x) = 0.75 , where x is the value corresponding to the 75th percentile of a
distribution.
• Essentially, instead of finding the probability a particular value occurs, you are given the
probability and must find the value.
Step 1: Find the z value that corresponds to the specified percentile rank (or cumulative
probability).
Step 2: Compute x = μ + zσ . This is the percentile value.
Example: Recall the previous example. What would my shoe size have to be for me to be in the
80th percentile of woman’s shoe sizes?
6
ST 311- Introduction to Statistics
Instructor: Judith Canner
Learning Objectives D
Spring 2010
If you know a population is (approximately) normally distributed, you can describe that
population completely using only the mean and standard deviation!
In Class Exercise: Consider the heights for NFL football players. Assume the heights are
normally distributed with a mean of 74 inches and a standard deviation of 2.5. On each of the
following subjects, spend a minute or two and write/draw as much as you possibly can about
NFL players’ heights with regards to:
1. Location/Center (mean, median)
2. Spread (~Range for a sample, Q1, Q3, IQR)
3. Shape (symmetry, mode)
4. Empirical Rule
5. Draw a Boxplot or a Histogram that represents a sample from the population
7
ST 311- Introduction to Statistics
Instructor: Judith Canner
Learning Objectives D
Spring 2010
8
ST 311- Introduction to Statistics
Instructor: Judith Canner
Learning Objectives D
Spring 2010
9