Download Presentation slides

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia, lookup

Transcript
Normal distributions
Normal curves provide a simple, compact way to describe
symmetric, bell-shaped distributions.
Normal curve
SAT math scores for CS students
Money spent in a supermarket
Is the normal curve a good approximation?
SAT math scores for CS students
The area under the histogram, i.e. the percentages of the observations, can be
approximated by the corresponding area under the normal curve.
If the histogram is symmetric, we say that the data are approximately normal
(or normally distributed).
We need to know only the average and the standard deviation of the
observations!!
SAT math scores for CS students
The variable SAT math scores is normally distributed with Mean m= 595.28
and Std Deviation s = 86.40.
The standard normal curve
The standard normal distribution has mean =0 and
standard deviation =1
The curve is perfectly symmetric around 0
Any value on a normal curve can be converted to a value
on the standard normal curve using this formula:
(value – mean) / standard deviation
Benchmarks under the
standard normal curve
50%
Graphing the normal curve
using Excel
Excel function NORMDIST = area under the normal curve
Syntax
NORMDIST(x, m, s, 1) = area to the left of x
m=average & s=standard deviation
NORMDIST(x, m, s, 0) = computes normal density function
at x
m=average & s=standard deviation
Excel function NORMSDIST(x,1) = area under the standard
normal curve (m=0, s=1)
Graphing the standard normal
density curve
• Open a new workbook
• Enter the labels z and f(z) in cells A2 and B2
• Enter –3.5 & -3.4 in cells A3 and A4, click and drag down
until you create the sequence of digits from –3.5 to 3.5.
• Select B3 and enter =NORMDIST(A3,0,1,0)
• Select B3 and drag down to B73
• Open the Chart Wizard, select XY (Scatter)
• The data range should already be indicated.
Normal distribution function
F(z)
It is defined as the area under the standard normal to the left
of z, that is F(z)=P(Z<=z)
Cumulative distribution function
1.2
1
F(z)
0.8
0.6
0.4
0.2
0
-4
-2
0
z
2
4
Application of the normal
distribution to the data
Mean = 595.28
Std Dev. s = 86.40
The distribution of the SATM
scores for the CS students is
approximately normal with
mean 595.28 and s.d. 86.40:
N(595.28 , 86.40)
Problem: What is the percentage of CS students that had SAT math scores between 600
and 750?
Answer: Use the normal approximation - It is the area under the normal density curve
between 600 and 750.
How do we compute it?
We use the values of the Normal distribution function
F(x)=P(X<=x).
Problem: What is the percentage of CS students that had SAT math
scores between 600 and 750?
Approximate answer:
The percentage of students with SATM between 600 and 750 is computed
as
600
750
600
==
__
750
595.28
595.28
595.28
Using Excel
• Select a cell, say A1
• Compute the area on the left of 600 as
=NORMDIST(600, 595.28 , 86.40, 1).
• Compute the area on the left of 750 as
=NORMDIST(750, 595.28 , 86.40, 1).
• The area under the curve between 600 and 750 is
=NORMDIST(750, 595.28 , 86.40,1)- NORMDIST(600, 595.28 ,
86.40, 600,1).
• The answer is 0.44 – Approximately 44% of CS students
in the survey have SATM between 600 and 750.
In summary
Follow the following steps:
1. State the problem. Calculate the sample average and the s.d. and
define the interval you are interested in
2.
Compute the area under the approximate normal density curve with
mean and s.d. defined above.
Example Problem
Problem: What is the lowest SAT math score that a student must have to be
in the top 25% of all CS students in the sample?
Mean = 595.28
Std Dev. s = 86.40
25%
Sample Q3=650
?
Find the value x, such that 25% of observations fall at or above it.
Beware!
Is the normal approximation appropriate for these data?
Underestimate this area
Overestimate this area
Use it when the histogram of the observations is bell-shaped!