Download Modeling Continuous Variables

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Modeling Continuous
Variables
Lecture 19
Section 6.1 - 6.3.1
Fri, Oct 6, 2006
Models


Mathematical model – An abstraction and,
therefore, a simplification of a real situation, one
that retains the essential features.
Real situations are usually much to complicated
to deal with in all their details.
Example

The “bell curve” is a model (an abstraction) of
many populations.
Real populations have all sorts of bumps and twists
and irregularities.
 The bell curve is smooth and perfectly symmetric.


In statistics, the bell curve is called the normal
curve, or normal distribution.
Models

Our models will be models of distributions,
presented either as histograms or as continuous
distributions.
Histograms and Area


In a histogram, frequency is represented by
area.
Consider the following distribution of test
scores.
Grade
Frequency
60 – 69
3
70 – 79
8
80 – 89
9
90 – 99
5
Histograms and Area
Frequency
10
8
6
4
2
Grade
0
60
70
80
90
100
Histograms and Area


What is the total area of this histogram?
We will rescale the vertical scale so that the total
area equals 1, representing 100%.
Histograms and Area

To achieve this, we divide the frequencies by the
original area to get the density.
Grade
Frequency
Density
60 – 69
3
0.012
70 – 79
8
0.032
80 – 89
9
0.036
90 – 99
5
0.020
Histograms and Area
Density
0.040
0.030
0.020
0.010
Grade
0
60
70
80
90
100
Histograms and Area
Density
0.040
Total area = 1
0.030
0.020
0.010
Grade
0
60
70
80
90
100
Histograms and Area


This histogram has the special property that the
proportion can be found by computing the area of
the rectangle.
For example, what proportion of the grades are
less than 80?

Compute: (10  0.012) + (10  0.032)
= 0.12 + 0.32 = 0.44 = 44%.
Density Functions

This is the fundamental property that connects
the graph of a continuous model to the
population that it represents, namely:

The area under the graph between two numbers a and b
on the x-axis represents the proportion of the population
that lies between a and b.
AREA = PROPORTION
Density Functions


Now consider an arbitrary distribution.
The area under the curve between a and b is the
proportion of the values of x that lie between a
and b.
x
a
b
Density Functions


Now consider an arbitrary distribution.
The area under the curve between a and b is the
proportion of the values of x that lie between a
and b.
x
a
b
Density Functions


Now consider an arbitrary distribution.
The area under the curve between a and b is the
proportion of the values of x that lie between a
and b.
x
a
b
Area = Proportion
Density Functions

Again, the total area under the curve must be 1,
representing a proportion of 100%.
x
a
b
Density Functions

Again, the total area under the curve must be 1,
representing a proportion of 100%.
100%
a
x
b
The Normal Distribution


Normal distribution – The statistician’s name for
the bell curve.
It is a density function in the shape of a “bell.”
Symmetric.
 Unimodal.
 Extends over the entire real line (no endpoints).
 “Main part” lies within 3 of the mean.

The Normal Distribution

The curve has a bell shape, with infinitely long
tails in both directions.
The Normal Distribution

The mean  is located in the center, at the peak.

The Normal Distribution

The width of the “main” part of the curve is 6
standard deviations wide (3 standard deviations
each way from the mean).

 – 3

 + 3
The Normal Distribution


The area under the entire curve is 1.
(The area outside of 3 st. dev. is approx.
0.0027.)
Area = 1
 – 3

 + 3
The Normal Distribution


The normal distribution with mean  and
standard deviation  is denoted N(, ).
For example, if X is a variable whose
distribution is normal with mean 30 and
standard deviation 5, then we say that “X is
N(30, 5).”
The Normal Distribution

If X is N(30, 5), then the distribution of X
looks like this:
15
30
45
Some Normal Distributions
N(3, 1)
0
1
2
3
4
5
6
7
8
Some Normal Distributions
N(5, 1)
N(3, 1)
0
1
2
3
4
5
6
7
8
Some Normal Distributions
N(2, ½)
N(5, 1)
N(3, 1)
0
1
2
3
4
5
6
7
8
Some Normal Distributions
N(2, ½)
N(3½, 1½)
N(5, 1)
N(3, 1)
0
1
2
3
4
5
6
7
8
Bag A vs. Bag B



Suppose we have two bags, Bag A and Bag B.
Each bag contains millions of vouchers.
In Bag A, the values of the vouchers have
distribution N(50, 10).


Normal with  = $50 and = $10.
In Bag B, the values of the vouchers have
distribution N(80, 15).

Normal with  = $80 and  = $15.
Bag A vs. Bag B
H0: Bag A
H1: Bag B
30
40
50
60
70
80
90
100
110
Bag A vs. Bag B


We are presented with one of the bags.
We select one voucher at random from that bag.
H0: Bag A
H1: Bag B
30
40
50
60
70
80
90
100
110
Bag A vs. Bag B

If its value is less than or equal to $65, then we
will decide that it was from Bag A.
H0: Bag A
H1: Bag B
30
40
50
60
65
70
80
90
100
110
Bag A vs. Bag B

If its value is less than or equal to $65, then we
will decide that it was from Bag A.
H0: Bag A
H1: Bag B
30
40
50
Acceptance Region
60
65
70
80
90
100
110
Bag A vs. Bag B

If its value is less than or equal to $65, then we
will decide that it was from Bag A.
H0: Bag A
H1: Bag B
30
40
50
Acceptance Region
60
65
70
80
90
Rejection Region
100
110
Bag A vs. Bag B

What is ?
H0: Bag A
H1: Bag B
30
40
50
60
65
70
80
90
100
110
Bag A vs. Bag B

What is ?
H0: Bag A
H1: Bag B

30
40
50
60
65
70
80
90
100
110
Bag A vs. Bag B

What is ?
H0: Bag A
H1: Bag B
30
40
50
60
65
70
80
90
100
110
Bag A vs. Bag B

What is ?
H0: Bag A
H1: Bag B

30
40
50
60
65
70
80
90
100
110
Bag A vs. Bag B

If the distributions are very close together, then
 and  will be large.
H0: Bag A
H1: Bag B
30
40
50
60
65
70
80
90
100
110
Bag A vs. Bag B

If the distributions are very similar, then  and
 will be large.
H0: Bag A
H1: Bag B

30
40
50
60
65
70
80
90
100
110
Bag A vs. Bag B

If the distributions are very similar, then  and
 will be large.
H0: Bag A
H1: Bag B

30
40
50
60
65
70
80
90
100
110
Bag A vs. Bag B

Similarly, if the distributions are far apart, then
 and  will both be very small.
H0: Bag A
H1: Bag B
30
40
50
60
65
70
80
90
100
110
Bag A vs. Bag B

Similarly, if the distributions are far apart, then
 and  will both be very small.
H0: Bag A
H1: Bag B

30
40
50
60
65
70
80
90
100
110
Bag A vs. Bag B

Similarly, if the distributions are far apart, then
 and  will both be very small.
H0: Bag A
H1: Bag B

30
40
50
60
65
70
80
90
100
110