Download 15.Math-Review

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
15.Math-Review
Statistics
1
Central Limit Theorem
Let us consider X1, X2,…,Xn, n independent
identically distributed random variables with mean 
and standard deviation .
n
And define:
Sn 
X
i
i 1
15.Math-Review
2
Central Limit Theorem
The Central Limit Theorem (CLT) states:
If n is large (say n30) then Sn follows approximately a
normal distribution with mean n and standard deviation
 n
1
n
If n is large (say n30) then S n follows approximately
a normal distribution with mean  and standard deviation
 n
15.Math-Review
3
Central Limit Theorem
 Example: sums of a Bernoulli random variable
Forecast: n=1
10,000 Trials
Forecast: n=10
Frequency Chart
0 Outliers
.797
7974
10,000 Trials
.598
.224
.399
.149
.199
.075
.000
0
1.00
1.25
1.50
1.75
Frequency Chart
2981
745.2
.000
0
14.50
2.00
15.88
17.25
18.63
20.00
Forecast: n=50
Forecast: n=30
10,000 Trials
75 Outliers
.298
Frequency Chart
28 Outliers
.178
1778
10,000 Trials
Frequency Chart
70 Outliers
.146
1459
.109
.133
.089
889
.073
729.5
.044
444.5
.036
364.7
0
.000
.000
48.00
15.Math-Review
51.00
54.00
57.00
60.00
0
82.50
86.25
90.00
93.75
97.50
4
Central Limit Theorem
 Example: Averages of Bernoulli random variable
Forecast: n=1
10,000 Trials
Forecast: mean, n=10
Frequency Chart
0 Outliers
.797
7974
10,000 Trials
.598
.224
.399
.149
.199
.075
.000
0
1.00
1.25
1.50
1.75
Frequency Chart
2981
745.2
.000
2.00
0
1.45
1.59
1.73
1.86
2.00
Forecast: mean, n=50
Forecast: mean, n=30
10,000 Trials
75 Outliers
.298
Frequency Chart
47 Outliers
.178
1778
10,000 Trials
Frequency Chart
70 Outliers
.146
1459
.109
.133
.089
889
.073
729.5
.044
444.5
.036
364.7
0
.000
.000
1.60
15.Math-Review
1.70
1.80
1.90
2.00
0
1.65
1.73
1.80
1.88
1.95
5
Central Limit Theorem
 Example: Compare a binomial random variable X~B(40,0.2)
with its normal approximation:
What is the normal approximation?
Compare P(X10), P(X 20), P(X30) for the binomial and the
normal approximation.
BINOMIAL:
X<=5
X<=10
X<=20
X<= 30
15.Math-Review
AVERAGE:
0.16133
0.83923
0.99999
1.00000
X<5
X<10
X<20
X<30
0.07591
0.73178
0.99998
1.00000
0.11862
0.78550
0.99999
1.00000
NORMAL:
0.11784
0.78540
1.00000
1.00000
6
Sampling
Let us consider the following example.
We work at a phone company and we would like to be
able to estimate the shape of the demand.
We assume that monthly household telephone bills follow
a certain probability distribution (continuous)
We have obtained the following data of monthly
household telephone bills by interviewing 70 randomly
chosen households (or their habitants rather) for the
month of October.
15.Math-Review
7
Sampling
Table:
15.Math-Review
Respondent
Number
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
October
Respondent October
Respondent October
Phone Bill
Number
Phone Bill Number
Phone Bill
$
95.67
25 $
79.32
49 $
90.02
$
82.69
26 $
89.12
50 $
61.06
$
75.27
27 $
63.12
51 $
51.00
$
145.20
28 $ 145.62
52 $
97.71
$
155.20
29 $
37.53
53 $
95.44
$
80.53
30 $
97.06
54 $
31.89
$
80.81
31 $
86.33
55 $
82.35
$
60.93
32 $
69.83
56 $
60.20
$
86.67
33 $
77.26
57 $
92.28
$
56.31
34 $
64.99
58 $ 120.89
$
151.27
35 $
57.78
59 $
35.09
$
96.93
36 $
61.82
60 $
69.53
$
65.60
37 $
74.07
61 $
49.85
$
53.43
38 $ 141.17
62 $
42.33
$
63.03
39 $
48.57
63 $
50.09
$
139.45
40 $
76.77
64 $
62.69
$
58.51
41 $
78.78
65 $
58.69
$
81.22
42 $
62.20
66 $ 127.82
$
98.14
43 $
80.78
67 $
62.47
$
79.75
44 $
84.51
68 $
79.25
$
72.74
45 $
93.38
69 $
76.53
$
75.99
46 $ 139.23
70 $
74.13
$
80.35
47 $
48.06
$
49.42
48 $
44.51
8
Sampling
 From this information we would like to be able to estimate,
for example:
What is an estimate of the shape of the distribution of October
household telephone bills?
What is an estimate of the percentage of households whose October
telephone bill is bellow $45.00
What is an estimate of the percentage of households whose October
telephone bill is between $60.00 and $100.00?
What is an estimate of the mean of the distribution of October
household telephone bills?
What is an estimate of the standard deviation of the distribution of
October household telephone bills?
15.Math-Review
9
Sampling
 A population (or “universe”) is the set of all units of interest.
 A sample is a subset of the units of a population.
 A random sample is a sample collected in such a way that
every unit in the population is equally likely to be selected.
 It is hard to ensure that a sample will be random.
15.Math-Review
10
Sampling
 In our example the population corresponds to all the
households in our area of coverage.
 The random sample selected were the 70 households (or their
inhabitants) interviewed.
 And for the random variables X1,X2,… ,Xn corresponding to
households 1, 2,… , n we observed x1=$95.67, x2=$82.69,… ,
xn=$74.13.
 Note that if we had chosen a different random set of
households we would have observed a different collection of
values.
15.Math-Review
11
Sampling
 To fix notation:
n will be our random sample size.
X1,X2,… ,Xn correspond to the random variables of unknown
distribution f(x), which is common to our population, and what we
want to study.
x1,x2,… ,xn are the observations obtained by observing the outcome of
our random sample. These are numbers!!
We try to use these numbers to estimate the characteristics of f(x), for
example what is the distribution, what is its mean, variance, etc.
15.Math-Review
12
Sampling
 To “look” at the shape of the distribution of X it is useful to
create a frequency table and histogram of the sample values
x1,x2,… ,xn.
15.Math-Review
Histogram of Sample of October Telephone Bills
14
12
10
8
6
4
2
Range for Oct. Bill
13
160-
150-160
140-150
130-140
120-130
110-120
100-110
90-100
80-90
70-80
60-70
50-60
0
40-50
%
Cumulative %
0.00%
0.00%
4.29%
4.29%
8.57%
12.86%
10.00%
22.86%
18.57%
41.43%
17.14%
58.57%
15.71%
74.29%
12.86%
87.14%
0.00%
87.14%
0.00%
87.14%
2.86%
90.00%
2.86%
92.86%
4.29%
97.14%
2.86%
100.00%
0.00%
100.00%
30-40
0
3
6
7
13
12
11
9
0
0
2
2
3
2
0
-30
Frequency
Number of households
Interval Limit
-30
30-40
40-50
50-60
60-70
70-80
80-90
90-100
100-110
110-120
120-130
130-140
140-150
150-160
160-
Sampling
 A histogram can be obtained from excel, the output looks
something like this:
Bin
30
40
50
60
70
80
90
100
110
120
130
140
150
160
More
15.Math-Review
FrequencyCumulative % Bin
FrequencyCumulative %
0
.00%
70
13
18.57%
3
4.29%
80
12
35.71%
6
12.86%
90
11
51.43%
7
22.86%
100
9
64.29%
13
41.43%
60
7
74.29%
12
58.57%
50
6
82.86%
11
74.29%
40
3
87.14%
9
87.14%
150
3
91.43%
0
87.14%
130
2
94.29%
0
87.14%
140
2
97.14%
2
90.00%
160
2 100.00%
2
92.86%
30
0 100.00%
3
97.14%
110
0 100.00%
2 100.00%
120
0 100.00%
0 100.00% More
0 100.00%
14
Sampling
 From this analysis we can give the following description of
the shape of this distribution (qualitative):
An estimate of the shape of the distribution of October telephone bills
in the site area is that it is shaped like a Normal distribution, with a
peak near $65.00, except for a small but significant group in the range
between $125.00 and $155.00.
15.Math-Review
15
Sampling
 In order to answer the other relevant questions we can use the
original data, and count favorable outcomes and divide by total
possible outcomes (70):
P(X  45.00) = 5/70 = 0.07
P (60.00  X  100.00) = 45/70 = 0.64
 Here we are approximating the continuous unknown distribution
by the discrete distribution given by the outcomes of the sample
15.Math-Review
16
Sampling
 Sample mean, variance and standard deviation:
 From our observed values x1,x2,… ,xn, we can compute:
The observed sample mean,
x    xn 1
x 1

n
n
n
x
i
i 1
The observed sample variance,
1
s2 
n 1
n
 x  x 
2
i
i 1
The observed sample stardard deviation,
s
15.Math-Review
1
n 1
n
2


x

x
i

i 1
17
Sampling
 In our example we have:
The observed sample mean,
x    xn 95 .67  82 .69    74 .13
x 1

 $79 .40
n
70
and the observedsample stardard deviation,
s
15.Math-Review
1
n 1
n
 x  x 
2
i
i 1

(95 .67  79 .40 ) 2    (74 .13  79 .40 ) 2
 $28 .79
69
18
Sampling
 We will use these observed values to estimate the unknown
mean , and standard deviation , of our unknown
underlying distribution.
 In other words:
 x will estimate 
 s will estimate 
 Also note that if we pick a different sample of the population,
our observed values will be different.
 We can define the random variables: sample mean, sample
standard deviation, of which x and s are observations.
15.Math-Review
19
Sampling
 Before the sample is collected, the random variables X1,X2,…
,Xn, can be used to define:
The sample mean,
X  X n 1
X  1

n
n
n
X
i
i 1
The observed sample variance,
1
S 
n 1
2
2


X

X
 i
n
i 1
The observed sample stardard deviation,
S
15.Math-Review
1
n 1
2


X

X
 i
n
i 1
20
Sampling
 X and S are random variables
 We distinguish between the sample mean X, which is a
random variable, and the observed sample mean x, which is a
number.
 Similarly, the sample standard deviation S is a random
variable, and the observed sample standard deviation s is a
number.
15.Math-Review
21
Sampling
 Distribution of X
From the formula that defines the sample mean we see that according
to CLT it should follow approximately a normal distribution (if n30)
The mean is E(X) = 

The standard deviation is E(X) =
n
 In summary:

X ~ N  ,
15.Math-Review
2



n "  N x, s n "
2
22
Sampling
 Example: At two different branches of the G-Mart department store,
they randomly sampled 100 customers on August 13. At Store 1, the
average amount purchased was $41.25 per customer, with a sample
standard deviation of $24.00. At Store 2, the average amount purchased
was $45.75 with a sample standard deviation of $34.00 Let X denote the
amount of a random purchase by a single customer at Store 1 and let Y
denote the amount of a random purchase by a single customer at Store 2.
Assuming that X and Y satisfy a joint normal distribution, what is the
distribution of X-Y? What is the probability that the mean of X exceeds
the mean of Y?
15.Math-Review
23
Sampling
 Example: In the quality control department of our company, knobs are
inspected to make sure that they meet quality standards. Since it is not practical
to test every knob, we draw a random sample to test. It is extremely necessary
that our knobs weigh at least 0.45 pounds. If we know that the average weight
is less than 0.45 pounds, we stop the production line and reset all the machines.
In a day we produce 300,000 knobs, and draw a random sample of 1,000 knobs
to test. If yesterday (Wednesday) the observed sample mean was 0.42 pounds,
and observed sample standard deviation was 0.2,
 how confident are you that the average weigh of knobs is less than 0.45 pounds?
 If the average weight of knobs produced is 0.45 pounds, with standard deviation of
0.2, what is the probability that the average weight of the sample will be 0.42 or
lower?
 Are these questions the same?
15.Math-Review
24