Download ENGR-25_Lec-18_Statistics-1

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Transcript
Engr/Math/Physics 25
Chp7
Statistics-1
Bruce Mayer, PE
Licensed Electrical & Mechanical Engineer
[email protected]
Engineering/Math/Physics 25: Computational Methods
1
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Learning Goals
 Use MATLAB to solve Problems in
• Statistics
• Probability
 Use Monte Carlo (random) Methods to
Simulate Random processes
 Properly Apply InterPolation or
ExtraPolation to Estimate values
between or outside of know data points
Engineering/Math/Physics 25: Computational Methods
2
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Histogram
 Histograms are
COLUMN Plots that
show the
Distribution of Data
• Height Represents
Data Frequency
 Some General
Characteristics
• Used to represent
continuous grouped,
or BINNED, data
– BIN  SubRange
within the Data
Engineering/Math/Physics 25: Computational Methods
3
• Usually Does not
have any gaps
between bars
• Areas represent
%-of-Total Data
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
HistoGram ≡ Frequency Chart
 A HistoGram shows how OFTEN some
event Occurs
• Histograms are
often constructed
using Frequency
Tables
Engineering/Math/Physics 25: Computational Methods
4
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Histograms In MATLAB
 MATLAB has 6
Forms of the
Histogram Cmd
 The Simplest
hist(y)
• Generates a
Histogram with
10 bins
 Example: HI Temps
at Oakland AirPort in
Jul-Aug08
Engineering/Math/Physics 25: Computational Methods
5
TmaxOAK
65, 66,
73, 79,
70, 74,
77, 86,
66, 72,
82, 76,
68, 65,
70, 68,
69, 67]
= [70, 75, 63, 64,
65, 65, 67, 78, 75,
71, 72, 67, 69, 69,
71, 72, 71, 74, 77,
90, 90, 70, 71, 66,
68, 73, 72, 82, 91,
75, 72, 72, 69, 70,
67, 65, 63, 64, 72,
71, 77, 65, 63, 69,
 The Plot Statement
hist(TmaxOAK), ylabel('No.
Days'), xlabel('Max. Temp
(°F)'), title('Oakland
Airport - Jul-Aug08')
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
hist Result for Oakland
Oakland Airport - Jul-Aug08
15
 It was
COLD in
Summer 08
10
No. Days
 Bin Width =
(91-63)/10 =
2.8 °F
5
0
60
65
70
75
80
85
90
95
Max. Temp (°F)
Engineering/Math/Physics 25: Computational Methods
6
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Histograms In MATLAB
 Next Example: Max
Temp at Stockton
AirPort in Jul-Aug08
hist(y)
• Generates a
Histogram with
10 bins
TmaxSTK = [94, 98, 93, 94,
91, 96, 93, 87, 89, 94,
100, 99, 103, 103, 103, 97,
91, 83, 84, 90, 89, 95, 94,
99, 97, 94, 102, 103, 107,
98, 86, 89, 95, 91, 84, 93,
98, 104, 105, 107, 103, 91,
90, 96, 93, 86, 92, 93, 95,
95, 86, 81, 93, 97, 96, 97,
101, 92, 89, 92, 93, 94]
 The Plot Statement
hist(TmaxSTK), ylabel('No.
Days'), xlabel('Max. Temp
(°F)'), title(‘Stockton
Airport - Jul-Aug08')
Engineering/Math/Physics 25: Computational Methods
7
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
hist Result for Stockton
Stockton Airport - Jul-Aug08
16
 It was HOT
in Summer
08
14
12
No. Days
10
 Bin Width =
(107-81)/10
= 2.6 °F
8
6
4
2
0
80
85
90
95
100
105
110
Max. Temp (°F)
Engineering/Math/Physics 25: Computational Methods
8
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
hist Command Refinements
 Adjust The number
 Consider Summer
and width of the bins
08 HI-Temp Data
using
from Oakland and
hist(y,N)
Stockton
hist(y,x)
• Where
 Make 2 Histograms
– N  an integer
specifying the
NUMBER of Bins
– x  A vector that
Specifies Bin
CENTERs
Engineering/Math/Physics 25: Computational Methods
9
• 17 bins
• 60F→110F by 2.5’s
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
hist Plots  17 Bins
>> hist(TmaxSTK,17),
ylabel('No. Days'),
xlabel('Max. Temp (°F)'),
title('Stockton, CA - JulAug08')>>
hist(TmaxOAK,17),
ylabel('No. Days'),
xlabel('Max. Temp (°F)'),
title('Oakland, CA - JulAug08')
Oakland, CA - Jul-Aug08
10
9
9
8
8
7
7
6
6
No. Days
No. Days
Stockton, CA - Jul-Aug08
10
5
5
4
4
3
3
2
2
1
1
0
80
85
90
95
Max. Temp (°F)
100
105
Engineering/Math/Physics 25: Computational Methods
10
110
0
60
65
70
75
80
Max. Temp (°F)
85
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
90
95
hist Plots  Same Scale
>> x = [60:2.5:110];
>> hist(TmaxSTK,x),
ylabel('No. Days'),
xlabel('Max. Temp (°F)'),
title('Stockton, CA - JulAug08')
>> x = [60:2.5:110];
hist(TmaxOAK,x),
ylabel('No. Days'),
xlabel('Max. Temp (°F)'),
title('Oakland, CA - JulAug08')
Oakland, CA - Jul-Aug08
16
14
14
12
12
10
10
No. Days
No. Days
Stockton, CA - Jul-Aug08
16
8
8
6
6
4
4
2
2
0
60
65
70
75
80
85
Max. Temp (°F)
90
95
100
105
Engineering/Math/Physics 25: Computational Methods
11
110
0
60
65
70
75
80
85
Max. Temp (°F)
90
95
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
100
105
110
hist Numerical Output
 Hist can also
provide numerical
Data about the
Histogram
n = hist(y)
• Gives the number of
values in each of the
(default) 10 Bins
 For the Stockton
data
Engineering/Math/Physics 25: Computational Methods
12
k =
2
7
5
9
1
2
10
7
16
3
 We can also spec
the number and/or
Width of Bins
>> k13 = hist(TmaxSTK,13)
k13 =
2
2
4
4
6
10
10
7
5
2
6
2
2
>> k2_5s = hist(TmaxOAK,x)
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
hist Numerical Output
 Bin-Count and Bin-Locations
(Frequency Table) for the Oakland Data
>> [u, v] = hist(TmaxOAK,x)
u =
0
3
11
7
15
9
6
4
1
2
1
0
3
0
0
0
0
0
0
0
0
v =
60.0000
62.5000
65.0000
72.5000
75.0000
77.5000
85.0000
87.5000
90.0000
97.5000 100.0000 102.5000
110.0000
Engineering/Math/Physics 25: Computational Methods
13
67.5000
80.0000
92.5000
105.0000
70.0000
82.5000
95.0000
107.5000
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Histogram Commands - 1
Command
bar(x,y)
Description
Creates a bar chart of y versus x.
hist(y)
Aggregates the data in the vector y into
10 bins evenly spaced between the
minimum and maximum values in y.
hist(y,n)
Aggregates the data in the vector y into
n bins evenly spaced between the
minimum and maximum values in y.
hist(y,x)
Aggregates the data in the vector y into
bins whose center locations are
specified by the vector x. The bin widths
are the distances between the centers.
Engineering/Math/Physics 25: Computational Methods
14
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Histogram Commands - 2
Command
[z,x] = hist(y)
Description
Same as hist(y) but returns two vectors
z and x that contain the frequency
count and the 10 bin locations.
Same as hist(y,n) but returns two
[z,x] = hist(y,n) vectors z and x that contain the
frequency cnt and the n bin locations.
Same as hist(y,x) but returns two
vectors z and x that contain the
[z,x] = hist(y,x) frequency count and the bin locations.
The returned vector x is the same as
the user-supplied vector x.
Engineering/Math/Physics 25: Computational Methods
15
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Bar vs. Hist
 Bar is Sequential while HIST is GROUPED
Tmax in Stockton, CA • Jul-Aug08
110
Stockton Airport - Jul-Aug08
16
105
14
12
100
No. Days
MaxTemp (°F)
10
95
8
6
90
4
2
85
0
80
85
90
95
100
Max. Temp (°F)
80
10
20
30
day
40
50
BAR
Engineering/Math/Physics 25: Computational Methods
16
60
HIST
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
105
110
BAR construction file
% Bruce Mayer, PE • 06Apr16
% ENGR25
clear, close, clc
% The data
TmaxSTK = [94, 98, 93, 94, 91, 96, 93, 87, 89,
94, 100, 99, 103, 103, 103, 97, 91, 83, 84, 90,
89, 95, 94, 99, 97, 94, 102, 103, 107, 98, 86,
89, 95, 91, 84, 93, 98, 104, 105, 107, 103, 91,
90, 96, 93, 86, 92, 93, 95, 95, 86, 81, 93, 97,
96, 97, 101, 92, 89, 92, 93, 94]
%
% the BAR graph
bar(TmaxSTK), axis([ 1 62 80 110]), grid
xlabel('day'); ylabel('MaxTemp (°F)')
title('Tmax in Stockton, CA • Jul-Aug08')
whitebg([0.8 1 1])
Engineering/Math/Physics 25: Computational Methods
17
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Check Default Bin Widths
• The previous
HandCalc of 2.8 °F
CONFIRMED by
MATLAB
 Oakland
>> Tlo = min(TmaxOAK)
Tlo =
63
>> Thi = max(TmaxOAK)
Thi =
91
>> [n,BinCtr] = hist(TmaxOAK)
n =
11
10
15
11
7
– Note use of diff
command
2
2
BinCtr =
64.4000
67.2000
70.0000
72.8000
81.2000
84.0000
86.8000
89.6000
>> DelBC = diff(BinCtr)
DelBC =
2.8000
2.8000
2.8000
2.8000
2.8000
2.8000
Engineering/Math/Physics 25: Computational Methods
18
2.8000
0
1
3
75.6000
78.4000
2.8000
2.8000
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Check Default Bin Widths
• The previous
HandCalc of 2.6 °F
CONFIRMED by
MATLAB
 Stockton
>> Tlo = min(TmaxSTK)
Tlo =
81
>> Thi = max(TmaxSTK)
Thi =
107
>> [n,BinCtr] = hist(TmaxSTK)
n =
2
5
1
10
16
– Note use of diff
command
7
9
BinCtr =
82.3000
84.9000
87.5000
90.1000
97.9000 100.5000 103.1000 105.7000
>> DelBC = diff(BinCtr)
DelBC =
2.6000
2.6000
2.6000
2.6000
2.6000
2.6000
Engineering/Math/Physics 25: Computational Methods
19
2.6000
2
7
3
92.7000
95.3000
2.6000
2.6000
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Data Statistics Tool - 1
 Make LinePlot of Temp
Data for
Stockton, CA
 Use the Tools
Menu to find
the Data
Statistics Tool
Engineering/Math/Physics 25: Computational Methods
20
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Data Statistics Tool - 2
 Use the
Tool to Add
Plot Lines
for the
Temp Data
• The Mean
• ±StdDev
Engineering/Math/Physics 25: Computational Methods
21
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Data Statistics Tool - 3
 Quite a Nice
Tool,
Actually
 The Result
 The Avg
Max Temp
Was
96.97 °F
Engineering/Math/Physics 25: Computational Methods
22
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Probability
 Probability  The LIKELYHOOD that a
Specified OutCome Will be Realized
• The “Odds” Run from 0% to 100%
 Class Question: What are the
Odds of winning the California
MEGA-MILLIONS Lottery?
Engineering/Math/Physics 25: Computational Methods
23
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
258 890 085 ... EXACTLY???!!!
 To Win the MegaMillions Lottery
• Pick five numbers from 1 to 75
• Pick a MEGA number from 1 to 15
 The Odds for the 1st ping-pong Ball
= 5 out of 75
 The Odds for the 2nd ping-pong Ball
= 4 out of 75, and so On
 The Odds for the MEGA are 1 out of 15
Engineering/Math/Physics 25: Computational Methods
24
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
258 890 085... Calculated
 Calc the OverAll Odds as the
PRODUCT of each of the Individual
OutComes
 5 4 3 2 1  1 5!70! 1
Odds         

75! 15
 75 74 73 72 71  15
120
1


31,066,902,000 258,890,085
• This is Technically a COMBINATION
Engineering/Math/Physics 25: Computational Methods
25
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
258 890 085... is a DEAL!
 The ORDER in Which the Ping-Pong
Balls are Drawn Does NOT affect the
Winning Odds
 If we Had to Match the Pull-Order:
1 1 1 1 1 1
70!
Odds       
75 74 73 72 71 15 15  71!
1

 120X the Current
31,066,902,000
• This is a PERMUTATION
Engineering/Math/Physics 25: Computational Methods
26
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Normal Distribution - 1
 Consider Data (Freq Tab) on the Height
of a sample group of 20 year old Men
 Plot this Frequency Data using bar
Height of 20 Yr-Old Men
12
10
No.
8
6
4
2
0
62
64
66
68
70
Height (Inches)
72
74
Engineering/Math/Physics 25: Computational Methods
27
76
>>
y_abs=[1,0,0,0,2,4,5,
4,8,11,12,10,9,8,7,5,
4,4,3,1,1,0,1];
>> xbins =
[64:0.5:75];
>> bar(xbins, y_abs),
ylabel('No.'),
xlabel('Height
(Inches'),
title('Height of 20
Yr-Old Men')
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Ht (in)
64
64.5
65
65.5
66
66.5
67
67.5
68
68.5
69
69.5
70
70.5
71
71.5
72
72.5
73
73.5
74
74.5
75
No.
1
0
0
0
2
4
5
4
8
11
12
10
9
8
7
5
4
4
3
1
1
0
1
Normal Distribution - 2
 We can also SCALE the Bar/Hist such that
the AREA UNDER the CURVE equals 1.00,
exactly
 The Game Plan for Scaling
• Calc the Height of Each Bar To Get the
Total Area = Σ([Bin Width] x [individual counts])
𝑨 = ∆𝑨 = 𝑩𝑾𝒌 × 𝑰𝑪𝒌 =
• The individual Bar Area =
[Bin Width] x [individual count]
• %-Area any one bar → [Bar Area]/[Total Area]
Engineering/Math/Physics 25: Computational Methods
28
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Normal Distribution - 3
 Use bar to construct the Scaled
Histogram with Area of 1.0000
• See File Scaled_Histogram_1206.m
 Would
need to
enter all
100 raw
data pts to
use hist
0.1
0.08
Frequency
– Again, use
bar to
construct
histrogram
Height of 20 Yr-Old Men)
0.12
0.06
0.04
0.02
0
62
Engineering/Math/Physics 25: Computational Methods
29
64
66
68
70
Height (Inches)
72
74
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
76
Probability Distribution Fcn (PDF)
 Because the Area
Under the Scaled
Plot is 1.00, exactly,
The FRACTIONAL
Area under any bar,
or set-of-bars gives
the probability that
any randomly
Selected 20 yr-old
man will be that
height
Engineering/Math/Physics 25: Computational Methods
30
 e.g., from the Plot
we Find
• 67.5 in → 4%
• 68 in → 8%
• 68.5 in → 11%
 Summing → 23 %
 Thus by this dataset 23% of 20 yr-old
men are 67.2568.75 inches tall
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Random Variable
 A random variable x takes on a defined set of
values with different probabilities; e.g..
• If you roll a die, the outcome is random (not fixed)
and there are 6 possible outcomes, each of which
occur with equal probability of one-sixth.
• If you poll people about their voting preferences,
the percentage of the sample that responds “Yes
on Proposition 101” is a also a random variable
– the %-age will be slightly differently every time you poll.
 Roughly, probability is how frequently we
expect different outcomes to occur if we
repeat the experiment over and over
(“frequentist” view)
Engineering/Math/Physics 25: Computational Methods
31
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Random variables can be Discrete
or Continuous
 Discrete random variables have a
countable number of outcomes
• Examples: Dead/Alive, Red/Black,
Heads/Tales, dice, deck of cards, etc.
 Continuous random variables have an
infinite continuum of possible values.
• Examples: Battery Current, human weight,
Air Temperature, the speed of a car, the
real numbers from 7 to 11.
Engineering/Math/Physics 25: Computational Methods
32
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Probability Distribution Functions
 A Probability Distribution Function
(PDF) maps the possible values of x
against their respective probabilities of
occurrence, p(x)
 p(x) is a number from 0 to 1.0, or
alternatively, from 0% to 100%.
 The area under a probability
distribution function curve is
always 1 (or 100%).
Engineering/Math/Physics 25: Computational Methods
33
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Discrete Example: Roll The Die
x
p(x)
1
p(x=1)=1/6
2
p(x=2)=1/6
3
p(x=3)=1/6
4
p(x=4)=1/6
5
p(x=5)=1/6
6
p(x=6)=1/6
px
1/6
1
2
3
4
5
1 1 1 1 1 1
px       

6 6 6 6 6 6
all x
1
or  p x   6  so
6
all x
 px   1
Engineering/Math/Physics 25: Computational Methods
34
6
all x
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
x
Continuous Case
 The probability function that accompanies a
continuous random variable is a continuous
mathematical function that integrates to 1.
 The Probabilities associated with
continuous functions are just areas under a
Region of the curve (→ Definite Integrals)
 Probabilities are given for a range of
values, rather than a particular value
• e.g., the probability of getting a math SAT
score between 700 and 800 is 2%).
Engineering/Math/Physics 25: Computational Methods
35
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Continuous Case PDF Example
 Recall the negative exponential function
(in probability, this is called
x
f
(
x
)

e
an “exponential distribution”):
 This Function Integrates to 1 (as
required for all PDF’s) for limits of 0 → ∞

e

0
x
Engineering/Math/Physics 25: Computational Methods
36

e 
    0   1  1
 1 0
x
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Continuous Case PDF Example
 The probability that
x is any exact value
(e.g.: 1.9476) is 0
• we can ONLY assign
Probabilities
to possible
RANGES of x
 For example, the
probability of x
falling within 1 to 2:
p(x)=e-x
1
x
p(x)=e-x
1
1
NO Area
Under a
LINE
2

p (1  x  2)   e  x   e  x

x
Engineering/Math/Physics 25: Computational Methods
37
2

1

 e  2   e 1
 .135  .368  .23 23% 
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
2
1
Gaussian Curve
 The Man-Height
HistroGram had
some Limited, and
thus DISCRETE,
Data
 If we were to
Measure 10,000 (or
more) young men
we would obtain a
HistoGram like this
Engineering/Math/Physics 25: Computational Methods
38
 As We increase the
number and
fineness of the
measurements The
PDF approaches a
CONTINUOUS
Curve
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Gaussian Distribution
 A Distribution that
Describes Many Physical
Processes is called the
GAUSSIAN or NORMAL
Distribution
 Gaussian (Normal) distribution
• Gaussian → famous “bell-shaped curve”
– Describes IQ scores, how fast horses can run, the no. of
Bees in a hive, wear profile on old stone stairs...
• All these are cases where:
– deviation from mean is equally probable in either direction
– Variable is continuous (or large enough integers
to look continuous)
Engineering/Math/Physics 25: Computational Methods
39
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Normal Distribution
 Real-valued PDF: f(x) → −∞ < x < +∞
 2 independent fitting parameters:
µ , σ (central location and width)
 Properties:
IP
• Symmetrical about Mode at µ
• Median = Mean = Mode
• Inflection points at ±σ
 Area (probability of observing event) within:
• ± 1σ = 0.683
• ± 2σ = 0.955
 For larger σ, bell shaped curve becomes
wider and lower (since area =1 for any σ)
Engineering/Math/Physics 25: Computational Methods
40
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
IP
Normal Distribution
 Mathematically
f x  
• Where
1
2 
e
 ( x   ) 2 2
– σ2 = Variance
– µ = Mean (& Median, Mode)
 The Area Under the Curve
 f x dx 


1
2 
e

 ( x   ) 2 2
Engineering/Math/Physics 25: Computational Methods
41
2
dx  1
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
2
68-95-99.7 Rule for Normal Dist
68% of
the data
σ
σ
95% of the data
2σ
2σ
3σ
99.7% of the data
Engineering/Math/Physics 25: Computational Methods
42
3σ
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
68-95-99.7 Rule in Math terms…
 Using Definite-Integral Numerical Calculus
 
1
e

   2
1  x 
 

2  
dx  .68 68% 
  2
1  x 
 

2  
  3
1  x 
 

2  
1
e

  2  2
1
e

 3  2
Engineering/Math/Physics 25: Computational Methods
43
2
2
dx  .95 95% 
2
dx  .997 99.7% 
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Error Function (erf) & Probability
 Guass’s Defining
Eqn for the erf
erf  z  
2


z
0
e
 y2
IG  
dy
 This looks a lot Like
the normal dist
f x  
1
2 
e
 ( x   ) 2
2
 Consider the
Gaussian integral
Engineering/Math/Physics 25: Computational Methods
44
2
1
2 
 Or
IG 
1
2 
e
 ( x   ) 2 2
e
 x 


 2 
1
 x    dy
y


dx  2
 2 
1
 dy 
dx Or
 2
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
dx
2
 Now Let
dx   2dy
2
dx
Error Function (erf) & Probability
 Subbing for 𝑥 & 𝑑𝑥
IG 
1
e

2 
 x 


 2 
IG 
2
dx
1

e
1 2
 
2 
1
 y2
1
IG 
e
2

dy

 erf 
2 
2
1
 y2
 As
IG 
e
dy


2
erf  z  
 ReArranging
Engineering/Math/Physics 25: Computational Methods
45

 y2
e
dy
 y2
z
e

dy 

 y2
0
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
dy
Error Function (erf) & Probability
 Now the Limits
 This Fcn is
Symmetrical about
y=0
 Plotting
1
f y  e
0.9
 y2
 Recall
0.8
erf  z  
2
f(y) = exp(-y )
0.7
0.6
0.5
2
z

e
 y2
0
dy
 And the erf properties
0.4
0.3
• erf(0) = 0
• erf(∞) = 1
0.2
0.1
0
-3
-2
-1
0
1
2
3
y
Engineering/Math/Physics 25: Computational Methods
46
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Error Function (erf) & Probability
 By Symmetry about y = 0 for
2

0
e
 y2

2
dy  1 



0
e
2
−𝑦
𝑒
 y2
the AUC’s
dy
 Thus for Positive 𝐵
2


B
e

 y2
dy 
2
0
e



 y2
dy 
2


B
0
e
 y2
dy
 So Finally integrating: −∞ → 𝐵
2


B
e
 y2

Engineering/Math/Physics 25: Computational Methods
47
dy  1  erf ( B)
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Error Function (erf) & Probability
 Note That for a
Continuous PDF
• Probability that x is
Less or Equal to b
Px  b  
b
 f x dx

• Probability that x is
between a & b
b
Pa  x  b    f x dx
a
Engineering/Math/Physics 25: Computational Methods
48
 The probability for
the Normal Dist
Px  b  
b
1
2 
2
dx

b
Pa  x  b  
 But
( x   )
1
e
2  
e
 ( x   ) 2 2
1
2 
e
 ( x   ) 2 2
2
a
2
2
2
dx  I G so
2





x


1

I G  2 erf 
2

2



Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
dx
Error Function (erf) & Probability
 If We Scale this


1
b

µ


Properly we can Px  b   1  erf 

2
  2 
Cast these Eqns
into the ½∙erf Form
1  bµ
 a  µ 
Pa  x  b   erf 
  erf 

2   2 
  2 
 MATLAB has the erf built-in, so if we have
the POPULATION Mean & StdDev We can
Calc Probabilities for Normally Distributed
Quantities
Engineering/Math/Physics 25: Computational Methods
49
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
MATLAB and Guassian Prob
 Thus MATLAB has the tools
needed to Calulate any
Gaussian Probability for
• −∞ < 𝑏 < +∞
•𝑎 < 𝑏
1  bµ
 a  µ 
Pa  x  b   erf 
  erf 

2   2 
  2 
Engineering/Math/Physics 25: Computational Methods
50
1
 b  µ 
P x  b   1  erf 

2
  2 
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
erf(𝒛) can be NEGATIVE
 For the previous erf calcs to work the erf
must be NEGATIVE when 𝑏 is negative;
e.g.:
1
  0.73  µ 
P x  0.73  1  erf 
 MUST be  0
2
  2 
 A quick Check
>> erfM73 = erf(-0.73)
erfM73 =
-0.6981
>> erfP73 = erf(+0.73)
erfP73 =
0.6981
Bruce Mayer, PE
Engineering/Math/Physics 25: Computational Methods
51
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Estimating µ & σ (1)
 The Location &
Width Parameters, µ
& σ, are Calculated
from the ENTIRE
POPULATION
• Mean, µ
n
   xk n
k 1
• Variance, σ2
n
   xk   2 n
2
• Standard Deviation, σ
  2
 For LARGE
Populations it is
usually impractical to
measure all the xk
 In this case we take a
Finite SAMPLE to
ESTIMATE µ & σ
k 1
Engineering/Math/Physics 25: Computational Methods
52
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Estimating µ & σ (2)
 Say we want to
characterize
Miles/Yr driven by
Every Licensed
Driver in the USA
 We Take the Mean of
the SAMPLE
 We assume that this
quantity is Normally
Distributed, so we
take a Sample of
N = 1013 Drivers
 Use the SAMPLEMean to Estimate the
POPULATION-Mean
Engineering/Math/Physics 25: Computational Methods
53
N
x   xk N
k 1
N
µ  x   xk N
k 1
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Estimating µ & σ (3)
 S
 Now Calc the
 Estimate
SAMPLE Variance &
• standard deviation:
StdDev
positive square root of
N
2
S2 
 x
k 1
k
x

N 1
• Number decreased
from N to (N – 1) To
Account for case
where N = 1
– In this case 𝑥 = 𝑥1 ,
and the S2 result is
meaningless
Engineering/Math/Physics 25: Computational Methods
54
the variance
– small std dev:
observations are
clustered tightly around
a central value
– large std dev:
observations are
scattered widely about
the mean
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
All Done for Today
Gaussian?
Or
Normal?
 Recall De Moivre’s Theorem
z  R cos   jR sin  
 Normal distribution was
introduced by French
mathematician
A. De Moivre in 1733.
• Used to approximate
probabilities of coin tossing
• Called it the exponential
bell-shaped curve
 1809, K.F. Gauss, a German
mathematician, applied it to
predict astronomical entities… it
became known as the Gaussian
distribution.
 Late 1800s, most believe majority
of physical data would follow the
distribution  called normal
distribution
z k  R k cosk   j sin k 
Engineering/Math/Physics 25: Computational Methods
55
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Engr/Math/Physics 25
Appendix
Some Normal Dist
Examples
Bruce Mayer, PE
Licensed Electrical & Mechanical Engineer
[email protected]
Engineering/Math/Physics 25: Computational Methods
56
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
How Good is the Rule for Real?
 Check some example data:
 The mean, µ, of the weight of a large
group of women
Cross Country
Runners = 127.8 lbs
 The standard
deviation (σ)
for this Group
= 15.5 lbs
Engineering/Math/Physics 25: Computational Methods
57
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
68% of 120 = .68x120 = ~ 82 runners
In fact, 79 runners fall within 1σ (15.5 lbs) of the mean
112.3
127.8
143.3
25
20
P
e
r
c
e
n
t
15
10
5
0
80
90
100
110
120
130
140
150
160
POUNDS
Engineering/Math/Physics 25: Computational Methods
58
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
95% of 120 = .95 x 120 = ~ 114 runners
In fact, 115 runners fall within 2σ of the mean
96.8
127.8
158.8
25
20
P
e
r
c
e
n
t
15
10
5
0
80
90
100
110
120
130
140
150
160
POUNDS
Engineering/Math/Physics 25: Computational Methods
59
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
99.7% of 120 = .997 x 120 = 119.6 runners
In fact, all 120 runners fall within 3σ of the mean
81.3
127.8
174.3
25
20
P
e
r
c
e
n
t
15
10
5
0
80
90
100
110
120
130
140
150
160
POUNDS
Engineering/Math/Physics 25: Computational Methods
60
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Engr/Math/Physics 25
Appendix
f x   2 x  7 x  9 x  6
3
2
Bruce Mayer, PE
Licensed Electrical & Mechanical Engineer
[email protected]
Engineering/Math/Physics 25: Computational Methods
61
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Basic Fitting Demo File
% Bruce Mayer, PE
% ENGR25 * 29Jun12 * Rev 27Oct14
% file = Demo_Basic_Fitting_Stockton_Temps_1410.m
%
% Data for Stockton AirPort from
%
http://www7.ncdc.noaa.gov/IPS/cd/cd.html;jsessionid=1926FA20901D9A52D64
FC06A0A449C00
TmaxSTK1107 = [93 99 100 100 102 101 98 97 90 88 82 82 79 78 80 81 81
86 89 96 96 93 91 88 89 91 95 98 93 87 92]
N07 = length(TmaxSTK1107)
TmaxSTK1108 = [89 93 93 86 92 91 88 91 94 95 91 92 95 95 92 94 94 95 88
86 86 90 97 97 94 96 95 96 94 89 89]
N08 = length(TmaxSTK1108)
%
TmaxSTK11 = [TmaxSTK1107,TmaxSTK1108]
Ntot = length(TmaxSTK11)
nday = [1:Ntot];
plot(nday, TmaxSTK11, '-dk', 'LineWidth', 2), xlabel('No. Days after
31Jun11'),...
ylabel('Max. Temp (°F)'), title('Stockton, CA - Jul-Aug11'), grid
Engineering/Math/Physics 25: Computational Methods
62
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Normal or Gaussian?
 Normal distribution was introduced by French
mathematician A. De Moivre in 1733.
• Used to approximate probabilities of coin tossing
• Called it exponential bell-shaped curve
 1809, K.F. Gauss, a German mathematician,
applied it to predict astronomical entities… it
became known as Gaussian distribution.
 Late 1800s, most believe majority data would
follow the distribution  called normal
distribution
Engineering/Math/Physics 25: Computational Methods
63
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Carl Friedrich Gauss
Engineering/Math/Physics 25: Computational Methods
64
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Gaussian/Normal Distribution Eqn
f x  
1
2 
e
 ( x   ) 2
2
2
 Calculate in MATLAB using the
Error Function, 𝑒𝑟𝑓 𝑧
>> TestPerf = erf(0.41)
>> TestNerf = erf(-0.41)
TestPerf =
0.4380
TestNerf =
-0.4380
Engineering/Math/Physics 25: Computational Methods
65
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Ht (in)
No.
Area (BW*No.)
No./TotArea
64
1
0.5
0.0200
1.00%
64.5
0
0
0.0000
0.00%
65
0
0
0.0000
0.00%
65.5
0
0
0.0000
0.00%
66
2
1
0.0400
2.00%
66.5
4
2
0.0800
4.00%
67
5
2.5
0.1000
5.00%
67.5
4
2
0.0800
4.00%
68
8
4
0.1600
8.00%
68.5
11
5.5
0.2200
11.00%
69
12
6
0.2400
12.00%
69.5
10
5
0.2000
10.00%
70
9
4.5
0.1800
9.00%
70.5
8
4
0.1600
8.00%
71
7
3.5
0.1400
7.00%
71.5
5
2.5
0.1000
5.00%
72
4
2
0.0800
4.00%
72.5
4
2
0.0800
4.00%
73
3
1.5
0.0600
3.00%
73.5
1
0.5
0.0200
1.00%
74
1
0.5
0.0200
1.00%
74.5
0
0
0.0000
0.00%
75
1
0.5
0.0200
1.00%
Engineering/Math/Physics
S
50.0 25: Computational Methods
66
BW*(No./TotArea)
S
100.00%
Normal
Dist
Data
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
SPICE Circuit
Engineering/Math/Physics 25: Computational Methods
67
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx