Download Random Variables & Disributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Random Variables &
Probability Distributions

Outcomes of experiments are, in part,
random




E.g. Let X7 be the gender of the 7th randomly
selected student.
In this case, the sample space is S={M,F}
Probability distributions used to
understand, model, and predict outcomes
of random experiments.
Many useful distributions for describing
random processes in environmental
science & mgt.
Example: Hazardous Waste
Hazardous Waste Depository: test
wells-monitor groundwater for leaks.
 Aldicarb limit = 30 ppb
 Aldicarb occurs naturally (but
concentration is variable).
 What is probability of exceeding
limit even if no leak? (Prob
measuring > 30 even if no leak?)

Evidence and Data

“Natural” distribution of aldicarb:

500 readings from sites known to not
be contaminated:
Evidence Cont’d

Based on this distribution, we will
assume these data are normally
distributed with:
Mean = 20 ppb
 Standard Deviation = 4 ppb

Definitions

Random Variable: the unknown outcome
of an experiment. The particular
outcome is a realization of the random
variable.




E.g. (1) rain Tues., (2) aldicarb measurement
r.v. takes diff. values each w/ diff. probs.
Histogram: plot of the frequency of
observation of a random variable over
discrete intervals.
Discrete vs. Continuous Random Variable
Frequency of Outcomes

Probability Density (Mass) Function:
Histogram of outcomes resulting from
infinite # samples: (Prob = area under)



For cont., bar width approaches 0
Cumulative Distribution Function:
Probability that the r.v.  x.
Examples on board:

# Grizzly cubs per sow (1,2,1,2,2,2,2,3,1,2)
• Histogram vs. known prob. mass (.13, .70, .17)

Natural aldicarb concentration
• Histogram (of data) vs. pdf N(20,4)
Known vs. Unknown
Distributions



True distribution may not be a known
distribution (e.g. dist’n of student’s
heights in this classroom)
Often, knowing how a process works will
point us to a particular (known)
distribution
Advantages of known distributions:


Can usually be described by 1 or 2
parameters.
Well studied, so most properties known
• Easy to ask questions like the aldicarb question.
Discrete Random Variables
1.
2.
3.
4.
5.
Bernoulli: 2 outcomes: “success”
(prob,= p) or “failure” (prob.= 1-p)
Binomial: Number of successes in n
independent Bernoulli trials.
Multinomial: Extends Binomial to more
than 2 outcomes.
Geometric: Number Bernoulli trials until
first success.
Poisson: Counting r.v. (takes integer
values). Number events that occur in
given time interval.
Normal Random Variable
1.
Normal: “Bell Shaped”, “Gaussian”.
Symmetric. + and – values.
1. Central Limit Theorem: Sum or Avg.
of several independent r.v.’s, result is
normal (often used as justification for
Normal).
2. “Standard Normal”: N(0,1).
3. Convert X~N(m,s) to Standard Normal
(Z):
Z=(X-m)/s
Continuous Random
Variables
1.
2.
3.
4.
5.
6.
Uniform: every possible outcome equally
likely (also a discrete r.v.)
Log-Normal: r.v. whose logarithm is
normally distributed.
Gamma: Non-negative values.
Extreme Value: Maximum or minimum of
many draws from some other distribution.
Exponential: Inter-arrival times,
“memoryless”.
c2: Closely related to Normal. Nonnegative. Skewed.
Answer
Question: What is probability that
measured aldicarb level  30 ppb, if
no leak?
 Let X be a random variable
describing the aldicarb level of a
given test.
 P(X  30) = area under N(20,4)
above 30 ppb.

Integrate Under N(20,4)

Normal pdf:
 1  x m
1
f (x) 
exp   

 2 s 
s 2

Draw
Isn’t
way?
on board…Ouch!
there another
2




2 Ways to Answer
1.
2.
Ask S-Plus (nicely): P(X<30)=0.994,
so P(X>30)=0.006.
Convert to N(0,1).
1.
Standard Normal Z=(30-20)/4=2.5.
Table gives Pr(0<Z<z):
z
.00
.01
.02
.03
0.0
.000
.004
.008
.012
0.3
.118
.122
.126
.129
1.8
.464
.465
.466
.466
2.5
.494
.494
.494
.494
Answer
Pr(X>30) when X~N(20,4) =
 Pr(Z>2.5) when Z~N(0,1)

Pr(0<Z<2.5)=.494
 Pr(-<Z<0)=.500


So, Pr(Z>2.5)=1-.494-.5 = .006