Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Random Variables & Probability Distributions Outcomes of experiments are, in part, random E.g. Let X7 be the gender of the 7th randomly selected student. In this case, the sample space is S={M,F} Probability distributions used to understand, model, and predict outcomes of random experiments. Many useful distributions for describing random processes in environmental science & mgt. Example: Hazardous Waste Hazardous Waste Depository: test wells-monitor groundwater for leaks. Aldicarb limit = 30 ppb Aldicarb occurs naturally (but concentration is variable). What is probability of exceeding limit even if no leak? (Prob measuring > 30 even if no leak?) Evidence and Data “Natural” distribution of aldicarb: 500 readings from sites known to not be contaminated: Evidence Cont’d Based on this distribution, we will assume these data are normally distributed with: Mean = 20 ppb Standard Deviation = 4 ppb Definitions Random Variable: the unknown outcome of an experiment. The particular outcome is a realization of the random variable. E.g. (1) rain Tues., (2) aldicarb measurement r.v. takes diff. values each w/ diff. probs. Histogram: plot of the frequency of observation of a random variable over discrete intervals. Discrete vs. Continuous Random Variable Frequency of Outcomes Probability Density (Mass) Function: Histogram of outcomes resulting from infinite # samples: (Prob = area under) For cont., bar width approaches 0 Cumulative Distribution Function: Probability that the r.v. x. Examples on board: # Grizzly cubs per sow (1,2,1,2,2,2,2,3,1,2) • Histogram vs. known prob. mass (.13, .70, .17) Natural aldicarb concentration • Histogram (of data) vs. pdf N(20,4) Known vs. Unknown Distributions True distribution may not be a known distribution (e.g. dist’n of student’s heights in this classroom) Often, knowing how a process works will point us to a particular (known) distribution Advantages of known distributions: Can usually be described by 1 or 2 parameters. Well studied, so most properties known • Easy to ask questions like the aldicarb question. Discrete Random Variables 1. 2. 3. 4. 5. Bernoulli: 2 outcomes: “success” (prob,= p) or “failure” (prob.= 1-p) Binomial: Number of successes in n independent Bernoulli trials. Multinomial: Extends Binomial to more than 2 outcomes. Geometric: Number Bernoulli trials until first success. Poisson: Counting r.v. (takes integer values). Number events that occur in given time interval. Normal Random Variable 1. Normal: “Bell Shaped”, “Gaussian”. Symmetric. + and – values. 1. Central Limit Theorem: Sum or Avg. of several independent r.v.’s, result is normal (often used as justification for Normal). 2. “Standard Normal”: N(0,1). 3. Convert X~N(m,s) to Standard Normal (Z): Z=(X-m)/s Continuous Random Variables 1. 2. 3. 4. 5. 6. Uniform: every possible outcome equally likely (also a discrete r.v.) Log-Normal: r.v. whose logarithm is normally distributed. Gamma: Non-negative values. Extreme Value: Maximum or minimum of many draws from some other distribution. Exponential: Inter-arrival times, “memoryless”. c2: Closely related to Normal. Nonnegative. Skewed. Answer Question: What is probability that measured aldicarb level 30 ppb, if no leak? Let X be a random variable describing the aldicarb level of a given test. P(X 30) = area under N(20,4) above 30 ppb. Integrate Under N(20,4) Normal pdf: 1 x m 1 f (x) exp 2 s s 2 Draw Isn’t way? on board…Ouch! there another 2 2 Ways to Answer 1. 2. Ask S-Plus (nicely): P(X<30)=0.994, so P(X>30)=0.006. Convert to N(0,1). 1. Standard Normal Z=(30-20)/4=2.5. Table gives Pr(0<Z<z): z .00 .01 .02 .03 0.0 .000 .004 .008 .012 0.3 .118 .122 .126 .129 1.8 .464 .465 .466 .466 2.5 .494 .494 .494 .494 Answer Pr(X>30) when X~N(20,4) = Pr(Z>2.5) when Z~N(0,1) Pr(0<Z<2.5)=.494 Pr(-<Z<0)=.500 So, Pr(Z>2.5)=1-.494-.5 = .006