Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Probability and Probability Distributions ● Experiment – It is a process that results in an observation (often called an outcome or sample point) which cannot be determined with certainty in advance of the experiment ● Sample Space (S) – S is the set of all possible outcomes of the experiment. ● Event (A, B, C, etc.) – An event is a subset of the sample space S ● Probability of an Event A: – P(A) = sum of the probabilities of all outcomes that are in the event A ● Examples: ● ● Experiment Sample Space Throw a die once S = { 1, 2, 3, 4, 5, 6} ● Throw a die twice S = { (1,1), (1,2), (1,3), (1,4), (1,5), (1,6) (2,1), (2,2), (2,3), (2,4), (2,5), (2,6) (3,1), (3,2), (3,3), (3,4), (3,5), (3,6) (4,1), (4,2), (4,3), (4,4), (4,5), (4,6) (5,1), (5,2), (5,3), (5,4), (5,5), (5,6)} Event A: even number; A = { 2, 4, 6} A: sum is 10; {(4,6), (5,5), (6,4)}; B: sum < 4; {((1,1), (1,2), (2,1)} Probability of an Event: p(A) ● Classical Definition: – If an experiment results in N equally likely outcomes, then p (A) = NA/N where NA is the number of outcomes in the event A. (Note: random selection of k units from N distinct units implies every possible group of k units is equally likely) ● Relative Frequency or Empirical Definition: – If an event occurs nA times in n repetitions of an experiment, then p(A) = nA/n whenever n is sufficiently large. – Example: When a fair coin is tossed a large number of times, then you expect to observed 50% heads and 50% tails. We use this proportion (i.e. relative frequency) as the p(Head) in a single toss; p (H) = ½. ● Axiomatic Approach to Probability – This approach builds up probability theory from a number of assumptions and will not be discussed here. Examples of Empirical Probabilities ● Proportions and percentages found in samples, journal articles, newspapers, polls, etc., are used as empirical probabilities ● Examples: ● “Five percent of all items produced in a factory are defective” means • P(A randomly selected item is defective) = 0.05 ● “NBA player Mr. W makes 80% free throws” means • P(Mr. W will make his next free throw) = 0.80 ● “5% of ICU patients die within 15 days of hospital admission” means • P(A randomly selected ICU patient will die within 15 days) = 0.05 Laws of Probability ● Addition Law • p(A or B) = p(A) + p(B) - p(A and B) ● Complementary Law • p(A) + p(not(A)) = 1 ● Conditional Probability • Probability of an even “A” given that B already occurred • = p(A/B) = p(A and B)/p(B) ● Multiplicative Law • p(A ∩ B) = p(A) p(B/A) = p(B)p(A/B) ● Independent Events ● Two events A and B are independent if and only if p(A and B) = p(A) p(B) ● Remarks • If A and B are disjoint or mutually exclusive, then P(A ∩ B) = 0 • If S is the sample space of an experiment , then p(S) = 1 • If A is any event of S, then (A) or (Not A) = S Example: Understanding Classical probability Employment Status Class Rank Fr Full-Time 100 Part-time 200 No Job 10 Total 310 So Jr Sr 20 50 70 80 75 40 50 25 280 150 150 390 Total 240 395 365 1000 Select a student at random from this 1000 students. What is the probability that the selected student is Sr? Unemployed? 390/1000; 365/1000; Sr and unemployed? Sr or unemployed? 280/1000; 475/1000 A student selected at random is found to be Sr. What is the probability that the student is unemployed? Ans. P(unemployed/Senior) = (280/1000)/ (390/1000) = 280/390 Example: Understanding Relative frequency or empirical probability ● An insurance company divides its policy holders into three categories: low risk, moderate risk, and high risk. The low-risk policy holders account for 60% of the total number of people insured by the company. The moderate-risk policy holders account for 30%, and the high-risk policy holders account for 10%. The probabilities that a low-risk, moderate-risk, and highrisk policy holder will file a claim within a given year are respectively .01, .10, and .50. – If a policy holder is selected at random, what is the probability that a low risk policy holder will be selected? (need the proportion of low risk policy holder) – If a policy holder is selected at random, what is the probability of selecting a policy holder who will file a claim? (need the proportion of all policy holders who will file a claim) – Given that a policy holder files a claim, what is the probability that the person is a high-risk policy holder? (need the proportion of high risk policy holders only among those who filed a claim) Random Variables and Probability Distributions ● A discrete random variable (X, say) can assume only a countable number of values. ● Probability distribution of X • The values of X and the corresponding probabilities together form the probability distribution of X. ● A continuous random variable X can assume any numerical value within some interval or intervals. ● The graph of the probability distribution is a smooth curve often called a probability density function ( f ) which satisfies – (i) f(x) ≥ 0 for all x, and (ii) Total area under the graph is 1 ● The probability that a randomly selected value of X falls between a and b is defined as the area between a and b under the graph of f. p(a<x<b)=shaded area A between a and b Binomial Probability Distribution (a discrete distribution) ● Binomial Experiment • consist of n ≥ 1 independent and identical trials where each trial has two possible outcomes S (Success) and F (Failure) such that P(S) =p is the same for each of the n trials. ● Binomial random variable : X = number of successes in n trials ● Probability distribution of X is binomial probability distribution # of trials # of successes Probability of success in each trial Geometric Probability Distribution (a discrete distribution) ● Recall that the binomial random variable is the number of successes in n independent Bernoulli trials ● Suppose now that we do not fix the number of Bernoulli trials n in advance but instead continue to observe the sequence of Bernoulli trials until we observe a success. The random variable of interest (X) here is the number of trials (or equivalently, the number of failures) needed before the first success. ● The probability distribution of X is called geometric distribution and is given by • p(x) = p(1-p)x, x = 0, 1, … , where p is the probability of success ● Mean and variance of this distribution are • μ = (1-p)/p, and σ2 = (1-p)/ p2 Poisson Probability Distribution ( a discrete distribution) ● The Poisson random variable X is the observed number of rare events in a unit of measurement (e.g., time, area, volume, weight, distance, etc.) and its probability distribution is given by • p(x) = (e-λ λx )/x!, x = 0, 1, ..., where • λ is the expected number of events during the given unit of measurement. ● For this distribution, both mean and variance are equal to λ (i.e. μ= σ2 = λ) ● Examples of some events and units • • • • • • Number of accidents (event) per month (unit) Number of cancer deaths (event) per year (unit) Number of diseased trees (event) per acre (unit) Number of airline fatalities (event) per month (unit) Number of hurricanes (event) per season (unit) Number of misprints (event) per page (unit) of a book Normal Distribution (continuous) ● Bell-shaped symmetric distribution with mean μ = 0 and standard deviation σ = 1 ● Often called a Zdistribution ● Bell-shaped symmetric distribution with mean μ and standard deviation σ a μ b X Standard Normal Distribution X=μ+Zσ p(a < X < b) = area between a and b p(a < X < b) = c 0 d Z p(c < Z < d) = area between c and d p( (a-μ)/σ < Z < (b-μ)/σ ) 11 Sampling probability distribution of sample mean As the sample size n increases, the distribution of sample mean gets closer to normal distribution. Standard Normal and t-distributions ● Standard Normal (mean = 0, var = 1) and t-distribution with df = η (mean = 0, var = η /(η -2) F-distribution ● F-distribution depends on two degrees of freedoms (df) d1 and d2 Chi-squared (χ2 ) distribution ● χ2-distribution for some values of degrees of freedom (k) Identifying probability distributions for observed data ● Compare summary results of observed data and properties of distributions – Calculate mean, median, variance, percentiles, etc., of your observed data and identify the properties and/or relationships of these summary results – A probability distribution that has properties similar to that found in the observed data is a good candidate to represent the observed data. ● Examples: – Count data of rare events with mean equal to variance is likely a good fit with Poisson distribution – Quantitative measurement data with independent mean and variance that satisfy empirical rule percentages (68%, 95% and 99.75%) may fit well with normal distribution – Quantitative measurement data for which mean is equal to standard deviation may fit well with exponential distribution Identifying probability distributions for observed data ● Three commonly used methods ● Histogram and overlayed probability density curve – Construct histogram of your data and display the desired probability density curve. Visually check if the density curve fits the histogram. ● Probability Plots – Construct a scatterplot of the ranked data values on one axis and the corresponding expected theoretical distribution score (often standardized score is used) on another axis. – Construct a scatterplot of the observed cumulative proportions on one axis and the corresponding expected theoretical cumulative proportions on another axis. ● For the Q-Q and P-P plots, scatterplot points are expected to be close to a straight line if the theoretical distribution fits well to the observed data Example: Reading Scores and Normal Distribution ● Histogram and Normal Curve ● Normal QuantileQuantile Plot (QQ Plot) ● Normal probability Plot (P-P plot) For Q-Q and P-P plots, points close to the solid line indicate that the data fit well to the theoretical distribution