Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Quantitative Techniques – Class II Quantifying Randomness – Probability and Probability Distributions Probability - Definitions • • • • • • • Sample Space – Like population, the entire range of values possible Event – The actual realization of the values Union – The likelihood of either of multiple events occurring Intersection – The likelihood of both events occurring Complement – Everything in the sample that is not occuring Mutual Exclusivity – If one event occurs, then the other cannot Independence – When the events are not related to each other – that is, the probability of one, does not affect the other • Permutations – The number of ways to arrange some objects • Combinations – Permutation, when order is not important Probability • Quantifying randomness • The context: An “experiment” that admits several possible outcomes – Some outcome will occur – The observer is uncertain which (or what) before the experiment takes place • Event space = the set of possible outcomes. (Also called the “sample space.”) • Probability = a measure of “likelihood” attached to the events in the event space. (Try to define probability without using a word that means probability.) Rules (Axioms) of Probability • An “event” E will occur or not occur • P(E) is a number that equals the probability that E will occur. • By convention, 0 < P(E) < 1. • E' = the event that E does not occur • P(E') = the probability that E does not occur. Essential Results for Probability • • • • • If P(E) = 0, then E cannot (will not) occur If P(E) = 1, then E must (will) occur E and E' are exhaustive – either E or E' will occur. Something will occur, P(E) + P(E') = 1 Only one thing can occur. If E occurs, then E' will not occur – E and E' are exclusive. Joint Events • Pairs (or groups) of events: A and B One or the other occurs: A or B ≡ A B Both events occur A and B ≡ A B • Independent events: Occurrence of A does not affect the probability of B • An addition rule: P(A B) = P(A)+P(B)-P(A B) • The product rule for independent events: P(A B) = P(A)P(B) Using Conditional Probabilities: Bayes Theorem Bayes’ Theorem finds the actual probability of an event from the result of your tests. Thus, very important for survey results or testing results of any kind The Theorem: P(A|B) = P(B|A) x P(A) P(B) Random Variable • Definition: A variable that will take a value assigned to it by the outcome of a random experiment. • Realization of a random variable: The outcome of the experiment after it occurs. The value that is assigned to the random variable is the realization. X = the variable, x = the realization • Use random variables to organize the information about a random occurrence. • Can be continuous or discrete Probability Distribution • Range of the random variable = the set of values it can take – Discrete: A set of integers. May be finite or infinite – Continuous: A range of values • Probability distribution: Probabilities associated with values in the range. Binary Random Variable • Like a coin toss – or any event that has only 2 alternatives • Event occurs X=1 • Event does not occur X = 0 • Probabilities: P(X = 1) = θ • P(X = 0) = 1 - θ Bernoulli Random Variable • X = 0 or 1 • Probabilities: P(X = 1) = θ • P(X = 0) = 1 – θ • (X = 0 or 1 corresponds to an event) Probability Function This is called a Probability Density Function (PDF) • Define the probabilities as a function of X • Bernoulli random variable – Probabilities: P(X = 1) = θ – P(X = 0) = 1 – θ • Function: P(X=x) = θx (1- θ)1-x, x=0,1 Mean and Variance • E[X] = 0(1- θ) + 1(θ) = θ • Variance = [02(1- θ) + 12 θ] – θ2 = θ(1 – θ) • Application: If X is the number of male children in a family with 1 child, what is E[X]? θ = .5, so this is the expected number of male children in families with one child. Models • Settings in which the probabilities can only be approximated • Models “describe” reality but don’t match it exactly – Assumptions are descriptive – Outcomes are not limited to a finite range Poisson Model The Poisson distribution is a model that fits situations such as Number of accidents in a location Number of people with a disease in a population e-λ λk P[X = k] = ,k = 0,1,2,... (not limited) k! e is the base of the natural logarithms, approximately equal to 2.7183. esomething is often written as the exponential function, exp(something) Poisson Variable Poisson Probabilities with Lambda = 4 X is the random variable 0.20 λ is the mean of x 0.15 is the standard deviation C2 λ The figure shows P[X=x] for a Poisson variable with λ = 4. 0.10 0.05 0.00 0 2 4 6 8 C1 10 12 14 16 Application • The arrival rate of customers at a pizza store is 3.2 people per hour. • What is the probability of exactly 5 customers walking-in during a particular hour? And at least 5 people walking in (less than or equal to 5)? • We can create a table like this on excel very easily (will learn in Analytics) • Probability of exactly 0 customers walking in is 0.0407 (or a 4% chance) – Of exactly 1 customer = 0.1304 (13%) – Of exactly 2 customers = 20.8% – Of at least 2 customers = 4% + 13% + 21% = 38% and so on… ----------------------------------------------Probability = Exp(-3.2) 3.2customers / customers! ----------------------------------------------Customers Probability 0 0.0407622 1 0.130439 2 0.208702 3 0.222616 4 0.178093 5 0.113979 6 0.060789 7 0.0277893 8 0.0111157 9 0.00395225 10 0.00126472 The Normal Distribution • The most useful distribution in all branches of statistics and econometrics. • Strikingly accurate model for elements of human behavior and interaction • Strikingly accurate model for any random outcome that comes about as a sum of small influences. Applications • Biological measurements of all sorts (not just human mental and physical) • Accumulated errors in experiments • Numbers of events accumulated in time – Amount of rainfall per interval – Number of stock orders per (longer) interval. (We used the Poisson for short intervals) – Economic aggregates of small terms. • And on and on….. The Empirical Rule and the Normal Distribution Dark blue is less than one standard deviation from the mean. For the normal distribution, this accounts for about 68% of the set (dark blue) while two standard deviations from the mean (medium and dark blue) account for about 95% and three standard deviations (light, medium, and dark blue) account for about 99.7%. The Logistic Distribution Used when the distribution has a fat tail High Kurtosis Many instances in real world data, including marketing surveys financial markets data