* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Introduction Introduction to probability theory
History of randomness wikipedia , lookup
Indeterminism wikipedia , lookup
Random variable wikipedia , lookup
Dempster–Shafer theory wikipedia , lookup
Infinite monkey theorem wikipedia , lookup
Birthday problem wikipedia , lookup
Inductive probability wikipedia , lookup
Law of large numbers wikipedia , lookup
Ars Conjectandi wikipedia , lookup
Probability Theory The stabilization of relative frequencies The basis of probability theory is the probability space. The key idea behind the probability space is the stabilization of relative frequencies. Introduction Intuition. A random experiment is repeated over and over. Let fn(A) be the number of occurences of the event A in the first n trials. If Introduction to probability theory it seems reasonable/intuitive to define a as the probability of the event A. This probability will be denoted Pr(A) or P(A). Thommy Perlinger, Probability Theory 1 Thommy Perlinger, Probability Theory Properties of relative frequencies 2 The probability space Important properties of relative frequencies should also hold for probabilities. The probability space (Ω, ࣠,P). Ω is the sample space, that is, the set of elementary events, or outcomes, {ω}. 1. Since 0 ≤ rn(A) ≤ 1 it follows that 0 ≤ Pr(A) ≤ 1. ࣠ is the collection of events, that is, the collection of subsets of Ω. 2. Since rn(Ø) = 0 and rn(Ω) = 1 it follows that Pr(Ø) = 0 and Pr(Ω) = 1. P is a probability measure that satisfies the Kolmogorov axioms, that is 3. Let B be the complement of A. Since rn(A) + rn(B) = 1 it follows that Pr(A) + Pr(B) = 1. i. For any A ࣠ there exist a number P(A), or Pr(A), the probability of A, such that P(A) ≥ 0. 4. Let A be contained in B. Since rn(A) ≤ rn(B) it follows that Pr(A) ≤ Pr(B). ii. P(Ω) = 1. 5. Let A and B be disjoint and C their union. Since rn(C) = rn(A) + rn(B) it follows that Pr(C) = Pr(A) + Pr(B). iii. Let {An,n≥1} be a collection of pairwise disjoint events and let A be their union. Then 6. Let A and B be (arbitrary) events, C their union, and D their intersection. Since rn(C) = rn(A) + rn(B) - rn(D) it follows that Pr(C) = Pr(A) + Pr(B) - Pr(D). Thommy Perlinger, Probability Theory 3 Thommy Perlinger, Probability Theory 4 1 Independence and conditional probabilities Random variables We are for a random experiment in general not interested in the events of ࣠ themselves, but rather in some real-valued function of them. Conditional probability. Let A and B be two events, with Pr(B) > 0. The conditional probability of A given B, is defined by Definition. A random variable is a (measurable) real-valued function Example. We know that a family has two children and that (at least) one of these is a boy. Determine the probability that the other child is also a boy. Example (cont.). We know that a family has two children and that (at least) one of these is a boy born on a tuesday. Determine the probability that the other child is also a boy. Independence. Two events A and B are independent iff the probability of their intersection equals the product of their marginal probabilities, that is, iff Thommy Perlinger, Probability Theory 5 Example. A random sample (of n individuals) is taken from the Swedish electorate in order to estimate the proportion of supporters of the royal house. X = The number of supporters of the royal house in the sample. It follows that Ω is the set of all possible n-samples from the Swedish electorate and Thommy Perlinger, Probability Theory 6 Example (cont.) Supporters of the royal house Probability distributions Let x be a real value. It then follows that Since X is a discrete random variable we seek the probability function of X. Because X : Ω [0,1,2,…,n] it is sufficient to find Pr(X=x) for x = 0,1,2,…,n. To properly describe the probability distribution of X we need Pr(XB) for all BԹ. However, it suffices to know Pr(XB) for all B = (-,x] where xԹ. Definition. The distribution function, FX, of the random variable X is given by Closely related to the distribution function is the probability function, pX, where pX(x) = Pr(X=x), for discrete random variables, and the probability density function, fX, for continuous random variables. We have that Let p be the (unknown) proportion of supporters of the royal house in the Swedish electorate (population). For practical purposes we consider the size of the population to be infinite. Combinatorial results then helps us to derive the following probabilities for X. This probability function is a representation of a probability distribution called the binomial distribution. and 7 Thommy Perlinger, Probability Theory 8 2 A (very) brief description of some common (families of) probability distributions Expectation and variance Often we do not require a complete description of the probability distribution, but rather a summary of it. The most common measure of location, is the mean or expected value of X. A common situation for (important) discrete probability distributions is that their probability functions can be derived using sampling from an urn with only two types of balls (e.g. white and black balls). This includes: The Bernoulli distribution, the Binomial distribution, the Geometric distribution, the First success distribution, the Negative binomial distribution, and the Hypergeometric distribution. The Poisson process gives rise to the important Poisson distribution. The most common measure of dispersion, is the variance of X. However, Poisson processes also gives rise to a number of important continuous probability distributions (or at least special cases of them). This includes: The Exponential distribution, the Gamma distribution, and the Beta distribution. Thommy Perlinger, Probability Theory 9 Thommy Perlinger, Probability Theory 10 3