Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ESS011 Mathematical statistics and signal processing Lecture 6: The random variable, its probability law and first two moments Tuomas A. Rajala Chalmers TU March 27, 2014 Course ESS011 (2014) Lecture 6: The random variable, its probability law and first two moments Where are we Over the three past lectures we have learned how to talk about randomness collected the basic tools of probability theory learned how to manipulate given probabilities to form new ones. Now let’s talk about modeling the events. This lecture will be a crash course to the basic definitions and terminology of models for random outcomes. 1/21 Course ESS011 (2014) Lecture 6: The random variable, its probability law and first two moments Motivation 0.8 0.6 0.4 P(outcome is x) 0.2 7 6 5 4 3 2 1 0 0.0 How would you describe the roll of a die? ”1 out 6”, ”equally likely”,... Perhaps in a figure: pk = 1/6 for k = 1, ..., 6, like so → 1.0 How would you describe the coin toss? ”50-50”, ”equally likely”... p = 0.5. 0.08 P(outcome is x) 0.04 outcome = x 2/21 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0.00 How would you describe an experiment with outcomes 1, ..., 25 that would have the probabilities as given in the figure → 0.12 outcome of a die = x Course ESS011 (2014) Lecture 6: The random variable, its probability law and first two moments Random variable: layman’s definition We have been describing the outcome of an experiment with events. Now let’s describe events using variables. Variable = an element, feature, or factor that is liable to vary or change Random variable (r.v.) A variable which describes the outcome of a random experiment as a number. We denote random variables before the experiment by uppercase letters, e.g. X, Y , and after the experiment with lowercase letters, e.g. x = 0, y = 4.5. One must encode the outcomes into numbers, so e.g. heads = 0, tails = 1. 3/21 Course ESS011 (2014) Lecture 6: The random variable, its probability law and first two moments Random variable: mathematical definition * Mathematical definition: Let S be a non-empty sample space, (S, S, P ) be a probability space, and let B(R) be the set of all Borel sets of R (all ”nice” subsets). Random variable (r.v.) A function X : S → R is called random variable if X −1 (B) = {s : X(s) ∈ B} ∈ S ∀B ∈ B(R) The mathematical definition says X is S-measurable. Which means that whatever X does, P can handle it. 4/21 Course ESS011 (2014) Lecture 6: The random variable, its probability law and first two moments Discrete and continuous random variables We classify random variables based on their outcome properties. Discrete random variable If r.v. X takes values on a finite or countably infinite set (can be mapped to N), it is called a discrete r.v. Continuous random variable If r.v. X takes values on a uncountably infinite set, it is called continuous r.v. E.g.: The number of people in a queue (discrete), the time we need to wait (continuous) We study only these two. Others include: processes, graphs, sequences, fields, point patterns... 5/21 Course ESS011 (2014) Lecture 6: The random variable, its probability law and first two moments Examples of random variables Die cast: X(s) = s. Then P ({score 5}) = P ({s : X(s) = 5}) or P (X = 5) for short. Coin sequence (see book example 3.1.2): Let’s toss a coin repeatedly, and observe when the first tails comes up. Denote this random number by Y = 1, 2, 3, .... Then 3 2 1 0 Continuous: X =the height of a male Swede. The graph on the right could depict a rough model for its probabilities. P(height = x) 4 P (first tails at toss k) = P (Y = k) 1.4 1.6 1.8 2.0 height = x (in meters) 6/21 2.2 Course ESS011 (2014) Lecture 6: The random variable, its probability law and first two moments Distribution: discrete From possible to probable: Likelihood of the outcomes a r.v. can have. The set of all probabilities {P (X = si ) : ∀si } is called the distribution of a discrete r.v. X. For example, dice: {1, 2, 3, 4, 5, 6} → {1/6, 1/6, ..., 1/6} The coin sequence: As we have P (tails) = P (not tails), and the tosses are independent, 1 P (Y = 1) = 2 1 1 P (Y = 2) = P (1st not tails)P (2nd is tails) = · 2 2 and so forth (see later), the distribution is 1 1 1 {1, 2, 3, ...} → { , , , ...} 2 4 8 7/21 Course ESS011 (2014) Lecture 6: The random variable, its probability law and first two moments Density: discrete The listing of the distribution is practical only for small sample spaces. Often we can (and want to) describe the distributions by special functions: define Discrete density Let X be a discrete r.v. The function f such that f (x) = P (X = x), x∈R is called the density function of X. Also known as the probability mass function (pmf) or the point mass function. The density defines the distribution, and vice versa. 8/21 Course ESS011 (2014) Lecture 6: The random variable, its probability law and first two moments Density: continuous For a continuous random variable X this is a bit trickier: For any single value x ∈ R we unfortunately have P (X = x) = 0 (height exactly 1.8?). We define the density as follows: Continuous density The function f is called the density of a continuous r.v. X, iff 1 2 3 f (x) ≥ 0 for all x ∈ R R∞ f (x)dx = 1 −∞ Rb P (a ≤ X ≤ b) = a f (x)dx for a, b ∈ R. The density of continuous X is equated to its distribution. Remember: For continuous X we have P (X = x) = 0 for every single value x, even if f (x) > 0! 9/21 Course ESS011 (2014) Lecture 6: The random variable, its probability law and first two moments Density: notes Here is a check-list for a function to be a density. Discrete density: A real valued function f is a discrete density, iff 1 2 f (x) ≥ 0 for all x P all x f (x) = 1 Continuous density: A real valued function f is a continuous density, iff 1 2 f (x) ≥ 0 for all x R∞ f (x)dx = 1 −∞ Notes: We will only consider sample points for which f (x) > 0 and where f not defined we assume f (x) = 0 Continuous X ⇔ continuous function f We often write X ∼ f when X has density f 10/21 Course ESS011 (2014) Lecture 6: The random variable, its probability law and first two moments Example continues For coin sequence: Due to the independence, f (k) := P (Y = k) = 1/2k , k = 1, 2, .... Is it a density? The values are positive, so we only need to check if it sums to 1: ∞ X f (k) = k=1 ∞ X (1/2)k k=1 P(Y=k) 0.2 1/2 =1 1 − 1/2 0.1 = 0.3 0.4 0.5 is a converging geometric series with limit 11/21 26 25 24 23 22 21 20 19 18 17 16 15 14 9 Y=k 13 8 12 7 11 6 5 10 4 3 2 1 0.0 so f is a density for Y . Course ESS011 (2014) Lecture 6: The random variable, its probability law and first two moments Cumulative distribution function Another useful function: Cumulative distribution function (CDF) Let X be a r.v. with density f . Then the function F with F (a) := P (X ≤ a) a ∈ R is called the cumulative distribution function (CDF) of X. The connection between density and CDF is For discrete X: X F (a) = f (x) x≤a For continuous X: Z a F (a) = f (x)dx, f (x) = F 0 (x) −∞ Note that for continuous X P (X ≤ a) = P (X < a) as P (X = a) = 0. 12/21 Course ESS011 (2014) Lecture 6: The random variable, its probability law and first two moments Examples contd. Coin toss: How many coins to toss so that the probability of getting the tails is ≥ 99%? I.e. with what k is P (Y ≤ k) ≥ 0.99? First, X X F (a) = f (a) = 1/2k k≤a k≤a It is a truncated geometric series and has a finite form F (a) = 1 − 1/2a . The solutions is given by F (a) > 0.99 ⇔ a ≈ 7. Heights of men: What is the probability that a randomly picked Swedish man is between a and b meters tall? Z b Z a P (a < X < b) = f (x)dx − f (x)dx = F (b) − F (a). −∞ −∞ *** 13/21 Course ESS011 (2014) Lecture 6: The random variable, its probability law and first two moments Expectation We are often interested in only some aspects of distributions. Most often want to know the average behaviour: What we would get on average if we would repeat the experiment. Example of heights: You are selling life jackets, one size only. What is the average size of a male Swede? Coins Butterflies example: You are a lepidopterist, and want to study at least one male of a special butterfly species. Your trap catches males and females with ratio 50:50, but only one at the time. How many do you need to catch on average to get that one male? 14/21 Course ESS011 (2014) Lecture 6: The random variable, its probability law and first two moments Expectation: general Note: If X is a random variable, and h is real valued function, h(X) is also a random variable. E.g. for the life jackets the chest diameter could be some h(x), where x is height. Then we can ask for the expected value of h(X). Expectation (general) For a random variable X with a density f , and a function h, the expected value of h(X) is defined as X E[h(X)] = h(x)f (x) (discrete) x Z E[h(X)] ∞ = h(x)f (x)dx −∞ technical condition: E|h(X)| < ∞ 15/21 (continuous) Course ESS011 (2014) Lecture 6: The random variable, its probability law and first two moments Expected value The quantity often asked describes the expected outcome of X, i.e. h(x) = x: Expected value For a random variable X with a density f the expected value is defined as X E(X) = xf (x) (discrete) x Z E(X) ∞ = xf (x)dx (continuous) −∞ Note: Theoretical average value: Where is the distribution concentrated ”expected value” or ”mean” or ”average” When data is involved most often ”average” Usually denoted by µ Useful equations: Ec = c and E(cX) = cE(X) for any constant c 16/21 Course ESS011 (2014) Lecture 6: The random variable, its probability law and first two moments Example of expectation The butterflies (i.e. coins): Y is the time of first male. We need on average ∞ ∞ X X E(Y ) = kf (k) = k/2k = ... = 2 k=1 k=1 0.6 0.4 0.0 0.2 Consider the following distribution (book Table 3.6) of heartbeat rates S = {40, 60, 67, 70, 72, 80, 100}: P(X=x) 0.8 1.0 samples (proof later). S → {.01, .04, .05, .8, .05, .04, .01} 40 60 68 70 72 80 100 80 100 heart beat rate X = x 1.0 The average is µ = 70 (dashed line). 0.6 0.4 0.2 0.0 S → {.4, .05, .04, .02, .04, .05, .4} P(X=x) 0.8 But another distribution could be The average is also µ = 70. 40 60 68 70 72 heart beat rate X = x 17/21 Course ESS011 (2014) Lecture 6: The random variable, its probability law and first two moments Variance Expected value is only one descriptor of the distribution. A variability number provides more information. Most used one is Variance Let X be a random variable with density f . Then Var(X) := E(X − E(X))2 is called the variance. Measures expected differences from mean Common notation: Var(X) = σ 2 Large variance = large deviations from the mean are to be expected Useful formulas: Var(c) = 0 and Var(cX) = c2 Var(X) for a constant c. 18/21 Course ESS011 (2014) Lecture 6: The random variable, its probability law and first two moments Variance example Example: For butterflies Var(Y ) = 2 (details later) The following formula is useful for computation: Var(X) = E(X 2 ) − [E(X)]2 Proof: Open the square inside definition. Example: For the heartbeats 1 E(X 2 ) = 4926.4 so Var1 (X) = 4926.4 − 702 = 26.4 For the heartbeats 2 Var(Y ) = 5630.32 − 702 = 730.32. Definitely not the same distribution. 19/21 Course ESS011 (2014) Lecture 6: The random variable, its probability law and first two moments Standard deviation Variance is a squared difference, so it does not share the units with the random variable. More common in practice is to talk about Standard deviation (sd) For a r.v. X with variance σ 2 the quantity √ p σ = Var(X) = σ 2 is called the standard deviation. Example: hearbeats 1: stan. dev. σ1 = 5.14 beats per minute hearbeats 2: sd. σ1 = 27.02 beats per minute Rule of thumb: roughly 95% of times the outcome falls within two σ from µ. 20/21 Course ESS011 (2014) Lecture 6: The random variable, its probability law and first two moments Summary Many things today: Random variable: a model for events Distribution: probabilities of random variables Density and cumulative distribution function: compact expressions for a distribution Expectation/mean: Theoretical average number Variance/standard deviation: scale of variation around mean 0.08 P(outcome is x) 0.04 21/21 25 24 23 22 21 20 19 18 17 16 9 outcome = x 15 8 7 14 13 6 12 11 10 5 4 3 2 1 0.00 The figure on the first slide: Density of a random variable with a Poisson distribution, mean 9 and sd 3. 0.12 Next time we look at some of the commonly used families of distributions.