Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Student Lecture Notes IV. Random Variables PBAF 527 Winter 2005 1 Learning Objectives 1. 2. Distinguish Between the Two Types of Random Variables Discrete Random Variables 1. 2. 3. Continuous Random Variables 1. 2. 3. 4. Describe Discrete Random Variables Compute the Expected Value & Variance of Discrete Random Variables Describe Normal Random Variables Introduce the Normal Distribution Calculate Probabilities for Continuous Random Variables Assessing Normality 2 Random Variables • 3 A variable defined by the probabilities of each possible value in the population. 1 Student Lecture Notes Data Types Data Data Numerical Numerical Discrete Discrete Qualitative Qualitative Continuous Continuous 4 Types of Random Variables Discrete Random Variable n n Whole Number (0, 1, 2, 3 etc.) Countable, Finite Number of Values l Jump from one value to the next and cannot take any values in between. Continuous Random Variables n n n Whole or Fractional Number Obtained by Measuring Infinite Number of Values in Interval l Too Many to List Like Discrete Variable 5 Discrete Random Variable Examples Experiment Children of One Gender in Family Open Check in Lines 6 Random Variable # Girls # Open Possible Values 0, 1, 2, ..., 10? 0, 1, 2, ..., 8 Answer 33 Questions # Correct 0, 1, 2, ..., 33 Count Cars at Toll # Cars Between 11:00 & 1:00 Arriving 0, 1, 2, ..., ∞ 2 Student Lecture Notes Discrete Probability Distribution 1. List of All possible [x, p(x)] pairs n n x = Value of Random Variable (Outcome) p(x) = Probability Associated with Value 2. Mutually Exclusive (No Overlap) 3. Collectively Exhaustive (Nothing Left Out) 4. 0 ≤ p(x) ≤ 1 5. Σ p(x) = 1 7 Marilyn says: It may sound strange, but more families of 4 children have 3 of one gender and one of the other than any other combination. Explain this. Construct a sample space and look at the total number of ways each event can occur out of the total number of combinations that can occur, and calculate frequencies. Sample Space BBBB GBBB BGBB BBGB • Are all 16 combinations equally likely? Is the sex of each child independent of the other three? BBBG GGBB P (girl) = 1/2 P (boy) = 1/2 so, P (BBBB) = ½ x ½ x ½ x ½ = 1/16 GBGB GBBG BGGB BGBG • If you have a family of four, what is the probability of… 8 BBGG P(all girls or all boys) = 2/16 = 1/8 P (2 boys, 2 girls)= 6/16 = 3/8 six different ways to have 2 boys and 2 girls P(3 boys, 1 girl or 3 girls, 2 boy)= 8/16=4/8=1/2 8 ways to have 3 of 1 and 2 of the other. BGGG GBGG GGBG GGGB GGGG Assume the random variable X represents the number of girls in a family of 4 kids. (lower case x is a particular value of X, ie: x=3 girls in the family) 9 Sample Space Random Variable X BBBB x=0 GBBB x=1 BGBB BBGB Number of Girls, x Probability, P(x) x=1 0 1/16 x=1 1 4/16 2 6/16 4/16 BBBG x=1 GGBB x=2 GBGB x=2 3 GBBG x=2 4 1/16 BGGB x=2 Total 16/16=1.00 BGBG x=2 BBGG x=2 BGGG x=3 GBGG x=3 GGBG x=3 GGGB x=3 GGGG x=4 What is the probability of exactly 3 girls in 4 kids? P(X=3) = 4/16 What is the probability of at least 3 girls in 4 kids? P(X=3) = 5/16 3 Student Lecture Notes Visualizing Discrete Probability Distributions Listing Table Number of Girls, x Probability, P(x) 0 1/16 1 4/16 2 6/16 3 4/16 {(0,1/16), (1,.25), (2,3/8),(3,.25),(4,1/16) } Graph Probability, P(x) 6/16 0.40 4 1/16 Total 16/16=1.00 0.35 P(x) 0.30 4/16 4/16 0.25 X is random and x is fixed. We can calculate the probability that different values of X will occur and make a probability distribution. 0.20 0.15 0.10 1/16 1/16 0.05 0.00 0 1 2 10 3 4 Number of Girls, x Probability Distributions Probability, P(x) 6/16 0.40 0.35 0.30 4/16 4/16 P(x) 0.25 0.20 0.15 0.10 1/16 1/16 0.05 0.00 0 1 2 3 4 Number of Girls, x 11 Probability distributions can be written as probability histograms. Cumulative probabilities: Adding up probabilities of a range of values. Washington State Population Survey and Random Variables A telephone survey of number of telephones,x households throughout 0 1 Washington State. 2 But some households don’t have 3 phones. 4 0.71 0.21769 0.02966 0.00775 0.00332 6 0.00088 0.50 7 0.00002 0.40 8 0.00000 9 0.00015 Total 1.00000 0.60 P(x) 0.70553 5 0.70 0.30 0.22 0.20 0.10 0.04 0.03 0.01 0.00 0.00 0 12 P(x) 0.03500 1 2 3 4 5 6 7 Number of Telephone Lines (x) 8 9 4 Student Lecture Notes Probabilities about Telephone in Washington State • • • • • • 13 What is the probability that a household will have no telephone? What is the probability that a household will have 2 or more telephone lines? What is the probability that a household will have 2 to 4 phone lines? What is the probability a household will have no phone lines or more than 4 phone lines? Who do you think is in that 3.5% of the population? What are the implications of this for the quality of the survey? Probability Histogram of Telephone Lines, 1998 0.71 0.70 0.60 P(x) 0.50 0.40 0.30 0.22 0.20 0.10 0.04 0.03 0.01 0.00 0.00 0 1 2 3 4 5 6 7 8 9 Number of Telephone Lines (x) 14 Summary Measures 1. Expected Value n n n mu Mean of Probability Distribution Weighted Average of All Possible Values µ = E(X) = Σx p(x) 2. Variance n Sigma -squared n n Weighted Average Squared Deviation about Mean σ2 = V(X)= E[ (x − µ)2 ] = Σ (x − µ)2 p(x) σ2 = V(X)=E(X2) −[E(X)]2 3. Standard Deviation 15 n σ =√σ2 = SD(X) 5 Student Lecture Notes What is the average number of telephones in Washington Households and how much does size vary from the average? # of Approach 1: Variance Approach 2: Variance Phones x Frequency P(x) xP(x) (x-µ) (x- µ)2 (x-µ)2P(x) x2 x2P(x) 0 198,286 0.04 0.00 -1.3 1.65 0.06 0 0.00 1 4,142,030 0.71 0.71 -0.3 0.08 0.06 1 0.71 2 1,278,026 0.22 0.44 0.7 0.51 0.11 4 0.87 3 174,110 0.03 0.09 1.7 2.94 0.09 9 0.27 4 45,499 0.01 0.03 2.7 7.38 0.06 16 0.12 5 19,473 0.00 0.02 3.7 13.81 0.05 25 0.08 6 5,170 0.00 0.01 4.7 22.24 0.02 36 0.03 7 118 0.00 0.00 5.7 32.67 0.00 49 0.00 8 - 0.00 0.00 6.7 45.10 0.00 64 0.00 9 897 0.00 0.00 7.7 59.53 0.01 81 0.01 Sum 5,863,609 1.00 µ=1.28 32.16 σ2=0.45 2.10 16 Cherbyshev’s Rule and Empirical Rule for a Discrete Random Variable Let x be a discrete random variable with a probability distribution p(x), mean µ, and standard deviation σ. Then, depending on the shape of p(x), the following probability statements can be made: Chebyshev’s Rule Applies to any probability distribution (eg: telephones in Washington State) Empirical Rule Applies to probability distributions that are mound-shaped and symmetric (eg: girls born of 4 children) ≥0 ≈.68 P(µ - σ < x < µ + σ) P(µ - 2σ < x < µ + 2σ) ≥3/4 ≈.95 P(µ - 3σ < x < µ + 3σ) ≥8/9 ≈1.00 17 Data Types Data Data Numerical Numerical Discrete Discrete 18 Continuous Continuous Qualitative Qualitative 6 Student Lecture Notes Continuous Random Variable • A variable with many possible values at all intervals 19 Continuous Random Variable Examples Experiment Random Variable Possible Values Weigh 100 People Weight 45.1, 78, ... Measure Part Life Hours 900, 875.9, ... Ask Food Spending Spending 54.12, 42, ... Measure Time Between Arrivals Inter-Arrival 0, 1.3, 2.78, ... Time 20 Continuous Probability Density Function Frequency 1. Mathematical Formula 2. Shows All Values, x, & Frequencies, f(x) n f(X) Is Not Probability (Value, Frequency) f(x) 3. Properties 21 n Area under curve sums to 1 n Can add up areas of function to get probability less than a specific value a b Value x 7 Student Lecture Notes Continuous Random Variable Probability P (c ≤ x ≤ d ) Probability Is Area Under Curve! f(x) c d © 1984-1994 T/Maker Co. 22 Continuous Probability Distribution Models Continuous Probability Distribution Uniform Normal Exponential 23 Importance of Normal Distribution 1. Describes Many Random Processes or Continuous Phenomena 2. Can Be Used to Approximate Discrete Probability Distributions n Example: Binomial 3. Basis for Classical Statistical Inference 24 X 8 Student Lecture Notes Normal Distribution 1. ‘Bell-Shaped’ & Symmetrical f(X) 2. Mean, Median, Mode Are Equal X 3. ‘Middle Spread’ Is 1.33 σ 4. Random Variable Has Infinite Range Mean Median Mode 25 Normal Distribution Useful Properties • About half of “weight” below mean (because symmetrical) • About 68% of probability within 1 standard deviation of mean (at change in curve) • About 95% of probability within 2 standard deviations • More than 99% of probability within 3 standard deviations f(X) µ +σ µ − 3σ µ − 2σ µ − σ µ + σ µ + 2σ µ + 3σ Mean Median Mode 26 Probability Density Function f (x) = x σ π e µ 27 1 σ 2π e 2 1 x − µ − 2 σ = Value of Random Variable (-∞ < x < ∞) = Population Standard Deviation = 3.14159 = 2.71828 = Mean of Random Variable x Don’t memorize this! X 9 Student Lecture Notes Notation X is N(µ,s ) The random variable X has a normal distribution (N) with mean µ and standard deviation s . X is N(40,1) X is N(10,5) X is N(50,3) 28 Effect of Varying Parameters (µ & σ) f(X) B A C X 29 Normal Distribution Probability Probability is area under curve! d c f(x) f(x) cc 30 ? P(c ≤ x ≤ d) = ∫? f (x) dx d d xx 10 Student Lecture Notes Infinite Number of Tables Normal distributions differ by mean & standard deviation. Each distribution would require its own table. f(X) X That’s an infinite number! 31 Standardize the Normal Distribution Z= Normal Distribution X −µ σ Z is N(0,1) Standardized Normal Distribution σ σ=1 µ X µ= 0 Z One table! 32 Standardizing Example Z= X − µ 6.2 − 5 = = .12 σ 10 Normal Distribution σ = 10 µ= 5 6.2 X 33 Standardized Normal Distribution σ=1 µ= 0 .12 Z 11 Student Lecture Notes Obtaining the Probability Standardized Normal Probability Table (Portion) Z .00 .01 σ=1 .02 0.0 .0000 .0040 .0080 .0478 0.1 .0398 .0438 .0478 0.2 .0793 .0832 .0871 µ= 0 .12 Z 0.3 .1179 .1217 .1255 Probabilities 34 Shaded area exaggerated Example P(3.8 ≤ X ≤ 5) Z= X − µ 3.8 − 5 = = − .12 σ 10 Normal Distribution Standardized Normal Distribution σ = 10 σ=1 .0478 3.8 µ = 5 35 X -.12 µ = 0 Z Shaded area exaggerated Example P(2.9 ≤ X ≤ 7.1) X − µ 2.9 − 5 = = −.21 σ 10 X − µ 7.1 − 5 Z= = = .21 Standardized σ 10 Normal Distribution Z= Normal Distribution σ = 10 σ=1 .1664 .0832 .0832 2.9 5 7.1 X 36 -.21 0 .21 Shaded area exaggerated Z 12 Student Lecture Notes Example P(X ≥ 8) Z= X −µ 8− 5 = = .30 σ 10 Normal Distribution Standardized Normal Distribution σ = 10 σ=1 .5000 .3821 .1179 µ=5 37 8 X µ=0 .30 Z Shaded area exaggerated Example P(7.1 ≤ X ≤ 8) X − µ 7.1 − 5 = = .21 σ 10 X − µ 8− 5 Z= = = .30 σ 10 Z= Normal Distribution Standardized Normal Distribution σ = 10 σ=1 .1179 .0347 .0832 µ=5 38 7.1 8 X µ=0 .21 .30 Z Shaded area exaggerated Travel Time and the Normal Distribution To help people plan their travel, WSDOT estimates that average trip from Seattle to Bellevue at 5:40 pm (at peak) takes 11 minutes and with a standard deviation of 10. They also believe this travel time approximates a normal distribution. What proportion of trips take less than 27 minutes? 39 13 Student Lecture Notes Process 1. Draw a picture and write down the probability you need. 2. Convert probability to standard scores. 3. Find cumulative probability in the table. 40 More Travel Time Suppose we have only 10-15 minutes to travel to Seattle from Bellevue. What proportion of trips will make it in that time? 10 − 11 15 − 11 P (10 < X < 15) = P < Z < P 10 10 = P(− 0.1 < Z < .4) = 1 − P (Z < −0.1) − P ( Z > .4) Since normal curves are symmetrical: 41 = 1 − P (Z > .1) − P ( Z > .4) = 1 − (.5 − .0398) − (. 5 − .1554 ) = 1 − (.4602) − (.3446) = .1952 19.5% of trips will make it in between 10 and 15 minutes. Finding Z Values for Known Probabilities What is Z given P(Z) = .1217? .1217 σ=1 Standardized Normal Probability Table (Portion) Z .00 .01 0.2 0.0 .0000 .0040 .0080 0.1 .0398 .0438 .0478 µ = 0 .31 Z Shaded area exaggerated 42 0.2 .0793 .0832 .0871 0.3 .1179 .1217 .1255 14 Student Lecture Notes Finding X Values for Known Probabilities Normal Distribution Standardized Normal Distribution σ = 10 σ=1 .1217 µ=5 ? X .1217 µ = 0 .31 Z X = µ + Z ⋅ σ = 5 + (. 31 )(10 ) = 8 . 1 43 Shaded areas exaggerated Travel Times Take 3 How much time will the trip take 99% of the time? 44 Finding Z Values for Known Probabilities 1. Write down probability statement and draw a picture P(Z<____)=.99 2. Look up Z value in table 2.325 P(Z<_____)=.99 3. Convert Z value (SD units) to variable (X) by using mean and SD. 34.25 2.325 X=µ+Zs so X=11+(_____)(10)= So, the trip can be made 99% of the time in 34.25 minutes. 45 15 Student Lecture Notes Assessing Normality 1. A histogram of the data is mound shaped and symmetrical about the mean. 2. Determine the percentage of measurements falling in each of the intervals x± s, x± 2s, and x± 3s. If the data are approximately normal, the percentages will be approximately equal to 68%, 95%, and 100% respectively. 3. Find the interquartile range, IQR, and standard deviation, s, for the sample, then calculate the ratio IQR/s. If the data are approximately normal, then IQR/S ≈ 1.3. 4. Construct a normal probability plot for the data. If the data are approximately normal, the points will fall (approximately) on a straight line. 46 Assessing Normality: Is Class Height Normally Distributed? 1. How does the histogram look? SPSS can produce the line of the normal curve for you. In SPSS select GRAPH, HISTOGRAM. After you choose the variable you want, click on the box “Display Normal Curve” and you’ll get something that looks like this. 7 6 5 y c n4 e u q e r F3 2 1 Mean = 66.52 Std. Dev. = 3.117 N = 23 0 60 62 64 66 68 70 72 Height 527 2005 47 Assessing Normality: Is Class Height Normally Distributed? Anticipated Actual Percent Percent 2. Compute the intervals: Height 527 2005 Valid 60 62 63 64 65 66 67 68 69 70 71 72 Total 48 Cumulative Frequency Percent Valid Percent Percent 1 4.3 4.3 4.3 1 4.3 4.3 8.7 3 13.0 13.0 21.7 2 8.7 8.7 30.4 1 4.3 4.3 34.8 3 13.0 13.0 47.8 2 8.7 8.7 56.5 2 8.7 8.7 65.2 5 21.7 21.7 87.0 1 4.3 4.3 91.3 1 4.3 4.3 95.7 1 4.3 4.3 100.0 23 100.0 100.0 x±s [63.40,69.64] 68% x±2s [60.29,72.75] 95% 96% x±3s [57.17,75.87] 100% 100% SPSS: ANALYZE, DESCRIPTIVE STATISTICS, FREQUENCIES 43% 16 Student Lecture Notes Assessing Normality: Is Class Height Normally Distributed? Statistics 3. Does IQR/s˜ 1.3? IQR=69-64=5 IQR/s=5/3.117=1.6 Height 527 2005 N Valid Missing Std. Deviation Percentiles 25 50 75 23 0 3.117 64.00 67.00 69.00 SPSS: ANALYZE, DESCRIPTIVE STATISTICS, FREQUENCIES then click on STATISTICS and choose the ones you want. 49 Assessing Normality: Is Class Height Normally Distributed? Normal Q-Q Plot of Height 527 2005 4. What does the normal probability plot look like? 74 72 SPSS: Graphs>Q-Q Test distribution is normal and click estimate distribution parameters from data. e70 lu a V l a68 rm o N d66 te c e p x E64 62 60 60 62 64 66 68 70 Observed Value 50 Learning Objectives 1. 2. Distinguish Between the Two Types of Random Variables Discrete Random Variables 1. 2. 3. Continuous Random Variables 1. 2. 3. 4. 51 Describe Discrete Random Variables Compute the Expected Value & Variance of Discrete Random Variables Describe Normal Random Variables Introduce the Normal Distribution Calculate Probabilities for Continuous Random Variables Assessing Normality 72 74 17