Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
IV. Random Variables PBAF 527 Winter 2005 1 Learning Objectives 1. 2. Distinguish Between the Two Types of Random Variables Discrete Random Variables 1. 2. 3. Continuous Random Variables 1. 2. 3. 4. 2 Describe Discrete Random Variables Compute the Expected Value & Variance of Discrete Random Variables Describe Normal Random Variables Introduce the Normal Distribution Calculate Probabilities for Continuous Random Variables Assessing Normality Random Variables • 3 A variable defined by the probabilities of each possible value in the population. Data Types Data Numerical Discrete 4 Continuous Qualitative Types of Random Variables Discrete Random Variable Whole Number (0, 1, 2, 3 etc.) Countable, Finite Number of Values Jump from one value to the next and cannot take any values in between. Continuous Random Variables Whole or Fractional Number Obtained by Measuring Infinite Number of Values in Interval 5 Too Many to List Like Discrete Variable Discrete Random Variable Examples Experiment Children of One Gender in Family Open Check in Lines 6 Random Variable # Girls # Open Possible Values 0, 1, 2, ..., 10? 0, 1, 2, ..., 8 Answer 33 Questions # Correct 0, 1, 2, ..., 33 Count Cars at Toll # Cars Between 11:00 & 1:00 Arriving 0, 1, 2, ..., Discrete Probability Distribution 1. List of All possible [x, p(x)] pairs x = Value of Random Variable (Outcome) p(x) = Probability Associated with Value 2. Mutually Exclusive (No Overlap) 3. Collectively Exhaustive (Nothing Left Out) 4. 0 p(x) 1 5. p(x) = 1 7 Marilyn says: It may sound strange, but more families of 4 children have 3 of one gender and one of the other than any other combination. Explain this. Construct a sample space and look at the total number of ways each event can occur out of the total number of combinations that can occur, and calculate frequencies. Sample Space • Are all 16 combinations equally likely? Is the sex of each child independent of the other three? P (girl) = 1/2 P (boy) = 1/2 so, P (BBBB) = ½ x ½ x ½ x ½ = 1/16 • If you have a family of four, what is the probability of… P(all girls or all boys) = P (2 boys, 2 girls)= 6/16 = 3/8 six different ways to have 2 boys and 2 girls P(3 boys, 1 girl or 3 girls, 2 boy)= 2/16 = 1/8 8 8/16=4/8=1/2 8 ways to have 3 of 1 and 2 of the other. BBBB GBBB BGBB BBGB BBBG GGBB GBGB GBBG BGGB BGBG BBGG BGGG GBGG GGBG GGGB GGGG Assume the random variable X represents the number of girls in a family of 4 kids. (lower case x is a particular value of X, ie: x=3 girls in the family) 9 Sample Space Random Variable X BBBB x=0 GBBB x=1 BGBB Number of Girls, x Probability, P(x) x=1 0 1/16 BBGB x=1 1 4/16 BBBG x=1 6/16 GGBB x=2 2 GBGB x=2 3 4/16 GBBG x=2 4 1/16 BGGB x=2 Total 16/16=1.00 BGBG x=2 BBGG x=2 BGGG x=3 GBGG x=3 GGBG x=3 GGGB x=3 GGGG x=4 What is the probability of exactly 3 girls in 4 kids? P(X=3) = 4/16 What is the probability of at least 3 girls in 4 kids? P(X≥3) = 5/16 Visualizing Discrete Probability Distributions Listing Table {(0,1/16), (1,.25), (2,3/8),(3,.25),(4,1/16) } Graph Probability, P(x) 6/16 0.40 Number of Girls, x Probability, P(x) 0 1/16 1 4/16 2 6/16 3 4/16 4 1/16 Total 16/16=1.00 0.35 0.30 4/16 4/16 P(x) 0.25 0.20 0.15 0.10 1/16 1/16 0.05 0.00 0 10 1 2 Number of Girls, x 3 4 X is random and x is fixed. We can calculate the probability that different values of X will occur and make a probability distribution. Probability Distributions Probability, P(x) 6/16 0.40 0.35 0.30 4/16 4/16 P(x) 0.25 0.20 0.15 0.10 1/16 1/16 0.05 0.00 0 1 2 3 4 Number of Girls, x 11 Probability distributions can be written as probability histograms. Cumulative probabilities: Adding up probabilities of a range of values. Washington State Population Survey and Random Variables A telephone survey of number of telephones,x households throughout 0 1 Washington State. 2 But some households don’t have 3 Probability Histogram of Telephone Coverage in phones. Washington 4 0.71 0.70553 0.21769 0.02966 0.00775 0.00332 6 0.00088 0.50 7 0.00002 0.40 8 0.00000 9 0.00015 Total 1.00000 0.60 P(x) 0.03500 5 0.70 0.30 0.22 0.20 0.10 0.04 0.03 0.01 0.00 0.00 0 12 P(x) 1 2 3 4 5 6 7 Number of Telephone Lines (x) 8 9 Probabilities about Telephone in Washington State • • • • • • 13 What is the probability that a household will have no telephone? What is the probability that a household will have 2 or more telephone lines? What is the probability that a household will have 2 to 4 phone lines? What is the probability a household will have no phone lines or more than 4 phone lines? Who do you think is in that 3.5% of the population? What are the implications of this for the quality of the survey? Probability Histogram of Probability Histogram of Telephone Coverage in 1998 Telephone Lines, Washington 0.71 0.70 0.60 P(x) 0.50 0.40 0.30 0.22 0.20 0.10 0.04 0.03 0.01 0.00 0.00 0 1 2 3 4 5 6 7 Number of Telephone Lines (x) 14 8 9 Summary Measures 1. Expected Value mu Mean of Probability Distribution Weighted Average of All Possible Values = E(X) = x p(x) 2. Variance Sigma -squared Weighted Average Squared Deviation about Mean 2 = V(X)= E[ (x (x p(x) 2 = V(X)=E(X [E(X 3. Standard Deviation 15 2 = SD(X) What is the average number of telephones in Washington Households and how much does size vary from the average? Approach 1: Variance # of Phones x P(x) xP(x) (x-) (x- )2 (x-)2P(x) x2 x2P(x) 0 198,286 0.04 0.00 -1.3 1.65 0.06 0 0.00 1 4,142,030 0.71 0.71 -0.3 0.08 0.06 1 0.71 2 1,278,026 0.22 0.44 0.7 0.51 0.11 4 0.87 3 174,110 0.03 0.09 1.7 2.94 0.09 9 0.27 4 45,499 0.01 0.03 2.7 7.38 0.06 16 0.12 5 19,473 0.00 0.02 3.7 13.81 0.05 25 0.08 6 5,170 0.00 0.01 4.7 22.24 0.02 36 0.03 7 118 0.00 0.00 5.7 32.67 0.00 49 0.00 8 - 0.00 0.00 6.7 45.10 0.00 64 0.00 9 897 0.00 0.00 7.7 59.53 0.01 81 0.01 5,863,609 1.00 =1.28 32.16 Sum 16 Frequency Approach 2: Variance 2=0.45 2.10 Cherbyshev’s Rule and Empirical Rule for a Discrete Random Variable Let x be a discrete random variable with a probability distribution p(x), mean , and standard deviation . Then, depending on the shape of p(x), the following probability statements can be made: Chebyshev’s Rule Applies to any probability distribution (eg: telephones in Washington State) Empirical Rule Applies to probability distributions that are mound-shaped and symmetric (eg: girls born of 4 children) 0 .68 P( - 2 < x < + 2) 3/4 .95 P( - 3 < x < + 3) 8/9 1.00 P( - < x < + ) 17 Data Types Data Numerical Discrete 18 Continuous Qualitative Continuous Random Variable • 19 A variable with many possible values at all intervals Continuous Random Variable Examples Experiment 20 Random Variable Possible Values Weigh 100 People Weight 45.1, 78, ... Measure Part Life Hours 900, 875.9, ... Ask Food Spending Spending 54.12, 42, ... Measure Time Between Arrivals Inter-Arrival 0, 1.3, 2.78, ... Time Continuous Probability Density Function Frequency 1. Mathematical Formula 2. Shows All Values, x, & Frequencies, f(x) f(X) Is Not Probability (Value, Frequency) f(x) 3. Properties 21 Area under curve sums to 1 Can add up areas of function to get probability less than a specific value a b Value x Continuous Random Variable Probability Probability Is Area Under Curve! P (c x d) f(x) c 22 © 1984-1994 T/Maker Co. d X Continuous Probability Distribution Models Continuous Probability Distribution Uniform 23 Normal Exponential Importance of Normal Distribution 1. Describes Many Random Processes or Continuous Phenomena 2. Can Be Used to Approximate Discrete Probability Distributions Example: Binomial 3. Basis for Classical Statistical Inference 24 Normal Distribution 1. ‘Bell-Shaped’ & Symmetrical f(X) 2. Mean, Median, Mode Are Equal X 3. ‘Middle Spread’ Is 1.33 4. Random Variable Has Infinite Range 25 Mean Median Mode Normal Distribution Useful Properties • About half of “weight” below mean (because symmetrical) • About 68% of probability within 1 standard deviation of mean (at change in curve) • About 95% of probability within 2 standard deviations • More than 99% of probability within 3 standard deviations 26 f(X) 3 2 Mean Median Mode 2 3 X Probability Density Function 1 f ( x) e 2 x e 27 1 x 2 2 = Value of Random Variable (- < x < ) = Population Standard Deviation = 3.14159 = 2.71828 = Mean of Random Variable x Don’t memorize this! Notation X is N(μ,σ) The random variable X has a normal distribution (N) with mean μ and standard deviation σ. X is N(40,1) X is N(10,5) X is N(50,3) 28 Effect of Varying Parameters ( & ) f(X) B A C X 29 Normal Distribution Probability Probability is area under curve! d ? P(c x d ) f ( x) dx c f(x) c 30 d x ? Infinite Number of Tables Normal distributions differ by mean & standard deviation. Each distribution would require its own table. f(X) X That’s an infinite number! 31 Standardize the Normal Distribution X Z Normal Distribution Z is N(0,1) Standardized Normal Distribution = 1 32 X =0 One table! Z Standardizing Example X 6.2 5 Z .12 10 Normal Distribution = 10 = 5 6.2 X 33 Standardized Normal Distribution =1 = 0 .12 Z Obtaining the Probability Standardized Normal Probability Table (Portion) Z .00 .01 =1 .02 0.0 .0000 .0040 .0080 .0478 0.1 .0398 .0438 .0478 0.2 .0793 .0832 .0871 = 0 .12 0.3 .1179 .1217 .1255 34 Probabilities Z Shaded area exaggerated Example P(3.8 X 5) X 3.8 5 Z .12 10 Normal Distribution Standardized Normal Distribution = 10 =1 .0478 3.8 = 5 35 X -.12 = 0 Shaded area exaggerated Z Example P(2.9 X 7.1) Normal Distribution X 2.9 5 Z .21 10 X 7.1 5 Z .21 Standardized 10 Normal Distribution = 10 =1 .1664 .0832 .0832 2.9 5 7.1 X 36 -.21 0 .21 Shaded area exaggerated Z Example P(X 8) X 85 Z .30 10 Normal Distribution Standardized Normal Distribution = 10 =1 .5000 .3821 .1179 =5 37 8 X =0 Shaded area exaggerated .30 Z Example P(7.1 X 8) Normal Distribution X 7.1 5 Z .21 10 X 85 Z .30 10 Standardized Normal Distribution = 10 =1 .1179 .0347 .0832 = 5 7.1 8 X 38 = 0 .21 .30 Z Shaded area exaggerated Travel Time and the Normal Distribution To help people plan their travel, WSDOT estimates that average trip from Seattle to Bellevue at 5:40 pm (at peak) takes 11 minutes and with a standard deviation of 10. They also believe this travel time approximates a normal distribution. What proportion of trips take less than 27 minutes? 39 Process 1. Draw a picture and write down the probability you need. 2. Convert probability to standard scores. 3. Find cumulative probability in the table. 40 More Travel Time Suppose we have only 10-15 minutes to travel to Seattle from Bellevue. What proportion of trips will make it in that time? 10 11 15 11 P10 X 15 P Z P 10 10 P 0.1 Z .4 1 PZ 0.1 P(Z .4) Since normal curves are symmetrical: 41 1 PZ .1 P( Z .4) 1 (.5 .0398) (.5 .1554) 1 (.4602) (.3446) .1952 19.5% of trips will make it in between 10 and 15 minutes. Finding Z Values for Known Probabilities Standardized Normal Probability Table (Portion) What is Z given P(Z) = .1217? .1217 =1 Z .00 .01 0.2 0.0 .0000 .0040 .0080 0.1 .0398 .0438 .0478 = 0 .31 Shaded area exaggerated 42 Z 0.2 .0793 .0832 .0871 0.3 .1179 .1217 .1255 Finding X Values for Known Probabilities Normal Distribution Standardized Normal Distribution = 10 =1 .1217 = 5 ? X .1217 = 0 .31 X Z 5 .3110 8.1 43 Shaded areas exaggerated Z Travel Times Take 3 How much time will the trip take 99% of the time? 44 Finding Z Values for Known Probabilities 1. Write down probability statement and draw a picture P(Z<____)=.99 2. Look up Z value in table 2.325 P(Z<_____)=.99 3. Convert Z value (SD units) to variable (X) by using mean and SD. 34.25 2.325 X=μ+Zσ so X=11+(_____)(10)= So, the trip can be made 99% of the time in 34.25 minutes. 45 Assessing Normality 46 1. A histogram of the data is mound shaped and symmetrical about the mean. 2. Determine the percentage of measurements falling in each of the intervals x s, x 2s, and x 3s. If the data are approximately normal, the percentages will be approximately equal to 68%, 95%, and 100% respectively. 3. Find the interquartile range, IQR, and standard deviation, s, for the sample, then calculate the ratio IQR/s. If the data are approximately normal, then IQR/S 1.3. 4. Construct a normal probability plot for the data. If the data are approximately normal, the points will fall (approximately) on a straight line. Assessing Normality: Is Class Height Normally Distributed? How does the histogram look? SPSS can produce the line of the normal curve for you. In SPSS select GRAPH, HISTOGRAM. After you choose the variable you want, click on the box “Display Normal Curve” and you’ll get something that looks like this. 6 5 Frequency 1. 7 4 3 2 1 Mean = 66.52 Std. Dev. = 3.117 N = 23 0 60 62 64 66 68 Height 527 2005 47 70 72 Assessing Normality: Is Class Height Normally Distributed? Anticipated Actual Percent Percent 2. Compute the intervals: x±s Height 527 2005 Valid 48 60 62 63 64 65 66 67 68 69 70 71 72 Total Frequency 1 1 3 2 1 3 2 2 5 1 1 1 23 Percent Valid Percent 4.3 4.3 4.3 4.3 13.0 13.0 8.7 8.7 4.3 4.3 13.0 13.0 8.7 8.7 8.7 8.7 21.7 21.7 4.3 4.3 4.3 4.3 4.3 4.3 100.0 100.0 Cumulative Percent 4.3 8.7 21.7 30.4 34.8 47.8 56.5 65.2 87.0 91.3 95.7 100.0 [63.40,69.64] 68% 43% x±2s [60.29,72.75] 95% 96% x±3s [57.17,75.87] 100% 100% SPSS: ANALYZE, DESCRIPTIVE STATISTICS, FREQUENCIES Assessing Normality: Is Class Height Normally Distributed? Statistics 3. Does IQR/s≈1.3? IQR=69-64=5 IQR/s=5/3.117=1.6 Height 527 2005 N Valid Mis sing Std. Deviation Percentiles 25 50 75 23 0 3.117 64.00 67.00 69.00 SPSS: ANALYZE, DESCRIPTIVE STATISTICS, FREQUENCIES then click on STATISTICS and choose the ones you want. 49 Assessing Normality: Is Class Height Normally Distributed? Normal Q-Q Plot of Height 527 2005 4. What does the normal probability plot look like? 74 Expected Normal Value 72 SPSS: Graphs>Q-Q Test distribution is normal and click estimate distribution parameters from data. 70 68 66 64 62 60 60 62 64 66 68 Observed Value 50 70 72 74 Learning Objectives 1. 2. Distinguish Between the Two Types of Random Variables Discrete Random Variables 1. 2. 3. Continuous Random Variables 1. 2. 3. 4. 51 Describe Discrete Random Variables Compute the Expected Value & Variance of Discrete Random Variables Describe Normal Random Variables Introduce the Normal Distribution Calculate Probabilities for Continuous Random Variables Assessing Normality