Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Random Variables and Probability Distribution How can a sampling be used to determine critical information relating to the population when considering all subgroups of samples? Random Variables 7-1 What type of random variables exist? • Random variable is a numeric variable whose value depends on the outcome of a chance experiment • Discrete—means isolated points on a number line (often integer values) • Continuous—all values (real numbers) Examples: Determine if each of the following are descrete or continuous: The number of defective tires on a car Someone’s body temperature The number of pages in a book The lifetime of a lightbulb • Probability Distributions for Discrete Random Variables 7-2 • What is the probability distribution for discrete random variables? • Probability distribution—gives the probability associated with each value of x • Where x represents each possible result of the experiment • Example • What is the sample space for flipping a coin 4 times? The probability distribution is based on the number of either heads or tails in this case we will count the number of tails. Note the pattern used to help get all the possible outcomes: 2^4=16 1st column 8 H’s then 8 T’s 2nd column 4 H’s then 4 T’s then repeat Then groups of 2 TAIL COUNT Prob of event Then 1 H H H H 0 .5·.5·.5·.5=.0625 H H H T 1 H H T H 1 H H T T 2 H T H H 1 H T H T 2 H T T H 2 H T T T 3 T H H H 1 T H H T 2 T H T H 2 T H T T 3 T T H H 2 T T H T 3 T T T H 3 T T T T 4 # Of Tails # of Outcomes 0 1 1 4 2 6 3 4 4 1 # of tails # of occurrences * probability 0 1 .0625 1 4 .0625 2 6 .0625 3 4∙ .0625 4 1∙ .0625 = overall probability = .0625 = .25 = .375 =.25 =.0625 1 ∑ p(x) =1 for all distributions. What is the probability of getting exactly one tail? What is the probability of getting at most two tails? What is the probability of getting at least two tails? What is the probability of getting more than two tails? What is the probability of getting at least one head? Of all the airline flight requests received by a certain discount ticket broker, 70% are for domestic flights (D). The rest are for international flights (I). Determine the probability distribution of domestic flights among the next three flights. 1 2 3 Dom flights D D D 3 D D 2 I D I D 2 D I I 1 I D D 2 I D I 1 Pg 356 1, 5, 7 I I D 1 Pg 361 9, 11, 12, 15, 17, 19 I I I 0 x # Prob per occurence P(X) 0 1 .027 1 3 .189 2 3 .441 3 1 total .343 .343 1 Probability Distributions for Continuous Random Variables 7.3 What is the probability distribution for continuous random variables? The Mean Value and Standard Deviation of a Random Variable 7.4 What is the mean and the standard deviation of a random variable? Probability for a continuous random variable (called the density Function) denoted by a smooth curve called the density curve. If f(x) is the function and f(x)≥0 therefore the curve does not go below the x-axis and the total area under the density curve is equal to 1. Uniform distribution when the density is constant over an interval (a “flat” density curve) Area = b * h Probability for a continuous random variable is often calculated using cumulative area P(a< x <b) = (cumulative area left of b) – (cumulative area left of a) Example: A particular professor never dismisses class early. Let x denote the amount of time past the hour (minutes) that elapses before the professor dismisses class. Suppose that x has a uniform distribution on the interval from 0 to 10 as in the density “curve” below. 0 10 A) What is the probability that at most 5 min. elapsed? B) What is the probability that between 3 and 5 min elapsed before dismissal? C) What is the probability that between 3 and 5 min inclusive elapsed before dismissal? 0 1 2 3 4 5 6 7 Shade the probability of: 1) that between 2 and 6 min elapsed before dismissal? 2) that between 3 and 5 min inclusive elapsed before dismissal? Pg 366-67 20, 22, 23, 24, 26 Mean value of a random Variable—describes where the probability distribution is centered (sometimes called the expected value) µx = xp( x) µx= E(x) x Examples: Individuals applying for a certain license are allowed up to four attempts to pass the licensing exam. Let x denote the number of attempts made by a randomly selected applicant. The probability distribution is as follows. X 1 2 3 4 P(x) .1 .2 .3 .4 Calculate the expected value (average number of times it takes) E(x) = µ = 1·.1 + 2·.2 + 3·.3 + 4· .4 = 3 What does this mean? Standard deviation of a random variable describes the variability in a probability distribution. ( x ) 2 p ( x) x x The more values you have the closer that x becomes to µx and σ to σx example: Use the data and the answer from the last example to calculate the standard deviation. x (1 3) 2 .1 ( 2 3) 2 .2 (3 3) 2 .3 ( 4 3) 2 .4 1 x 1 2 3 4 P(x) .1 .2 .3 .4 Since µ =3 and x =1 What is the probability that x exceeds the mean value? Pg 378-379 27, 29, 30, 33, 34, 35, 37 The Binomial and Geometric distributions 7.5 What is the difference between a binomial and a geometric distribution? Binomial Distribution— •Has a predetermined number of dichotomus (2 possible results) independent outcomes that have the same probability of success for each trial (that is, you “have replacement” or a large enough basis) •The label of success is arbitrary and based on the question •. . .in counting the number of female births n= the number of trials P(s) = a female is born x= the number of successes n! p ( x) p x (1 p) n x x!(n x)! You may have seen this as p = the prob of success on any one trial C x p (1 p ) x or n x n x p (1 p ) x n n x Ex. 60% of watches sold by a large discount store have digital faces the remaining 40% are analog. Twelve watches are sold today. A) what is the probability that exactly 4 are digital? N =12 x = 4 p= .6 P(4) = 12C4 .64 .48 B) what is the probability that between 4 and 7 inclusive were digital? Calculator version: 2nd Vars (dist) binomcdf (returns Prob ≤x) binompdf (returns Prob for x) Trials: (how many trials) p = (prob success) x value: (# of successful outcomes) Enter enter NOTE: there is a table on p 776 that assists by giving pre-calculated values based on the value of n and π •When sampling without replacement, the events are no longer independent n •However, if N ≤ .05 where n = number sampled and N = number in the population Then the differences are so small that they can be and are often ignored since the calculations would be difficult to make µx = np and σ= np(1 p ) Example: It has been reported that one-third of all credit card users pay their bills in full each month. This figure is, of course, an average across different cards and issuers. Suppose that 30% of all individuals holding Visa cards issued by a certain bank pay in full each month. A random sample of n= 25 cardholders is to be selected. The bank is interested in the variable x = number in the sample that pay each month. Even though the sampling in done without replacement, the sample size of 25 is most likely very small compared to the total number of credit cardholders so we can approximate the probability distribution of x using a binomial distribution of n = 25 and p= .3 We have defined paid in full as a success. Find the mean and standard deviation. Geometric distributions do not have a specific number of trials instead, they terminate when we have a successful outcome. Calculator version: 2nd Vars (dist) x = number of trials to achieve a success p = probability of success geometcdf (returns Prob ≤x) geometpdf (returns Prob for x) P(x) = (1 – p)x-1p p = (prob success) x value: (# of trials until a success) Enter enter Example: You will ask someone if they have jumper cables until someone says yes. If they have them they will let you use them. Assume that p = .4 since 40% of those driving have jumper cables in the car. Let x = the number of students who must be asked before you find someone who has jumper cables. What is the probability that it will be at most 3. Homework pg 390-392 44, 48, 49, 52, 54, 56, 58, 60, 61 The Normal Distribution 7.6 What is the normal distribution? Characteristics: The smaller the σ the taller and narrower the distribution The points of concavity lie ±1σ from the mean That is µ±1σ = pt of inflection A standard normal curve has a µ= 0 and σ=1 this is normally called the z-curve (relates to z scores) The words largest or smallest x% refers to amounts to one side of the curve The most extreme x% refers to the x/2% at each end. Any data can be converted to a normal curve using the z- score formula P(a < x < b) = P(a* < Z < b*) a and b are the given values a* and b* are the converted values (called standardizing) Z-score formula z x Z values are inside the back cover or on your formula sheet Please note that the value given in the chart is the percentage of the data that falls BELOW the z value Give that µ=10 and σ=2 1. What % of the data would fall below 8? 2. What % of the data would fall above 11? 3. What % of the data would fall between 8 and 11? 4. What value separates the bottom 86%? 1. What % falls to the left of a z score of 2.18 2. Determine the z* value that separates the largest 4% of the population Means from .96 to 1 3. Determine the z* values that separate the most extreme 4% of the population Means from 0 to .02 and from .98 to 1 Homework Pg 406-409 64, 66, 68, 70, 72, 74, 75 Checking for Normality and Normalizing Transformations 7-7 How do you check for and normalize transformations? Used when you have univariate data that you want to normalize. 1) The normal probability plot is (z-score, observed score) 2) You know the data is normalized if -after placing observed data in L2, your run 1-var stat on L2 and then calculating the z-score in L1=(L2- x )/σ -so you can plot L1 vs L2 and if you get a linear pattern -turn on diagnostics and calculate the regression line -r turns out to be greater than or equal to critical r from the table below based on “n” (the number of items in the data set) n values 5 10 15 Critical r values .832 .880 .911 20 25 30 40 50 60 75 .929 .941 .949 .960 .966 .971 .976 r< critical r causes plausible doubt on normality 3) Many tests do not go to the trouble of the above, they simply plot a histogram and determine if it appears approximately normal if so, no transformation is needed . If not ,transform L2 by taking the square root, cube root or the log etc. and storing in L3 -plot the histogram of L3 to determine if it is approximately normal using a window setting of .25 to 3.5 with x-scale .25 -if it is normal, do one variable stat on L3 get µx and σx -let L4 =(L3 - x )/σ -plot (L4, L2) -get the linreg with r and compare Together pg 374 #80 The following observations are DDT concentrations in the blood of 20 people. 24 26 30 35 35 38 39 40 40 41 42 52 56 58 61 75 79 88 102 42 Assuming no transformation is needed. Calculate the normal scores, find your r value and check it against critical r. Homework pg 415-418 84, show the work! REVIEW Chapter 7 review What formulas are needed for chapter 7? x xp( x) x x 2 ( x ) p ( x) x y a bx y a b x ( y ) 2 b 2 x y b x 2 Discrete random variablesset up your chart and calculate p(X) Continuous random variablesmany convert using z scores to determine amounts in a given range Binomial distribution (only 2 outcomes –set number of trials) x n x C p ( 1 p ) n x µx = nπ and σ= np(1 p ) Geometric distributions (experiment goes on until there is a success –outcomes success or failure) p(x) =(1-p)x-1 p Hypergeometric distribution(no replacement) if Z-scores n .05 N z ie <5% sampled, µx, σx are the same as above x normal plot (z-score, obs value) Review Problems Pg 426-429 106, 107, 108, 110, 114, 119, 121, 124 SHOW ALL WORK –just answers is not acceptable TEST 15 mult. Choice 4 computational Total about 60 points