Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Ex St 801 Statistical Methods Probability and Distributions a graphic: Gather data sample population Make inferences parameters , , , , , etc . 2 statistics 2 ˆ, etc . , ˆ, ˆ , ˆ, ˆ or y , S2 S, p, b, etc. some definitions. Population: Set of all measurements of interest to the investigator. Parameter: Characteristic of the population; Greek letter, usually unknown. Sample: Subset of measurements selected (usually at random) from the population. Statistic: Characteristic of the sample; Latin letter or Greek letter with a hat ( ^ ). Random Variable: The characteristic we measure, usually called ‘y’. a graphic, again Probability Gather data sample population Make inferences Statistics More terms: Probability: Likelihood of occurrence; we know the population, and we predict the outcome or the sample. Statistics: We observe the sample and use the statistics to describe the unknown population. When we make an inference about the population, it is desirable that we give a measure of ‘confidence’ in our being correct (or incorrect). This is done giving a statement of the ‘probability’ of being correct. Hence, we need to discuss probability. So, Probability: more terms: event: is an experiment y: the random variable, is the outcome from one event sample space: is the list of all possible outcomes probability, often written P(Y=y), is the chance that Y will be a certain value ‘ y ’ and can be computed: number of successes p = -------------------------. number of trials 8 things to say about probability: 1. 0 p 1 ; probabilities are between 0 and 1. 2. pi = 1 ; the sum of all the probabilities of all the possible outcomes is 1. Event relations 3. Complement: If P(A=a) = p then A complement is is P( A not a) = 1-p (= q sometimes). Note: p + (1-p) = 1 (p+q=1). A complement is also called A (not the mean). 8 things continued 4. Mutually exclusive: two events that cannot happen together. If P(AB)=0, then A and B are M.E. 5. Conditional: Probability of A given that B has already happened. P(A|B) 6. Independent: Event A has no influence on the outcome of event B. If P(A|B) = P(A) or P(B|A) = P(B) then A and B are independent. 8 things continued again two laws about events: 7. Multiplicative Law: (Intersection ; AND) P(AB) = P(A B) = P(A and B) = P(A) * P(B|A) = P(B) * P(A|B) if A and B are independent then P(AB) = P(A) * P(B). 8. Additive Law: (Union; OR) P(A B) = P(A or B) = P(A) + P(B) - P(AB). The Venn diagram below can be used to explain the 8 ‘things’. A AB B An example: Consider the deck of 52 playing cards: sample space: (A, (A, (A, (A, 2, 3, 2, 3, 2, 3, 2, 3, … , J, Q, K) spades … , J, Q, K) diamonds … , J, Q, K) hearts … , J, Q, K) clubs Now, consider the following events: J= draw a J: P(J)= 4/52=1/13 F = draw a face card (J,Q,K): P(F)= 12/52=3/13 H = draw a heart: P(H)= 13/52 An example.2: Compute the following: 1. P(F complement) = (52/52 - 12/52) = 40/52 2. Are J and F Mutually Exclusive ? No: P(JF) = 4/52 is not 0. 3. Are J and F complement M.E. ? Yes: P(J and F ) = 0 4. Are J and H independent ? Yes: P(J) = 13/52 = 1/13 = P(J|H) An example.3: Compute the following: 5. Are J and F independent ? No: P(J) = 4/52 but P(J|F) = 4/12 6. P(J and H) = P(J) * P(H|J) = 4/52*1/4 = 1/52 7. P(J or H) = P(J) + P(H) - P(JH) = 4/52 + 13/42 - 1/52 = 16/52. Two sorts of random variables are of interest: DISCRETE: the number of outcomes is countable CONTINUOUS: the number of outcomes in infinite (not countable). Random variables are often described with probability distribution functions. These are graphs, tables or formula which allow for the computation of probabilities. A few common Discrete probability distributions are: Uniform: P(Y=y) = 1/number of outcomes (all outcomes are equally likely) Binomial P(Y=y) = nCy * py * (1-p)(n-y) where: nCy is the combination of n things taken y at the time n is the number of trials y is the number of successes p is the probability of succeeding in one trial in each trial, the only outcomes are success and failure (0,1). y e Poisson P(Y=y) = ; y! y=0,1,2,… (for example number of people waiting in line at a teller) = the population mean of Y. These are a few of many and are used when there are only a few possible outcomes: number of defects on a circuit board, number of tumors in a mouse, pregnant or not, dead or alive, number of accidents at an intersection and so on. PROBABILITY DEFINITIONS • An EXPERIMENT is a process by which an observation is obtained. • A SAMPLE SPACE (S) is the set of possible outcomes or results of an experiment. • A SIMPLE EVENT (Ei) is the smallest possible element of the sample space. PROBABILITY DEFINITIONS • A COMPOUND EVENT is a collection of two or more simple events. • Two events are INDEPENDENT if the occurrence of one event does not affect the occurrence of the other event. SAMPLE SPACE FOR THE COINS IN A JAR EXAMPLE (P, N1, N2) (P, N1, D) (P, N1, Q) (P, N2, D) (P, N2, Q) (P, D, Q) (N1, N2, D) (N1, N2, Q) (N1, D, Q) (N2, D, Q) INTERPRETATIONS OF PROBABILITY Classical: The number of equally likely wayes E can occur P(E) = The total number of equally likely events Relative Frequency: The number of times E occurred P(E) = The maximum number of times E could have occurred PROPERTIES OF PROBABILITIES 0 P E i 1 For all E i P E 1 For all simple events i in the sample. RANDOM VARIABLES • A RANDOM VARIABLE (r.v.) is a numerical valued function defined on a sample space • A DISCRETE RANDOM VARIABLE is a random variable that can assume a countable number of values. RANDOM VARIABLES (CONT.): • A CONTINUOUS RANDOM VARIABLE is a random variable that can assume an infinitely large number of values corresponding to the points on a line interval. A RANDOM VARIABLE FOR THE COINS IN A JAR EXAMPLE Let Y be the amount of money taken out of the jar. 11 16 31 16 31 (P, N1, N2) (P, N1, D) (P, N1, Q) (P, N2, D) (P, N2, Q) 40 40 36 20 35 (P, D, Q) (N1, N2, D) (N1, N2, Q) (N1, D, Q) (N2, D, Q) PROBABILITY DISTRIBUTIONS • A DISCRETE PROBABILITY DISTRIBUTION is a formula, table or graph that shows the probability associated with each value of the discrete random variable. • A CONTINUOUS PROBABILITY DISTRIBUTION is given by an equation f (y) (probability density function) that shows the density of probability as it varies with the continuous random variable. COINS IN A JAR EXAMPLE PROBABILITY DISTRIBUTION Y P(Y) 11 .1 16 .2 20 .1 31 .2 EXPECTED VALUE (MEAN) E(Y) = 27.6 VARIANCE VAR (Y) = 105.84 35 .1 36 .1 40 .2 EXPECTATION AND VARIANCE FOR A DISCRETE RANDOM VARIABLE E (Y ) y p( y) all y 2 Var ( y ) E[( y ) 2 ] all y ( y ) 2 p( y ) EXPECTATION FORMULA SUPPOSE E(Y) = µY AND E(X)= µX • E(aY) = a E(Y) = a µY • E(Y + X) = E(Y) +E(X) = µY + µX • E(aY + bX) = aE(Y) +bE(X) = aµY + b µX VARIANCE FORMULA Suppose Var(Y) = Y2 and Var(X) = X2 • Var(aY) = a2 Var(Y) • If Y and X are independent, then – Var(Y + X) = Var(Y) + Var(X) – Var(aY + bX) = a2 Var(Y) + b2 Var(X) THE NORMAL DISTRIBUTION Some properties 1. The area under the entire curve is always 1. 2. The distribution is symmetric about the mean 3. The mean and the median are equal. 4. Probabilities may be found by determining the appropriate area under the curve. SAMPLING DISTRIBUTIONS • The SAMPLING DISTRIBUTION of a statistic is the probability distribution for the values of the statistic that results when random samples of size n are repeatedly drawn from the population. • The STANDARD ERROR is the standard deviation of the sampling distribution of a statistic. DIAGRAM FOR OBTAINING A SAMPLING DISTRIBUTION POP. SAMPLE 1 Y1 SAMPLE 2 Y2 SAMPLE 499 Y499 SAMPLE 500 Y500 Y THE CENTRAL LIMIT THEOREM • If random samples of n observations are drawn from a population with a finite mean, , and a finite variance 2, then, when n is large (usually greater than 30), the SAMPLE MEAN, will be approximately normally distributed with mean and variance 2/n. • The approximation becomes more accurate as n becomes large. THE CENTRAL LIMIT THEOREM if then Y ~ ( , ) 2 2 for large n Y ~ N , n AN APPLICATION OF THE SAMPLING DISTRIBUTION • The Ybar CONTROL CHART can be used to detect shifts in the mean of a process. • The chart looks at a sequence of sample means and the process is assumed to be “IN CONTROL” as long as the sample means are within the control limits. Y CONTROL CHART UPPER CONTROL LIMIT CENTER LINE LOWER CONTROL LIMIT SAMPLE GLASS BOTTLE EXAMPLE • A glass-bottle manufacturing company wants to maintain a mean bursting strength of 260 PSI. • Past experience has shown that the standard deviation for the bursting strength is 36 PSI. • The company periodically pulls 36 bottles off the production line to determine if the mean bursting strength has changed. GLASS BOTTLE EXAMPLE (CONT) • Construct a control chart so that 95% of the sample means will fall within the control limits when the process is “IN CONTROL.” • The END