Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1 Probability (i.d.) and they are independent. The major consequence of independence is the multiplication rule. For example, A CM, when operated, produces an outcome. The sample space is the collection of all possible outcomes of the CM. An event is any collection of outcomes. Before the CM is operated, the probability of the event A, denoted by P (A), is a number that measures the likelihood that A will occur. How are probabilities assigned to events? If we assume the ELC, then each possible outcome is equally likely to occur. If we assume the ELC, then we assign probabilities to events as follows. For any event A, P (A) = The number of outcomes in A . The number of outcomes in S If we do not assume the ELC, there are two possibilities. 1. Suppose that the sample space is finite and consists of k possible outcomes: 1, 2, . . . k. The probability of outcome j is denoted by pj . Each pj ≥ 0 and they sum to one. 2. Suppose that the sample space is an infinite sequence and consists of possible outcomes: 0, 1, 2, . . .. The probability of outcome j is denoted by pj . Each pj ≥ 0 and they sum to one. If A and B are events, then (A or B) is the event that contains all elements that are in A and/or B; (AB) is the event that contains all elements that are in both A and B. Two events, A and B, are called disjoint or mutually exclusive if they have no elements in common; in other words, if AB is the empty set. Rules of Probability 1. The probability of the sample space equals 1. 2. For any event A, 0 ≤ P (A) ≤ 1. 3. If A and B are disjoint events, then P (A or B) = P (A) + P (B). c 4. P (A ) = 1 − P (A). 5. If A is a subset of B, then P (A) ≤ P (B). 6. For any events A and B, P (A or B) = P (A) + P (B) − P (AB). 2 Trials Consider repeated operations of a CM. Each operation is called a trial and yields the value of a random variable. The random variables are denoted by X1 for the first trial, X2 for the second trial, and so on. Trials are i.i.d. if, and only if, the Xi ’s all have the same probability distribution 1 P (X1 = 3, X2 = 1, X3 = 4) = P (X1 = 3)P (X2 = 1)P (X3 = 4). Let A be an event for which on every trial A either occurs or does not occur; i.e. A is a function of a single operation of the CM. If the trials are i.i.d. then the relative frequency of occurrence of event A in n trials converges to P (A) as n tends to infinity. In other words, for large n the relative frequency of occurrence of A is approximately equal to P (A). If each trial has two possible outcomes, 1 (S) and 0 (F), then we have Bernoulli Trials. Let X denote the total number of successes in n BT. The probability distribution of X is given by the following equation. P (X = x) = n! px q n−x , for x = 0, 1, . . . , n. x!(n − x)! (1) Equation 1 is called the binomial probability distribution with parameters n and p; it is denoted by Bin(n, p). Suppose that a random variable X takes on values that are equally spaced. Let δ denote the distance between successive values of X. Then the probability histogram for the probability distribution is drawn as follows. 1. Place the possible values, x, of X on a horizontal number line. 2. Draw rectangles that are centered at the x’s. The base of each rectangle is δ and the height is P (X = x)/δ. Every probability distribution has a mean (µ), variance (σ 2 ) and standard deviation (σ). For √ Bin(n, p), µ = np, σ 2 = npq and σ = npq. Let X be any random variable with mean µ and standard deviation σ. Then the standardized version of X is denoted by Z and is given by the equation: Z= X −µ . σ If X ∼ Bin(n, p) we can approximate P (X ≥ x) by standardizing to √ P (Z ≥ (x − 0.5 − np)/ npq). We can approximate this probability for Z by using the snc. A finite population is a well-defined collection of individuals. We visualize a finite population as a box of cards. Let p denote the proportion of the cards in the box marked ‘1’ and q = 1 − p. We refer to a population box as Box(N,p) to denote a box with N cards, of which N × p cards are marked ‘1.’ Consider the CM: Select one card at random from Box(N, p). After operating this CM, place the selected card back into the population box. Repeat this process n times. This operation is referred to as selecting n cards at random with replacement. Viewing each selection as a trial, we can see that we have BT. 2 3 Estimation of p 3. Successes cannot be simultaneous. Assume that we have n BT, but we don’t know the value of p. Nature knows the value of p. We collect data and observe x successes in the n trials. The point estimate of p is p̂ = x/n. The point estimate is correct if p̂ = p. Only Nature knows whether a particular p̂ is correct. They are rarely correct. A confidence interval (CI) estimate of p is correct if the interval includes p. Only Nature knows whether a particular CI is correct, but, before collecting data, the researcher knows the probability that an interval will be correct. If this probability is, say, 95% then after collecting data we call the resultant interval a 95% confidence interval. Intervals can be 2-sided (the default value if not stated) or 1-sided. In this exam, a 1-sided CI is always an upper bound interval. There are two possible CIs: exact and approximate. Exact is obtained from a website. The approximate CI is q q p̂ ± z p̂q̂/n and p̂ + z p̂q̂/n for the 2- and 1-sided CI. The value of z is determined by the confidence level as shown in the following table. z: 2-sided CL 1-sided CL 1.282 1.645 1.960 2.326 2.576 80% 90% 95% 98% 99% 90% 95% 97.5% 99% 99.5% 4 The Poisson Distribution If the sample space is the sequence, 0, 1, . . . , then a researcher might want to use the Poisson Distribution. The Poisson has one parameter, θ > 0. Exact Poisson probabilities can be obtained from a website. For a Poisson(θ), µ = σ 2 = θ. The snc can be√used to obtain approximate probabilities. In P (X ≥ x) just standardize x to z = (x − 0.5 − θ)/ θ and proceed as for the binomial. If θ is unknown we can use the value of X to estimate it. The point estimate is x and, following the presentation for the binomial, we can use the snc to obtain an approximate confidence interval for θ. The result for 2- and 1-sided intervals are: √ √ x ± z x and x + z x. With these assumptions, it turns out that the probability distribution of the number of successes in any interval of time is the Poisson distribution with parameter θ, where θ = λ × w, where w > 0 is the length of the interval and λ > 0 is a feature of the process, often called its rate. 5 The Goodness of Fit Test A Chi-Squared curve is characterized by the value of one parameter, called its degrees of freedom (df ). The degrees of freedom can be any positive integer, 1, 2, 3, . . . . Our symbol for this curve will be χ2 (df ). For example, χ2 (5) is the Chi-Squared curve with df = 5. We assume that we have a CM with a finite number, k, of possible outcomes that can be operated repeatedly and, when so operated, yields i.i.d. trials. Whether the outcomes are categories or numbers, we assign numbers to each outcome: 1, 2, . . . k or 0, 1, 2, . . . (k − 1). The probability of outcome i is denoted by pi . The Goodness of Fit Test is used when we have a theory about the values of the pi ’s and we want to evaluate whether or not the theory is reasonable. In general, let pi0 denote the theory’s value of pi . This makes the hypotheses: • H0 : pi = pi0 for all i • H1 : Not H0 ; i.e. pi 6= pi0 for at least one i. A test can result in a correct decision, a Type 1 error (rejecting a true null) or a Type 2 error (failing to reject a false null). The researcher must specify the significance level of the test. It is denoted by α and is the probability of a Type 1 error. The data consists of the observed frequencies of the various possible outcomes, denoted by O’s. Each O has its E, calculated as Ei = npi0 . The observed value of the test statistic is χ2 = X (Oi − Ei )2 /Ei , summed over all cells. We reject the null if, and only if, χ2 ≥ χ2α (k − 1). The P-value is the area under the χ2 (k − 1) curve to the right of the observed value of the test statistic. It is also possible to use a website to get an exact CI for θ. If X ∼ Bin(n, p) with n very large, p very small and np moderate, then probabilities for X can be approximated by using Poisson(θ), where θ = np. Below are the assumptions for a Poisson Process. 1. The number of successes in disjoint intervals are independent of each other. 2. The probability distribution of the number of successes counted in any time interval only depends on the length of the interval. 3 4