Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Probability and Statistics – PDFs and CDFs The study of econometrics begins with the concept of a random variable. Any variable X which has yet to be determined is called a random variable. How long you will live is a random variable. How many times you will be married is a random variable. What your highest monthly income will be during the next 20 years is a random variable. The price of a share of China Airlines stock ten years from today minus $25NT is a random variable. No one can tell you what these numbers will be. They have not yet been determined. We can ask a simple question about the random variable X. Namely, what is the probability that X is less than or equal to some number α. We write this as P[ X ≤ α]. This probability we assume is greater than or equal to zero and less than or equal to 1. We write this as 0 ≤ P[ X ≤ α] ≤ 1 The probability is always between these two limits. That means we can describe probability as an area under a curve, so long as any part of the area is positive or zero and the total area is one. Here are some examples of areas under curves that are positive and might sum to 1. Just imagine that all the areas in gray above are equal to 1. Then any part of the area must be greater than zero and less than one. Thus, it can be a probability. Try drawing a couple of curves below that are non-negative and have areas that might equal 1. Make them different from the ones above. The function f(x) is very special since it is what we use to determine the probability of a random variable. Each random variable has a curve assigned to it – the function f(x) is called a probability density function (PDF). We see now that any function that is (1) non-negative and (2) has area under it equal to 1 can be a PDF. A very famous PDF, one that is used often in econometrics, is the standard normal PDF. Here is a drawing The equation for this function is somewhat complicated and is written in the following way Note that this function uses two very special irrational numbers (called transcendental irrationals) π ≈ 3.141 and e ≈ 2.718. Clearly, if x = 0 then f(x) is maximized. That is where it is highest. The maximum is equal to f(0) ≈ 0.40. The standard normal PDF is symmetric which means that the left side is exactly the same as the right side. Some PDFs are not symmetric. How do we define probability using the PDF concept? Easy, we just mark off the limits and compute the area. For example, suppose we have a random variable X with PDF equal to f(x) and we want to compute the probability P[ X ≤ α]. We just find the area under the curve f(x) from -∞ < x ≤ α. Here is an example using the standard normal density. This probability must be less than 1 since the total area under f(x) equals to 1. In order to find probabilities, as you can see, we must be able to integrate and find the areas under f(x). Sometimes this is very difficult and sometimes this is impossible. When it is impossible to integrate to a closed form function we approximate and make tables which we can read. Luckily, high speed computers makes it easy to numerically integrate and therefore software like GRETL can easily find probabilities for us, even when f(x) is a very complicated function. In the past it was very hard to find these integrals (areas) and therefore people used tables. We still use tables occasionally, but most of that is in the past. You can find the standard normal table on the internet using the following link. http://www.sjsu.edu/faculty/gerstman/EpiInfo/z-table.htm Most of the PDFs in econometrics and statistics are very complicated. But, we can practice on some simpler examples. Here are some problems. Problems: (1) Show the following function is a PDF. for (2) Show the following function is a PDF. for (3) Show the following function is a PDF. for (4) Show the following function is a PDF. for (5) Note that we define the cumulative distribution function (CDF) as the following Find the CDFs for each of the following PDFs in 1 - 4 above. (6) Explain why that the derivative of any CDF is positive, i.e., always. (7) If f(x) is a PDF and g(x) is a PDF, then is f(x) + g(x) a PDF? (8) Find the exact value for β such that (9) Note that is a PDF when is a PDF for . Suppose that we had some observations on a random variable X that had this PDF. The observations are as follows: x1 = 0.20 x2 = 0.85 x3 = 0.75 x4 = 0.35 x5 = 0.95 How could we guess the value of ? A good guess would be Here is how to solve the problem. Consider the fact that the theoretical mean of X is defined using the PDF as and this should be close to the sample mean defined using the data as .