Download Econometrics_Lesson_..

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Randomness wikipedia , lookup

Transcript
Probability and Statistics – PDFs and CDFs
The study of econometrics begins with the concept of a random variable. Any variable X which
has yet to be determined is called a random variable.
How long you will live is a random variable. How many times you will be married is a random
variable. What your highest monthly income will be during the next 20 years is a random
variable. The price of a share of China Airlines stock ten years from today minus $25NT is a
random variable. No one can tell you what these numbers will be. They have not yet been
determined.
We can ask a simple question about the random variable X. Namely, what is the probability that
X is less than or equal to some number α. We write this as P[ X ≤ α]. This probability we
assume is greater than or equal to zero and less than or equal to 1. We write this as
0 ≤ P[ X ≤ α] ≤ 1
The probability is always between these two limits. That means we can describe probability as
an area under a curve, so long as any part of the area is positive or zero and the total area is one.
Here are some examples of areas under curves that are positive and might sum to 1.
Just imagine that all the areas in gray above are equal to 1. Then any part of the area must be
greater than zero and less than one. Thus, it can be a probability.
Try drawing a couple of curves below that are non-negative and have areas that might equal 1.
Make them different from the ones above.
The function f(x) is very special since it is what we use to determine the probability of a random
variable. Each random variable has a curve assigned to it – the function f(x) is called a
probability density function (PDF).
We see now that any function that is (1) non-negative and (2) has area under it equal to 1 can be
a PDF. A very famous PDF, one that is used often in econometrics, is the standard normal PDF.
Here is a drawing
The equation for this function is somewhat complicated and is written in the following way
Note that this function uses two very special irrational numbers (called transcendental irrationals)
π ≈ 3.141 and e ≈ 2.718. Clearly, if x = 0 then f(x) is maximized. That is where it is highest.
The maximum is equal to f(0) ≈ 0.40. The standard normal PDF is symmetric which means that
the left side is exactly the same as the right side. Some PDFs are not symmetric.
How do we define probability using the PDF concept?
Easy, we just mark off the limits and compute the area. For example, suppose we have a random
variable X with PDF equal to f(x) and we want to compute the probability P[ X ≤ α]. We just
find the area under the curve f(x) from -∞ < x ≤ α. Here is an example using the standard normal
density. This probability must be less than 1 since the total area under f(x) equals to 1.
In order to find probabilities, as you can see, we must be able to integrate and find the areas
under f(x). Sometimes this is very difficult and sometimes this is impossible. When it is
impossible to integrate to a closed form function we approximate and make tables which we can
read. Luckily, high speed computers makes it easy to numerically integrate and therefore
software like GRETL can easily find probabilities for us, even when f(x) is a very complicated
function. In the past it was very hard to find these integrals (areas) and therefore people used
tables. We still use tables occasionally, but most of that is in the past. You can find the standard
normal table on the internet using the following link.
http://www.sjsu.edu/faculty/gerstman/EpiInfo/z-table.htm
Most of the PDFs in econometrics and statistics are very complicated. But, we can practice on
some simpler examples. Here are some problems.
Problems:
(1) Show the following function is a PDF.
for
(2) Show the following function is a PDF.
for
(3) Show the following function is a PDF.
for
(4) Show the following function is a PDF.
for
(5)
Note that we define the cumulative distribution function (CDF) as the following
Find the CDFs for each of the following PDFs in 1 - 4 above.
(6) Explain why that the derivative of any CDF is positive, i.e.,
always.
(7) If f(x) is a PDF and g(x) is a PDF, then is f(x) + g(x) a PDF?
(8) Find the exact value for β such that
(9) Note that
is a PDF when
is a PDF for
. Suppose that we had some
observations on a random variable X that had this PDF. The observations are as
follows:
x1 = 0.20
x2 = 0.85
x3 = 0.75
x4 = 0.35
x5 = 0.95
How could we guess the value of ? A good guess would be
Here is how to solve the problem. Consider the fact that the theoretical mean of X is
defined using the PDF as
and this should be close to the sample mean defined using the data as
.