Download Weeks 2 to 4 September Statistics

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
School of Economics, Mathematics and Statistics
Birkbeck College
Malet Street, London WC1E 7HX, UK
Weeks 2 to 4
September Statistics
Ali C. Tasiran
September 2007
Contents
1 Introduction
1.1 Course Aims . . . . .
1.2 Course Objectives . .
1.3 Outline of Topics . . .
1.4 Teaching arrangements
1.5 Textbooks . . . . . . .
1.6 Some preliminaries . .
Problems . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
and Assessment
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
.
.
.
.
.
.
.
2 Probability
2.1 Probability definitions and concepts . . .
2.1.1 Classical definition of probability .
2.1.2 Frequency definition of probability
2.1.3 Subjective definition of probability
2.1.4 Axiomatic definition of probability
Problems . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4
4
4
4
5
5
5
7
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
8
8
8
8
8
8
11
3 Random variables and probability distributions
3.1 Random variables, densities, and cumulative distribution functions
3.1.1 Discrete Distributions . . . . . . . . . . . . . . . . . . . . .
3.1.2 Continuous Distributions . . . . . . . . . . . . . . . . . . .
3.1.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
12
12
13
13
14
15
15
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4 Expectations and moments
4.1 Mathematical Expectation and Moments .
4.1.1 Mathematical Expectation . . . . .
4.1.2 Moments . . . . . . . . . . . . . .
Problems . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
16
16
16
20
22
5 Some univariate distributions
5.1 Discrete Distributions . . . . . . .
5.1.1 The Bernoulli Distribution
5.1.2 The Binomial Distribution .
5.1.3 Example . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
23
23
23
23
24
.
.
.
.
1
.
.
.
.
.
.
.
.
.
.
.
.
Quantitative Techniques II
2
5.1.4 Simple Random Walk . . . . . . . . . . . . . . . . . .
5.1.5 Geometric Distribution . . . . . . . . . . . . . . . . .
5.1.6 Hypergeometric Distribution . . . . . . . . . . . . . .
5.1.7 Negative Binomial Distribution . . . . . . . . . . . . .
5.1.8 Poisson Distribution . . . . . . . . . . . . . . . . . . .
5.2 Continuous Distributions . . . . . . . . . . . . . . . . . . . .
5.2.1 Uniform Distribution on an Interval . . . . . . . . . .
5.2.2 Beta Distribution . . . . . . . . . . . . . . . . . . . . .
5.2.3 Cauchy Distribution . . . . . . . . . . . . . . . . . . .
5.2.4 Chi-Square Distribution . . . . . . . . . . . . . . . . .
5.2.5 The Exponential Distribution . . . . . . . . . . . . . .
5.2.6 Extreme Value Distribution (Gompertz Distribution) .
5.2.7 F Distribution . . . . . . . . . . . . . . . . . . . . . .
5.2.8 Gamma Distribution . . . . . . . . . . . . . . . . . . .
5.2.9 Geometric Distribution . . . . . . . . . . . . . . . . .
5.2.10 Logistic Distribution . . . . . . . . . . . . . . . . . . .
5.2.11 Lognormal Distribution . . . . . . . . . . . . . . . . .
5.2.12 The Normal Distribution . . . . . . . . . . . . . . . .
5.2.13 Pareto Distribution . . . . . . . . . . . . . . . . . . . .
5.2.14 Student’s t Distribution . . . . . . . . . . . . . . . . .
5.2.15 Weibull Distribution . . . . . . . . . . . . . . . . . . .
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6 Multivariate distributions
6.1 Bivariate Distributions . . . . . . . . . . . . . . .
6.1.1 The Bivariate Normal Distribution . . . .
6.1.2 Mixture Distributions . . . . . . . . . . .
6.2 Multivariate Density Functions . . . . . . . . . .
6.2.1 The Multivariate Normal Distribution . .
6.2.2 Standard multivariate normal density . .
6.2.3 Marginal and Conditional Distributions of
6.2.4 The Chi-Square Distribution . . . . . . .
Problems . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
N (μ, Σ)
. . . . .
. . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
24
24
25
25
25
25
26
26
26
26
27
27
27
27
27
28
28
28
28
29
29
29
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
31
31
32
32
33
33
33
34
34
35
7 Sampling, sample moments, sampling distributions, and simulation
7.1 Independent, Dependent, and Random Samples . . . . . . . . . . . . . .
7.2 Sample Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3 Sampling Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
36
36
37
37
37
8 Large sample theory
8.1 Different Types of Convergence . .
8.2 The Weak Law of Large Numbers .
8.3 The Strong Law of Large Numbers
8.4 The Central Limit Theorem . . . .
Problems . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
39
39
41
41
42
42
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Quantitative Techniques II
3
9 Estimation and properties of estimators
9.1 Point Estimation . . . . . . . . . . . . . . . . . . .
9.1.1 Small Sample Criteria for Estimators . . . .
9.1.2 Large Sample Properties of Estimators . . .
9.2 Interval Estimation . . . . . . . . . . . . . . . . . .
9.2.1 Pivotal-quantity method of finding CI . . .
9.2.2 CI for the mean of a normal population . .
9.2.3 CI for the variance of a normal population .
9.3 Problems . . . . . . . . . . . . . . . . . . . . . . .
Problems . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
43
43
43
44
45
46
46
47
48
48
10 Tests of statistical hypotheses
10.1 Basic Concepts in Hypothesis Testing . . . . . . .
10.1.1 Null and Alternative Hypotheses . . . . . .
10.1.2 Simple and Composite Hypotheses . . . . .
10.1.3 Statistical Test . . . . . . . . . . . . . . . .
10.1.4 Type I and Type II Errors . . . . . . . . . .
10.1.5 Power of a Test . . . . . . . . . . . . . . . .
10.1.6 Operating Characteristics . . . . . . . . . .
10.1.7 Level of Significance and the Size of a Test
Problems . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
49
49
49
49
50
50
50
50
50
51
11 Examination 1
11.1 Definition Questions . . .
11.2 Calculation questions . . .
11.3 Discussion questions . . .
11.4 Multiple choice questions
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
52
52
52
53
53
.
.
.
.
55
55
55
56
56
12 Examination 2
12.1 Definition Questions . . .
12.2 Calculation questions . . .
12.3 Discussion questions . . .
12.4 Multiple choice questions
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Chapter 1
Introduction
1.1
Course Aims
This is a refresher course in the mathematical statistics and it has also some new modules
for forthcoming econometric studies.
1.2
Course Objectives
The course is a beginning to mathematical statistics and consists of the foundations of
the theory of probability and statistical inference. It is intended to provide the necessary
statistical background for the Econometric courses. At the beginning, it covers basic facts
about random variables and their distributions. It then provides an introduction to statistical inference. These tools are arranged with a view to applying them to econometric
methodology. Thus, the emphasis is on probability and distribution theories together with
estimation and hypothesis testing involving several parameters.
1.3
Outline of Topics
There are two main parts in the course.
Probability and Distribution Theories
1. Probability
2. Random variables and probability distributions
3. Expectations and moments
4. Some univariate distributions
5. Multivariate distributions
Statistical Inference
6. Sampling, sample moments and sampling distributions
8. Large sample theory
9. Estimation and properties of estimators
10. Tests of statistical hypotheses
4
Quantitative Techniques II
1.4
5
Teaching arrangements and Assessment
There will be two lectures and one class for problem solutions each week over three weeks.
Performance in this course is assessed through a written examination. You are required to
pass this examination to continue on the MSc programme. No resits are held.
1.5
Textbooks
Lecture notes are provided. However, these are not a substitute for a textbook. I do not
recommend any particular text, but in the past students have found the following useful.
• Greene, W.H., (2004) Econometric Analysis, 5rd edition, Prentice-Hall. A good summary of much of the material can be found in Appendix.
• Hogg, R. V. and Craig A. T., (1995) Introduction to Mathematical Statistics, 5th
edition, Prentice Hall. A popular textbook, even though it is slightly dated.
• Mittelhammer. R. C., (1999) Mathematical Statistics for Economics and Business,
Springer Verlag. A good text. A good mathematical statistics textbook for economists, it is useful especially for further econometric studies.
• Mood. A.M., Graybill, and Boes D.C., (1974) Introduction to the Theory of Statistics,
3rd edition, McGraw-Hall.
• Spanos. A., (1999) Probability Theory and Statistical Inference, Econometric Modeling
with Observational Data, Cambridge University Press.
• Wackerly. D., Mendenhall W., and Scheaffer. R., (1996) Mathematical Statistics with
Applications, 5th edition, Duxbury Press.
Those who plan to take forthcoming courses in Econometrics may buy the book by
Greene (2004).
Ali Tasiran
[email protected]
1.6
Some preliminaries
Statistics is the science of observing data and making inferences about the characteristics
of a random mechanism that has generated data. It is also called as science of uncertainty.
In Economics, theoretical models are used to analyze economic behavior. Economic
theoretical models are deterministic functions but in real world, the relationships are not
exact and deterministic rather than uncertain and stochastic. We thus employ distribution
functions to make approximations to the actual processes that generate the observed data.
The process that generates data is known as the data generating process (DGP or Super
Population). In Econometrics, to study the economic relationships, we estimate statistical models, which are build under guidance of the theoretical economic models and by
taking into account the properties in the data generating process.
Quantitative Techniques II
6
Using parameters of estimated statistical models, we make generalisations about the
characteristics of a random mechanism that has generated data. In Econometrics, we use
observed data in the samples to draw conclusions about the populations. Populations
are either real which the data came or conceptual as processes by which the data were
generated. The inference in the first case is called design-based (for experimental data)
and used mainly to study samples from populations with known frames. The inference in
the second case is called model-based (for observational data) and used mainly to study
stochastic relationships.
The statistical theory that used for such analyses is called as the Classical inference
one will be followed in this course. It is based on two premises:
1. The sample data constitute the only relevant information
2. The construction and assessment on the different procedures for inference are based
on long-run behavior under similar circumstances.
The starting point of an investigation is an experiment. An experiment is a random
experiment if it satisfies the following conditions:
- all possible distinct outcomes are known ahead of time
- the outcome of a particular trial is not known a priori
- the experiment can be duplicated.
The totality of all possible outcomes of the experiment is referred to as the sample
space (denoted by S) and its distinct individual elements are called the sample points or
elementary events. An event, is a subset of a sample space and is a set of sample points
that represents several possible outcomes of an experiment.
A sample space with a finite or countably infinite sample points (with a one to one
correspondence to positive integers) is called a discrete space.
A continuous space is one with an uncountable infinite number of sample points (that
is, it has as many elements as there are real numbers).
Events are generally represented by sets, and some important concepts can be explained
by using the algebra of sets (known as Boolean Algebra).
Definition 1 The sample space is denoted by S. A = S implies that the events in A must
always occur. The empty set is a set with no elements and is denoted by ®. A = ® implies
that the events in A do not occur.
The set of all elements not in A is called the complement of A and is denoted by Ā.
Thus, Ā occurs if and only if A does not occur.
The set of all points in either a set A or a set B or both is called the union of the two
sets and is denoted by ∪. A ∪ B means that either the event A or the event B or both
occur. Note: A ∪ Ā = S.
The set of all elements in both A and B is called the intersection of the two sets and
is represented by ∩. A ∩ B means that both the events A and B occur simultaneously.
A ∩ B = ® means that A and B cannot occur together. A and B are said to be disjoint
or mutually exclusive. Note: A ∩ Ā = ®.
A ⊂ B means that A is contained in B or that A is a subset of B, that is, every
element of A is an element of B. In other words, if an event A has occurred, then B must
have occurred also.
Sometimes it is useful to divide elements of a set A into several subsets that are disjoint.
Such a division is known as a partition. If A1 and A2 are such partitions, then A1 ∩A2 = ®
Quantitative Techniques II
7
and A1 ∪ A2 = A. This can be generalized to n partitions; A = ∪n1 Ai with Ai ∩ Aj = ® for
i 6= j.
Some postulates according to the Boolean Algebra:
Identity: There exist unique sets ® and S such that, for every set A, A ∩ S = A and
A ∪ ® = A.
Complementation: For each A we can define a unique set Ā such that A ∩ Ā = ® and
A ∪ Ā = S.
Closure: For every pair of sets A and B, we can define unique sets A ∪ B and A ∩ B.
Commutative: A ∪ B = B ∪ A; A ∩ B = B ∩ A.
Associative: (A ∪ B) ∪ C = A ∪ (B ∪ C).
Also (A ∩ B) ∩ C = A ∩ (B ∩ C).
Distributive: A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C).
Also, A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C).
Morgan’s Laws: A ∪ B) = Ā ∩ B̄.
(A ∩ B) = Ā ∪ B̄.
Problems
1. Let the set S contains the ordered combination of sexes of two children
S = {F F, F M, M F, M M }.
Let A denote the subset of possibilities containing no males, B the subset of two
males, and C the subset containing at least one male. List the elements of A, B, C,
A ∩ B, A ∪ B, A ∩ C, A ∪ C, B ∩ C, B ∪ C, and C ∩ B̄.
2. Verify Morgan’s Laws by drawing Venn Diagrams.
A ∪ B = Ā ∩ B̄.
(A ∩ B) = Ā ∪ B̄.
Chapter 2
Probability
2.1
2.1.1
Probability definitions and concepts
Classical definition of probability
If an experiment has n(n < ∞) mutually exclusive and equally likely outcomes, and if nA
of these outcomes have an attribute A (that is, the event A occurs in nA possible ways),
then the probability of A is nA /n, denoted as P (A) = nA /n
2.1.2
Frequency definition of probability
Let nA be the number of times the event A occurs in n trials of an experiment. If there
exists a real number p such that p = limn→∞ (nA /n), then p is called the probability of A
and is denoted as P (A). (Examples are histograms for frequency distribution of variables).
2.1.3
Subjective definition of probability
Our personal judgments to assess the relative likelihood of various outcomes. They are based
on our ”educated guesses” or intuitions. ”The weather will be rainy with a probability 0.6,
tomorrow”.
2.1.4
Axiomatic definition of probability
The probability of an event A ∈ z is a real number such that
1) P (A) ≥ 0 for every A ∈ z,
2) the probability of the entire sample space S is 1, that is P (S) = 1, and
exclusive events (that is, Ai ∩ Aj = ® for all i 6= j),
3) if A1 , A2 , ..., An are mutually
P
then P (A1 ∪ A2 ∪ ...An ) = i P (Ai ), and this holds for n = ∞ also.
Where z is a set of all sub-sets in the sample space, S. The triple (S, z, P (·)) is referred
to as the probability space, and P (·) is a probability measure.
We can derive the following theorems by using the axiomatic Definition of probability.
Theorem 1 P (Ā) = 1 − P (A).
Theorem 2 P (A) ≤ 1.
8
Quantitative Techniques II
9
Theorem 3 P (®) = 0.
Theorem 4 If A ⊂ B, then P (A) ≤ P (B).
Theorem 5 P (A ∪ B) = P (A) + P (B) − P (A ∩ B).
Definition 2 Let A and B be two events in a probability space (S, z, P (.)) such that
P (B) > 0. The conditional probability of A given that B has occurred, denoted by
P (A | B), is given by P (A ∩ B)/P (B). (It should be noted that the original probability space
(S, z, P (·)) remains unchanged even though we focus our attention on the subspace, this is
(S, z, P (· | B))
Theorem 6 Bonferroni’s Theorem: Let A and B be two events in a sample space S. Then
P (A ∩ B) ≥ 1 − P (Ā) − P (B̄).
Theorem 7 Bayes Theorem: If Aand B are two events with positive probabilities, then
P (A | B) =
P (A) P (B | A)
P (B)
Law of total probability
Assume that S = A1 ∪ A2 ∪ ... ∪ An where Ai ∩ Aj = ∅ for i 6= j. Then for any event
B⊂S
n
X
P (Ai )P (B | Ai ).
P (B) =
i=1
Theorem 8 Extended Bayes Theorem: If A1 , A2 , ..., An constitute a partition of the sample
space, so that Ai ∩ Aj = ® for i 6= j and ∪i Ai = S, and P (Ai ) 6= 0 for any i, then for a
given event B with P (B) > 0,
P (Ai ) P (B | Ai )
P (Ai | B) = P
i P (Ai ) P (B | Ai )
Definition 3 Two events A and B with positive probabilities are said to be statistically
independent if and only if P (A | B) = P (A). Equivalently, P (B | A) = P (B) and
P (A ∩ B) = P (A)P (B).
The other type of statistical inference is called Bayesian inference where sample information is combined with prior information. This is expressed of a probability distribution
known as the prior distribution. When it is combined with the sample information then a
posterior distribution of parameters is obtained. It can be derived by using Bayes Theorem.
If we substitute Model (the model that generated the observed data) for A and Data
(Observed Data) for B, then we have
P (M odel | Data) =
P (Data | M odel)P (M odel)
P (Data)
(2.1)
where P (Data | M odel) is the probability of observing data given that the Model is true.
This is usually called the likelihood, (sample information). P (M odel) is the probability
Quantitative Techniques II
10
that the Model is true before observing the data (usually called the prior probability).
P (M odel | Data) is the probability that the Model is true after observing the data (usually
called posterior probability). P (Data) is the unconditional probability of observing data
(whether the Model is true or not). Hence, the relation can be written
P (M odel | Data) ∝ P (Data | M odel)P (M odel)
(2.2)
That is, that Posterior probability is proportional to likelihood (sample information) times
prior probability. The inverse of an estimator’s variance is called as the precision. In
Classical Inference, we use only parameter’s variances but in Bayesian Inference, we have
both sample precision and prior precision. Also, the precision (or inverse of the variance) of
the posterior distribution of a parameter is the sum of sample precision and prior precision.
For example, the posterior mean will lie between the sample mean and the prior mean. The
posterior variance will be less than the both the sample and prior variances. These are the
reasons behind the increasing popularity of Bayesian Inference in the practical econometric
applications.
When we speak in econometrics of models to be estimated or tested, we refer to sets of
DGPs in Classical Inference context. In design-based inference, we restrict our attention
to a particular sample size and characterize a DGP by the law of probability that governs
the random variables in a sample of that size. In model based inference, we refer to a
limiting process in which the sample size goes to infinity, it is clear that such a restricted
characterization will no longer suffice. When we indulge in asymptotic theory, the DGPs
in question must be stochastic processes. A stochastic process is a collection of random
variables indexed by some suitable index set. This index set may be finite, in which case we
have no more than a vector of random variables, or it may be infinite, with either a discrete
or a continuous infinity of elements. In order to define a DGP, we must be able to specify the
joint distribution of the set of random variables corresponding to the observations contained
in a sample of arbitrarily large size. This is a very strong requirement. In econometrics,
or any other empirical discipline for that matter, we deal with finite samples. How then
can we, even theoretically, treat infinite samples? We must in some way create a rule that
allows one to generalize from finite samples to an infinite stochastic process. Unfortunately,
for any observational framework, there is an infinite number of ways in which such a rule
can be constructed, and different rules can lead to widely asymptotic conclusions. In the
process of estimating an econometric model, what we are doing is to try to obtain some
estimated characterization of the DGP that actually did generate the data. Let us denote
an econometric model that is to be estimated, tested, or both, as M and a typical DGP
belonging to M as μ.
The simplest model in econometrics is the linear regression model, one possibility is to
write
y = Xβ + u, u ∼ N (0, σ 2 In )
(2.3)
where y and u are n-vectors and X is a nonrandom nxk matrix and y follows the N (Xβ, σ 2 In )
distribution. This distribution is unique if the parameters β and σ 2 are specified. We may
therefore say that the DGP is completely characterized by the model parameters. In
other words, knowledge of the model parameters β and σ 2 uniquely identify an element of
μ in M .
Quantitative Techniques II
11
On the other hand, the linear regression model can also be written as
y = Xβ + u, u ∼ IID(0, σ 2 In )
(2.4)
with no assumption of normality. Many aspects of the theory of linear regressions are just
0
applicable, the OLS estimator is unbiased, and its covariance matrix is σ 2 (X X)−1 . But the
distribution of the vector u, and hence also that of y, is now only partially characterized
even when β and σ 2 are known. For example, errors u could be skewed to the left or to the
right, could have fourth moments larger or smaller than 3σ 4 .Let us call the sets of DGPs
associated these regressions M1 and M2 ., respectively. M1 being in fact a proper subset of
M2 . For a given β and σ 2 there is an infinite number of DGPs in M2 (only one of which
is M1 ) that all correspond to the same β and σ 2 . Thus we must consider these models as
different models even though the parameters used in them are the same. In either case, it
must be possible to associate a parameter vector in a unique way to any DGP μ in the model
M , even if the same parameter vector is associated with many DGPs. We call the model M
with its associated parameter-defining mapping θ as a parametrized model The main
task in our practical work is to build the association between the DGPs of a model and the
model parameters. For example, in the Generalized Method of Moments (GMM) context,
there are many possible ways of choosing the econometric model, i.e., the underlying set of
DGPs. One of the advantages of GMM as an estimation method is that it permits models
which consist of a very large number of DGPs. In striking contrast to Maximum Likelihood
estimation, where the model must be completely specified, any DGP is admissible if it
satisfies a relatively small number of restrictions or regularity conditions. Sometimes, the
existence of the moments used to define the parameters is the only requirement needed for
a model to be well defined.
Problems
1. A sample space consists of five simple events E1 , E2 , E3 , E4 , and E5 .
(a) If P (E1 ) = P (E2 ) = 0.15, P (E3 ) = 0.4 and P (E4 ) = 2P (E5 ), find the probabilities of E4 and E5 .
(b) If P (E1 ) = 3P (E2 ) = 0.3, find the remaining simple events if you know that the
remaining events are equally probable.
2. A business office orders paper supplies from one of three vendors, V1 , V2 , and V3 .
Orders are to be placed on two successive days, one order per day. Thus (V2 , V3 )
might denote that vendor V2 gets the order on the first day and vendor V3 gets the
order on the second day.
(a) List the sample points in this experiment of ordering paper on two successive
days.
(b) Assume the vendors are selected at random each day and assign a probability to
each sample point.
(c) Let A denote the event that the same vendor gets both orders and B the event
that V2 gets at least one order. Find P (A), P (B), P (A ∩ B), and P (A ∪ B) by
summing probabilities of the sample points in these events.
Chapter 3
Random variables and probability
distributions
3.1
Random variables, densities, and cumulative distribution
functions
A random variable X, is a function whose domain is the sample space and whose range is
a set of real numbers.
Definition 4 In simple terms, a random variable (also referred as a stochastic variable) is a real-valued set function whose value is a real number determined by the outcome
of an experiment. The range of a random variable is the set of all the values it can assume.
The particular values observed are called realisations x. If these are countable, x1 , x2 , ...,
it is said to be discrete with associated probabilities
X
P (X = xi ) = p(xi ) ≥ 0,
p(xi ) = 1;
(3.1)
i
Pj
and cumulative distribution P (X ≤ xj ) = i=1 p(xi ).
For a continuous random variable, defined over the real line, the cumulative distribution
function is
Z x
f (u)d(u),
(3.2)
F (x) = P (X ≤ x) =
−∞
where denotes the probability density function
f (x) =
dF (x)
dx
(3.3)
R∞
and −∞ f (x)d(x) = 1.
Also note that the cumulative distribution function satisfies limx→∞ F (x) = 1 and
limx→−∞ F (x) = 0.
Definition 5 The real-valued function F (x) such that F (x) = Px {(−∞, x]} for each x ∈ <
is called the distribution function, also known as the cumulative distribution (or
cumulative density) function, or CDF.
12
Quantitative Techniques II
13
Theorem 9 P (a < X ≤ b) = F (b) − F (a)
Theorem 10 For each x ∈ <, F (x) is continuous to the right of x.
Theorem 11 If F (x) is continuous at x ∈ <, then P (X = x) = 0.
Although f (x) is defined at a point, P (X = x) = 0 for a continuous random variable.
The support of a distribution is the range over which f (x) 6= 0.
Let f be a function from Rk to R. Let x0 be a vector in Rk and let y = f (x0 ) be its
k
image. The function f is continuous at x0 if whenever {xn }∞
n=1 is a sequence in R which
∞
converges to x0 , then the sequence {f (xn )}n=1 converge to f (x0 ). The function f is said
to be continuous if it is continuous at each point in its domain.
All polynomial functions are continuous. As an example of a function that is not continuous consider
½
1, if x > 0,
f (x) =
0, if x ≤ 0.
If both g and f are continuous functions, then g(f (x)) is continuous.
3.1.1
Discrete Distributions
Definition 6 For a discrete random variable X, let f (x) = Px (X = x). The function f (x)
is called the probability function (or as probability mass function).
The Bernoulli Distribution
f (x; θ) = f (x; p) = px (1 − p)1−x for x = 0, 1(failure, success) and 0 ≤ p ≤ 1.
The Binomial Distribution
µ ¶
n x
n!
px (1 − p)n−x
f (x; θ) = B(x; n, p) =
p (1 − p)n−x =
x! (n − x)!
x
(3.4)
x = 0, 1, ..., n (X is the number of success in n trials) 0 ≤ p ≤ 1.
3.1.2
Continuous Distributions
Definition 7 For a random variable X if there exists a nonnegative function f (x), defined
on the real line, such that for any interval B,
P (X ∈ B) =
Z
f (x) dx
(3.5)
B
then X is said to have a continuous distribution and the function f (x) is called the
probability density function or simply density function (or pdf).
The following can be written for the continuous random variables:
F (x) =
Z
x
−∞
f (u) d(u)
(3.6)
Quantitative Techniques II
14
f (x) = F 0 (x) =
Z
∂F (x)
∂x
(3.7)
+∞
f (u) d(u) = 1
(3.8)
−∞
F (b) − F (a) =
Z
b
f (u) d(u)
(3.9)
a
Uniform Distribution on an Interval
A random variable X with the density function
1
(b − a)
f (x; a, b) =
(3.10)
in the interval a ≤ X ≤ b is called the uniform distribution on an interval.
The Normal Distribution
A random variable X with the density function
1
f (x; μ, σ) = p
σ (2π)
∙
¸
1 (x − μ)2
−
σ2
e 2
(3.11)
is called a Normal (Gaussian) distributed variable.
3.1.3
Example
1. Toss of a single fair coin. X =number of heads
⎧
if x < 0
⎨ 0,
1
,
if
0
≤
x<1
F (x) =
⎩ 2
1,
if x ≤ 1
the cumulative distribution function (cdf) of discrete random variables are always step
functions because the cdf increases only at a countable of number of points.
½ 1
2 , if x = 0
f (x) =
1
2 , if x = 1
F (x) =
X
xj ≤x
f (xj )
Quantitative Techniques II
3.2
15
Problems
1. Write P (a ≤ x ≤ b) in terms of integrals and draw a picture for it.
2. Assume the probability density function for x is:
½
cx, if 0 ≤ x ≤ 2
f (x) =
0,
elsewhere
(a) Find the value of c for which f (x) is a pdf.
(b) Compute F (x).
(c) Compute P (1 ≤ x ≤ 2).
3. The large lot of electrical is supposed to contain only 5 percent defectives assuming
a binomial model. If n = 20 fuses are randomly sampled from this lot, find the
probability that at least three defectives will be observed.
4. Let the distribution function of a random variable X be given by
⎧
0,
x<0
⎪
⎪
⎨ x, 0 ≤ x < 2
8
F (x) =
x2
⎪
, 2≤x<4
⎪
16
⎩
1,
x≥4
(a) Find the density function (i.e., pdf) of x.
(b) Find P (1 ≤ x ≤ 3)
(c) Find P (x ≤ 3)
(d) Find P (x ≥ 1 | x ≤ 3).
Chapter 4
Expectations and moments
4.1
Mathematical Expectation and Moments
The probability density and the cumulative distributions functions determine the probabilities of random variables at various points or in different intervals. Very often we are
interested in summary measures of where the distribution is located, how it is dispersed
around some average measure, whether it is symmetric around some point, and so on.
4.1.1
Mathematical Expectation
Definition 8 Let X be a random variable with f (x) as the PMF, or PDF, and g(x) be
a single-valued-function. The integral is the expected value (or mathematical expectation) of g(x) and is denoted P
by E[g(X)]. In the case of a discrete random variable
+∞
this takes the form E[g(X)] =
−∞ g(x)f (xi ), and in the continuous case, E[g(X)] =
R +∞
−∞ g(x)f (x)dx
Mean of a Distribution
For the special case of g(X) = X, the mean of a distribution is μ = E(X).
Theorem 12 If c is a constant, E(c) = c.
Theorem 13 If c is constant, E[cg(X)] = cE[g(X)].
Theorem 14 E[u(X) + v(X)] = E[u(X)] + E[v(X)].
Theorem 15 E(X − μ) = 0, where μ = E(X).
Examples:
Ex1: Let X have the probability density function
x
f (x)
1
2
3
4
4
10
1
10
3
10
2
10
16
Quantitative Techniques II
E(x) =
P
x xf (x)
=1
¡4¢
10
17
+2
Ex2: Let X have the pdf
¡1¢
10
+3
f (x) =
E(x) =
R +∞
−∞
xf (x)dx =
R1
0
¡3¢
½
x(4x3 )dx = 4
Moments of a Distribution
10
+4
¡2¢
10
=
¡ 23 ¢
10
.
4x3 , 0 < x < 1
.
0, elsewhere
R1
0
x4 dx = 4
h
i1
x5
5 0
=4
£1¤
5
= 45 .
The mean of a distribution is the expected value of the random variable X. If the following
integral exists
0
m
μm = E(X ) =
Z
+∞
xm dF
(4.1)
−∞
0
it is called the mth moment around the origin, and it is denoted by μm . Moments can
also be obtained around the mean or the central moments (denoted by μm )
m
μm = E[(X − μ) ] =
Z
+∞
−∞
(x − μ)m dF
(4.2)
Variance and Standard Deviation
The central moment of a distribution that corresponds to m = 2 is called the variance of
this distribution, and is denoted by σ 2 or V ar(X). The positive square root of the variance
is called standard deviation and is denoted by σ or Std(x). The variance is an average
of the squared deviation from the mean. There are many deviations from the mean but
only one standard deviation. The variance shows the dispersion of a distribution and by
squaring deviations one treats positive and negative deviations symmetrically.
Mean and Variance of a Normal Distribution
A random variable X is normal distributed as N(μ, σ 2 ) the mean is μ, and variance is σ 2 .
The operation of substracting the mean and dividing by the standard deviation is called
standardizing. Then the standardized variable Z = (X − μ)/σ is SN(0, 1).
Mean and Variance of a Binomial Distribution
The random variable X is binomial distributed B(n, p) with the mean np and a variance
with np(1 − p). (Show this!)
Theorem 16 If E(X)=μ and Var(X)=σ 2 , and a and b are constants, then V ar(a + bX) =
b2 σ 2 . (Show this!)
Quantitative Techniques II
18
Example:
Ex3: Let X have the probability density function
½
4x3 , 0 < x < 1
f (x) =
.
0, elsewhere
E(x) = 45 .
V ar(x) = E(x2 )−E 2 (x) =
R1
0
x2 (4x3 )dx−
Expectations and Probabilities
£ 4 ¤2
5
=4
h
i1 £ ¤
x6
4 2
6 0− 5
= 46 − 16
25 =
2
75
= 0.0266.
Any probability can be interpreted as an expectation. Define the variable Z which is equal
to 1 if event A occurs, and equal to zero if event A does not occur. Then it is easy to see
that P r(A) = E(Z).
How much information about the probability distribution of a random variable X is
provided by the expectation and variance of X? There are three useful theorems here.
Theorem 17 Markov’s Inequality If X is nonnegative random variable, that is, if P r(X <
0) = 0, and any k is any constant, then P r(X ≥ k) ≤ E(X)/k.
Theorem 18 Chebyshev’s Inequality Let b a positive constant and h(X) be a nonnegative
measurable function of the random variable X. Then
1
Pr(h(X) ≥ b) ≤ E[h(X)]
b
For any constant c > 0 and σ 2 = V ar(X),
Corollary 19 P r(| X − μ |≥ c) ≤
σ2
c2
Corollary 20 P r(| X − μ |≤ c) ≥ 1 −
Corollary 21 P r(| X − μ |≥ kσ) ≤
µ
σ2
c2
¶
1
k2
For linear functions the expectation of the function is the function of the expectation.
But if Y = h(X) is nonlinear, then in general E(Y ) 6= h[E(X)]. The direction of the
inequality may depend on the distribution of X. For certain functions, we can be more
definite.
Theorem 22 Jensen’s Inequality If Y = h(X) is concave and E(X) = μ, then E(Y ) ≤
h(μ).
For example, the logarithmic function is concave, so E[log(X)] ≤ log[E(X)] regardless
of the distribution of X. Similarly, if Y = h(X) is convex, so that it lies everywhere
above its tangent line, then E(Y ) ≥ h(μ). For example, the square function is convex, so
E(X 2 ) ≥ [E(X)]2 regardless of the distribution of X.
Quantitative Techniques II
19
Approximate Mean and Variance of g(X)
Suppose X is a random variable defined on (S, z, P (·)) with E(X) = μ and V ar(X) = σ 2 ,
and let g(X) be a differentiable and measurable function of X. We first take a linear
approximation of g(X) in the neighborhood of μ. This is given by
g(X) ≈ g(μ) + g 0 (μ)(X − μ)
(4.3)
provided g(μ) and g 0 (μ) exist. Since the second term zero expectation E[g(X)] ≈ g(μ), and
variance is V ar[g(X)] ≈ σ 2 [g 0 (μ)]2 .
Mode of a Distribution
The point(s) for which f (x) is maximum are called mode. It is the most frequently observed
value of X.
Median, Upper and Lower Quartiles, and Percentiles
A value of x such that P (X < x) ≤ (1/2), and P (X ≤ (1/2)) ≥ (1/2) is called a median
of the distribution. If the point is unique, then it is the median. Thus the median is the
point on either side of which lies 50 percent of the distribution. We often prefer median as
an ”average” measure because the arithmetic average can be misleading if extreme values
are present.
The point(s) with an area 1/4 to the left is (are) called the lower quartile(s), and the
point(s) corresponding to 3/4 is (are) called upper quartile(s).
For any probability p, the values of X, for which the area to the right is p are called the
upper pth percentiles (also referred to as quantiles).
Coefficient of Variation
The coefficient of variation is defined as the ratio (σ/μ)100, where the numerator is the
standard deviation and the denominator is the mean. It is a measure of the dispersion of a
distribution relative to its mean and useful in the estimation of relationships. We usually
say that the variable X does not vary much if the coefficient of variation is less than 5
percent. This is also helpful to make comparison between two variables that are measured
with different scales.
Skewness and Kurtosis
If a continuous density f (x) has the property that f (μ + a) = f (μ − a) for all a (μ being
the mean of the distribution), then f (x) is said to be symmetric around the mean . If
a distribution is not symmetric about the mean, then it is called skewed. A commonly used
measure of skewness is α3 = E[(X − μ)3 /σ 3 ]. For a symmetric distribution such as the
normal, this is zero(μ = α3 = 0). [Positive skewed (μ > α3 > 0), to the right with a long
tail, negative skewed (μ < α3 < 0), to the left with a long tail].
The peaknedness of a distribution is called kurtosis. One measure of kurtosis is
α4 = E[(X − μ)4 /σ 4 ]. For a normal distribution, kurtosis is called mesokurtic (α4 = 3).
A narrow distribution is called leptokurtic (α4 > 3) and a flat distribution is called
Quantitative Techniques II
20
platykurtic (α4 < 3). The value E[(X − μ)4 /σ 4 ] − 3 is often referred to as excess kurtosis.
4.1.2
Moments
Mathematical Expectation
The concept of mathematical expectation is easily extended to bivariate random variables.
We have
Z Z
E[g(X, Y )] =
g(x, y)dF (x, y)
(4.4)
where the integral is over the (X, Y ) space.
Moments
The rth moment of X is
r
E(X ) =
Z
xr dF (x)
(4.5)
Joint Moments
E(X r Y r ) =
Z Z
xr ys dF (x, y)
Let X and Y be independent random variables and let u(X) be a function of X only and
v(Y ) be a function of Y only. Then,
E[u(X)v(Y )] = E[u(X)]E[v(Y )]
(4.6)
Covariance
Covariance between X and Y is defined as
σXY = Cov(X, Y ) = E[(X − μx )(Y − μy )] = E(XY ) − μx μy
In the continuous case this takes the form:
Z ∞Z ∞
σXY =
(x − μx )(y − μy )f (x, y)dxdy
−∞
(4.7)
(4.8)
−∞
and in the discrete case it is
σXY =
XX
(x − μx )(y − μy )f (x, y)
x
(4.9)
y
Although the covariance measure is useful in identifying the nature of the association
between X and Y , it has a serious problem, namely, the numerical value is very sensitive
to the units of measurement. To avoid this problem, a ”normalized” covariance measure is
used. This measure is called the correlation coefficient.
Quantitative Techniques II
21
Correlation
The quantity
ρXY =
σXY
Cov(X, Y )
=p
σX σY
V ar(X) V ar(Y )
(4.10)
is called correlation coefficient between X and Y . If Cov(X, Y ) = 0, then Cor(X, Y ) = 0,
in which case X and Y are said to be uncorrelated. Two random variables are independent
then σXY = 0 and ρXY = 0. The converse need not to be true.
Theorem 23 | ρXY |≤ 1 that is, −1 ≤ ρXY ≤ 1.
The inequality [Cov(X, Y )]2 ≤ V ar(X)V ar(Y )is called Cauchy-Schwarz Inequality
or ρ2XY ≤ 1 that is, −1 ≤ ρXY ≤ 1. It should be emphasized that ρXY measures only
a linear relationship between X and Y . It is possible to have an exact relation but a
correlation less than 1, even 0.
Example:
To illustrate, consider random variable X which is distributed as Uniform [−θ, θ] and the
transformation Y = X 2 . Cov(X, Y ) = E(X 3 ) − E(X)E(X 2 ) = 0 because the distribution
is symmetric around the origin and hence all the odd moments about the origin are zero.
It follows that X and Y are uncorrelated even though there is an exact relation between
them. In fact, this result holds for any distribution that is symmetric around the origin.
Definition 9 Conditional Expectation: Let X and Y be continuous random variables
and g(Y ) be a continuous function. Then the conditional expectation
(or conditional mean)
R∞
of g(Y ) given X = x, denoted by EY |X [g(Y ) | X], is given by −∞ g(y) f (y | x) dy wheref (y |
x) is the conditional density of Y given X.
Note that E[g(Y ) | X = x] is a function of x and is not a random variable because x is
fixed. The special case of E(Y | X) is called the regression of Y on X.
Theorem 24 Law of Iterated Expectation: EXY [g(Y )] = EX [EY |X {g(Y ) | X}]. That
is, the unconditional expectation is the expectation of the conditional expectation.
Definition 10 Conditional Variance: Let μY |X = E(Y | X) = μ∗ (X) be the conditional
mean of Y given X. Then the conditional variance of Y given X is defined as V ar(Y |
X) = EY |X [(Y − μ∗ )2 | X}]. This is a function of X.
Theorem 25 V arY |X (Y ) = EX [V ar(Y | X)] + V arX [E(Y | X)], that is, the variance of
Y is the mean of its conditional variance plus the variance of its conditional mean.
Theorem 26 V ar(aX + bY ) = a2 V ar(X) + 2abCov(X, Y ) + b2 V ar(Y ).
Quantitative Techniques II
22
Approximate Mean and Variance for g(X, Y )
After obtaining a linear approximation of the function g(X, Y )
¸
¸
∙
∙
∂g
∂g
(X − μX ) +
(Y − μY )
g(X, Y ) ≈ g(μx , μy ) +
∂X
∂Y
its mean can be written E[g(X, Y )] ≈ g(μX , μY ).
Its variance is
¸
¸
¸∙
¸
∙
∙
∙
∂g
∂g 2
∂g 2
∂g
2
2
+ σY
+ 2 σX σY
V ar[g(X, Y )] ≈ σX
∂X
∂Y
∂X
∂Y
(4.11)
(4.12)
Note that approximations may be grossly in error. You should be especially careful with
the variance and covariance approximations.
Problems
1. For certain ore samples the proportion Y of impurities per sample is a random variable
with density function given by
½ ¡3¢ 2
2 y + y, 0 ≤ y ≤ 1 .
f (y) =
0,
elsewhere
The dollar value of each sample is W = 5 − 0.5Y . Find the mean and variance of W.
2. The random variable Y has the following probability density function
½ ¡3¢
2
8 (7 − y) , 5 ≤ y ≤ 7 .
f (y) =
0,
elsewhere
(a) Find E(Y ) and V ar(Y ).
(b) Find an interval shorter than (5, 7) in which least 3/4 of the Y values must lie.
(c) Would you expect to see a measurement below 5.5 very often? Why?
Chapter 5
Some univariate distributions
5.1
Discrete Distributions
A random variable X is said to have a discrete distribution if it can take only a finite
number of different values x1 , x2 , ..., xn , or a countably infinite number of distinct points.
5.1.1
The Bernoulli Distribution
We have this distribution when there are only two possible outcomes to an experiment, one
labeled a success (p) and the other labeled a failure (1 − p = q). If there is only a trial
of an experiment then we have the Bernoulli Distribution with the probability density
function
f (x; θ) = f (x; p) = px (1 − p)1−x .
(5.1)
for X = 0, 1(failure, success) and 0 ≤ p ≤ 1.
E(x) =
1
X
x=0
xf (x) = 0(1 − p) + 1.p = p.
V ar(x) = E(x2 ) − E 2 (x) = 0(1 − p) + 1.p − p2 = p(1 − p) = pq.
5.1.2
(5.2)
(5.3)
The Binomial Distribution
This is also a Bernoulli Distribution but in this distribution we have n independent trials.
µ ¶
n x
n!
px (1 − p)n−x
f (x; θ) = B(x; n, p) =
p (1 − p)n−x =
x! (n − x)!
x
x = 0, 1, ..., n (X is the number of success in n trials) 0 ≤ p ≤ 1.
E(x) = np.
(5.4)
V ar(x) = npq.
23
(5.5)
Quantitative Techniques II
5.1.3
24
Example
Ex1: Assume a student is given a test with 10 true-false questions. Also assume that the
student is totally unprepared for the test and guesses at the answer to every question. What
is the probability that the student will answer 7 or more questions correctly?
Let X is the number of questions answered correctly. The test represents a binomial
experiment with n = 10, p = 1/2. So X ∼ Bin(n = 10, p = 1/2).
P (x ≥ 7) = P (x = 7) + P (x = 8) + P (x = 9) + P (x = 10)
10 µ ¶ µ ¶k µ ¶10−k
10 µ ¶ µ ¶10
X
X
10
1
1
10
1
=
=
2
2
2
k
k
k=7
k=7
= 0.17
5.1.4
Simple Random Walk
This is a process often used to describe the behavior of stock prices. Suppose that { it } is
a purely random series with mean μ and variance σ 2 . Then a process {Xit } is said to be a
random walk if
Xt = Xt−1 +
t
(5.6)
Let us assume that X0 is equal to zero. Then the process evolves as follows:
X1 =
(5.7)
1
X2 = X1 +
=
1
+
2
2
(5.8)
and so on. We have by successive substitution
Xt =
t
X
(5.9)
i
i=1
Hence E(Xt ) = tμ and V ar(Xt ) = tσ 2 . Since the mean and variance change with t, the
process is nonstationary, but its first difference is stationary. Referring to share prices, this
says that the changes in a share price will be purely random process.
5.1.5
Geometric Distribution
Let X be the number of the trial at which the first success occurs. The distribution of
X is known as the geometric distribution. It has the density function
f (x; p) = p(1 − p)x−1
x = 1, 2, 3, ...
(5.10)
Quantitative Techniques II
5.1.6
25
Hypergeometric Distribution
The binomial distribution is often referred to as sampling with replacement, which is
needed to maintain the same probabilities across the trials.
Let there be a objects in a certain class (defective) and b objects in another class
(nondefective).
If we draw a random sample of size n without replacement then there
¡ ¢
are xa (a over x) possible way to get x from class A. For each such outcome, there are
¡ b ¢
n−x (b over (n − x)) possible ways drawing from B. Thus the probability density function
of a hypergeometric distributed X variable is:
¶
µ ¶µ
b
a
x n−x
µ
¶
f (x; n, a, b) =
(5.11)
a+b
n
5.1.7
Negative Binomial Distribution
In a Binomial experiment, let Y be the number of trials to get exactly k success. To
get exactly k success, there must be k − 1 success in y − 1 trials and the next outcome must
be a success. Let X = y − k be the number of failures until k success have been obtained.
The density function of X is known as the negative binomial.
¶
µ
x+k−1 k
p (1 − p)x
f (x; k, p) =
k−1
5.1.8
x = 0, 1, 2, · · ·
(5.12)
Poisson Distribution
When n → ∞, and p → 0 in a binomial distributed variable np = λ (> 0) for all n and p.
The probability of success is very small and the number of trials is large. This is known as
the Poisson distribution.
f (x; λ) =
e−λ λx
x!
x = 0, 1, 2, · · ·
(5.13)
We use this distribution in queuing theory, and modeling the arrival of a next customer
on the check out line or the making of a phone call in a specific small interval.
5.2
Continuous Distributions
Definition 11 For a random variable X if there exists a nonnegative function f (x), defined
on the real line, such that for any interval B,
Z
P (X ∈ B) =
f (x)dx
B
then X is said to have a continuous distribution and the functionf (x) is called the
probability density function or simply density function (or PDF).
Quantitative Techniques II
5.2.1
26
Uniform Distribution on an Interval
A random variable X with the density function
f (x; a, b) =
1
(b − a)
(5.14)
in the interval a ≤ X ≤ b is called the uniform distribution on an interval.
5.2.2
Beta Distribution
The density function for this distribution has the form
f (x) = R 1
0
xm−1 (1 − x)n−1
0<x<1
xm−1 (1 − x)n−1 dx
m, n > 0
(5.15)
The denominator is known as the Beta Function. This distribution, B(m, n), reduces to
the uniform distribution for m = n = 1.
5.2.3
Cauchy Distribution
The standard Cauchy distribution has the density function
f (x) =
1
π(1 + x2 )
−∞<x<∞
(5.16)
The Cauchy distribution arises when the ratio of the two independent normal variates is
computed.
5.2.4
Chi-Square Distribution
P
If Z1 , Z2 , ....., Zn are independent N (0, 1) variables, and X = ni=1 Zi2 , then the probability
density function of a Chi-Squared distributed variable X is
n
x
( x ) 2 −1 e− 2
f (x) = 2
2Γ( n2 )
x>0
n = 1, 2, ...
(5.17)
Where Γ(n) is the Gamma function.
Γ(1/2) =
√
π
(5.18)
Γ(1) = 1
(5.19)
Γ(n) = (n − 1)Γ(n − 1)
Z 1
un−1 e−u du
=
(5.20)
Γ(n + 1) = n!
(5.21)
0
Quantitative Techniques II
5.2.5
27
The Exponential Distribution
The distribution
x
1 −
f (x; θ) = e θ
θ
x>0
θ>0
(5.22)
is called the exponential distribution.
5.2.6
Extreme Value Distribution (Gompertz Distribution)
For modeling extreme values such as the peak electricity demand in a day, maximum rainfall,
and so on, we can use the extreme value distribution which, in its standard form, has the
following density.
f (x) = e−x exp[−e−x ]
5.2.7
−∞<x<∞
(5.23)
F Distribution
If x = (w1 /m)/(w2 /n) where w1 ∼ χ2 (m) and w2 ∼ χ2 (n) are independent, then x ∼
F (m, n) with the following density function
m−2
³ m ´ m Γ[( m+n )]
x 2
2
2
f (x) =
n
m m+n
n
Γ[ m
2 ] Γ( 2 ) [1 + n x] 2
x>0
n = 1, 2, ...
(5.24)
That is the ratio two independent chi-square variables, each divided by its degrees of freedom, has the Snedecor F Distribution, with numerator and denominator degrees of
freedom equal to those of the respective chi-squares.
5.2.8
Gamma Distribution
This distribution has the density function
f (x; α, β) =
1
β α Γ(α)
x
xα−1 e− β
x>0
β>0
(5.25)
When α = 1, the Gamma Distribution reduces to the Exponential distribution.
5.2.9
Geometric Distribution
The density function for this distribution is
f (x) = θxθ−1
.
0<x<1
θ>0
(5.26)
Quantitative Techniques II
5.2.10
28
Logistic Distribution
This distribution has the following density function:
f (x) =
5.2.11
e−x
(1 + e−x )2
− ∞ < x < ∞.
(5.27)
Lognormal Distribution
A random variable X is said to have the standard lognormal distribution if Z = lnX
has the standard normal distribution
z2
1 −
fz (z) = √ e 2
2π
−∞<z <∞
(5.28)
By transforming variable to x, we can write its density function
µ
¶
1
(ln x)2
1
fx (x) = √ exp −
2
x
2π
0<z<∞
(5.29)
Because lnX is defined only for positive X and most economic variables take only positive
values, this distribution is very popular in economics. It has been used to model the size
of firms, stock prices at the end of a trading day, income distributions, expenditure on
particular commodities, and certain commodity prices.
5.2.12
The Normal Distribution
A random variable X with the density function
1
f (x; μ, σ) = p
σ (2π)
∙
¸
1 (x − μ)2
−
σ2
e 2
(5.30)
is called a Normal (Gaussian) distributed variable. The integral in the cdf of the standard
normal distribution does not have a closed form solution but requires numerical integration.
5.2.13
Pareto Distribution
The density function for this distribution is
f (x) =
θ h x0 iθ+1
x0 x
x > x0
θ>0
(5.31)
Although the lognormal distribution is often used to model the distribution of incomes, it
has been found to approximate incomes in the middle range very well but to fail in the
upper tail. A more appropriate distribution for this purpose is the Pareto distribution.
Quantitative Techniques II
5.2.14
29
Student’s t Distribution
q
If Z ∼ N (0, 1) and W ∼ χ2(n) with Z and W being independent, and X = Z/ W
k , then the
probability density function of X is
1 Γ[( n+1
2 )]
f (x) = √
n
n Γ[ 2 ] Γ( 12 )
∙
¸− 1 (n+1)
x2 2
1+
n
(5.32)
The probability density function is symmetric, centered at zero, and similar in shape to
a standard normal probability density function.
5.2.15
Weibull Distribution
In some more general situations the conditions for the exponential distribution are not met.
An exponential distribution provides an appropriate model for the lifetime of an equipment
but it is not suitable for the lifetime of human population. The exponential distribution
thus is called memoryless. The distribution has the density function
b
f (x; a, b) = abxb−1 e−ax
x>0
a, b > 0
(5.33)
Note that when b = 1, this reduces to the exponential distribution.
Problems
1. Find the values z0 in the following probabilities.
(a) P (Z > z0 ) = 0.50
(b) P (Z < z0 ) = 0.8643
(c) P (−z0 < Z < z0 ) = 0.90
(d) P (−z0 < Z < z0 ) = 0.99
2. A soft drink machine can be regulated so that it discharges an average of μ ounces per
cup. If the ounces of fill are normally distributed with σ 2 = (0.3)2 , give the setting
for μ so that 8-ounce cups will overflow only 1 percent of the time.
(Note: P (Z > 2.33) = 0.01).
3. Let f1 (y) and f2 (y) be density functions, and let a be a constant such that 0 ≤ a ≤ 1.
Consider the function f (y) = af1 (y) + (1 − a)f2 (y).
(a) Show that f (y) is a density function. Such a density function is often referred to
as a mixture of two density functions.
(b) Suppose that Y1 is a random variable with density function f1 (y), and that
E(Y1 ) = μ1 and V ar(Y1 ) = σ12 , and similarly suppose that Y2 is a random
variable with density function f2 (y), and that E(Y2 ) = μ2 and V ar(Y2 ) = σ22 .
Assume that Y is a random variable whose density is a mixture of the densities
corresponding to Y1 and Y2 .
Quantitative Techniques II
(i) Show that E(Y ) = aμ1 + (1 − a)μ2 .
(ii) Show that V ar(Y ) = aσ12 + (1 − a)σ22 + a(1 − a)[μ1 − μ2 ]2 .
[Hint: E(Yi2 ) = μ2i + σi2 , i = 1, 2]
30
Chapter 6
Multivariate distributions
6.1
Bivariate Distributions
In most cases, the outcome of an experiment may be characterized by more than one
variable. For instance, X may be the income, Y the total expenditures of a household, and
Z be family size. We observe (X, Y, Z).
Definition 12 Joint Distribution Function: Let X and Y be two random variables. Then
the function FXY (x, y) = P (X ≤ x and Y ≤ y) is called the joint distribution function.
1) FXY (x, ∞) = F (x) and FXY (∞, y) = F (y).
2) FXY (−∞, y) = FXY (x, −∞) = 0.
Definition 13 Joint Probability Density Function
Discrete probability function:
fXY (x, y) = P (X = x, Y = y)
(6.1)
Continuous probability function:
fXY (x, y) =
and hence
FXY (x, y) =
Z
x
−∞
Z
∂ 2 F (x, y)
∂x∂y
(6.2)
y
fXY (u, v)dudv
(6.3)
−∞
In the univariate case, if ∆x is a small increment of x, then fX (x)∆x is the approximate
probability that x − (1/2)∆x < X ≤ x + (1/2)∆x. Similarly, in a bivariate distribution,
fXY (x, y)∆x∆y is the approximate probability that x − (1/2)∆x < X ≤ x + (1/2)∆x and
y − (1/2)∆y < Y ≤
(1/2)∆y. The bivariate density function satisfies the conditions
R∞
R ∞y +
fXY (x, y) ≥ 0 and −∞ −∞ dF (x, y) = 1 where dF (x, y) is the bivariate analog of dF (x).
Definition 14PMarginal Density Function: If X and Y are discrete random
variables,
P
then fX (x) = Y fXY (x, y) is the marginal density of X, and
R fY (y) = X fXY (x, y) is
the marginal density of Y .R In the continuous case, fX (x) = fXY (x, y)dy is the marginal
density of X and fY (y) = fXY (x, y)dx is the marginal density of Y .
31
Quantitative Techniques II
32
Definition 15 Conditional Density Function: The conditional density of Y given X = x
is defined as f (y | x) = f (x, y)/f (x), provided f (x) 6= 0. The conditional density of X
given Y = y is defined as f (x | y) = f (x, y)/f (y), provided f (y) 6= 0. This definition holds
for both discrete and continuous random variables.
Definition 16 Statistical Independence: The random variables X and Y are said to be
statistically independent if and only if f (y | x) = f (y) for all values of X and Y for which
f (x, y) is defined. Equivalently, f (x | y) = f (x) and f (x, y) = f (x)f (y).
Theorem 27 Random variables X and Y with joint density function f (x, y) will be statistically independent if and only if f (x, y) can be written as a product of two nonnegative
functions, one in X alone and another in Y alone.
Theorem 28 If X and Y are statistically independent and a, b, c, d are real constants with
a < b and c < d, then P (a < X < b, c < Y < d) = P (a < X < b)P (c < Y < d).
6.1.1
The Bivariate Normal Distribution
Let (X, Y ) have the joint density
f (x, y) =
⎡
1
− 2(1−ρ
2)
⎤
µ
1
´ ¶ ⎦
³
p
exp ⎣ ³ x−μx ´2
(x−μx )(y−μy )
y−μy 2
2
− 2ρ
+ σy
2πσx σy (1 − ρ )
σx
σx σy
(6.4)
for −∞ < x < ∞, −∞ < y < ∞, −∞ < μX < ∞, −∞ < μY < ∞, σX , σY > 0
−1 < ρ < 1. Then (X, Y ) is said to have the bivariate normal distribution.
and
2 )
Theorem 29 If (X, Y ) is bivariate normal, then the marginal distribution of X is N (μX , σX
and that of Y is N(μY , σY2 ). The converse of this theorem need not to be true, that is, if
the marginal distribution of X is univariate normal, the joint density between X and Y not
be bivariate normal.
Theorem 30 For a bivariate normal, the conditional density of Y given X = x is univariate normal with mean μY + (ρσY /σX )(x − μX ) and variance σY2 (1 − ρ2 ). The conditional
density of X given Y = y is also normal with mean μX + (ρσX /σY )(y − μY ) and variance
2 (1 − ρ2 ).
σX
In the case of the bivariate normal density, the conditional expectation E(Y | X) is
the form α + βX, where α and β depend on the respective means, standard deviations
and the correlation coefficient. This is a simple linear regression in which the conditional
expectation is a linear function of X.
6.1.2
Mixture Distributions
If the distribution of random variables depend on parameters or variables which themselves
depend on other random variables then we say that we have mixture distributions. This
might take the form f (x; θ) where θ depends on a random variable or the form f (x | y),
where Y is another random variable. For example the unobserved heterogeneity in hazard
models is an example to the latter. The density function for the duration t, f (t | v) is
conditional on an unobserved heterogeneity term which in turn itself is a random variable.
Quantitative Techniques II
6.2
33
Multivariate Density Functions
The joint density function of X1 , X2 , ...Xn have the form f (x1 , x2 , ...xn ). If Xs are continuous random variables,
fX (x1 , x2 , · · · xn ) =
6.2.1
∂ n FX (x1 , x2 , · · · xn )
∂x1 ∂x2 · · · ∂xn
(6.5)
The Multivariate Normal Distribution
Definition 17 Mean vector: Let X 0 = (X1 , X2 , ...Xn ) be an n−dimensional vector random
variable defined in <n with a density function f (x), E(Xi )=μi , and μ0 = (μ1 , μ2 , ...μn ).
Then the mean of the distribution is μ = E(X), where μ and E(X) are nx1 vectors, and
hence E(X − μ) = 0.
Definition 18 Covariance Variance Matrix: The covariance between Xi and Xj is defined
as σij = E[(Xi − μi )(Xj − μj )], where μi = E(Xi ). The matrix
⎡
σ11 σ12
X ⎢ σ21 σ22
V ar(X) =
=⎢
⎣ .
.
σn1 σn2
.
.
.
.
.
.
.
.
⎤
. σ1n
. σ2n ⎥
⎥
.
. ⎦
. σnn
(6.6)
also denoted as V ar(X), is called the covariance matrix of X. In matrix notation, this can
be expressed as Σ = E[(X − μ)(X − μ)0 ], where (X − μ) is nx1.
Note that the diagonal elements are variances.
Properties:
1) If Ymx1 = Amxn Bnx1 + bmx1 , then E(Y ) = Aμ + b.
2) Σ is a symmetric positive semi-definite matrix.
3) Σ is positive definite if and only if it is nonsingular.
0
0
4) Σ = E[XX ] − μμ .
0
5) If Y = AX + b, then the covariance matrix of Y is AΣA .
6.2.2
Standard multivariate normal density
Let X1 , X2 , ...Xn be n independent random variables each of which is N (0, 1). Then their
joint density function is the product of individual density functions and is the standard
multivariate normal density.
¸
¸
∙
∙ Pn
x2i
1 n
i=1
√
fX (x1 , x2 , · · · , xn ) =
exp −
2
2π
" 0 #
∙ ¸n
1 2
xx
=
(6.7)
exp −
2π
2
We have the density function of the general multivariate normal distribution N (μ, Σ)
as
Quantitative Techniques II
34
"
0
(y − μ) Σ−1 (y − μ)
exp
−
fY (y) =
n
1
2
(2π) 2 | Σ | 2
1
#
(6.8)
Properties:
1) If Y is multivariate normal, then Y1 , Y2 , ...., Yn will be independent if and only if Σ is
diagonal.
2) A linear combination of multivariate normal random variables is also multivariate
normal. More specifically, let Y ∼ N (μ, Σ). Then Z = AY ∼ N (Aμ, AΣA0 ), where A is an
nxn matrix.
3) If Y ∼ N (μ, Σ) and Σ has rank k < n, then there exists a nonsingular kxk matrix
such that the kx1 matrix X = [A−1 O](Y − μ) is a k−variate normal with zero mean and
covariance matrix. Ik , where O is a kx(n − k) matrix of zeros.
6.2.3
Marginal and Conditional Distributions of N(μ, Σ)
Let Y ∼ N (μ, Σ), and consider the following partition
¸
∙
¸
∙
¸
∙
μ1
Σ11 Σ12
Y1
;
μ=
;
Σ=
;
Y =
Y2
μ2
Σ21 Σ22
(6.9)
where the n random variables are partitioned into n1 and n2 variates (n1 + n2 = n).
Theorem 31 Given the above partition, the marginal distribution of Y1 is N (μ1 , Σ11 ) and
the conditional density of Y2 given Y1 is multivariate normal with mean μ2 +Σ21 Σ−1
11 (Y1 −μ1 )
−1
and covariance matrix Σ22 -Σ21 Σ11 Σ12 .
6.2.4
The Chi-Square Distribution
Theorem: If Z1 , Z2 , · · · , Zn are independent N (0, 1) variables, and X = Σni=1 Zi2 , then the
probability density function of a Chi-Squared distributed variable X is
n
f (x) =
x
( x2 ) 2 −1 e− 2
2Γ( n2 )
x>0
n = 1, 2, ...
(6.10)
Where Γ(n) is the Gamma function.
The chi-square distribution has the additive property, that is if X ∼ χ2m , Y ∼ χ2n , and
X and Y are independent, then their sum (X + Y ) ∼ χ2m+n . Thus the sum of independent
chi-square is also chi-square with degrees of freedom as the sum of the degrees of freedom.
Theorem 32 If Xi ∼ N (μi , σi2 ), and i = 1, 2, .., n and X1 , X2 , ..., Xn are all independent,
then Y = Σni=1 [(Xi − μi )/σi ]2 has the chi-square distribution with n degrees of freedom.
Quantitative Techniques II
35
Problems
1. Let Y1 and Y2 have the bivariate uniform distribution
½
1,
0 ≤ y1 ≤ 1; 0 ≤ y2 ≤ 1
f (y1 , y2 ) =
0,
otherwise
(a) Sketch the probability density surface
(b) Find F (0.2, 0.4).
(c) Find P (0.1 ≤ Y1 ≤ 0.3, 0 ≤ Y2 ≤ 0.5).
2. Let
f (y1 , y2 ) =
½
2y1 ,
0,
0 ≤ y1 ≤ 1;
0 ≤ y2 ≤ 1
otherwise
(a) Sketch f (y1 , y2 ).
(b) Find the marginal density functions for Y1 , and Y2 .
Chapter 7
Sampling, sample moments,
sampling distributions, and
simulation
7.1
Independent, Dependent, and Random Samples
The totality of elements about which some information is desired is called a population.
We often use a small proportion of a population (known as a sample) and measure attributes it and draw conclusions or make policy decisions based on the data obtained. By
statistical inference, we estimate of the unknown parameters underlying statistical distributions, measure their precision, test hypotheses on them, and use them generating forecasts
of random variables.
Definition 19 Independent Sample: The observations x1 , x2 , · · · , xn are said to form an
0
independent sample if the joint density function of the xi s has the form
fX (x1 , x2 , · · · , xn ) =
n
Y
fXi (xi ; θi )
(7.1)
i=1
0
fXi might be different across i, here we are not assuming that the x s have the same distribution.
Definition 20 Random Sample: A random sample from a population is a set of independent, identically distributed (abbreviated as iid) random variables x1 , x2 , · · · , xn , each of
which has the same distribution as X.
fX (x1 , x2 , · · · , xn ) = f (x1 )f (x2 ) · · · f (xn ) =
n
Y
f (xi ) = [f (xi )]n
(7.2)
i=1
Definition 21 Dependent Sample: If the observations are obtained over time or if there is
a dependency between observations in a cross section data then we have a dependent sample.
The joint density fX (x1 , x2 , · · · , xn ; θ) can be factored as follows: f (xn | x1 , x2 , ...., xn−1 ; θ)
f (x1 , x2 , ...., xn−1 ; θ).
36
Quantitative Techniques II
7.2
37
Sample Statistics
Definition 22 Statistics: A statistic is a function of the observable random variable(s)
that does not contain any unknown parameters. Examples are: Sample mean, sample variance, sample moments, sample covariance, sample correlation coefficient.
Theorem 33 If x1 , x2 , · · · , xn is a random sample from a population with mean μ and
0
variance σ 2 and all c0i s are constant, then Y = c1 x1 + c2 x2 + · · · + cn xn = c x has the
following expectation and variance:
à n !
X
ci μ
E(Y ) =
i=1
= (c1 + c2 + · · · cn )μ.
V ar(Y ) = (c21 + c22 + · · · c2n ) σ 2
= σ 2 c0 c.
(7.3)
(7.4)
Corollary 34
E(x̄) = μ.
V ar(x̄) =
7.3
σ2
.
n
(7.5)
(7.6)
Sampling Distributions
Because a sample statistic is a function of random variables, it has a statistical distribution.
This is called sampling distribution of the statistic. If we obtain a sample of n observations and compute the statistic, we obtain a numerical value. By repeating this process we
get a sequence of values of the statistic. This can be tabulated in the form of a frequency
distribution.
As an example take the normal distribution case. Linear combinations of normal variates
√
are also normally distributed. Z = n(x̄ − μ)/σ ∼ N (0, 1) will converge to a normal
distribution as n → ∞, even though the parent distribution was not normal.
Problems
1. Suppose that X1 , X2 , ..., Xm and Y1 , Y2 , ..., Yn are independent random samples, with
the variables Xi normally distributed with mean μ1 and variance σ12 and the variables
Yi normally distributed with mean μ2 and variance σ22 . The difference between the
sample means, X̄ − Ȳ , is then a linear combination of m + n normal random variables,
and is itself normally distributed.
(a) Find E(X̄ − Ȳ ).
(b) Find V ar(X̄ − Ȳ ).
Quantitative Techniques II
38
(c) Suppose that σ12 = 2, σ22 = 2.5, and m = n. Find the sample size so that (X̄ − Ȳ )
will be within one unit of (μ1 − μ2 ) with probability 0.95.
2. If Y is a random variable that has an F distribution with ν1 numerator and ν2 denominator degrees of freedom, show that U = 1/Y has an F distribution with ν2
numerator and ν1 denominator degrees of freedom.
3. If T has a t distribution with ν degrees of freedom, then show that U = T 2 has an
F distribution with 1 numerator degree of freedom and ν denominator degrees of
freedom.
Chapter 8
Large sample theory
8.1
Different Types of Convergence
In many situations it is not possible to derive exact distributions of several statistics based
on a random sample of observations. The problem disappears in most cases, however, if the
sample size is large because we can derive approximate distributions. Hence we need for
large sample or asymptotic distribution theory.
Definition 23 Limit of a sequence: Suppose a1 , a2 , ...., an constitute a sequence of real
numbers. If there exists a real number a such that for every real > 0, there exists an
integer N( ) with the property that for all n > N ( ), we have | an − a |< , then we say that
a is the limit of the sequence {an } and write limn→∞ an = a.
Intuitively, if an lies in an neighborhood of a (a − , a + ) for all n > N ( ), then a
said to be the limit of the sequence {an }. Examples where limits are exists are
∙
µ ¶¸
1
= 1 and
(8.1)
lim 1 +
n→∞
n
h
³ a ´in
lim 1 +
= ea
n→∞
n
(8.2)
The notion of convergence is easily extended to that of a function f (x).
Definition 24 Limit of a function: The function f (x) has the limit A at the point x0 , if
for every > 0 there exists a δ( ) > 0 such that | f (x) − A |< whenever 0 <| x − x0 |< δ( )
Definition 25 Convergence in Distribution: Given a sequence of random variables Xn
whose CDF is Fn (x), and a CDF FX (x) corresponding to the random variable X, we say
d
that Xn converges in distribution to X, and write Xn → X , if limn→∞ Fn (x) = FX (x)
at all points x at which FX (x) is continuous.
Intuitively, convergence in distribution occurs when the distribution of Xn comes closer
and closer to that of X as n increased indefinitely. Thus, FX (x) can be taken to be an
approximation to the distribution of Xn when n is large. Remember Poisson distribution.
39
Quantitative Techniques II
40
Definition 26 Convergence in Probability: The sequence of random variables Xn is said
to converge in probability to the real number x if limn→∞ P [| Xn − x |> ] = 0 for each
> 0. Thus it becomes less and less likely that the random variable Xn − x lies the outside
the interval (− , + ). Equivalent definitions are given below.
1. limn→∞ P [| Xn − x |< ] = 1,
> 0.
2. Given > 0 and δ > 0, there exists N ( , δ) such that P [| Xn − x |> ] < δ, for all
n > N.
3. P [| Xn − x |< ] > 1 − δ , for all n > N , that is, P [| XN+1 − x |< ] > 1 − δ,
P [| XN+2 − x |< ] > 1 − δ, and so on.
p
We write Xn → x or plim Xn = x.
The sequence of random variables Xn is said to converge in probability to the random
variable X if the sequence of their differences (Xn − X) converges in probability to 0. This
result known as the weak law of large numbers.
Definition 27 Convergence in mean (r): The sequence of random variables Xn is said to
(r)
converge in mean of order (r) to X (r ≥ 1), and designated Xn → X, if E[| Xn − X |r ]
exists and limn→∞ E[| Xn −X |r ] = 0, that is, if r th moment of the difference tends to zero.
The most commonly used version is mean squared convergence, which is when r = 2.
This means that sample mean (x̄n ) converges in mean square to μ. Because V ar(x̄n ) =
E[(x̄n − μ)2 ] = (σ 2 /n) tends to zero as n goes to infinity.
Definition 28 Almost Sure Convergence: The sequence of random variables Xn is said to
a.s.
converge almost surely to the real number x, and is written Xn → x, if P [limXn = x] =
1. In other words, the sequence Xn may not converge everywhere to x, but the points where
it does not converge form a set of measure zero in the probability sense. More formally,
given , δ > 0, there exists N such that P [| XN+1 − x |< , | XN+2 − x |< ].. > 1 − δ, that
is, the probability of these events jointly occurring can be made arbitrarily close to 1. Xn is
a.s
said to converge almost surely to the random variable X if (Xn − X) → 0.
p
d
d
Theorem 35 If Xn → X and Yn → c(6= 0), where c is a constant, (a) (Xn +Yn ) → (X +c),
d
and (b) (Xn /Yn ) → (X/c). Note that c is a constant.
p
p
p
p
Theorem 36 If Xn → X and Yn → Y , then (a) (Xn + Yn ) → (X + Y ), (b)(Xn Yn ) → XY ,
p
and (c) if Yn and Y 6= 0, then (Xn /Yn ) → X/Y .
p
p
Theorem 37 If g(·) is a continuous function, then Xn → X implies that g(Xn ) → g(X).
In other words, convergence in probability is preserved under continuous transformations.
p
Theorem 38 Convergence in probability implies convergence in distribution, that is, Xn →
d
X =⇒ Xn → X, but the converse need not to be true.
Theorem 39 Convergence in mean of order r implies convergence in mean of an order
(r)
(s)
less than r, that is, Xn → X =⇒ Xn → X (r > s), but the converse need not to be true.
Quantitative Techniques II
41
Theorem 40 Convergence in mean of order r ≥ 1 implies convergence in probability, but
the converse need not to be true.
Theorem 41 Almost sure convergence implies convergence in probability, but the converse
need not be true.
a.s
Xn → X –––––––––––>
p
d
Xn → X ––—> Xn → X
(r)
(s)
Xn → X –––—> Xn → X––—>
Relationships Among Modes of Convergence
a.s.
Theorem 42 Xn → X if and only if P [sup∞
j=n | Xj − X |> ] → 0 as n → ∞ for any
> 0.
P
a.s.
Theorem 43 If ∞
n=1 P (| Xn − X |> ] is finite for each > 0, then Xn → X.
Theorem 44 If
P∞
a.s.
E(| Xn − X |r ] is finite for some r > 0, then Xn → X.
n=1
a.s
a.s.
Theorem 45 If the functions g(·) is continuous at X and Xn → X, then g(Xn ) → g(X).
8.2
The Weak Law of Large Numbers
We know that the sample mean approaches the population mean when the sample size
becomes large. This is called the weak law of large numbers (WLLN), which holds
under a variety of different assumptions.
Theorem 46 Khinchin’s theorem: Let P
{Xn , n ≥ 1} be a sequence of iid random variables
with the finite mean μ, and let (X̄n ) = ( ni=1 Xi )/n. Then limn→∞ P [| X̄n − μ |> ] = 0 or
equivalently limn→∞ P [| X̄n − μ |≤ ] = 1. In other words, plim(X̄n ) = μ.
8.3
The Strong Law of Large Numbers
The WLLN stated that under certain conditions the sample mean converges in probability
to the population mean. We can in fact derive a stronger result, namely, that the sample
mean converges almost surely to the population mean. This is the strong law of large
numbers. (SLLN).
of random variables with E(X i ) = μi < ∞,
As before,P
let X1 , X2 , ..., Xn be a sequence
Pn
n
and (X̄n ) = ( i=1 Xi )/n, and (μ̄n ) = ( i=1 μi )/n. Then under certain conditions we can
a.s.
show that (X̄n − μ̄n ) → 0.
0
a.s.
Theorem 47 If the Xi s are iid, then (X̄n − μ̄n ) → 0.
0
Theorem 48 Kolmogorov’s Theorem on SLLN: If the Xi s are independent with finite variP
a.s.
2
ances, and if ( ∞
n=1 V ar(Xn )/n < ∞, then (X̄n − μ̄n ) → 0.
0
a.s.
Theorem 49 If the X s are iid, then a necessary and sufficient condition for (X̄n −μ̄n ) → 0
is that E | Xi − μi |< ∞ for all i.
Quantitative Techniques II
8.4
42
The Central Limit Theorem
Perhaps the most important theorem in large sample theory is the central limit theorem,
which states, under quite general conditions, the mean of a sequence of random variables
(such as the sample mean for example) converges to a normal distribution even though the
population is not normal. Thus, even if we did not know the statistical distribution of the
population from which a sample is drawn, by having a large sample we can approximate
quite well the distribution of the sample mean by the normal distribution.
Theorem 50 Central Limit
Theorem: Let X1 , X2 , ..., Xn be a sequence of random variP
ables, Sn be their sum ni=1 Xi and (X̄n ) be their mean (Sn /n). Define the standardized
mean
X̄n − E(X̄n )
Sn − E(Sn )
= p
Zn = p
V ar(Sn )
V ar(X̄n )
(8.3)
d
Then, under a variety of alternative assumptions (stated below) Zn → N (0, 1).
Problem
1. Explain the relationship among modes of convergence.
2. The service times for customers coming through a checkout counter in a retail store are
independent random variables with mean 1.5 minutes and variance 1.0. Approximate
the probability that 100 customers can be serviced in less than 2 hours of total service
time.
Chapter 9
Estimation and properties of
estimators
The formula for obtaining the estimate of a parameter is referred to as an estimator, it
is a function of the observations x1 , x2 , · · · , xn and the numerical value associated with it
is called an estimate. There are two types of parametric estimation; Point and Interval
Estimation.
9.1
Point Estimation
A point estimation procedure uses the information in the sample to arrive at a single number
that is intended to be close to the true value of the target parameter in the population. For
example, the sample mean
Pn
Yi
(9.1)
Ȳ = i=1
n
is one possible point estimator of the population mean μ.
9.1.1
Small Sample Criteria for Estimators
The standard notation for an unknown parameter is θ and an estimator of θ is denoted by
θ̂. The parameter space is denoted by Θ. A function g(θ) is called estimable if there exists
a statistic u(x) such that E[u(x)] = g(θ).
Unbiasedness
An estimator θ̂ is called unbiased estimator of θ if E(θ̂) = θ. If E(θ̂) − θ = b(θ) and b(θ)
is nonzero, it is called bias.
Mean Square Error
A commonly used measure of the adequacy of an estimator is E[(θ̂ − θ)2 ], which is called
the mean square error (MSE). It is a measure of how close θ̂ is, on average, to the true
43
Quantitative Techniques II
44
θ. It can also be written as follows:
M SE = E[(θ̂ − θ)2 ]
= E[(θ̂ − E(θ̂) + E(θ̂) − θ)2 ]
= V ar(θ̂) + bias2 (θ)
(9.2)
Relative Efficiency
Let θ̂1 and θ̂2 be two alternative estimators of θ. Then the ratio of the respective M SEs,
E[(θ̂1 − θ)2 ]/E[(θ̂2 − θ)2 ], is called the relative efficiency of θ̂1 with respect to θ̂2 .
Uniformly Minimum Variance Estimators An estimator θ̂ of θ is called a uniformly
minimum variance unbiased estimator (UMVU) if E(θ̂1 ) = θ and for any other
unbiased estimator θ∗ , V ar(θ̂1 ) ≤ V ar(θ∗ ) for every θ. Thus, among the class of unbiased
estimators, a UMVU has the smallest variance.
Sufficiency
Definition: Let θ̂ be a sample statistics and θ∗ any other statistics not a function of θ̂. Also,
let f (x; θ) be density function. θ̂ is said to be a sufficient statistic for θ if and only if the
conditional density of θ∗ given θ̂ is independent of θ, for every choice of θ∗ . Equivalently,
the conditional density of the sample given θ̂, that is f (x1 , x2 , · · · , xn | θ̂), is independent
of θ.
Minimal Sufficiency θ̂ is minimal sufficient if, for any other sufficient statistic θ∗ we
can find a function h(·) so that θ̂ = h(θ∗ ).
9.1.2
Large Sample Properties of Estimators
Asymptotic Unbiasedness
√
If an estimator has the property that V ar(θ̂) and n(θ̂n − θ) tend to zero as the sample
size increases, then it is said to be asymptotically unbiased.
Consistency
Another desirable property of θ̂ is that as sample size n increases, θ̂ must approach to the
true θ. This property is called consistency. We have three types of consistency measures.
Simple Consistency Let θ̂1 , θ̂2 , · · · , θ̂n be a sequence of estimators of θ. This sequence
is a simple consistent estimator of θ if, for every > 0,
³
´
lim P | θ̂ − θ |<
=1
θ∈Θ
(9.3)
n→∞
p
Thus θ̂n is a simple consistent estimator if P limθ̂n = θ. This is, (θ̂n → θ) convergence
in probability.
Quantitative Techniques II
45
Squared-error Consistency The sequence (θ̂n ) is a squared-error consistent estimator if
lim [E(θ̂n − θ)2 ] = 0
n→∞
(9.4)
m.s.
This is, (θ̂n → θ) convergence in mean square.
Strong Consistency θ̂n is said to be strongly consistent if
P [ lim θ̂n = θ] = 1
n→∞
(9.5)
a.s.
This is, (θ̂n → θ) almost sure convergence.
Asymptotic Efficiency
Definition: Let θ̂n be a consistent estimator of θ. θ̂n is said to be asymptotically efficient
if there is no other consistent estimator θn∗ for which
limsupn→∞ {[E(θ̂n − θ)2 ]/[E(θn∗ − θ)]} > 1
(9.6)
for all θ in some open interval.
Best Asymptotic Normality
The sequence of estimators (θ̂n ) is a best asymptotically normal (BAN) estimator if all
the following conditions are satisfied.
p
1. θ̂n → θ for every θ ∈ Θ, that is, θ̂n is consistent.
√
d
2. The distribution of n(θ̂n − θ) → N[0, σ 2 (θ)], where σ 2 (θ) = limV ar(θ̂n ).
3. There is no other sequence (θn∗ ) that satisfies (1) and (2) and is such that σ 2 (θ) >
σ ∗2 (θ) for every θ in some open interval. [σ ∗2 (θ) = limV ar(θn∗ )].
9.2
Interval Estimation
Instead of obtaining a point estimate of a parameter, we estimate an interval within which
the value of the parameter is contained with some probability.
Let X1 , X2 , ..., Xn be a random sample from f (x; θ). Let T1 = t1 (X1 , X2 , ..., Xn ) and
T2 = t2 (X1 , X2 , ..., Xn ) be two statistic satisfying T1 ≤ T2 for which
Pr(T1 < τ (θ) < T2 ) = α.
(9.7)
(T1 , T2 ) is a 100α percent confidence interval for τ (θ), T1 , T2 are the lower and upper
confidence limits, and α is the confidence coefficient.
Pr [T1 < τ (θ)] = α
(9.8)
Pr [τ (θ) < T2 ] = α
(9.9)
is a one-sided lower CI for τ (θ).
is a one-sided upper CI for τ (θ).
Quantitative Techniques II
9.2.1
46
Pivotal-quantity method of finding CI
Let Q = q(X1 , X2 , ..., Xn ). If the distribution of Q does not depend on θ, Q is defined to be
a pivotal quantity. e.g., X1 , X2 , ..., Xn is a random sample from N (μ, 1).
x̄ − μ is a pivotal quantity since x̄ − μ ∼ N (0, 1/n).
x̄/μ is not a pivotal quantity since x̄/μ ∼ N (1, 1/μ2 n).
If Q = q(X1 , X2 , ..., Xn ; θ) is a pivotal quantity with a pdf, then for any fixed x ∈ (0, 1)
there will exist q1 and q2 such that P (q1 < Q < q2 ) = α.
But P (cq1 < cQ < cq2 ) = α = P (d + cq1 < d + cQ < d + cq2 ), that is the probability
of the event P (q1 < Q < q2 ) is unaffected by a change of scale or a translation of Q. Thus,
if we know the pdf of Q, it may be possible to use these operations to form the desired
confidence interval.
For example, assume that
f (x; μ) ≡ N (μ, 1) and x̄ ∼ N(μ, 1/n)
Q=
x̄ − μ
√1
n
Q is a pivotal quantity.
Ã
!
x̄ − μ
P q1 < 1 < q2
= α
√
n
³
So, x̄ −
9.2.2
√1 q2 , <
n
x̄ −
√1 q1
n
´
∼ N (0, 1);
¶
1
1
= P √ q1 < x̄ − μ < √ q2
n
n
µ
¶
1
1
= P √ q1 − x̄ < −μ < √ q2 − x̄
n
n
µ
¶
1
1
= P x̄ − √ q2 < μ < x̄ − √ q1
n
n
(9.10)
(9.11)
µ
(9.12)
is a 100α percent confidence interval for μ.
CI for the mean of a normal population
Consider the case where both μ and σ 2 are unknown.
x̄ − μ
√σ
n
∼ N (0, 1)
(9.13)
Although this is a pivotal quantity the presence of σ means that we cannot compute a CI.
We know, however, that
x̄ − μ
√s
n
for all σ 2 > 0.
∼ t(n−1)
(9.14)
Quantitative Techniques II
47
Using the table of the t-distribution, we can always find a number b such that
!
Ã
x̄ − μ
P −b < s < b = α
√
n
(9.15)
which can be written as
µ
¶
s
s
P x̄ − b √ < μ < x̄ + b √
=α
n
n
(9.16)
b = t( 1−α , (n−1))
(9.17)
2
Example
Ex1: n = 10, x̄ = 3.22, s = 1.17, α = 0.95. The 95 percent CI for μ is
(2.262)(1.17)
(2.262)(1.17)
√
√
, 3.22 +
10
10
or (2.38, 4.06)
3.22 −
9.2.3
CI for the variance of a normal population
We know that
Q=
(n − 1)s2
∼ χ2(n−1) ,
σ2
(9.18)
hence the Q is a pivotal quantity.
P (q1 < Q < q2 ) = α
So,
³
(n−1)s2 (n−1)s2
, q1
q2
´
¶
µ
(n − 1)s2
< q2
= P q1 <
σ2
µ
¶
(n − 1)s2
(n − 1)s2
2
= P
<σ <
q2
q1
is a 100α percent CI for σ 2 .
Example
Ex2: n = 10, x̄ = 3.22, s = 1.17, α = 0.95. The 95 percent CI for σ 2 is
⎞
⎛
2
2
(n
−
1)s
(n
−
1)s
⎠
⎝
,
χ2 1−α , (n−1) χ2 1+α , (n−1)
( 2
) ( 2
)
χ2(0.025, (9)) = 19.02, and χ2(0.975, (9)) = 2.70 so the interval is (0.65, 4.56)
(9.19)
Quantitative Techniques II
9.3
48
Problems
1. Suppose that Y1, Y2 , ..., Yn denote a random sample with a density function
f (yi ) =
½
1 −
θe
yi
θ
0,
, yi > 0
elsewhere
with mean θ. Find the MLE of the population variance θ2 .
2. Suppose that Y is normally distributed with mean 0 and unknown variance σ 2 . Then
Y 2 /σ 2 has a χ2 distribution with 1 degree of freedom. Use this distribution to find:
(a) a 95 percent confidence interval for σ 2 .
(b) a 95 percent upper confidence limit for σ 2
(c) a 95 percent lower confidence limit for σ 2 .
3. Suppose that E(θ̂1 ) = E(θ̂2 ) = θ, V ar(θ̂1 ) = σ12 , and V ar(θ̂2 ) = σ22 . A new unbiased
estimator θ̂3 is to be formed by
θ̂3 = aθ̂1 + (1 − a)θ̂2 .
(a) How should a constant a be chosen in order to minimise the variance of θ̂3 ?
Assume that θ̂1 and θ̂2 are independent.
(b) How should a constant a be chosen in order to minimise the variance of θ̂3 ?
Assume that θ̂1 and θ̂2 are not independent but are such that Cov(θ̂1 ,θ̂2 ) = c 6= 0.
Chapter 10
Tests of statistical hypotheses
The testing of statistical hypotheses on the unknown parameters of a probability model
is one of the most important steps of any empirical study. Three test situations can be
mentioned:
- test of alternative models, for drawing conclusions that are not model sensitive
- test of policy change effects, and
- test of the validity of an economic theory.
10.1
Basic Concepts in Hypothesis Testing
Consider a family of distributions represented by the density function f (x; θ), θ ∈ Θ. The
term hypothesis stands for a statement or conjecture regarding the values that θ might take.
The testing of a hypothesis consists of three basic steps:
1) formulate two opposing hypotheses,
2) derive a test statistic and identify its sampling distribution, and
3) derive a decision rule and choose one of the opposing hypothesis.
10.1.1
Null and Alternative Hypotheses
A hypothesis can be thought of as a binary partition of the parameter space Θ into two
sets, Θ0 and Θ1 such that Θ0 ∩ Θ1 =® and Θ0 ∪ Θ1 =Θ. The set Θ0 , which corresponds
to the statement of the hypothesis, is called the null hypothesis and denoted by H0 ,
and Θ1 , which is the class of alternatives to the null hypothesis, is called the alternative
hypothesis and denoted by H1 .
10.1.2
Simple and Composite Hypotheses
If the null hypothesis is of the form H0 : θ = θ0 and the alternative is H1 : θ = θ1 , then we
have a simple hypothesis and a simple alternative. If either H0 or H1 specifies a range
of values for θ (for example, H1 : θ 6= θ0 ) then we have a composite hypothesis. If we
have a simple hypothesis and a simple alternative only, in this case the problem reduces to
one of choosing between the two density functions f (x; θ0 ) and f (x; θ1 ).
49
Quantitative Techniques II
10.1.3
50
Statistical Test
A decision rule that selects one of the inferences ”accept the null hypothesis” or ”reject the
null hypothesis” is called a statistical test or simply test. A test procedure is usually
described by a sample statistic T (x) = T (x1 , x2 , · · · , xn ), which is called the test statistic. The range of values of T for which the test procedure recommends the rejection of
a hypothesis is called the critical region, and the range for accepting the hypothesis is
called acceptance region.
10.1.4
Type I and Type II Errors
In performing a test one may arrive at the correct decision or commit one of the two types
errors. The errors can be classified into two groups labeled Type I and Type II errors.
Type I error : Rejecting H0 when it is true
Type II error: Accepting H0 when it is false
10.1.5
Power of a Test
The probability of rejecting the null hypothesis H0 based on a test procedure is called the
power of the test. This probability would obviously depend on the value of the parameter
θ about which the hypothesis is formulated. It is a function of θ. This power function is
denoted by π(θ).
10.1.6
Operating Characteristics
The probability of accepting the null hypothesis is known as the operating characteristic
and is represented by 1 − π(θ). This concept is widely used in statistical quality control
theory.
10.1.7
Level of Significance and the Size of a Test
When θ is in Θ0 , π(θ) gives the probability of Type I error. This probability, denoted by
P (I), will also depend on θ. The maximum value of P (I) when θ ∈ Θ0 is called the level
of significance of a test, denoted by α. It is also known as the size of a test. Thus,
α = max P (I) = max π(I)
θ∈Θ0
θ∈Θ0
(10.1)
The level of significance is hence the largest probability of a Type I error. The common
sizes used are 0.001, 0.05, and 0.10. The probability of a Type II error is denoted by β(θ).
It is readily seen to be 1 − π(θ) when θ ∈ Θ1 . Thus,
β(θ) = P (II)θ∈Θ1 = 1 − π(θ)
(10.2)
Ideally we would want to keep both P (I) and P (II) to the minimum no matter the
value of θ is. But this is impossible because an attempt to reduce P (I) generally increases
P (II). For instance, the decision rule ”reject H0 always” regardless of x, has P (II) = 0
but P (I) = 1 if θ ∈ Θ0 . Similarly, the rule ”always accept H0 ” implies that P (I) = 0
but P (II) = 1 when θ ∈ Θ1 . Thus for some values of θ, one decision rule will be better
than another. The classical decision procedure chooses an acceptable value for α and then
Quantitative Techniques II
51
selects a decision rule (that is, a test procedure) that minimizes P (II). In other words,
given α, among the class of decision rules for which P (I) ≤ α, choose the one for which
P (II) is minimized or, equivalently, for which π(θ) is maximized. Thus the test procedure
selects the decision rule that maximizes π(θ) subject to P (I) ≤ α. Such a test is called a
most powerful (MP) test. If the critical region obtained this way is independent of the
alternative H1 , then we have a uniformly most powerful (UMP) test.
Problems
1. The output voltage for a certain electric circuit is specified to be 130. A sample of 40
independent readings on the voltage for this circuit gave a sample mean of 128.6 and
a standard deviation of 2.1. Test the hypothesis that the average output voltage is
130 against the alternative that it is less than 130. Use a test with level 0.05.
2. Let Y1, Y2 , ..., Yn be a random sample of size n = 20 from a normal distribution with
unknown mean μ and known variance σ 2 = 5. We wish to test H0 : μ = 7 versus H1 :
μ > 7.
(a) Find the uniformly most powerful test with significance level 0.05.
(b) For the test in (a), find the power at each of the following alternative values for
μ : μ1 = 7.5, μ1 = 8.0, μ1 = 8.5, and μ1 = 9.0.
Chapter 11
Examination 1
11.1
Definition Questions
Define the following statistical terms.
1. (5 marks)
(3) a. Statistical modelling.
(2) b. Variable and random variable.
2. (5 marks)
The Bayes Theorem.
3. (5 marks)
a. The Weak Law of Large Numbers.
b. The Strong Law of Large Numbers.
4. (5 marks)
The Central Limit Theorem.
5. (5 marks)
a. The property of unbiasedness of an estimator.
b. The property of sufficiency of an estimator.
11.2
Calculation questions
Compute the following values.
6. (10 marks)
Let X1 , X2 , . . . , Xn , be random sample with the following probability density function
f (x) = 2x−3 for x > 1
Compute E(X) and M edian(X).
7. (10 marks) Suppose a manufacturer of TV tubes draws a random sample of 10
tubes. The probability that a single tube is defective is 10 percent. Calculate
a. the probability of having exactly 3 defective tubes,
b. the probability of having no more than 2 defectives.
52
Quantitative Techniques II
53
8. (10 marks) Let A and B be two events such that P (A∪B) = 0.9, P (A | B) = 0.625,
and P (A | B̄) = 0.5. Calculate P (A)
9. (10 marks) The ages of a group executives attending a convention are uniformly
distributed between 35 and 65 years. If X denotes ages in years, the probability density
function is
½ 1
for 35 < X < 65
30
.
f (x) =
0
otherwise
1. (2) a. Draw the probability density function for this random variable.
(2) b. Find and draw the cumulative distribution function for this random variable.
(3) c. Find the probability that the age of a randomly chosen executive in this group
is between 40 and 50 years.
(4) d. Find the mean age of executives in the group.
11.3
Discussion questions
Determine whether the following statements are true, false or uncertain. For full point there
is a need for an explanation.
10. (5 marks) The probability of union of the events A and B can be written
P (A ∪ B) = P (A) + P (B)[1 − P (A | B)]
11. (5 marks) In a sample, the observations of a random variable can be seen as
degenarated values of its marginal distribution (not all equal constants to each other, of
course).
12. (5 marks) Consider a bivariate sample with sample size n is drawn independently
from a population. Independence in this process runs across the n observations, not within
each observation.
13. (5 marks) Unbiasedness is related to the number of observations in each sample,
while consistency is related to the number of samples.
11.4
Multiple choice questions
Select correct answers for the following questions and write it in your paper.
14. (5 marks) If P (A ∩ B) ≥ 0 then the following inequality can be written
a. P (A ∪ B) ≥ P (A) + P (B)
b. P (A) + P (B) ≥ P (A ∪ B)
c. P (A ∪ B) − P (A) ≥ P (B)
d. P (A) ≤ P (A ∪ B) − P (B).
15. (5 marks) For a negatively skewed distibuted random variable, X, the following
can be written
a. M ean(X) < M edian(X)
b. M ean(X) > M edian(X)
c. M ean(X) > M ode(X)
d. M edian(X) > M ode(X).
Quantitative Techniques II
54
16. (5 marks) For a conditional probability, P (A|B), the following expressions can be
written
a. P (A|B) = P (A ∩ B|B)
b. P (A|B) = P (A ∩ B|B) + P (A ∩ B̄|B)
c. P (A|B) > P (A ∩ B̄|B)
d. All of them above.
Chapter 12
Examination 2
12.1
Definition Questions
Define the following statistical terms.
1. (5 marks)
a. Statistical model.
b. Probability space.
2. (5 marks)
Axiomatic definition of probability.
3. (5 marks)
a. Limiting distribution of the sample mean, X̄.
b. Approximative distribution of the sample mean, X̄.
4. (5 marks)
Central Limit Theorem.
5. (5 marks)
a. The property of sufficiency of an estimator.
b. Convergence in probability.
12.2
Calculation questions
Compute the following values.
6. (10 marks)
Let X1 , X2 , . . . , Xn , be random sample from a Poisson distribution with probability
function
e−λ λx
for λ > 0 and x = 0, 1, 2, ...
f (x) =
x!
Determine λ by using the maximum likelihood estimation method and discuss its unbiasedness and sufficiency. Note that E(x) = V ar(x) = λ for a Poisson distributed variable
X.
7. (10 marks) The following conditional density function is given by
f (y | x) = c1
y
for 0 ≤ x ≤ 1 and 0 ≤ x ≤ y
x2
55
Calculate
Quantitative Techniques II
a.
b.
8.
a.
b.
56
c1 and
P [(1/4) < Y < (1/2) | X = (5/8)].
(10 marks) Below is a table for random variables X and Y , calculate
E(X | Y = 2)
E(Y | X = 40).
Y
Frequency
20
X 30
40
Total
12.3
1
218
125
201
544
2
302
411
256
969
3
198
305
287
790
4
660
310
327
1297
Total
1378
1151
1071
3600
Discussion questions
Determine whether the following statements are true, false or uncertain. For full point there
is a need for an explanation.
9. (5 marks) The following equality is correct:
F (y) = P (Y | X ≤ x)F (x) + P (Y | X > x)(1 − F (x))
10. (5 marks) One can try to minimise the probability of commiting Type II error
after choosing an acceptable value for the probability of Type I error in testing procedure.
11. (5 marks) The Cramer-Rao Inequality establishes a lower bound for the variance
of an unbiased estimator of θ. However, it does not necessarly imply that the variance of
the minimum variance unbiased estimator of θ has to be equal to the Cramer-Rao Lower
Bound.
12. (5 marks) Non-random samples may not be used in the scientific inferences.
12.4
Multiple choice questions
Select correct answers for the following questions and write it in your paper.
13. (5 marks) Choose the correct expression(s) in the followings
a. Use harmonic means for ratios
b. Use arithmetic means for values including extremly low or high values
c. Use geometric means for proportions
d. None of them above.
14. (5 marks) A ∩ (B ∪ C) can be expressed as
a. (A ∪ B) ∩ (A ∪ C)
b. (A ∪ B) ∪ (A ∪ C)
c. (A ∩ B) ∪ (A ∩ C)
d. (A ∩ B) ∩ (A ∩ C).
15. (5 marks) For a positively skewed distibuted random variable, X, the following
can be written
a. M ean(X) < M edian(X)
Quantitative Techniques II
57
b. M ean(X) > M edian(X)
c. M ean(X) < M ode(X)
d. M edian(X) < M ode(X).
16. (5 marks) For a pair of correlated random variables, X and Y, the following
relationship is correct
a. V ar(Y ) < V ar(X)Cov(X, Y )
b. Cov 2 (X, Y ) < V ar(X)V ar(Y )
c. Cov(X, Y ) < V ar(X)V ar(Y )
d. Cov 2 (X, Y ) > V ar(X)V ar(Y ).
17. (5 marks) For jointly distributed random variables, X and Y, the following can be
written
a. F (y) = FXY (∞, y)
b. FXY (X, −∞) = 0
c. FXY (X, ∞) = F (x)
d. All of them above.