Download QT1 exam answers

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Confidence interval wikipedia , lookup

Data assimilation wikipedia , lookup

Regression toward the mean wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Transcript
QT1 exam answers
Quantitative Techniques 1
Exam questions for January 2004
Time allowed: two hours
Attempt ALL questions
1.
What are the main sources of data errors? What can be done to reduce errors in
data you did not collect yourself?
Sources of data error
•At collection source: Clerical error, misunderstood question, conceptual error
•The incentive to look good /bad
•Wrong units ('000s, millions, etc.), $ , £
•Sampling error
•Transcription error
•Calculation error
•Rounding
Lesson
•Assume data are error-ridden
•Use checking techniques:
–descriptive statistics, graphs
–eyeballing: do the data follow expected pattern?
 make corrections where feasible
 omit data which is probably wrong
2.
B:
For each of the following statements about the probabilities of outcomes A and
(a)
(b)
(i)
Say whether they are true, false, or uncertain
If uncertain, spell out conditions under which they are true
P(A or B) = P(A) + P(B)
a) Uncertain: b) True if mutually exclusive
(ii)
P(AB) = P(A) + P(B)
Never true
1
(iii) P(AB)=P(A).P(B)
Never true
(iv)
P(AB) = P(A).P(B)
A and B independent
P(A | B)= P (AB)/P(B)
(v)
Always true
3. The question was meant to be : Suppose x is a continuous random variable with the
probability density function (pdf):
f(x)= x
for
0x1
2 - x for
1 x  2
0
elsewhere
a) Draw a graph of this function
Question 3
1.2
1
pdf (x)
0.8
0.6
0.4
0.2
0
0
0.5
1
1.5
2
2.5
x
b) Explain how you know this is a valid pdf.
i) f(x) 0 (not true for 2-2x)!
ii) Area under curve = 1 (ditto)
c) Comment on the relative position of the mean, median and mode.
2
Symmetric distribution. Therefore they will all be at x = 1 (does not apply to 2-2x)
d) Calculate the probability that
0.5  x  1.5
Area to left of 0.5 = half base x height = 0.5*0.5* 0.5= 0.125
Area to right of 1.5 = 0.125
Total area = 0.250
Therefore P(0.5  x  1.5) = 1- 0.25 = 0.75
4.
You are organising a concert and believe that attendance will depend on the
weather. You believe the following possibilities are appropriate:
Weather
Terrible weather
Mediocre
weather
Good weather
Probability
0.2
Attendance
500
0.6
0.2
1200
2000
a) What is the expected attendance?
b) suppose each ticket costs £% and the fixed costs are £2,000. What are the
expected profits?
c) Graph the probability distribution for profits
d) What is the most you could pay for the fixed costs and still have an 80% chance of
making a profit on the event (To nearest £.)
Costs:
£2,000
Weather
Terrible weather
Mediocre
weather
Good weather
Probability
0.2
Attendance
500
100
Revenue
£2,500.00
0.6
0.2
1200
2000
720
400
£6,000.00
£10,000.00
a)
Expected attendance
Ticket price
b)
Expected revenue
less costs
Expected profits
d)
1220
£5.00
£6,100.00
£2,000.00
£4,100.00
£5,999.99
c)
3
Prob
0.7
0.6
0.6
0.5
0.4
0.3
0.2
0.2
0.2
0.1
0
£500
1
£4,000
2
£8,000
3
d. £5,999.
If we have mediocre weather we make £1. Good weather we make £4,001.
Probability of loss = 0.2
5. Suppose that heights in a population are normally distributed with a mean of 78
inches and a standard deviation of 5 inches.
a) What is the probability that an individual selected at random will have a height
between 68.2 and 79.8 inches?
b) Construct a 95% confidence interval for the average height in a random sample
of four individuals.
Generally z= (x-)/ :
a)
Upper tail area: z1 = (79.8 – 78)/5 = 1.8/5 = 0.36 ;
Implies area in tail = 1- 0.6406 = 0.3594
Lower tail area z2 = (68.2 – 78)/5 = 9.8/5 = -1.96
Implies area in tail = 0.025
Total area in tails = 0.3844
4
Total central area = 1- 0.3844 = 0.6156
Answer = 0.6156or 61.56%
b) Width of interval = zxbar each side of the mean. 2.5% in each tail
z0.025= 1.96
xbar = x/n = 5/2 = 2.5
Therefore zxbar = 1.96 x 2.5 = 4.9
Confidence interval ranges from 78 – 4.9 to 78 +4.9
Answer: Lower bound of CI = 73.1 inches
Upper bound = 82.9 inches
6. Suppose we wanted to conduct a survey. It is desired that we produce an interval
estimate of the population mean that is within 5 from the true population mean with
99% confidence, based on a historical planning value of 15 for the population
standard deviation, how big should the sample be?
x= 15. Width of interval = zxbar each side of the mean. 0.5% in each tail
z= 2.575 xbar = x/n = 15/ n .
Set this interval width to the desired value of 5.
Therefore 2.575 x 15/ n = 5.
n = 2.575 x 15/5 =2.575 x 3 = 7.725
Therefore n = 59.67563. i.e. 60 is minimum number.
Answer: 60 is the minimum size of a sample to produce this result.
7. Explain in simple terms the differences (and similarities if any) between the following
approaches to estimation;
a) method of moments
b) maximum likelihood
c) least squares
These were covered in chapter 7 of Ashenfelter et.al. to which you were specifically
referred in Lecture 8
a)
“Moments” refer to mean, variance, skewness, kurtosis, etc. The method of
moments seeks to equate these moments implied by the statistical model of the
population distribution with the actual moments implied by the sample.
MOM estimators proceed as it were by analogy. For example if we are interested in the
population mean we use the sample mean. If our model of the population (or data
generation process) says that the disturbance term has expected value zero then we set the
mean of the residuals equal to zero, which implies u= 0 . If our model says that the
5
disturbance term is uncorrelated with the regressors (X’s) ( i.e. E(Xu) = 0 ) then we can
base our estimator on the condition Xu =0.
Since they reflect the underlying properties of the population, they approach the
population values as n=> 1 . This means they are consistent.
b)
Before sampling, the probability of a sample (x1, x2, x3,… etc.) depends of the
population parameter say θ as defined by the probability density function f(x1, x2, x3,…
|θ). But in an estimation situation the x’s are known and θ is unknown. If we take the
x’s as parameters but θ as unknown the function f becomes a likelihood function
denoted by L((x1, x2, x3,… |θ).
ML estimators are also generally consistent.
c) Least squares estimators are based on the criterion of minimising the sum of squared
“errors” - these being defined in some appropriate way to reflect the deviations of the
the sample from the implied population characteristics. Squaring does two things i) it
treats positive and negative values equally. ii) it penalises large departures from the
hypothetical population parameter more than small departures.
Similarities: Under some conditions, all three estimates produce the same result. e.g. the
sample mean as an estimator of the population mean, OLS regression coefficients being
ML (if disturbances are normal) and MOM if E(u) = 0 and E(Xu) =0.
Differences: The ML estimator of variance is not the same as the LS estimator.
conditions are violated estimates will not generally be the same.
If other
8. Under what conditions will the ordinary least squares estimator be
a) unbiased
b) efficient?
c) What does it mean to say an estimate is consistent?
a)
E(u)= 0; E(Xu) =0
b)
E(u2) = 2 (a constant); E(uiuj) =0 (no autocorrelation)
c)
As the sample size increases without limit, the estimate converges on the
population parameter.
9. A regression of the cost of water delivered (Y) on the number of customers and the
volume of water delivered yields the following regression:
Y = 20,000,000 + 75 X1 +
0.25 X2
R2 = 0.5123
Y = operating cost (£)
X 1 = number of customers
6
X2 = total volume of water delivered (cubic metres)
a) What is the predicted average cost per cubic metre when X1 = 600,000 and the
consumption per customer is 300 cubic metres?
[4]
b) If we change the units of measurement so that Y is now in millions of £ and
customers are in thousands what will happen
i)
to each of the coefficients in the equation
[4]
ii)
to the significance levels reported by the econometric software? [4]
a) Y = 20,000,000 + 75 x 600,000 + 0.25 x 300 x 600,000
=20 + 45 + 45 £million = £110 million
Total water consumption = 600,000 x 300 = 180,000,000 cubic meters
Cost per cubic metre = £110 million/180million = £0.6111111
per cubic metre
= 61.111 pence
b) i) Each coefficient will be smaller by a factor of 106.
ii) The significance levels will be unchanged
7