Download Solutions to the questions in the sample test

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Solutions to Sample Test #1
1 Descriptive Statistics
1.1 Small Sample
A very small sample is collected, with the following observations:
2.1
– 1.2
1.1
0.3
Calculate1
• the mean
• the median
• the standard deviation
• The range
• The maximum, minimum and the midrange
Solutions
A spreadsheet's solution is
Mean
Median
Population Standard Deviation
Population Variance
Sample Standard Deviation
Sample Variance
Range
Minimum
Maximum
Midrange
0.575
0.7
1.207
1.457
1.394
1.943
3.3
-1.2
2.1
0.45
The numbers are rounded to three decimal digits.
1.2 Large Sample
A data set consists of the following (sorted, and rounded) 40 values:
−40.1
−14.1
−8.9
−1.3
0.3
1.7
2.4
3.7
3.8
3.8
1
4.1
4.2
4.3
4.6
4.7
4.8
4.8
5
5.4
5.5
5.6
5.7
5.8
5.8
5.9
6.1
6.1
6.2
6.3
6.3
6.5
8.3
9.2
9.9
10.6
10.7
12.4
26.9
27.2
30.9
We find the following summaries
Number of data points
40
Sum of values:
210.64
Sum of their squares:
5610.44
1. Compute the sample mean, and the sample standard deviation.
2. From the table, extract the following values:
•
Minimum, Maximum, Range
•
1st Quartile, Median, 3rd Quartile
Solutions
A spreadsheet's solution is
Mean
5.277
Median
5.550
Population Variance
5.277
Population S.D.
10.61
Standard Deviation
Sample Variance
10.745
115.448
Range
71.0
Minimum
−40.1
Maximum
30.9
1 quartile
4.025
3 quartile
6.350
A Note on Quartiles
You will notice that the median, which can be chosen as any number between 5.5 and 5.6, is picked by the
computer to be the midpoint between these two values. As for the first quartile, which could be any number
between 3.8 and 4.1, and the third, which could be any number between 6.3 and 6.5, the program chose,
respectively, 4.025, and 6.350. As you can see, the first is closer to the upper bound, and the second to the lower
bound, i.e. they are pushed towards “the middle”. As you can check for yourself on various web sites, this is only
one of several “rules” in the literature for picking the “right” percentile. Of course, there is no “right” choice.
Percentiles (quartiles being the special case of 25th and 75th percentile), are uniquely determined only when
dealing with a continuum of data, as in probability distributions (e.g. the first quartile of the standard normal
distribution is −0.67448975019608 , since the probability of such a variable to be less than that number
is 0.25). These various rules are attempts to “interpolate” a continuum of data between the actual data, and
depending on which (arbitrary) rule you choose, you get different outcomes. In any case, notice how irrelevant
that is in any practical use of the information.
2 Probability: Normal Distribution
2.1 Constructed Normality
“Grading on a curve” is a procedure by which student grades are changed to conform, approximately, to a
normal distribution with a pre-assigned “true mean” (expectation, usually denoted by  ) and standard
deviation (usually denoted by  ). Suppose an instructor takes as pre-assigned values
μ= 2.2 , σ =0.8 . With these choices, what is the probability of a student
1 To fail to get a passing grade of 2.0, i.e., calling the grade G, what is P [G≤1.9 ]
2 To score 4.0 (or higher: a normal distribution will allow, in theory, to get any real number as a
grade), that is P [ G≥4.0 ]
Solutions
1. The z – score of 1.9, with our numbers for expectation and standard deviation is
1.9−2.2
=−0.375 . This is the difference (negative, since 1.9 is less than the expected
0.8
value) between our data point and the expected value, using the standard deviation as the
unit. From tables, we can find, in the (by now) familiar way that the probability of being
less than this for a standard normal variable is approximately 0.354.
4−2.2
=2.25 . The probability of exceeding this value
0.8
is given by 1 – the probability of a standard normal variable to be less than 2.25, so that
the final answer is approximately 0.012
2. Similarly, the z – score for 4 is
Note: In practice this would work like this: the sample mean and standard deviation are used to
compute empirical “z – scores” for the exam results, and the published grades would follow the
theoretical distribution so that, for example, to get a 4.0 a student would have to have an empirical z –
score of 2.25 or better. If the actual distribution of the exam turned out to be anything close to normal,
more than 35% of the student would fail, and about 1% would get a 4.0 (regardless of how well or
badly they did in absolute terms). Maybe this instructor should adjust the parameters, or, even better,
just drop the idea of grading on a curve.
2.2 Quality Control
Normal models are generally not really good for “time to failure” issues that arise in quality control.
Nonetheless, let's assume that a company has decided that the “lifetime” of a gadget can be described by a
normal variable with parameters μ=1.1, σ=0.2 (time is measured in years). The company offers a
1-year warranty on its product. If the product fails before 1 year is over the company will have to refund
the buyer, and lose $100 on the transaction. On the other hand, the company is also betting on customers
wanting to “upgrade” to a newer version within 2 years of their purchase.
1. What is the probability of a random gadget to fail in the first year, and thus cost the company
$1002
2. What is the probability that a random gadget will last longer than the 2 years, and thus make the
owner think twice about “upgrading”?
2
If we call p the probability of warranty to kick in, and c the associated cost (in our case c = $100), the product
pc is called the expected cost. In a cost-benefit analysis, this quantity would be compared to the expected profit
in order to determine whether the failure rate was acceptable, financially speaking, or not.
Solutions
Call T the time to failure of our gadget. We are assuming that T ∼N (1.1,0 .04) , and we ask
1. What is P[T <1] ? To answer using tables, we “normalize” our variable, so the question
becomes “what is P
[
T −1.1 1−1.1
<
0.2
0.2
]
?” The random variable on the left is a standard
normal random variable, let's call it Z, so we are looking for P[Z <−0.5]≈0.3085 (0.5 is the
z-score of 1). Thus, the average cost to the company would be about $30.85 (this last result was
not asked in the text, but we might as well take note of it). Note that a 30% failure rate is
substantial, but if the average profit for the company is higher than that, a cost-benefit analysis
might conclude that it is in its best interest to produce such a shoddy product.
2. What is P[T >2] ? Similarly, since the z-score of 2 is
2−1.1
=4.5 , our answer is given by
0.2
P[Z > 4.5]=1−P [Z⩽4.5]≈0.0000034 . 4.5 is such a large z-score that it is not reported in
standard tables. An answer of “practically 0” would be perfect. Extreme z-scores are easily
handled by a computer, or one can refer to less common tables, reporting very small “tail
probabilities”, like the ones attached at the end of the file (a table of the normal distribution,
together with a table of small tail probabilities put in the public domain and available on the web)