Download Student`s t Distribution (Also simply called t Distribution) Student is

Student’s t Distribution (Also simply called t Distribution) Student is the pen name of the discoverer: William Gosset. Definition: 2 If Z N 0, 1 , W , and Z, W are independent, then we say that Z T W/ has a t distribution with degrees of freedom. t N 0, 1 when n (They are sufficiently close, when 30 See the slices on the web regarding t-distribution: Recall: If X i by (2) X / n by (3) n 1 2 i.i.d. 2 N , 1, 2, . . . , n, then , i N 0, 1 , S2 2 1 , and n X and S 2 are independent. X / n Let Z ,W n 1 2 S2, n 1. Since X and S 2 are independent, Z, W are independent. We may obtain a t- distibtution: T X / n Z W/ n 1 2 X / n X / n S/ S2 2 X S/ n tn 1. S 2 /n 1 X / n S (4) T X S/ n This statistic: T with When tn 1. X S/ n is similar to X / n , but replaced by its estimator S. 2 is unknown, T in making inference for . X S/ n is very useful Application of Student’s t Distribtuion Example 2: An automobile manufacturer wishes to estimate his new model’s millage (miles per gallon). He decides to carry out a fuel efficiency test. Six non-professional drivers are randomly selected and each drives a new-model car from St. Catharines to Toronto. Assume the millage is normally distributed with unknown mean and unknown variance 2 , find the probability that X will be within 2Sn of the true mean . Given X i i.i.d. Ask: P |X N , | 2 2S n , i 1, 2, . . . , 6. ? Solution: Let T P |X X S n | 2S n P |X | S n 2 P |T | According the definition of Student’s t X distribution, this T t5 . S n Looking at Table 5, P849, we see that: P T 2. 015 0. 05 This implies P |T | 2. 015 0. 90. P |T | 2 is slightly less than 0. 90. Note: If 2 was known, the we would use Normal distribution rather than t distribution or by emperical rule P |Z | 2 95%. 2 Another distribution related to the normal distribution: F distribution: Definition: 2 2 If W 1 1 , W2 2 , and W 1 , W 2 are independent, then we say that W1/ 1 F W2/ 2 has an F distribution with 1 numerator degrees of freedom and 2 denominator degrees of freedom. Denote: F F 1 , 2 . (An F random variable is defined as a ratio of two independent chi-square random variables.) See the slices on the web regarding F-distribution: A practical example: Suppose we have two independent random samples: Sample 1: X 1 i.i.d. i 2 i.i.d. i 1, N 2 1 , i 1, 2, . . . , n 1 ; N 2 , 22 , i 1, 2, . . . , n 2 ; Sample 2: X These two sample are independent with sample variances S 21 , S 22 . Then by (3) we have n1 1 2 2 n1 1 , 2 S1 1 n2 1 2 2 S 22 2 n2 1. So we can construct an F-distributed statistic: F n1 1 2 1 n2 1 2 2 S 21 / n 1 1 S 22 S 22 / S 22 / n 2 1 Specially, when S 21 S 21 / F n1 1, n 2 2 1 2 1 2 2 2 2, 1. F n1 1, n 2 1. (5) If there are two independent random samples: Sample 1: X Sample 2: X Then, F 1 i.i.d. i 2 i.i.d. i S 21 / S 22 / 2 1 2 2 N 1, 2 1 , i 1, 2, . . . , n 1 ; N 2, 2 2 , i 1, 2, . . . , n 2 ; F n1 1, n 2 1. An application of F-distribution: Two independent samples: n 1 6, , n 2 from two normal population with equal population variances. Find b such that S 21 b 0. 95. p 2 S2 S 21 Solution: F p S 21 so p S 21 S 22 b F 5, 9 , and S 22 1 b S 22 10 p S 21 S 22 b 0. 95, 0. 05. Look-up Table 7, P852: b 3. 48. Since the sample sizes are relatively small, even two populations are having equal variances, the probability that the ratio of their sample variances exceed 3. 48 is still 5%. Review: Sampling Distributions Related to the Normal Distribution i.i.d. N , If X i Then, (1) 2 , i 1, 2, . . . , n. i.i.d. Xi N 0, 1 , and n 2 Xi 2 i 1 n (2) with X 1 n Xi, i 1 X X / n 2 N , n , N 0, 1 ; n; n (3) with S 2 1 X 2, Xi n 1 i 1 n n 1 2 S 2 Xi X 2 2 n 1 , and i 1 X and S 2 are independent. X tn 1; S/ n (4) T (5) For two independent samples with sample variances S 21 , S 22 , S 21 / F S 22 / 2 1 2 1 2 2 2 2, F n1 S 21 S 22 1, n 2 F n1 1 ; Specially, when 1, n 2 1. Assumption of the Normal distribution is required for (1) to (5). For example, Result (2) (it is actually Theorem 7.1) states: If X i i.i.d. 2 N , , i 1, 2, . . . , n. n with X 1 n X i , then i 1 X Exactly X / n 2 N , n , Exactly N 0, 1 . What if the assumption of the Normal distribution is invalid? Let’s take a look at a non-normal distribution: Exponential distribution E 10 , for example. fx 1 10 0, e x 10 , x 0; otherwise. Images from top: density of E(10), histograms of some sample means, X, with sample sizes of 5, 10, 120 respectively. Central Limit Theorem (CLT) (THM 7.4): i.i.d. If X i Any distribution with mean variance 2 , i 1, 2, . . . , n. and n 1 n with X X i , then i 1 Asymptotically X Namely, X X / n N , Asymptotically X / n Approximately Approximately 2 , and N 0, 1 . n 2 N , n and N 0, 1 , when n is sufficiently large. Compare to Result (2) in (THM 7.1): If X i i.i.d. 2 N , , i 1, 2, . . . , n. n with X 1 n X i , then i 1 X Exactly N , 2 n , and X / n Exactly N 0, 1 . Reading pages: 353-364; 370-373. Example 7.9 The service times for customers coming through a checkout counter in a retail store are independent random variables with mean 1. 5 minutes and variance 1. Approximate the probability that 100 customers can be served in less than 2 hours of total service time. Let X i be the service time for ith customer. Note that X i has unknown distribution with mean 1. 5, and variance 2 1. However, we have a large sample n 100. 100 We want to find: P Xi i 1 120 minutes ? According to CLT, X has an approximately normal distribution: Approx. X N 1. 5, 1 , 100 X 1. 5 Approx. N 0, 1 . Z 1 100 So we have, 100 Xi 100 P Xi 120 P P X 1. 5 i 1 PZ i 1 100 1. 2 1 100 3 120 100 PX 1. 5 1 100 PZ 3 0. 00135. Therefore, the probability that 100 customers can be serviced in less than 2 hours is very very small. 1. 2 Perhaps another cashier line is needed. Assume the manager decided to add on more line, with the same service speed, to make this situation better. Assuming each service line can share the service load equally. Then what is the probability that 100 customers can be serviced in less than 2 hours now? Now the mean service time becomes 1. 5/2 0. 75 mins, assuming the variance hasn’t changed. We shall have: Approx. X N 0. 75, 1 , 100 Approx. X 0. 75 N 0, 1 . Z 1 100 100 Xi P 120 PX P X 0. 75 1. 2 i 1 PZ 1. 2 1 100 4. 5 0. 75 1 100 1 PZ 4. 5 Surely the 100 customers would be served within 2hrs. 1. Normal approximation to other distributions has been very useful in making inferences. One of the important ones includes: The Normal approximation to the Binomial distribution. Y  bn, p Finding b n p y i 1 − p n−y i  ? PY ≤ b  ∑ yi y i 0 Tables are available only for some sample sizes: n  5, 10, 15, 20, 25 in your text. Others cases are not available and their calculation are tedious, especially for large n. Consider Y be the number of successes in n trials, so it can be expressed as n Y ∑ Yi, i1 where Yi  1, if the ith trial results in success; otherwise. 0, The distribution of Y i is Bernoulli distribution with PY i  1  p, and PY i  0  1 − p. EY i   p, VarY i   p1 − p. When n is large, according to CLT, n Y  1 n n ∑ Y i  Ȳ Approximately  p1 − p Np, . n i1 In addition, we can consider Y having approximately normal distribution Nnp, np1 − p. Note that: this approximated distribution, p1 − p Y Approx.  Np, , n n is very important for making inference regarding population proportions. Example 7.10. Candidate A believes that she can win a city election if she can earn at least 55% of the votes in Precinct 1. She also believes that 50% voters of the whole city favor her. If 100 voters show up to vote at Precinct 1, what is the probability that she will receive at least 55% of their votes? Let Y be the number of voters at Precinct 1 who vote for her. We need to find P Y n ≥ 0. 55  ? We assume the 100 voters at Precinct 1 be a random sample from the city. Y  b100, 0. 5. Then, its normal approximation will be 0. 51 − 0. 5 Y Approximately  N0. 5, , n 100 Approximately Y  N0. 5, 0. 0025. i. e. n Y −0.5 Y ≥ 0.55−0.5  P n ≥ 0. 55  P n 0.0025  P Y n −0.5 0.0025 ≥ 1 ≈ PZ ≥ 1  15. 87% (Or applying Empirical Rule,  16%. ) ≈ 1−68% 2 0.0025 How good can the Normal approximation make to the Binomial distribution? Check Example 7.11 (See P381). From Y  b25, 0. 4, find the exact probability that Y ≤ 8 and Y  8 and compare these to the corresponding values found by using its normal approximation. From Y  b25, 0. 4, and Table 1, PY  8  PY ≤ 8 − PY ≤ 7 Exactly  0. 274 − 0. 154  0. 12. Let W be a random variable having a normal distribution: Nnp, np1 − p Approx. Then Y  W ′ s distribution: Nnp  10, np1 − p  6 PY  8 ≈ P7. 5 ≤ W ≤ 8. 5 P 7. 5 − 10 ≤ W − 10 ≤ 8. 5 − 10 6 6 6  P−1. 02 ≤ Z ≤ −0. 61  PZ ≤ −0. 61 − PZ ≤ −1. 02  PZ ≥ 0. 61 − PZ ≥ 1. 02  0. 2709 − 0. 1539  0. 1170 It is close to 0. 12 in the Binomial Table. From Y  b25, 0. 4, and Table 1, PY ≤ 8  0. 274; Approx. From Y  W ′ s distribution: N10, 6 PY ≤ 8 ≈ PW  8. 5  P W−10  8.5−10  6 6  PZ  −0. 61  PZ  0. 61  0. 2709. It is close to 0. 274 in the Binomial Table. Commonly used continuity correction: If Y  bn, p, then Approx. Y  PY ≤ PY ≥ PY  W ′ s distribution: Nnp, np1 − p. k ≈ PW  k  0. 5; k ≈ PW  k − 0. 5; k ≈ Pk − 0. 5  W  k  0. 5. This approximation performs well even when the sample size is moderately large. One guideline is that this approximation can be used whenever the sample size satisfies: Larger of p and 1 − p . n  9∗ Smaller of p and 1 − p Reading Pages in Chapter 7: P346-382. Purpose of study: Making inference for a population using sample data. Making Inferences: (1) Point Estimation; (2) Interval Estimation; (3) Hypothesis Testing. Point Estimation We mimic the point estimation with a shooting competition. Three players A, B, and C: Say the center of the target is your population parameter (population mean, for instance) that you are estimating. Each player shoots 5 times, and their scores are recorded. Which player gives the best performance? A parameter– the target An estimate– each shooting score An estimator – a shooter Which estimator is the best? How would we judge them? (1) the closeness to the center (target); and (2) with small variability. An estimator– a rule or formula that tells how to calculate the value of an estimate based on sample observations. A good estimator has to be (1) targeted at the center and (2) with a small variance. Let  be a population parameter, and ̂ is an point estimator for . Note that: An estimator is a statistic so its value changes from one sample to another. Property (1) means E ̂  . (unbiasedness) Property (2) means Var ̂ is small. (small variance) Unbiasedness: If E ̂  , then ̂ is called an unbiased estimator for . If E ̂ ≠ , then ̂ is said to be biased. Let B ̂ denote E ̂ − , then B ̂ is called the bias of ̂ for estimating . B ̂  E ̂ − . When B ̂  0, ̂ is unbiased; When B ̂  0, ̂ tends to over-estimate ; When B ̂  0, ̂ tends to under-estimate . Another measurement for the performance of an estimator is called Mean Square Error (MSE): The MSE of a point estimator ̂ is defined as: 2 ̂ ̂ MSE   E  −  . Show that MSE ̂  Var ̂  B ̂ The first approach using EY 2   VarY  EY 2 . The second approach is as follows: 2 . MSE ̂  E ̂ −  2  E ̂ − E̂   E̂  −  E ̂ − E̂  2 2 ̂ − E̂  E̂  −   E̂  −  E ̂ − E̂  2 2  2 E ̂ − E̂  E 2 E̂  −  E̂  −  2  Var ̂  2 ∗ 0 ∗ B ̂  E̂  −   Var ̂  B ̂ 2 . 2 As we know, for a normal population, two most important estimators are: (i) X̄ for population mean ; (ii) S 2 for population variance  2 . How good are they? Example 1: Assume X 1 , X 2 , X 3 are randomly selected sample from N(,  2 ). X 1 2X 2 2 , unbiased (1) Are X̄ , X 1 , X 1 X 2 3 estimators for ? (2) Which one is the best for estimating ? n (3) Let S 2∗  1 n ∑X i − X̄  2 . Are both S 2∗ i1 and S 2 (the sample variance n S2  1 n−1 ∑X i − X̄  2 ) unbiased for  2 ? i1 (4) In this example, which one is better for estimating  2 ? 2 MSES 2   VarS 2   n−1 4; MSES 2∗   VarS 2∗   BS 2∗  2 2n − 1 4 1 4    n2 n2  2n −2 1  4 . n Most commonly used unbiased point estimators are: Reading pages: P390-393, P396-399. Techniques on "constructing an unbiased estimator": (i) If E ̂    a, namely ̂ is a biased estimator for  and B ̂  a, then E ̂ − a  . So, ̂ − a is unbiased for ; (ii) If E ̂  k, k ≠ 0, and k ≠ 1, i.e. ̂ is a biased estimator for , then E ̂ is unbiased for . k ̂ k  . So, Reading pages: Chapter 9: P444-452.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Student`s t Distribution (Also simply called t Distribution) Student is