Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Point and Interval Estimation Mathematics Education Section CDI, EMB Email: [email protected] Normal Distribution Continuous random variables and normal distribution Let X be a continuous random variable defined on an interval Ω. The probability distribution of X is described by a function f(x), where x ∈ Ω, such that (a) f(x) ≥ 0, x ∈ Ω. (b) ∫ Ω f ( x)dx = 1 (c) P(a ≤ X ≤ b) = ∫ b a f ( x)dx This function f(x) is then called the probability density function (pdf) of the random variable X. Point and Interval Estimation Parameter and Statistic ¾ A number that describes a population is called a parameter ¾ A number that describes a sample is a statistic ¾ If we take a sample and calculate a statistic, we often use that statistic to infer something about the population from which the sample was drawn Population Parameter choose estimate Sample Statistic calculate The sample mean = the population mean ? Rolling a fair dice { { Cast a fair dice and take X to be the uppermost number, the population mean is μ = 3.5, and that the population median is also m = 3.5 Take a sample of four throws, the mean may be far from 3.5 The results of 5 such samples of 4 throws X1 X2 X3 X4 X Sample 1 6 2 5 6 4.75 Sample 2 2 3 1 6 3 Sample 3 1 1 4 6 3 Sample 4 6 2 2 1 2.75 Sample 5 1 5 1 3 2.75 The sample size n = 4 Random samples Sampling distribution: Basic theorem Central Limit Theorem (CLT) Central Limit Theorem (CLT) For ANY population (regardless of its shape) the distribution of sample means will approach a normal distribution as n approaches infinity The variability of a sample mean decreases as the sample size increases Central Limit Theorem (CLT) How large is a “large sample”? It depends upon the form of the distribution from which the samples were taken. If the population distribution deviates greatly from normality larger samples will be needed to approximate normality. n ≥ 30 Central Limit Theorem (CLT) The Central Limit Theorem X : N (μ , σ n ) 68% probability that our will be in this region X 95% probability that our will be in this region X POINT AND INTERVAL ESTIMATION { Point Estimate is a single number, calculated from available sample data, that is used to estimate the value of an unknown population parameter. { Interval Estimate (Confidence Interval) is an interval that provides an upper and lower bound for a specific unknown population parameter. -> arguably the most useful type of inference. Estimators 1. An estimator θˆ is said to be unbiased for the parameter θ if E θˆ = θ () 2. If this equality does not hold, θˆ is said to be a biased estimator of θ, with Bias = E θˆ − θ () Questions: { An unfair coin has a 75% chance of landing heads-up. Let X = 1 if it lands heads-up, and X = 0 if it lands tails-up. { Find the sampling distribution of the mean X for samples of size 3. Solution: { The experiment consists of tossing a coin 3 times and measuring the sample mean X Probability X HHH HHT HTH HTT THH THT TTH TTT 27/64 3/64 1/64 1 9/64 9/64 3/64 9/64 3/64 2/3 2/3 1/3 2/3 1/3 1/3 The possible values of X are 0, 1/3, 2/3 and 1. 0 The distribution of the sample mean is a binomial distribution X P( X = x ) 0 1/3 2/3 1 1/64 9/64 27/64 27/64 Is the Sample Mean an Unbiased estimator of the Population Mean? Example 1 An engineer wish to estimate the mean yield of a chemical process based on the yield measurements X1, X2, X3 from three independent runs of an experiment. Consider the following two estimators of the mean yield θ : X1 + X 2 + X 3 ˆ { θ1 = 3 { X1 + 2 X 2 + X 3 ˆ . and θ 2 = 4 Which one should be preferred? Solution: { Since E (θˆ1 ) = E (X ) = θ and E ( X 1 ) + 2 E ( X 2 ) + E ( X 3 ) θ + 2θ + θ ˆ = =θ, E (θ 2 ) = 4 4 bothθˆ1 and θˆ2 are unbiased for θ. Solution: Denoting the population variance by σ2, we have ( ) V θˆ1 = V ( X ) = σ2 3 and 2 3 σ V ( X 1 ) + 4V ( X 2 ) + V ( X 3 ) V (θˆ2 ) = = . 16 8 σ2 3σ 2 ˆ < , θ1 is better than θˆ2 . 3 8 Confidence Interval Confidence Interval ‘… perhaps the most obvious difficulty with confidence intervals lies in how we interpret what the confidence statement means’ (Smithson, 2003, p.16) Explanation: Confidence Interval X ~ N (μ , σ2 n P(− zα < z < zα ) = 1 − α ) 2 ( X − μ) Z= ~ N (0,1) σ 2 P(− z α < 2 n P( x − zα 2 ( x − zα 2 σ n σ n x−μ σ < zα ) = 1 − α 2 n < μ < x + zα σ n 2 , x + zα 2 σ n ) ) = 1−α a 100(1-α)% confidence interval of μ a 100(1-α)% confidence interval of μ { { { Large samples have narrower widths than small samples Higher confidence levels have wider intervals than lower confidence levels Narrow widths and high confidence levels are desirable, but these two things affect each other a 100(1-α)% confidence interval of μ a 100(1-α)% confidence interval of μ a 100(1-α)% confidence interval of μ Confidence interval for a population proportion Confidence interval for a population proportion p Assumptions: 1. The sample is a random sample 2. The conditions for the binomial distribution are satisfied 3. The normal distribution can be used to approximate the binomial distribution Confidence interval for a population proportion p μ pˆ = E( pˆ ) X = E( ) n 1 = E( X ) n np = n =p σ pˆ 2 = σ X 2 n 1 2 = 2 σX n 1 = 2 [np (1 − p )] n p (1 − p ) = n Confidence interval for a population proportion p