* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Notes
Survey
Document related concepts
Transcript
Lecture 8 What is a statistical model? A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced the data. The variable θ is called the parameter of the model, and the set Ω is called the parameter space. From the definition of a statistical model, we see that there is a unique value , such that is the true distribution that generated the data. We refer to this value as the true parameter value. Example: Suppose we have observations of heights in cm of individuals in a population and we feel that it is reasonable to assume that the distribution of height of the population is normal with some unknown mean and variance. The statistical model in this case is Goals of Statistics: • Estimate unknown parameters of underlying probability distribution. • Measure errors of these estimates. • Test whether data gives evidence that parameters are (or are not) equal to a certain value or that the probability distribution has a particular form. Point Estimation Most statistical procedures involve estimation of the unknown value of the parameter of the statistical model. A point estimator of the parameter θ is a function of the underlying random variables and so it is a random variable with a distribution function. A point estimate of the parameter θ is a function of the data; it is a statistic. For a given sample an estimate is a number. Notation: Desirable properties of a point estimator: • Unbiased • Consistent • Minimum variance • With known probability distribution Definition: Let ̂ be a point estimator for a parameter θ. Then ̂ is an unbiased estimator if ( ̂ ) . Note: There may not always exist an unbiased estimator for θ. Unbiased for θ, does not mean unbiased for g(θ). Example (of unbiased estimator): The sample mean is an unbiased estimator of the population mean. If ( ̂ ) , ̂ is called biased. Definition: The bias of a point estimator ̂ is given by ( ̂) . Definition: The mean square error of a point estimator ̂ is ( ̂) [( ̂ ) ]. ( ̂) Note: Proof: ( ̂) ( ̂) [ ̂ ] . Example: Suppose (̂ ) (̂ ) , (̂ ) , ̂ ̂ . (̂ ) . Consider ̂ (a) Show that ̂ is an unbiased estimator for ; (b) If ̂ and ̂ are independent, how should the constant a be chosen to minimize the variance of ̂ ? Solution: Examples of Unbiased Point Estimators We denote by ̂ estimator ̂ , ̂ estimator. the variance of the sampling distribution of the √ ̂ is called the standard error of the Claim: Let be a random sample of size n from a population with mean µ and variance . Then the sample variance ̅ is an unbiased estimator of the population ∑ variance Proof: , but ∑ ̅ is a biased estimator of . Goodness of Point Estimator Definition: The error of estimation is the distance between an estimator and its target parameter. Suppose ̂ is an unbiased estimator of and has a sampling distribution. Select a number b and consider . Example: A sample of n = 1000 voters, randomly selected from a city, showed y = 560 in favor of candidate Jones. Estimate p, the fraction of voters in the population favouring Jones, and place a 2standard-error bound on the error of estimation. Solution: Example: (#8.24) Results of a public opinion poll reported on the Internet indicated that 69% of respondents rated the cost of gasoline as a crisis or major problem. The article states that 1001 adults, age 18 or older, were interviewed and that the results have a sampling error of 3%. How was the 3% calculated, and how should it be interpreted? Can we conclude that a majority of the individuals in the 18+ age group felt that cost of gasoline was a crisis or major problem? Solution: Confidence Intervals A point estimate provides no information about the precision and reliability of estimation. For example, the sample mean ̅ is a point estimate of the population mean μ but because of sampling variability, it is virtually never the case that ̅ . A point estimate says nothing about how close it might be to μ. An alternative to reporting a single sensible value for the parameter being estimated is to calculate and report an entire interval of plausible values – a confidence interval (CI). Properties of the interval: - It contains true parameter ; - It is relatively narrow. The upper and lower endpoints of a CI are called the upper and lower confidence limits. The probability that a CI will enclose coefficient, denoted by . is called the confidence Definition: A confidence interval for a parameter ̂ ] is a random interval ̂ ̂ such that [ ̂ regardless of the value of . A confidence level is a measure of the degree of reliability of a confidence interval. It is denoted as 100(1-α)%. The most frequently used confidence levels are 90%, 95% and 99%. The higher the confidence level, the more strongly we believe that the true value of the parameter being estimated lies within the interval. Deriving a Confidence Interval Suppose are a random sample and we observed the data which are the realization of these random variables. We want a CI for some parameter θ. Pivotal method: To derive this CI we need to find another random variable that is typically a function of the estimator of θ satisfying: 1) It depends on and θ 2) Its probability distribution does not depend on θ or any other unknown parameter. Such a random variable is called a “pivot”. Example: Suppose we are to obtain a single observation Y~Exp(θ). Use Y to form a CI for θ with confidence coefficient 0.90, or 90% confidence level. Solution: Example: { Show that is a pivotal quantity. Use it to find a 90% lower confidence limit for θ. Solution: Large-Sample Confidence Intervals Example: Let ̂ be a statistic ~ ̂ . Find a confidence interval for with a confidence coefficient . Solution: Example: (#8.56) In a survey of n = 800 randomly chosen adults, 45% indicated that movies were getting better whereas 43% indicated that movies were getting worse. (a) Find a 98% CI for p, the overall proportion of adults who say that movies are getting better. (b) Does the interval include the value p = 0.50? Do you think that a majority of adults say that movies are getting better? Solution: Width and Precision of CI: The precision of an interval is conveyed by the width of the interval. If the confidence level is high and the resulting interval is quite narrow, the interval is more precise (i.e., our knowledge of the value of the parameter is reasonably precise). A very wide CI implies that there is a great deal of uncertainty concerning the value of the parameter we are estimating. Note: Confidence intervals do not need to be central, any a and b that solve ( ̅ √ ) define 100(1-α)% CI for the population mean μ. Example: The National Student Loan Survey collected data about the amount of money that borrowers owe. The survey selected a random sample of 1280 borrowers who began repayment of their loans between four to six months prior to the study. The mean debt for the selected borrowers was $18,900 and the standard deviation was $49,000. Find a 95% for the mean debt for all borrowers. Solution: Interval Estimation of Variability In many case we will be interested in making inference about the population variance. Theorem: Let distribution with mean be a random sample from a normal and variance . Then . Proof: Now let’s derive a CI for : Example: An experimenter wanted to check the variability of measurements obtained by using equipment designed to measure the volume of an audio source. Three independent measurements recorded by this equipment for the same sound were 4.1, 5.2, and 10.2. Estimate with confidence coefficient 0.90. Solution: The t distribution Definition: Let Z be a standard normal random variable and let X be an independent chi-squared random variable with n degrees of freedom. The random variable is said to follow a t distribution √ with n degrees of freedom. Theorem: Let distribution with mean be a random sample from a normal and variance . Then, ̅ √ Proof: CI for μ when σ is unknown Suppose are random sample from a normal distribution with mean and variance , where both μ and σ are unknown. If is unknown we can estimate it by and use the distribution. A 100(1-α)% confidence interval for μ in this case is ̅ √ Example: A manufacturer of gunpowder has developed a new powder, which was tested in 8 shells. The resulting muzzle velocity (ft/sec): 3005 3925 2935 2965 2995 3005 2939 2905 Find a 95% CI for the true average velocity for shells of this type. Assume that velocities ~ appr. Normal. Solution: