Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Parameter, Statistic and Random Samples • A parameter is a number that describes the population. It is a fixed number, but in practice we do not know its value. • A statistic is a function of the sample data, i.e., it is a quantity whose value can be calculated from the sample data. It is a random variable with a distribution function. Statistics are used to make inference about unknown population parameters. • The random variables X1, X2,…, Xn are said to form a (simple) random sample of size n if the Xi’s are independent random variables and each Xi has the sample probability distribution. We say that the Xi’s are iid. STA286 week 8 1 Example – Sample Mean and Variance • Suppose X1, X2,…, Xn is a random sample of size n from a population with mean μ and variance σ2. • The sample mean is defined as 1 n X Xi. n i 1 • The sample variance is defined as 1 n 2 S X X . i n 1 i 1 2 • The sample standard deviation, S, is the square root of the sample variance. STA286 week 8 2 Quantiles • A quantile of a sample, xp, is the value for which a specific fraction, p, of the data values is less than or equal to it, and (1-p) is greater than it. • The most known quantile is the median which is the 50th quantile. • Quantiles are often described as percentiles and represents an estimate of a characteristic of the theoretical distribution. • If a data set contains n observations, then the pth percentile is the p th n 1 value in the ordered data set. 100 • We can describe the spread or variability of a distribution by giving several percentiles. STA286 week 8 3 Quartiles • The 25th percentile is called the first quartile (Q1). • The 75th percentile is called the third quartile (Q3). • Note, the median is the second quartile Q2 . • The distance between the first and third quartiles is called the Interquartile range (IQR) i.e. IQR =Q3 – Q1 . • The IQR is another measure of spread that is less sensitive to the influence of extreme values. STA286 week 8 4 The five-number summary • The five-number summary of a set of observations consists of the smallest observation, the first quartile, the median, the third quartile and the largest observation. • These five numbers give a reasonably complete description of both the center and the spread of the distribution. • MINITAB commands: Stat > Basic Statistics > Display Descriptive Statistics STA286 week 8 5 Example • The highway mileages of 20 cars, arranged in increasing order are: 13 15 16 16 17 19 20 22 23 23 23 24 25 25 26 28 28 28 29 32. Give the five number summary. • Answer We have, min = 13, Q1 = 18, median = 23, Q3 = 27 , max = 32. • The MINITAB output using the above commands is as follows: Variable mileage N 20 Minimum 13.00 Q1 17.50 STA286 week 8 Median 23.00 Q3 27.50 Maximum 32.00 6 Box-plot • A box-plot is a graph of the five-number summary. • Example: Make a box-plot for the data in the above example. Boxplot of Mileages Mileages 30 25 20 15 • MINITAB commands: Graph > Boxplot STA286 week 8 7 Quantile Plots • A quantile plot is a plot of the data values on the vertical axis against an empirical assessment of the fraction of observations exceeded by the data value…. • A very useful quantile plot is the Normal-Quantile-Quantile plot. It is often used by analysts to determine whether a data set came from a normal distribution. • A Normal Quantile Quantile plot is a plot of the empirical (data) quantiles against the corresponding quantiles of the normal distribution… STA286 week 8 8 Interpreting Normal Quantile Plots • If the data comes form any normal distribution, the NQQ plot produces a straight line on the plot. • If the points on a normal quantile plot lie close to a straight line, the plot indicates that the data are normal. • Systematic deviations from a straight line indicate a nonnormal distribution. • Outliers appear as points that are far away from the overall pattern of the plot. STA286 week 8 9 • Histogram, the nscores plot and the normal quantile plot for data generated from a normal distribution (N(500, 20)). 15 540 520 10 value 510 5 500 490 480 470 0 460 460 470 480 490 500 510 520 530 540 -2 value -1 0 1 2 ncores Normal Probability Plot for value 99 ML Estimates 95 Mean: 500.343 StDev: 17.4618 90 80 Percent Frequency 530 70 60 50 40 30 20 10 5 1 450 STA286 week 8 500 Data 550 10 • Histogram, the nscores plots and the normal quantile plot for data generated from a right skewed distribution Frequency 10 5 0 0 5 10 value value 10 5 0 -2 -1 0 1 ncores STA286 week 8 2 21 11 2 ncores 1 0 -1 -2 0 5 10 value Normal Probability Plot for value 99 ML Estimates 95 Mean: 2.64938 StDev: 2.17848 90 Percent 80 70 60 50 40 30 20 10 5 1 0 5 Data STA286 week 8 10 12 • Histogram, the nscores plots and the normal quantile plot for data generated from a left skewed distribution Frequency 10 5 0 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95 1.05 value 1.0 0.9 value 0.8 0.7 0.6 0.5 0.4 0.3 -2 -1 0 1 2 nscore STA286 week 8 13 2 0 -1 -2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 value Normal Probability Plot for value 99 ML Estimates 95 Mean: 0.8102 StDev: 0.161648 90 80 Percent nscore 1 70 60 50 40 30 20 10 5 1 0.50 0.75 1.00 1.25 Data STA286 week 8 14 • Histogram, the nscores plots and the normal quantile plot for data generated from a uniform distribution (0,5) 9 8 Frequency 7 6 5 4 3 2 1 0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 value 5 value 4 3 2 1 0 -2 -1 0 1 2 ncores STA286 week 8 15 2 ncores 1 0 -1 -2 0 1 2 3 4 5 value Normal Probability Plot for value 99 ML Estimates 95 Mean: 2.21603 StDev: 1.46678 90 Percent 80 70 60 50 40 30 20 10 5 1 -2 -1 0 1 2 3 STA286 week 8 Data 4 5 6 16 Sampling Distribution of a Statistic • The sampling distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population. • The distribution function of a statistic is NOT the same as the distribution of the original population that generated the original sample. • The form of the theoretical sampling distribution of a statistic will depend upon the distribution of the observable random variables in the sample. STA286 week 8 17 Sampling from Normal population • Often we assume the random sample X1, X2,…Xn is from a normal population with unknown mean μ and variance σ2. • Suppose we are interested in estimating μ and testing whether it is equal to a certain value. For this we need to know the probability distribution of the estimator of μ. STA286 week 8 18 Sampling Distribution of Sample Mean • Suppose X1, X2,…Xn are i.i.d normal random variables with unknown mean μ and variance σ2 then 2 X ~ N , n • Proof: STA286 week 8 19 The Central Limit Theorem • Let X1, X2,…be a sequence of i.i.d random variables with mean n E(Xi) = μ < ∞ and Var(Xi) = σ2 < ∞. Let S n X i i 1 S n n converges in distribution to Z ~ N(0,1). n Then, Z n • Also, Z n Xn n converges in distribution to Z ~ N(0,1). • Example… STA286 week 8 20 Example Suppose that the weights of airline passengers are known to have a distribution with a mean of 75kg and a std. dev. of 10kg. A certain plane has a passenger weight capacity of 7700kg. What is the probability that a flight of 100 passengers will exceed the capacity? week 8 21 Question State whether the following statements are true or false. (i) As the sample size increases, the mean of the sampling distribution of the sample mean X decreases. (ii) As the sample size increases, the standard deviation of the sampling distribution of the sample mean X decreases. (iii) The mean X of a random sample of size 4 from a negatively skewed distribution is approximately normally distributed. (iv) The distribution of the proportion of successes X in a sufficiently large sample is approximately normal with mean p and standard deviation np1 p where p is the population proportion and n is the sample size. (v) If X is the mean of a simple random sample of size 9 from N(500, 18) distribution, then X has a normal distribution with mean 500 and variance 36. week 8 22 Question State whether the following statements are true or false. o A large sample from a skewed population will have an approximately normal shaped histogram. o The mean of a population will be normally distributed if the population is quite large. o The average blood cholesterol level recorded in a SRS of 100 students from a large population will be approximately normally distributed. o The proportion of people with incomes over $200 000, in a SRS of 10 people, selected from all Canadian income tax filers will be approximately normal. week 8 23 Exercise A parking lot is patrolled twice a day (morning and afternoon). In the morning, the chance that any particular spot has an illegally parked car is 0.02. If the spot contained a car that was ticketed in the morning, the probability the spot is also ticketed in the afternoon is 0.1. If the spot was not ticketed in the morning, there is a 0.005 chance the spot is ticketed in the afternoon. a) Suppose tickets cost $10. What is the expected value of the tickets for a single spot in the parking lot. b) Suppose the lot contains 400 spots. What is the distribution of the value of the tickets for a day? c) What is the probability that more than $200 worth of tickets are written in a day? week 8 24 Law of Large Numbers - Example • Toss a coin n times. • Suppose 1 Xi 0 if i th toss came up H if i th toss came up T • Xi’s are Bernoulli random variables with p = ½ and E(Xi) = ½. 1 n • The proportion of heads is X n X i . n i 1 • Intuitively X n approaches ½ as n ∞ . STA286 week 8 25 Law of Large Numbers • Interested in sequence of random variables X1, X2, X3,… such that the random variables are independent and identically distributed (i.i.d). Let 1 n Xn Xi n i 1 Suppose E(Xi) = μ , V(Xi) = σ2, then 1 n 1 n E X n E X i E X i n i 1 n i 1 and 1 n 1 V X n V X i 2 n i 1 n n V X i 1 i 2 n • Intuitively, as n ∞, V X n 0 so X n E X n STA286 week 8 26 • Formally, the Weak Law of Large Numbers (WLLN) states the following: • Suppose X1, X2, X3,…are i.i.d with E(Xi) = μ < ∞ , V(Xi) = σ2 < ∞, then for any positive number a P Xn a 0 as n ∞ . This is called Convergence in Probability. STA286 week 8 27 Recall - The Chi Square distribution • If Z ~ N(0,1) then, X = Z2 has a Chi-Square distribution with parameter 1, i.e., X ~ 21 . • Can proof this using change of variable theorem for univariate random variables. • The moment generating function of X is 1/ 2 1 m X t 1 2 t • If X 1 ~ 2v1 , X 2 ~ 2v2 , , X k ~ 2vk , all independent then k T X i ~ 2k v i 1 1 i • Proof… STA286 week 8 28 Claim • Suppose X1, X2,…Xn are i.i.d normal random variables with mean μ and variance σ2. Then, Z i X i are independent standard normal variables, where i = 1, 2, …, n and Xi 2 2 Z ~ i n i 1 i 1 n n 2 • Proof: … STA286 week 8 29 Sampling Distribution of S2 • Suppose X1, X2,…Xn are i.i.d normal random variables with mean μ and variance σ2. Then, n 1s 2 2 1 2 X n i 1 X ~ 2n1 2 i • Further, it can be shown that X and s2 are independent. STA286 week 8 30 t distribution • Suppose Z ~ N(0,1) independent of X ~ χ2(n). Then, T Z X /v ~ t v . • Proof: using one dimensional change of variables theorem. • The density function of the t-distribution is given by… STA286 week 8 31 Claim • Suppose X1, X2,…Xn are i.i.d normal random variables with mean μ and variance σ2. Then, X ~ tn1 S/ n • Proof: STA286 week 8 32 F distribution • Suppose X ~ χ2(n) independent of Y ~ χ2(m). Then, X /n ~ Fn ,m Y /m • The density function of the F distribution is given by… STA286 week 8 33 Properties of the F distribution • The F-distribution is a right skewed distribution. • Fm,n 1 Fn,m i.e. PFn ,m 1 1 1 a P P Fm,n F a n ,m a • Can use Table A.6 in appendix to find percentile of the F- distribution. • Example… STA286 week 8 34