Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Parameter, Statistic and Random Samples
• A parameter is a number that describes the population. It is a fixed
number, but in practice we do not know its value.
• A statistic is a function of the sample data, i.e., it is a quantity
whose value can be calculated from the sample data. It is a random
variable with a distribution function. Statistics are used to make
inference about unknown population parameters.
• The random variables X1, X2,…, Xn are said to form a (simple)
random sample of size n if the Xi’s are independent random
variables and each Xi has the sample probability distribution. We say
that the Xi’s are iid.
STA286 week 8
1
Example – Sample Mean and Variance
• Suppose X1, X2,…, Xn is a random sample of size n from a population
with mean μ and variance σ2.
•
The sample mean is defined as
1 n
X   Xi.
n i 1
• The sample variance is defined as
1 n
2


S 
X

X
.

i
n  1 i 1
2
• The sample standard deviation, S, is the square root of the sample
variance.
STA286 week 8
2
Quantiles
• A quantile of a sample, xp, is the value for which a specific fraction,
p, of the data values is less than or equal to it, and (1-p) is greater
than it.
• The most known quantile is the median which is the 50th quantile.
• Quantiles are often described as percentiles and represents an
estimate of a characteristic of the theoretical distribution.
• If a data set contains n observations, then the pth percentile is the
p th
n  1 
value in the ordered data set.
100
• We can describe the spread or variability of a distribution by giving
several percentiles.
STA286 week 8
3
Quartiles
• The 25th percentile is called the first quartile (Q1).
• The 75th percentile is called the third quartile (Q3).
• Note, the median is the second quartile Q2 .
• The distance between the first and third quartiles is called the
Interquartile range (IQR) i.e. IQR =Q3 – Q1 .
• The IQR is another measure of spread that is less sensitive to the
influence of extreme values.
STA286 week 8
4
The five-number summary
• The five-number summary of a set of observations consists of
the smallest observation, the first quartile, the median, the
third quartile and the largest observation.
• These five numbers give a reasonably complete description of
both the center and the spread of the distribution.
• MINITAB commands: Stat > Basic Statistics > Display
Descriptive Statistics
STA286 week 8
5
Example
• The highway mileages of 20 cars, arranged in increasing order are:
13 15 16 16 17 19 20 22 23 23 23 24 25 25 26 28 28 28 29 32.
Give the five number summary.
• Answer
We have, min = 13, Q1 = 18, median = 23, Q3 = 27 , max = 32.
• The MINITAB output using the above commands is as follows:
Variable
mileage
N
20
Minimum
13.00
Q1
17.50
STA286 week 8
Median
23.00
Q3
27.50
Maximum
32.00
6
Box-plot
• A box-plot is a graph of the five-number summary.
• Example:
Make a box-plot for the data in the above example.
Boxplot of Mileages
Mileages
30
25
20
15
• MINITAB commands: Graph > Boxplot
STA286 week 8
7
Quantile Plots
• A quantile plot is a plot of the data values on the vertical axis
against an empirical assessment of the fraction of observations
exceeded by the data value….
• A very useful quantile plot is the Normal-Quantile-Quantile plot. It
is often used by analysts to determine whether a data set came from
a normal distribution.
• A Normal Quantile Quantile plot is a plot of the empirical (data)
quantiles against the corresponding quantiles of the normal
distribution…
STA286 week 8
8
Interpreting Normal Quantile Plots
•
If the data comes form any normal distribution, the NQQ plot
produces a straight line on the plot.
•
If the points on a normal quantile plot lie close to a straight line,
the plot indicates that the data are normal.
•
Systematic deviations from a straight line indicate a nonnormal
distribution.
•
Outliers appear as points that are far away from the overall pattern
of the plot.
STA286 week 8
9
• Histogram, the nscores plot and the normal quantile plot for data
generated from a normal distribution (N(500, 20)).
15
540
520
10
value
510
5
500
490
480
470
0
460
460
470
480
490
500
510
520
530
540
-2
value
-1
0
1
2
ncores
Normal Probability Plot for value
99
ML Estimates
95
Mean:
500.343
StDev:
17.4618
90
80
Percent
Frequency
530
70
60
50
40
30
20
10
5
1
450
STA286
week 8
500
Data
550
10
• Histogram, the nscores plots and the normal quantile plot for
data generated from a right skewed distribution
Frequency
10
5
0
0
5
10
value
value
10
5
0
-2
-1
0
1
ncores
STA286 week 8
2
21
11
2
ncores
1
0
-1
-2
0
5
10
value
Normal Probability Plot for value
99
ML Estimates
95
Mean:
2.64938
StDev:
2.17848
90
Percent
80
70
60
50
40
30
20
10
5
1
0
5
Data
STA286
week 8
10
12
• Histogram, the nscores plots and the normal quantile plot for
data generated from a left skewed distribution
Frequency
10
5
0
0.25
0.35
0.45
0.55
0.65
0.75
0.85
0.95
1.05
value
1.0
0.9
value
0.8
0.7
0.6
0.5
0.4
0.3
-2
-1
0
1
2
nscore
STA286 week 8
13
2
0
-1
-2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
value
Normal Probability Plot for value
99
ML Estimates
95
Mean:
0.8102
StDev:
0.161648
90
80
Percent
nscore
1
70
60
50
40
30
20
10
5
1
0.50
0.75
1.00
1.25
Data
STA286 week 8
14
• Histogram, the nscores plots and the normal quantile plot for
data generated from a uniform distribution (0,5)
9
8
Frequency
7
6
5
4
3
2
1
0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
value
5
value
4
3
2
1
0
-2
-1
0
1
2
ncores
STA286 week 8
15
2
ncores
1
0
-1
-2
0
1
2
3
4
5
value
Normal Probability Plot for value
99
ML Estimates
95
Mean:
2.21603
StDev:
1.46678
90
Percent
80
70
60
50
40
30
20
10
5
1
-2
-1
0
1
2
3
STA286
week 8
Data
4
5
6
16
Sampling Distribution of a Statistic
• The sampling distribution of a statistic is the distribution of values
taken by the statistic in all possible samples of the same size from
the same population.
• The distribution function of a statistic is NOT the same as the
distribution of the original population that generated the original
sample.
• The form of the theoretical sampling distribution of a statistic will
depend upon the distribution of the observable random variables in
the sample.
STA286 week 8
17
Sampling from Normal population
• Often we assume the random sample X1, X2,…Xn is from a normal
population with unknown mean μ and variance σ2.
• Suppose we are interested in estimating μ and testing whether it is
equal to a certain value. For this we need to know the probability
distribution of the estimator of μ.
STA286 week 8
18
Sampling Distribution of Sample Mean
• Suppose X1, X2,…Xn are i.i.d normal random variables with
unknown mean μ and variance σ2 then
 2 

X ~ N   ,
n 

• Proof:
STA286 week 8
19
The Central Limit Theorem
• Let X1, X2,…be a sequence of i.i.d random variables with mean
n
E(Xi) = μ < ∞ and Var(Xi) = σ2 < ∞. Let S n   X i
i 1
S n  n
converges in distribution to Z ~ N(0,1).
 n
Then, Z n 
• Also, Z n 
Xn  
 n
converges in distribution to Z ~ N(0,1).
• Example…
STA286 week 8
20
Example
Suppose that the weights of airline passengers are known to have a
distribution with a mean of 75kg and a std. dev. of 10kg. A certain
plane has a passenger weight capacity of 7700kg. What is the
probability that a flight of 100 passengers will exceed the capacity?
week 8
21
Question
State whether the following statements are true or false.
(i) As the sample size increases, the mean of the sampling
distribution of the sample mean X decreases.
(ii) As the sample size increases, the standard deviation of the
sampling distribution of the sample mean X decreases.
(iii) The mean X of a random sample of size 4 from a negatively
skewed distribution is approximately normally distributed.
(iv) The distribution of the proportion of successes X in a
sufficiently large sample is approximately normal with mean p
and standard deviation np1  p where p is the population
proportion and n is the sample size.
(v) If X is the mean of a simple random sample of size 9 from
N(500, 18) distribution, then X has a normal distribution with
mean 500 and variance 36.
week 8
22
Question
State whether the following statements are true or false.
o A large sample from a skewed population will have an
approximately normal shaped histogram.
o The mean of a population will be normally distributed if the
population is quite large.
o The average blood cholesterol level recorded in a SRS of 100
students from a large population will be approximately
normally distributed.
o The proportion of people with incomes over $200 000, in a
SRS of 10 people, selected from all Canadian income tax filers
will be approximately normal.
week 8
23
Exercise
A parking lot is patrolled twice a day (morning and afternoon).
In the morning, the chance that any particular spot has an
illegally parked car is 0.02. If the spot contained a car that was
ticketed in the morning, the probability the spot is also ticketed
in the afternoon is 0.1. If the spot was not ticketed in the
morning, there is a 0.005 chance the spot is ticketed in the
afternoon.
a) Suppose tickets cost $10. What is the expected value of the
tickets for a single spot in the parking lot.
b) Suppose the lot contains 400 spots. What is the distribution of
the value of the tickets for a day?
c) What is the probability that more than $200 worth of tickets
are written in a day?
week 8
24
Law of Large Numbers - Example
• Toss a coin n times.
• Suppose

1
Xi  

0
if i th toss came up H
if i th toss came up T
• Xi’s are Bernoulli random variables with p = ½ and E(Xi) = ½.
1 n
• The proportion of heads is X n   X i .
n i 1
• Intuitively X n approaches ½ as n  ∞ .
STA286 week 8
25
Law of Large Numbers
• Interested in sequence of random variables X1, X2, X3,… such that the
random variables are independent and identically distributed (i.i.d).
Let
1 n
Xn   Xi
n i 1
Suppose E(Xi) = μ , V(Xi) = σ2, then
1 n
 1 n
E X n   E   X i    E  X i   
 n i 1  n i 1
and
1 n
 1
V X n   V   X i   2
 n i 1  n
n
V  X  
i 1
i
2
n
• Intuitively, as n  ∞, V  X n   0 so X n  E  X n   
STA286 week 8
26
• Formally, the Weak Law of Large Numbers (WLLN) states the following:
• Suppose X1, X2, X3,…are i.i.d with E(Xi) = μ < ∞ , V(Xi) = σ2 < ∞, then for
any positive number a


P Xn    a  0
as n  ∞ .
This is called Convergence in Probability.
STA286 week 8
27
Recall - The Chi Square distribution
• If Z ~ N(0,1) then, X = Z2 has a Chi-Square distribution with
parameter 1, i.e., X ~  21 .
• Can proof this using change of variable theorem for univariate
random variables.
• The moment generating function of X is
1/ 2
 1 
m X t   

1

2
t


• If X 1 ~  2v1  , X 2 ~  2v2  , , X k ~  2vk  , all independent then
k
T   X i ~  2k v
i 1
1 i
• Proof…
STA286 week 8
28
Claim
• Suppose X1, X2,…Xn are i.i.d normal random variables with mean μ
and variance σ2. Then, Z i  X i   are independent standard normal

variables, where i = 1, 2, …, n and
 Xi   
2
2
Z

~





i
n 
 
i 1
i 1 
n
n
2
• Proof: …
STA286 week 8
29
Sampling Distribution of S2
• Suppose X1, X2,…Xn are i.i.d normal random variables with mean μ
and variance σ2. Then,
n  1s 2
2

1
2
 X
n
i 1
 X  ~  2n1
2
i
• Further, it can be shown that X and s2 are independent.
STA286 week 8
30
t distribution
• Suppose Z ~ N(0,1) independent of X ~ χ2(n). Then, T 
Z
X /v
~ t v  .
• Proof: using one dimensional change of variables theorem.
• The density function of the t-distribution is given by…
STA286 week 8
31
Claim
• Suppose X1, X2,…Xn are i.i.d normal random variables with mean μ
and variance σ2. Then,
X 
~ tn1
S/ n
• Proof:
STA286 week 8
32
F distribution
• Suppose X ~ χ2(n) independent of Y ~ χ2(m). Then,
X /n
~ Fn ,m 
Y /m
• The density function of the F distribution is given by…
STA286 week 8
33
Properties of the F distribution
• The F-distribution is a right skewed distribution.
•
Fm,n  
1
Fn,m 
i.e. PFn ,m 
 1

1
1

 a   P
   P Fm,n   
F

a

  n ,m  a 
• Can use Table A.6 in appendix to find percentile of the F- distribution.
• Example…
STA286 week 8
34
Related documents