Download Statistics-2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Inductive probability wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

German tank problem wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Central limit theorem wikipedia , lookup

Misuse of statistics wikipedia , lookup

Law of large numbers wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
EML 4550: Engineering Design Methods
Probability and Statistics
in Engineering Design
Class Notes
Probability & Statistics for Engineers & Scientists,
by Walpole, Myers, Myers and Ye
EML4550 2007
1
Sampling Distributions
 Estimation of population (infinity or very large sample) based on sample
(limited ensemble) information
 Sampling distribution of mean y
 Note: we are going to specify the population mean as m and the population
standard deviation as s
 Central limit theorem:
If y is the mean of a random sample of size n taken from a population
(not necessaril y normal) with mean m and finite variance s 2 then
y-m
the limiting form of the variable : z 
as n   has a standard
s/ n
normal z - distributi on. (Note : n  30 if the original population is not normal)
EML4550 -- 2007
Example
 An engineering process is producing ball bearings that have an average
diameter of 1 cm and a standard deviation of 0.05 cm. Find the probability
that a random sample of 64 ball bearings will have an average diameter of
less than 0.99 cm.
y  0.99
ym
0.99  1
z

 1.6
s / n 0.05 / 64
Pr( y  0.99)  Pr( z  1.6)  0.0548
With a confidence level of 99%, what do you expect the range of the average of a
random sample of 36 ball bearings should be?
99% probabilit y for a normal z - distributi on. Due to symmetry, we should look up for
ym
z - value of 0.005 in the table  z*  -2.575  -2.575  z 
 2.575
s n
0.979 cm  y  1.021 cm
EML4550 -- 2007
Example
What is the probability that an elevator will overload when 12 people (randomly
selected) ride at the same time? Given average weight of the population is m=170
lbs, standard deviation is s=30 lbs and the capacity of the elevator is rated at 2400
lbs.
2400
y  m 200  170
 200, z 

 3.46
s
30
12
3.46
12
Accorinf to the Central Limit Theorem, it behaviors like z - normal distributi on
Pr( y  200  z  3.46)  0.00027  0.027%
Very low probabilit y that you will have 12 people with an average weight of 200 lbs
to ride an elevator t he same time.
y
Note : it is supposed to be a random sample from the population , excluding condition
such as members from a football team, sumo wrestler training camp, etc..
EML4550 -- 2007
Sampling Distribution of s2
Theorem :
If s2 is the variance of a random sample of size n taken from a normal population
having the variance s 2 , then the variable
 
2
( n  1) s 2
s2
n

i 1
( yi  y ) 2
s2
has a chi - squared distributi on with n - 1 degrees of freedom
The probability (a) that the sample produces a 2 value
greater than specific a2 value.
a
a2
EML4550 -- 2007
Chi-squared distribution
Chi-square table
As indicated in the table, it is significant if the 2 is large for given degree of freedom.
For example, with a two DOF only 5% probability (1 out of 20 chance) that 2 is
greater than 5.99. Therefore, it is likely that certain specifications are not correct
resulting a value > 5.99. For values that give probability of greater than 10%
probability, one can usually accept the hypothesis (without strong contradiction.)
EML4550 -- 2007
Example: check hypothesis
 A manufacturer company claims that it produces hard drives that have an
average life of 10 years with a standard deviation of 1 year. If six of these
drives examined have lifetime of 9.8, 10.5, 12, 8.8, 11.5, 10.2 years, can we
believe the company’s claim based on these samples? Assume normally
distributed population.
n
y  10.47, s 2 
2
(
y

y
)
 i
i 1
 1.343
n 1
( n  1) s 2 5(1.343)
2
 

 6.715
2
s
1
Check the chi-square table with 5 degrees of freedom (lost one DOF obtaining
the average.) One expects most (90%, between 5 and 95%) of the 2 value
will fall between 1.14 and 11.07, therefore, it is likely that the claim by the
company is correct (at least one does not have enough evidence to dispute.)
EML4550 -- 2007
Example: estimate variance
 A sample set of steel rods are measured to have the following lengths: 46.4, 45.8, 46.1, 46.9,
45.7, 46.0, 45.9 cm. Assuming the entire stock has a normal distribution, estimate the mean
and variance of the entire stock given the confidence level of 90%.
1 7
y   yi  46.11 (estimate m on your own using Student t - distributi on) ,
n i 1
1 7
s 
( yi  y ) 2  0.171

n  1 i 1
To have a 90% confidence level, we would like to have a probabilit y of 90%
that a sample will fall within given range. Due to symmetry, we expect tha t
5% will fall below and 5% will exceed the specified range.
2
2
2
From the chi - square table, for a six DOF :  0.05
 12.59,  0.95
 1.63
Therefore, at 90% confidence level for s 2
 (n  1) s 2 
2
  1.63  
   0.05  12.59
2
 s

Inverse the equation and times everything by (n - 1)s 2 
(6)(0.171)
(6)(0.171)
s 2 
 0.629  s 2  0.081
1.63
12.59
2
0.95
EML4550 -- 2007
Student t-distribution
 Central limit theorem is good given that the standard deviation s of a population is known.
However, knowledge about s usually is not available. Therefore, an estimation of s must be
used for the estimation of the sample average m. A new variable is defined to handle this
situation
T
ym
s/ n
 If the sample size is large, one would expect that s ~ s and the T-variable follows
closely to a standard normal distribution as defined in the central limit theorem. The
t-test is important when the sample size is small (n<30).
It is interestin g to explore the relationsh ip between a z - normal variable and
the T - variable :


 y-μ  s n 
y-μ
z

s
V/(n - 1)
s2 s 2
n
(n - 1)s 2
where V 
has chi - squared distributi on as defined earlier
2
T
s
EML4550 -- 2007
t-distribution
EML4550 -- 2007
Example: estimation based on small sample size
 The manufacturer collect 8 samples of the years of failure for a newly designed
advanced engine; they are: 15.4, 14.7, 18.1, 16.5, 17.2, 13.5, 15.8, 18.0. It is believed
that the lifetimes of all engines are normally distributed with an unknown standard
deviation. Therefore, we can not apply central limit theorem to estimate the average
lifetime of this engine. Instead, t-distribution is used instead. For a 90% confidence
level, estimate the average lifetime of the given engine.
Choose one-tail 0.05 or two-tails 0.10 (1-0.9=0.1) as the starting point.
The DOF is 8-1=7
That is, with a 90% probabilit y that T variables fall within this range
ym
my
 1.895  T 
 1.895  1.895 
 1.895
s/ n
s/ n




1.895 s / n  y  m  y  1.895 s / n ; y  16.15, s  1.62
17.23  m  15.07 or m  16.15  1.08 (year) with 90% confidence level
EML4550 -- 2007