Download Statistical Review

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Statistical Review
• We will be working with two types of
probability distributions:
• Discrete distributions
– If the random variable of interest can take a
countable number of values (e.g., number of
defects) it is modeled with a discrete
distribution.
– We will use two such distributions
• Binomial Distribution
• Poisson Distribution
• Continuous distributions
– If the random variable of interest can take an
infinite number of value (e.g., the diameter of a
machined part) it is modeled with a continuous
distribution
– We will use one such distribution
• Normal Distribution
The Binomial Distribution
• If the random variable of interest can take
one of two values (e.g., heads or tails,
defective or not defective) the binomial
distribution is appropriate.
• The binomial distribution is described by
two parameters:
p = probability of a success on a given trial
n = the number of trials
• If we denote A as the number of successes
in n trials then A is said to have a binomial
distribution with:
mean = E[A] = np
Variance[A] = np(1-p)
Standard Deviation [A] = np(1-p)
The Excel Function for the
Binomial Distribution is
BINOMDIST
•
•
•
•
BINOMDIST(A,n,p,cumulative)
A is the number of successes in trials.
n is the number of independent trials.
p is the probability of success on each
trial.
• Cumulative is a logical value that
determines the form of the function. If
cumulative is TRUE, then BINOMDIST
returns the cumulative distribution
function, which is the probability that there
are A successes or less; if FALSE, it
returns the probability mass function,
which is the probability that there are A
successes.
Examples
• A manufacturing process is estimated to
produce 5% noncomforming items. If a
random sample of 5 items is chosen:
– What is the probability of getting 2
noncomforming items in the sample?
– What is the probability of getting between 1 and
3 non comforming items?
– What is the probability of getting one or more
noncomforming items?
Proportion Defective and the
Binomial Distribution
• We often use the proportion defective
rather than the number defective in SPC.
This is expressed as:
p(bar) = x/n
Where X follows a binomial distribution
with parameters p and n and x is an
observed value of X.
The mean of p(bar) is p and the variance is
p(1-p)/n
The probability that p(bar) <= a =
probability that x <= na
Example
• A process produces an average of 2%
defective units what is the probability that
a sample of 10 will have more than 5%
defective?
The Poisson Distribution
• As n becomes large the binomial
distribution approaches the poisson
distribution. Therefore, the poisson
distribution is often used to model the
number of defects within a product (e.g.,
where there is a potential for a large
number of defects).
• The Poisson distribution is described by
one parameter:
 = the average number of defects
• If we denote A as the number of defects on
a given product the then A is said to have a
poisson distribution with:
mean = E[A] = 
Variance[A] = 
Standard Deviation [A] =  
The Excel Function for the
Poisson Distribution is
POISSON
•
•
•
•
POISSON(A,,cumulative)
A is the number of events.
 is the mean.
Cumulative is a logical value that
determines the form of the probability
distribution returned. If cumulative is
TRUE, POISSON returns the cumulative
Poisson probability that the number of
random events occurring will be between
zero and A inclusive; if FALSE, it returns
the Poisson probability mass function that
the number of events occurring will be
exactly A.
Examples
• The average number of defects in a
computer produced by an assembly
process is known to be 10.
• What is the probability of finding 6
defects?
• What is the probability of finding between
2 an 12 defects
• what is the probability of finding less than
3 defects?
The Normal Distribution
• The normal distribution has two
parameters:
 = mean
2 = variance
 = standard deviation
A number of functions are available in Excel
for working with the normal distribution:
NORMDIST
NORMSDIST
NORMINV
NORMSINV
NORMDIST
• NORMDIST(x,mean,standard_dev,cumula
tive)
• X is the value for which you want the
distribution.
• Mean is the arithmetic mean of the
distribution.
• Standard_dev is the standard deviation of
the distribution.
• Cumulative is a logical value that
determines the form of the function. If
cumulative is TRUE, NORMDIST returns
the cumulative distribution function; if
FALSE, it returns the probability mass
function.
• NORMDIST(42,40,1.5,TRUE) equals
0.908789
NORMSDIST
• NORMSDIST(z)
• Z is the value for which you want the
distribution.
• NORMSDIST(1.333333) equals 0.908789
NORMINV
• NORMINV(probability,mean,standard_de
v)
• Probability is a probability corresponding
to the normal distribution.
• Mean is the arithmetic mean of the
distribution.
• Standard_dev is the standard deviation of
the distribution.
• NORMINV(0.908789,40,1.5) equals 42
NORMSINV
• NORMSINV(probability)
• Probability is a probability corresponding
to the normal distribution.
• NORMSINV(0.908789) equals 1.3333
Enumerative vs. Analytical
•
Enumerative Studies: Statistical investigations that lead
to action on static populations (e.g., calculate income
rates by area)
– Time specific and static
– There is no reference to the future
•
Analytic Studies: Statistical investigations that lead to
action on dynamic populations (e.g., why productivity is
low and how can it be increased?)
– If a 100% sample of the population answers the question
under investigation, the study is enumerative; otherwise
the study is analytic
– Focuses on causes of patterns and variations that take place
over different areas, over periods of time, etc.
– Focus is on the future not the present
– Since future process output does not exist, it can not be part
of the population.
Sampling Distributions
• Sample statistics are used to draw
conclusions about population parameters
• The sample mean is used to draw
conclusions about the population mean
• The sample variance is used to draw
conclusions about the population variance
• The behavior of these statistics over
repeated samples is referred to as the
sampling distribution of the statistic.
Control Charts are
Representations of Sampling
Distributions
• The center line on a control chart is an
estimate of the the mean of the sampling
distribution
– The central limit theorem tells us that the mean
of the sampling distribution is a good point
estimate of the mean of the population.
• Interval Estimation
– An interval estimate or (confidence interval) is
defined by two endpoints such that the
probability of the parameter of interest being
contained in the interval is of some value (e.g.,
99%)
– The control limits on control chart are an
example of an interval estimate. They are a
function of the point estimates of the mean and
standard deviation of the sampling distribution.
Example
• The mean of the sampling distribution of
the width of a machined shaft is estimated
to be 10 inches with a standard deviation
of .003 inches. Construct an confidence
interval such that you would expect 95%
of the means of samples of shafts to be
contained within its limits.
Hypothesis Testing
• Each time we plot a value on a control
chart we are testing a hypothesis.
• Classical hypothesis testing involves 4
steps:
– Formulate the null and alternative hypotheses
– Determine the test statistic
– Determine the rejection region of the null
hypothesis based on a chosen level of
significance, 
– Make a decision.
Formulating the Null
Hypothesis
• In any hypothesis testing problem there are
two hypotheses
– The null hypothesis, Ho , is the proposition
being tested.
– The alternative hypothesis, Ha , is formulated as
a contradiction to the null hypothesis.
• Example: We expect the mean length of a
shaft to be 30 mm. We are interested in
determining whether the mean length of a
sample of shafts differs from 30 mm:
Ho = 30 mm
Ha  30 mm
Determining the Test Statistic
• There are many possible test statistics
depending on assumptions made and the
parameter being tested. We will use the
following statistic to test the hypothesis
our hypothesis that the sample mean does
not differ from the population mean:
zo 
X  o
X
Determining the Rejection
Region
• The rejection region is determined by the
choice of , the significance of the test.
Assuming the sample measurements are
distributed normally about the population
mean (central limit theorem), the normal
distribution is referenced to determine
which points constitute a (1 - )
confidence interval about the population
mean. The area beyond these points is
considered the rejection region
Make a Decision
• If the test statistic falls in the rejection
region: Reject Ho
• If the test statistic does not fall in the
rejection region: Fail to reject Ho
• Example: Suppose the estimate of the
population standard deviation is 2mm and
a sample produces a mean of 25mm. At
the .05 level of significance what do you
conclude?
Errors in Hypothesis Testing
• There are two types of errors in hypothesis
testing
– Type I error refers to the probability of rejecting
the null hypothesis when it is actually true.
What is the type I error associated with the
hypothesis we tested?
What are the implications to quality control?
– Type II error refers to the probability of failing
to reject the null hypothesis when it is actually
false
What is the type II error associated with the
hypothesis we tested?
What are the implications to quality control?