Download Statistical Review

Statistical Review • We will be working with two types of probability distributions: • Discrete distributions – If the random variable of interest can take a countable number of values (e.g., number of defects) it is modeled with a discrete distribution. – We will use two such distributions • Binomial Distribution • Poisson Distribution • Continuous distributions – If the random variable of interest can take an infinite number of value (e.g., the diameter of a machined part) it is modeled with a continuous distribution – We will use one such distribution • Normal Distribution The Binomial Distribution • If the random variable of interest can take one of two values (e.g., heads or tails, defective or not defective) the binomial distribution is appropriate. • The binomial distribution is described by two parameters: p = probability of a success on a given trial n = the number of trials • If we denote A as the number of successes in n trials then A is said to have a binomial distribution with: mean = E[A] = np Variance[A] = np(1-p) Standard Deviation [A] = np(1-p) The Excel Function for the Binomial Distribution is BINOMDIST • • • • BINOMDIST(A,n,p,cumulative) A is the number of successes in trials. n is the number of independent trials. p is the probability of success on each trial. • Cumulative is a logical value that determines the form of the function. If cumulative is TRUE, then BINOMDIST returns the cumulative distribution function, which is the probability that there are A successes or less; if FALSE, it returns the probability mass function, which is the probability that there are A successes. Examples • A manufacturing process is estimated to produce 5% noncomforming items. If a random sample of 5 items is chosen: – What is the probability of getting 2 noncomforming items in the sample? – What is the probability of getting between 1 and 3 non comforming items? – What is the probability of getting one or more noncomforming items? Proportion Defective and the Binomial Distribution • We often use the proportion defective rather than the number defective in SPC. This is expressed as: p(bar) = x/n Where X follows a binomial distribution with parameters p and n and x is an observed value of X. The mean of p(bar) is p and the variance is p(1-p)/n The probability that p(bar) <= a = probability that x <= na Example • A process produces an average of 2% defective units what is the probability that a sample of 10 will have more than 5% defective? The Poisson Distribution • As n becomes large the binomial distribution approaches the poisson distribution. Therefore, the poisson distribution is often used to model the number of defects within a product (e.g., where there is a potential for a large number of defects). • The Poisson distribution is described by one parameter:  = the average number of defects • If we denote A as the number of defects on a given product the then A is said to have a poisson distribution with: mean = E[A] =  Variance[A] =  Standard Deviation [A] =   The Excel Function for the Poisson Distribution is POISSON • • • • POISSON(A,,cumulative) A is the number of events.  is the mean. Cumulative is a logical value that determines the form of the probability distribution returned. If cumulative is TRUE, POISSON returns the cumulative Poisson probability that the number of random events occurring will be between zero and A inclusive; if FALSE, it returns the Poisson probability mass function that the number of events occurring will be exactly A. Examples • The average number of defects in a computer produced by an assembly process is known to be 10. • What is the probability of finding 6 defects? • What is the probability of finding between 2 an 12 defects • what is the probability of finding less than 3 defects? The Normal Distribution • The normal distribution has two parameters:  = mean 2 = variance  = standard deviation A number of functions are available in Excel for working with the normal distribution: NORMDIST NORMSDIST NORMINV NORMSINV NORMDIST • NORMDIST(x,mean,standard_dev,cumula tive) • X is the value for which you want the distribution. • Mean is the arithmetic mean of the distribution. • Standard_dev is the standard deviation of the distribution. • Cumulative is a logical value that determines the form of the function. If cumulative is TRUE, NORMDIST returns the cumulative distribution function; if FALSE, it returns the probability mass function. • NORMDIST(42,40,1.5,TRUE) equals 0.908789 NORMSDIST • NORMSDIST(z) • Z is the value for which you want the distribution. • NORMSDIST(1.333333) equals 0.908789 NORMINV • NORMINV(probability,mean,standard_de v) • Probability is a probability corresponding to the normal distribution. • Mean is the arithmetic mean of the distribution. • Standard_dev is the standard deviation of the distribution. • NORMINV(0.908789,40,1.5) equals 42 NORMSINV • NORMSINV(probability) • Probability is a probability corresponding to the normal distribution. • NORMSINV(0.908789) equals 1.3333 Enumerative vs. Analytical • Enumerative Studies: Statistical investigations that lead to action on static populations (e.g., calculate income rates by area) – Time specific and static – There is no reference to the future • Analytic Studies: Statistical investigations that lead to action on dynamic populations (e.g., why productivity is low and how can it be increased?) – If a 100% sample of the population answers the question under investigation, the study is enumerative; otherwise the study is analytic – Focuses on causes of patterns and variations that take place over different areas, over periods of time, etc. – Focus is on the future not the present – Since future process output does not exist, it can not be part of the population. Sampling Distributions • Sample statistics are used to draw conclusions about population parameters • The sample mean is used to draw conclusions about the population mean • The sample variance is used to draw conclusions about the population variance • The behavior of these statistics over repeated samples is referred to as the sampling distribution of the statistic. Control Charts are Representations of Sampling Distributions • The center line on a control chart is an estimate of the the mean of the sampling distribution – The central limit theorem tells us that the mean of the sampling distribution is a good point estimate of the mean of the population. • Interval Estimation – An interval estimate or (confidence interval) is defined by two endpoints such that the probability of the parameter of interest being contained in the interval is of some value (e.g., 99%) – The control limits on control chart are an example of an interval estimate. They are a function of the point estimates of the mean and standard deviation of the sampling distribution. Example • The mean of the sampling distribution of the width of a machined shaft is estimated to be 10 inches with a standard deviation of .003 inches. Construct an confidence interval such that you would expect 95% of the means of samples of shafts to be contained within its limits. Hypothesis Testing • Each time we plot a value on a control chart we are testing a hypothesis. • Classical hypothesis testing involves 4 steps: – Formulate the null and alternative hypotheses – Determine the test statistic – Determine the rejection region of the null hypothesis based on a chosen level of significance,  – Make a decision. Formulating the Null Hypothesis • In any hypothesis testing problem there are two hypotheses – The null hypothesis, Ho , is the proposition being tested. – The alternative hypothesis, Ha , is formulated as a contradiction to the null hypothesis. • Example: We expect the mean length of a shaft to be 30 mm. We are interested in determining whether the mean length of a sample of shafts differs from 30 mm: Ho = 30 mm Ha  30 mm Determining the Test Statistic • There are many possible test statistics depending on assumptions made and the parameter being tested. We will use the following statistic to test the hypothesis our hypothesis that the sample mean does not differ from the population mean: zo  X  o X Determining the Rejection Region • The rejection region is determined by the choice of , the significance of the test. Assuming the sample measurements are distributed normally about the population mean (central limit theorem), the normal distribution is referenced to determine which points constitute a (1 - ) confidence interval about the population mean. The area beyond these points is considered the rejection region Make a Decision • If the test statistic falls in the rejection region: Reject Ho • If the test statistic does not fall in the rejection region: Fail to reject Ho • Example: Suppose the estimate of the population standard deviation is 2mm and a sample produces a mean of 25mm. At the .05 level of significance what do you conclude? Errors in Hypothesis Testing • There are two types of errors in hypothesis testing – Type I error refers to the probability of rejecting the null hypothesis when it is actually true. What is the type I error associated with the hypothesis we tested? What are the implications to quality control? – Type II error refers to the probability of failing to reject the null hypothesis when it is actually false What is the type II error associated with the hypothesis we tested? What are the implications to quality control?

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Statistical Review