* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Hypothesis Testing
Psychometrics wikipedia , lookup
History of statistics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Foundations of statistics wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Gibbs sampling wikipedia , lookup
Misuse of statistics wikipedia , lookup
Chapter 4: Inferences About Process Quality By: Tange Awbrey IET 603 Statistics and Sampling Distributions Parameters of a process are most likely: 1. Unknown 2. Change over time Key Points • Identification and estimation of the parameters of various probability distributions in order to prove or disprove a hypothesis. • Use observations about samples to make inferences about population. Sampling Distribution This is the probability distribution of a statistic OR the likelihood that the data will deviate from the average. Random samples are frequently used in this type of analysis. Why random samples? Same probability of being chosen as any other sample. Parameters are unknown. Observations are independent. Statistical Inference: To draw conclusions or make decisions about a population based on a sample selected from the population. Statistic: any function of the sample data that does not contain unknown parameters. i.e. Mean, Variance, and Standard Deviation Normal Distribution This is a continuous distribution. The result is a bell curve. Types of Normal Distribution include: 1) Chi Squared Distribution 2) T Distribution 3) F Distribution 1) The Chi Squared Distribution uses independently distributed random variables with an average of 0 and variance of 1. Degrees of freedom are described as K. K = The distribution of the sum of the squares of K OR how many random variables are present. Very commonly used as Chi-Squared Tests to determine Goodness of fit of an observed distribution to a theoretical one. 2) T Distribution Also uses degrees of freedom represented as K. When: x = standard normal random variable y = Chi Squared random variable with degrees of freedom And both variables are independent, The random variable is distributed as T with K degrees of freedom. 3) F Distribution When w and y as two independent chi-square random variables with u and v degrees of freedom, the ratio is distributed as F with u numerator degrees of freedom and v denominator degrees of freedom. Used to make inferences about the variances of two normal distributions. Sampling from a Bernoulli Distribution Bernoulli Trials Uses discrete data When a part is pulled from a population to determine its probability of passing or failing. Taken from: https://www.google.com/search?q=bernoulli+trials x = 1 is a success x = 0 is a failure The sample mean will be between 1 and 0. Distribution of mean can be taken from the binomial. Sampling from A Poisson Distribution A Poisson Distribution is used to predict the number of defects any given part will have when pulled from a population. Taken from: http://ned.ipac.caltech.edu The parameter of a Poisson Distribution is λ. Distribution of the sample sum = nλ. Distribution of sample mean is a discrete random variable that takes on the values of 0, 1/n, 2/n, etc. Point Estimation Parameters Describe probability distributions Have to be estimated since they are unknown. Ex. Mean and variance of a normal distribution are parameters. Point Estimator: a single numerical value that is the estimate of an unknown parameter. -Should be unbiased -Have a minimum variance Ex. A random sample is used to determine the sample mean and variance in order to estimate these unknown parameters of the population. Statistical Inference for a Single Sample Two categories which include: 1) Parameter Estimation 2) Hypothesis Testing Statistical Hypothesis: statement about values of the parameters of a probability distribution. Null Hypothesis: testing what is to be rejected. Type 1 error: when hypothesis is rejected when true OR probability that a good lot will be rejected. Type 2 error: when hypothesis is not rejected when false OR probability of the consumer accepting a lot of poor quality. Alternative Hypothesis: hypothesis about unknown parameters is less than or greater than value. Inference of the Mean of a Population, Variance Known Hypothesis Testing: test a hypothesis with a known variance in order to find an unknown parameter. Make observations of the random variable taken from a random sample. This is known as one-sample Z-test. Confidence Intervals: Uses an interval estimate in order to calculate an unknown parameter. 100 (1 – a)% represents the CI of L ≤ u ≤ U This represents two sides. A one sided CI is described as: L ≤ u or u≤U P- Values Traditional way to report the results of a hypothesis test: The Null Hypothesis was or was not rejected at a level of significance. P – values refer to the probability that the test statistic will take on a value that is at least as extreme as the observed value of the statistic when a Null Hypothesis is true. A conclusion can be made at any level of significance. Inference on the Mean of a Normal Distribution, Variance Unknown Hypothesis Testing: Both variance and mean are unknown Parameters. Test hypothesis that the mean equals a standard value. With an unknown variance, it is assumed that mean is normally Distributed. Conducted using a one sample t- test. Inference on the Variance of a Normal Distribution Hypothesis Testing: the variance of a normal distribution equals a constant. Sample variance computed from a random sample of Observations. Uses Chi-Square formula. If variance is less than or equal to a value, the variability of the process will be within specification limits. If it exceeds the value, it will not be within limits Inference for a Difference in Means of Two Normal Distributions, Variances Unknown Assume populations are normally distributed. Hypothesis tests and CI's based on t- distribution. Hypothesis Tests: First Test: assume variances are equal. Combining two sample variances creates a pooled estimator. This finds the weighted average of the two samples. The result is a t-distribution with 2 degrees of freedom under a Null Hypothesis. A two sample, pooled t-test was used for this test. Two-sided CI using the pooled estimate of population standard deviation, and upper percentage point of t distribution. Second Test: assume variances are unequal. No specific t- test for this situation. Test using above method with equal means. Can now use t as a statistic. Two-sided CI where v is given, and upper percentage point of t distribution with v degrees of freedom. Inferences of the Variances of 2 Normal Distributions Hypothesis: assume variances are equal. Uses the f distribution method. Statistic is ratio of 2 sample variances. A null hypothesis is rejected when the f distribution reveals n1, n-1 degrees of freedom. Test statistic remains the same for alternative hypotheses when using f distribution. CI: 100(1 – a)% 2- sided where the percentage point of f distribution with u and v degrees of freedom. Inference of Two Population Proportions Includes two binomial parameters of interest. Used for large sample testing using multiple populations. Estimators of each population with have normal distributions. A 100( a – 1)% two- sided CI is constructed on the difference of each proportion. Results are based on the binomial distribution. Taken from: http://trendingsideways.com What if There are More than Two Populations? The Analysis of Variance In quality control and engineering testing, there are often multiple aspects of an experiment. Ex. When testing two heat treating methods, the single factor of interest (heat treating methods) which contains two levels of that factor (the two methods). Analysis of Variance is used for comparing means when there are two or more levels of a single factor. Linear statistical models are used to describe a set of observations within a single factor experiment. Taken from: http://www.philender.com More about single factor experiments: Observations are taken in random order. The environment in which the experiment takes place is called the Completely randomized experiment design. Fixed effects model analysis of variances (ANOVA) is used to test the equality of population means. Variability within sample data is divided into two parts. Hypothesis testing is based on a comparison of the two estimates of the population variance. Total variability in the data is described by the total sum of squares. Linear Regression Models It is important to explore relationships between variables. Sometimes, true relationships are unknown. Regression models describe relationships between variables within a sample of data. They are widely used in quality and process improvement. Multiple linear regression model - contains two independent variables. - Independent variables are also known as predictor variables or regressors. - Parameters are called regression coefficients. Estimation of Parameters Model fitting: method of estimating parameters in multiple linear regression models. Least squares: method to estimate the regression coefficients in a multiple linear regression model. -this minimizes the sum of the squares of errors. -produces an unbiased estimator of the parameter in this model, β. -linear combination of observations. Hypothesis Testing: Test for significance of regression (determine a linear relationship between a set of variables). Regression Model Diagnostics Model Adequacy Checking important in order to analyze sets of data properly. -Aids in building regression models. -Ensures fitted model provides an adequate approximation to the true system. -Verifies that the least squares regression assumptions are not violated. Scaled Residuals and PRESS: Residuals provide the most information. One type is the standardized residual. - used to discover outliers (unusual, result of error or problems). - residuals are most likely found between -3 and 3. PRESS = Prediction Error Sum of Squares. Calculated prediction error for each observation. -produces a set of PRESS residuals. PRESS statistic becomes the sum of squares of the PRESS residuals. All possible subsets of observations are used as an estimation data set. - Each observation then becomes a prediction data set. - A prediction based on PRESS is known as R2. Can calculate PRESS with least squares fit to all observations. References http://trendingsideways.com http://www.philender.com https://www.google.com/search?q=bernoulli+trials http://ned.ipac.caltech.edu Douglass C. Montgomery, Intro to Statistical Quality Control.