Download Hypothesis Testing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Gibbs sampling wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Chapter 4: Inferences About
Process Quality
By: Tange Awbrey
IET 603
Statistics and Sampling Distributions
Parameters of a process are most likely:
1. Unknown
2. Change over time
Key Points
• Identification and estimation of the parameters of various
probability distributions in order to prove or disprove a
hypothesis.
• Use observations about samples to make inferences about
population.
Sampling Distribution
 This is the probability distribution of a statistic OR the
likelihood that the data will deviate from the average.
 Random samples are frequently used in this type of
analysis.
 Why random samples?
Same probability of being chosen as any other sample.
Parameters are unknown.
Observations are independent.
 Statistical Inference: To draw conclusions or make decisions
about a population based on a sample selected from the
population.
 Statistic: any function of the sample data that does not
contain unknown parameters.
i.e. Mean, Variance, and Standard Deviation
Normal Distribution
 This is a continuous distribution.
 The result is a bell curve.
 Types of Normal Distribution include:
1) Chi Squared Distribution
2) T Distribution
3) F Distribution
1) The Chi Squared Distribution uses independently distributed
random variables with an average of 0 and variance of 1.
 Degrees of freedom are described as K.
 K = The distribution of the sum of the squares of K OR how many
random variables are present.
 Very commonly used as Chi-Squared Tests to determine
Goodness of fit of an observed distribution to a theoretical one.
2) T Distribution
 Also uses degrees of freedom represented as K.
 When:
x = standard normal random variable
y = Chi Squared random variable with degrees of
freedom
And both variables are independent,
The random variable is distributed as T with K degrees
of freedom.
3) F Distribution
 When w and y as two independent chi-square random variables
with u and v degrees of freedom, the ratio is distributed as F with
u numerator degrees of freedom and v denominator degrees of
freedom.
 Used to make inferences about the variances of two normal
distributions.
Sampling from a Bernoulli Distribution
Bernoulli Trials
 Uses discrete data
 When a part is pulled from a population to determine its
probability of passing or failing.
Taken from: https://www.google.com/search?q=bernoulli+trials
x = 1 is a success
x = 0 is a failure
 The sample mean will be between 1 and 0.
 Distribution of mean can be taken from the binomial.
Sampling from A Poisson Distribution
 A Poisson Distribution is used to predict the number of defects
any given part will have when pulled from a population.
Taken from:
http://ned.ipac.caltech.edu
 The parameter of a Poisson Distribution is λ.
 Distribution of the sample sum = nλ.
 Distribution of sample mean is a discrete random variable that
takes on the values of 0, 1/n, 2/n, etc.
Point Estimation
Parameters
 Describe probability distributions
 Have to be estimated since they are unknown.
Ex. Mean and variance of a normal distribution are parameters.
 Point Estimator: a single numerical value that is the estimate of
an unknown parameter.
-Should be unbiased
-Have a minimum variance
Ex. A random sample is used to determine the sample mean and
variance in order to estimate these unknown parameters of the
population.
Statistical Inference for a Single Sample
Two categories which include:
1) Parameter Estimation
2) Hypothesis Testing
 Statistical Hypothesis: statement about values of the
parameters of a probability distribution.
 Null Hypothesis: testing what is to be rejected.
 Type 1 error: when hypothesis is rejected when true
OR probability that a good lot will be rejected.
 Type 2 error: when hypothesis is not rejected when
false OR probability of the consumer accepting a lot of poor
quality.
 Alternative Hypothesis: hypothesis about unknown parameters
is less than or greater than value.
Inference of the Mean of a Population, Variance Known
 Hypothesis Testing: test a hypothesis with a known variance in
order to find an unknown parameter.
 Make observations of the random variable taken from a random
sample.
 This is known as one-sample Z-test.
 Confidence Intervals:
 Uses an interval estimate in order to calculate an unknown
parameter.
 100 (1 – a)% represents the CI of L ≤ u ≤ U
 This represents two sides.
 A one sided CI is described as:
L ≤ u or
u≤U
P- Values
 Traditional way to report the results of a hypothesis test:
 The Null Hypothesis was or was not rejected at a level of
significance.
 P – values refer to the probability that the test statistic will take
on a value that is at least as extreme as the observed value of
the statistic when a Null Hypothesis is true.
 A conclusion can be made at any level of significance.
Inference on the Mean of a Normal Distribution, Variance Unknown
 Hypothesis Testing: Both variance and mean are unknown
Parameters.
 Test hypothesis that the mean equals a standard value.
 With an unknown variance, it is assumed that mean is normally
Distributed.
 Conducted using a one sample t- test.
Inference on the Variance of a Normal Distribution
 Hypothesis Testing: the variance of a normal distribution equals
a constant.
 Sample variance computed from a random sample of
Observations.
 Uses Chi-Square formula.
 If variance is less than or equal to a value, the variability of the
process will be within specification limits.
 If it exceeds the value, it will not be within limits
Inference for a Difference in Means of Two Normal Distributions,
Variances Unknown
 Assume populations are normally distributed.
 Hypothesis tests and CI's based on t- distribution.
Hypothesis Tests:
 First Test: assume variances are equal.
 Combining two sample variances creates a pooled estimator.
 This finds the weighted average of the two samples.
 The result is a t-distribution with 2 degrees of freedom under a
Null Hypothesis.
 A two sample, pooled t-test was used for this test.
 Two-sided CI using the pooled estimate of population standard
deviation, and upper percentage point of t distribution.
 Second Test: assume variances are unequal.
 No specific t- test for this situation.
 Test using above method with equal means.
 Can now use t as a statistic.
 Two-sided CI where v is given, and upper percentage point of
t distribution with v degrees of freedom.
Inferences of the Variances of 2 Normal Distributions
Hypothesis: assume variances are equal.
 Uses the f distribution method.
 Statistic is ratio of 2 sample variances.
 A null hypothesis is rejected when the f distribution reveals n1, n-1 degrees of freedom.
 Test statistic remains the same for alternative hypotheses
when using f distribution.
CI: 100(1 – a)%
 2- sided where the percentage point of f distribution with u
and v degrees of freedom.
Inference of Two Population Proportions
 Includes two binomial parameters of interest.
 Used for large sample testing using multiple populations.
 Estimators of each population with have normal distributions.
 A 100( a – 1)% two- sided CI is constructed on the difference of
each proportion.
 Results are based on the binomial distribution.
Taken from: http://trendingsideways.com
What if There are More than Two Populations? The Analysis of
Variance
In quality control and engineering testing, there are often multiple
aspects of an experiment.
Ex. When testing two heat treating methods, the single factor of
interest (heat treating methods) which contains two levels of that
factor (the two methods).
Analysis of Variance is used for comparing means when there are
two or more levels of a single factor.
Linear statistical models are used to describe a set of
observations within a single factor experiment.
Taken from:
http://www.philender.com
More about single factor experiments:
 Observations are taken in random order.
 The environment in which the experiment takes place is called
the Completely randomized experiment design.
 Fixed effects model analysis of variances (ANOVA) is used to
test the equality of population means.
 Variability within sample data is divided into two parts.
 Hypothesis testing is based on a comparison of the two
estimates of the population variance.
 Total variability in the data is described by the total sum of
squares.
Linear Regression Models
 It is important to explore relationships between variables.
 Sometimes, true relationships are unknown.
 Regression models describe relationships between variables
within a sample of data.
 They are widely used in quality and process improvement.
 Multiple linear regression model
- contains two independent variables.
- Independent variables are also known as predictor
variables or regressors.
- Parameters are called regression coefficients.
Estimation of Parameters
 Model fitting: method of estimating parameters in multiple
linear regression models.
 Least squares: method to estimate the regression coefficients
in a multiple linear regression model.
-this minimizes the sum of the squares of errors.
-produces an unbiased estimator of the parameter in this
model, β.
-linear combination of observations.
 Hypothesis Testing:
 Test for significance of regression (determine a linear
relationship between a set of variables).
Regression Model Diagnostics
 Model Adequacy Checking important in order to analyze sets of
data properly.
-Aids in building regression models.
-Ensures fitted model provides an adequate approximation to
the true system.
-Verifies that the least squares regression assumptions are not
violated.
 Scaled Residuals and PRESS:
 Residuals provide the most information.
 One type is the standardized residual.
- used to discover outliers (unusual, result of error or
problems).
- residuals are most likely found between -3 and 3.
PRESS = Prediction Error Sum of Squares.
 Calculated prediction error for each observation.
-produces a set of PRESS residuals.
 PRESS statistic becomes the sum of squares of the PRESS
residuals.
 All possible subsets of observations are used as an
estimation data set.
- Each observation then becomes a prediction data set.
- A prediction based on PRESS is known as R2.
 Can calculate PRESS with least squares fit to all
observations.
References
http://trendingsideways.com
http://www.philender.com
https://www.google.com/search?q=bernoulli+trials
http://ned.ipac.caltech.edu
Douglass C. Montgomery, Intro to Statistical Quality Control.