Download standard deviation of the sampling distribution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Inductive probability wikipedia , lookup

Law of large numbers wikipedia , lookup

Misuse of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Probability amplitude wikipedia , lookup

Transcript
Six Sigma Training
Dr. Robert O. Neidigh
Dr. Robert Setaputra
Variable Types – Page 235

Attribute Data – a variable is either classified into
categories or used to count occurrences of a
phenomenon, also referred to as classification or
categorical data. Examples: gender, reasons for
defects, and votes for candidates

Measurement Data – results from a measurement
taken on an item or person of interest, also called
continuous or variables data. Examples: height,
weight, temperature, and cycle time
Measures of Central Tendency

Measures that try to describe or quantify the middle
of a data set.
Measures of Central Tendency






Mean – average of all data points
Median – value such that at least half the data points are less
than or equal to the value and at least half the data points are
greater than or equal to the value
Mode – value in the data set that occurs most frequently
First Quartile – value such that at least 25% of the data points
are less than or equal to the value and at least 75% of the data
points are greater than or equal to the value
Third Quartile – value such that at least 75% of the data points
are less than or equal to the value and at least 25% of the data
points are greater than or equal to the value
Minitab example on Page 249
Measures of Variation

Measures that try to describe or quantify the
amount of spread or variation in a data set
Measures of Variation



Range – distance from the smallest data point to the largest
data point
Variance and Standard Deviation – measure of how much the
data points fluctuate around the mean
Minitab example on Page 249
What is standard deviation?







Standard deviation is a measure of variation within
a data set.
The larger the standard deviation, the more
variation in the data set and vice versa.
Technically, standard deviation is a measure of
variation about the mean.
Roughly speaking, standard deviation is the average
distance between each data point and the mean.
Motivate measure of variation through examples of
small data sets.
Population – divide by n, sample – divide by n - 1
Show students Normal.xls file.
Continuous Probability Distributions








Can assume an infinite number of values within a
given range
Probability of any one point is zero
Probabilities are measured over intervals
Area under curve defines probability
Use calculus to calculate probabilities
Ugh!!!
Normal probability distribution is one type
Fortunately, probabilities already calculated and
contained in a table for normal distribution
Characteristics of Normal Probability
Distribution
1)
2)
3)
4)
5)
6)
Bell-shaped
Symmetrical
Mean, median, and mode are the same
Asymptotic – tails never touch X-axis
Completely described by its two parameters –
mean(µ) and standard deviation(σ)
There are an infinite number of possible normal
probability distributions
How do we calculate probabilities?
Since there are an infinite number of normal distributions, how can we
possibly calculate probabilities for all of them? Fortunately, there is a unique
characteristic of all normal distributions that allows us to do so. The
probability of having a value above/below a point that is X standard
deviations above/below the mean is the same for every possible normal
distribution. The probabilities for the standard normal distribution (µ = 0 and
σ = 1) can be used for every other normal distribution. These probabilities can
be found in the standard normal probability table. Our task is to convert every
normal distribution to the standard normal, this is called standardizing.
How do we standardize?
The distance between any point on our normal distribution of interest and the
mean is found. We now want to put this distance in units of standard
deviation, to do so we divide the distance between the point and the mean by
our standard deviation. This value is called a Z-value and tells us how many
standard deviations above or below the mean a point is. If the z-value is
positive, the point is above the mean and if the z-value is negative the point is
below the mean.
Z-value = (point minus the mean)/standard deviation
The standard normal table always gives the probability of having a value less
than the Z-value.
Finding the probability of having a
value less than a given point





Find the Z-value for the given point
The Z-value lets us know how many standard
deviations above/below the mean the point is
Look up the probability in the standard normal
table
This is the probability of having a value less than
the given point
μ = 70 and σ = 10, find probability of having a value
less than 66
40
50
60
70
80
90
100
-3
-2
-1
0
1
2
3
Finding the probability of having a
value greater than a given point






Find the Z-value for the given point
The Z-value lets us know how many standard
deviations above/below the mean the point is
Look up the probability in the standard normal
table
This is the probability of having a value less than
the given point
Subtract this probability from one to find the
probability of having a point greater than the given
point
μ = 70 and σ = 10, find probability of having a value
greater than 56
40
50
60
70
80
90
100
-3
-2
-1
0
1
2
3
Finding the probability of having a
value between two points







Find the Z-values for the given points
The Z-values let us know how many standard
deviations above/below the mean the points are
Look up the probabilities in the standard normal
table for the two Z-values
These are the probabilities of having a value less
than the given point associated with each Z-value
Subtract the probability associated with the smallest
Z-value from the probability associated with the
largest Z-value
This is the probability of having a value between the
two points
μ = 70 and σ = 10, find probability of having a value
between 57 and 76
40
50
60
70
80
90
100
-3
-2
-1
0
1
2
3
Finding the point on a normal
distribution associated with a given
probability





Find the probability in the standard normal table
Find the Z-value associated with the probability
Convert the Z-value to a point on the normal
distribution
Mean plus (Z-value times standard deviation)
μ = 70 and σ = 10, find the value such that 70% of
the charge amounts will be greater than that
amount
40
50
60
70
80
90
100
-3
-2
-1
0
1
2
3
Sampling Methods
Reasons for sampling:





Too time consuming to check entire population
Too expensive to check entire population
Sample results are adequate
Destructive testing
Impossible to check entire population
Sampling Definitions
Simple random sample – each item in the population has
the same probability of being selected
Sampling error – difference between a sample mean and
the population mean
Sampling distribution of the sample mean – probability
distribution of all possible sample means of a given
sample size
Standard error of the mean – standard deviation of the
sampling distribution of sample means (average
sampling error)
When is sampling distribution
normal?
If population distribution is normal, then sampling
distribution is normal for any sample size
If sample size is greater than or equal to thirty, then
sampling distribution is always normal
Properties of normal sampling
distribution?
Sampling distribution mean (µx-bar) equals population
mean (µ)
Standard error (σx-bar) equals population standard
deviation (σ) divided by the square root of the
sample size (n)
Once we know the mean and standard error of the
sampling distribution and we know it is normally
distributed we are set to compute probabilities
Notation
X  
X  / n
Example



Captain D’s tuna is sold in cans that have a net weight of 8 ounces.
The weights are normally distributed with a mean of 8.025 ounces and a
standard deviation of 0.125 ounces.
You take a sample of 36 cans.
Example – Cont.
 X  8.025
 X  0.125 / 36  0.020833
Example – Cont.

What is the probability of having a sample mean greater than 8.03
ounces?
7.962
-3
7.983
8.004
8.025
-2
-1
0
8.046
1
8.067
2
8.088
3
Example – Cont.

What is the probability of having a sample mean less than 7.995
ounces?
7.962
-3
7.983
8.004
8.025
-2
-1
0
8.046
1
8.067
2
8.088
3
Example – Cont.

What is the probability of having a sample mean between 7.995 ounces
and 8.03 ounces?
7.962
-3
7.983
8.004
8.025
-2
-1
0
8.046
1
8.067
2
8.088
3
Hypothesis Testing
Hypothesis – a statement about a population developed
for the purpose of testing
Hypothesis test – a procedure based on sample evidence
and probability theory to determine whether the
hypothesis is a reasonable statement
Key Point – Anytime a decision is made about a
population based upon sample data an incorrect decision
may be made
Type I and Type II Errors
Type I Error – rejecting a true null hypothesis
Type II Error – accepting a false null hypothesis
Unfortunately, in hypothesis testing the probability of a
Type I Error (α) is inversely related to the probability of a
Type II Error (β). If we decrease the probability of a Type
I Error, then the probability of a Type II Error increases
and vice versa.
What are Type I and Type II errors in the U.S. Legal
System?