Download Lecture 11 11092016

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
MAT 135
Introductory Statistics and Data Analysis
Adjunct Instructor
Kenneth R. Martin
Lecture 11
November 9, 2016
Agenda
• Housekeeping
– Exam #2
– Readings
• Chapter 1, 14, 10, 2, 3, 4 & 5
Confidential - Kenneth R. Martin
Housekeeping
• Exam #2
Confidential - Kenneth R. Martin
Housekeeping
•
•
•
•
•
•
•
Read, Chapter 1.1 – 1.4
Read, Chapter 14.1 – 14.2
Read, Chapter 10.1
Read, Chapter 2
Read, Chapter 3
Read, Chapter 4
Read, Chapter 5
Confidential - Kenneth R. Martin
Continuous vs. Discrete vs. Attribute Data
Continuous
infinite # of possible measurements in a continuum
Discrete:
Count
Discrete:
Ordinal
0
0
1
1
4
3
2
“low”/“small”/“short”
Discrete:
Nominal or Group A
Categorical
Attribute:
Binary
2
Group B
3
4
5
7
6
5
6
“medium” / “mid”
Group C
Group D
7
8
8
Group E
10
Group F
“good”/“go”/”group #2
defines TWO groups - no order
Confidential - Kenneth R. Martin
9
10
“high”/”large”/”tall”
defines several groups - no order
“bad”/“no-go”/”group #1”
9
Discrete Probability Distribution
Examples:
Confidential - Kenneth R. Martin
Discrete Probability Distribution
Examples:
Confidential - Kenneth R. Martin
Discrete Probability Distribution
Examples:
Confidential - Kenneth R. Martin
Discrete Probability Distribution
Theorem 1:
•
Probability of 1.000 means an event is certain to
occur
•
Probability of 0 means the event is certain to NOT
occur.
Therefore:
0  P(E)  1
Confidential - Kenneth R. Martin
Discrete Probability Distribution
Theorem 5:
•
The total (sum) of the probabilities, for any discrete
distribution, of all situations equals to 1.000
Confidential - Kenneth R. Martin
Discrete Probability Distribution
Theoretical Mean:
Confidential - Kenneth R. Martin
Discrete Probability Distribution
Theoretical Mean - Example:
Confidential - Kenneth R. Martin
Discrete Probability Distribution
Variance:
Confidential - Kenneth R. Martin
Discrete Probability Distribution
Variance - Example:
Confidential - Kenneth R. Martin
Discrete Probability Distributions
Binomial Distribution:
•
Used for discrete single point (Integer) probabilities.
•
A Binomial probability distribution occurs when
there’s a fixed number of “trials” or where there’s a
steady stream of items coming from a source.
•
Used for data with two outcomes, (pass / fail, head /
tail, etc.); the events are independent, and
probability of outcomes do not change.
•
Uses Combination and Simple Multiplication
Confidential - Kenneth R. Martin
Discrete Probability Distributions
Binomial Distribution:
n!
P(d ) 
p0d q0nd
d !(n  d )!
P(d) = Prob. of d nonconforming or target units in
sample size n
n = # units in sample
d = # nonconforming or target units in a sample
p0 = proportion nonconforming / targets in population (lot)
q0 = proportion conforming / not a target (1-p0) in
population (lot)
Confidential - Kenneth R. Martin
Discrete Probability Distributions
Binomial Distribution (example):
Confidential - Kenneth R. Martin
Discrete Probability Distributions
Binomial Distribution (example):
Confidential - Kenneth R. Martin
Discrete Probability Distributions
Binomial Distribution (example):
Confidential - Kenneth R. Martin
Discrete Probability Distributions
Binomial Distribution table:
Confidential - Kenneth R. Martin
Discrete Probability Distributions
Binomial Distribution – Mean / Var. & SD:
Confidential - Kenneth R. Martin
Discrete Probability Distributions
Binomial Distribution (example):
Confidential - Kenneth R. Martin
Discrete Probability Distributions
Hypergeometric Distribution:
•
Used for discrete single point (Integer) probabilities.
•
A Hypergeometric probability distribution occurs
when the population is finite, two outcomes are
possible, and the random sample is taken without
replacement (trials are not Independent).
•
Uses three Combinations and Simple Multiplication.
Confidential - Kenneth R. Martin
Discrete Probability Distributions
Hypergeometric Distribution:
D
d
N D
n d
N
n
C C
P( d ) 
C
P(d) = Prob. of d nonconforming / target units in
sample size n
N = # units in the lot (population)
n = # units in the sample
D = # nonconforming / target units in the lot
d = # nonconforming / target units in the sample
N-D = # conforming / not a target in the lot
n-d = # conforming / not a target in the sample
C = Combinations
Confidential - Kenneth R. Martin
Discrete Probability Distributions
Hypergeometric Distribution (example):
= 3 * 20
126
Confidential - Kenneth R. Martin
Discrete Probability Distributions
Hypergeometric Distribution (example):
Confidential - Kenneth R. Martin
Discrete Probability Distributions
Poisson Distribution:
•
Use for discrete single point (Integer) probabilities.
•
A Poisson probability distribution occurs when n is
large and p0 is small.
•
Used for applications of observations per time, or
observations per quantity.
Confidential - Kenneth R. Martin
Discrete Probability Distributions
Poisson Distribution:
X = occurrences of events occurring in a sample.
λ = average count of events occurring per unit.
e = 2.718281
Confidential - Kenneth R. Martin
Discrete Probability Distributions
Poisson Distribution (example):
Confidential - Kenneth R. Martin
Discrete Probability Distributions
Poisson Distribution
Table:
Confidential - Kenneth R. Martin
Discrete Probability Distributions
Poisson Distribution (alternate):
c
(np0 )  np0
P( c ) 
e
c!
C = count of events occurring in a sample, i.e. count of
non-conformities.
np0 = average count of events occurring in population.
e = constant = 2.718281
Confidential - Kenneth R. Martin
Discrete Probability Distributions
Poisson Distribution:
•
The Poisson distribution formula can be used
directly to find probability estimates, or Table C
can be used.
–
The table gives point values, and cumulative
(parenthesis from top - down)
•
Mean = np0
•
SD = (np0)1/2
Confidential - Kenneth R. Martin
Discrete Probability Distributions
Poisson Distribution (example):
Confidential - Kenneth R. Martin
Discrete Probability Distributions
Poisson Distribution Table:
Confidential - Kenneth R. Martin
Discrete Probability Distributions
Poisson Distribution Table:
Confidential - Kenneth R. Martin
Discrete Probability Distributions
Poisson Distribution Table:
Confidential - Kenneth R. Martin
Discrete Probability Distributions
Poisson Distribution Table:
Confidential - Kenneth R. Martin
Discrete Probability Distributions
Poisson Distribution Table:
Confidential - Kenneth R. Martin
Discrete Probability Distributions
Poisson Distribution (example):
Confidential - Kenneth R. Martin
Continuous vs. Discrete vs. Attribute Data
Continuous
infinite # of possible measurements in a continuum
Discrete:
Count
Discrete:
Ordinal
0
0
1
1
4
3
2
“low”/“small”/“short”
Discrete:
Nominal or Group A
Categorical
Attribute:
Binary
2
Group B
3
4
5
7
6
5
6
“medium” / “mid”
Group C
Group D
7
8
8
Group E
10
Group F
“good”/“go”/”group #2
defines TWO groups - no order
Confidential - Kenneth R. Martin
9
10
“high”/”large”/”tall”
defines several groups - no order
“bad”/“no-go”/”group #1”
9
Probability - Review
Theorem 5:
•
The total (sum) of the probabilities, for any discrete
distribution, of all situations equals to 1.000
Confidential - Kenneth R. Martin
Probability - Review
Definition, Theorem 5:
•
Correspondingly, the total area under a continuous
probability distribution (normal curve) is equal to
1.000 also. The tails of the curve never touch the xaxis. Thus, area can be used to estimate
probabilities.
Confidential - Kenneth R. Martin
Statistics
Histogram – by increasing data and thus bins, the
fitted line becomes smoother and more accurate
Confidential - Kenneth R. Martin
Statistics
Histogram – by increasing data and thus bins, the
fitted line becomes smoother and more accurate
Confidential - Kenneth R. Martin
Statistics
Histogram – by increasing data and thus bins, the
fitted line becomes smoother and more accurate
Confidential - Kenneth R. Martin
Statistics
By increasing data, you approach the population,
and get a smooth polygon.
Confidential - Kenneth R. Martin
Statistics
Area Under Curve
•
We can find the area under any curve by 2
methods.
1.
We can make a large quantity of really narrow
bins, find each individual bin area / rectangle
area (under the curve), and add them all up.
We can integrate under the curve, to find the
area bound by the curve and the X-axis.
2.
–
This method is simpler, and gives more accurate
results.
Confidential - Kenneth R. Martin
Statistics
Equation of a Normal Distribution
Y=
Confidential - Kenneth R. Martin
Statistics
Area Under Curve
•
We may wish to find the area under the curve
when, for example:
1.
We want to find the number of students whose final
semester grade falls between standard grade
lettering schemes, and we have a collection of
student scores.
2.
Or if we want to find the number of people who arrive
at a fast food restaurant chain after 11 am, and we
have the associated data.
Etc.
3.
Confidential - Kenneth R. Martin
Statistics
Continuous Probability Distribution (aka. CRV)
•
A function of a Continuous Random Variable that describes
the likelihood the variable occurs at a certain value within a
given set of points by the integral of its density (prob. density)
function (i.e. corresponding area under f(x) curve).
–
We shall calculate CRV over ranges
Confidential - Kenneth R. Martin
Statistics
Continuous Probability Distribution (aka. CRV)
•
So we are seeking to find the area under some curve, y=f(x),
bounded by the X-axis, between some values along the xaxis.
Confidential - Kenneth R. Martin
Statistics
Probability Density Function (cont. prob. dist.)
f(X) = PDF
f(X)
a
b
Confidential - Kenneth R. Martin
X
Statistics
Cumulative Density Function – Cross Section
f(X) = PDF
+∞
f(X)
∫f(X) dx = 1.0
-∞
• Sum under entire
curve = 1.0
X
Confidential - Kenneth R. Martin
Statistics
Probability Density Function (cont. prob. dist.)
f(X) = PDF
= p(x≤b) - p(x≤a)
= F(b) - F(a)
f(X)
= Entire area under
curve to section(b)
minus Entire area under
curve to section(a)
• Sum under entire
curve = 1.0
• Curve typically read
left to right
a
b
Confidential - Kenneth R. Martin
X
Statistics
Cumulative Density Function
f(X) = PDF
t
P(X<t)=∫f(X) dx = F(t)
-∞
f(X)
t
F(t)
X
Confidential - Kenneth R. Martin
Statistics
Cumulative Density Function
f(X) = PDF
F(t) + R(t) = 1.0
f(X)
R(t)
F(t)
t
Confidential - Kenneth R. Martin
X
Statistics
Normal Curve
•
AKA, Gaussian distribution of CRV.
•
Mean, Median, and Mode have the approx. same value.
–
Associated with mean () at center and dispersion ()
X  N(,) [when a random variable x is distributed normally]
–
–
•
Observations have equal likelihood on both sides of mean
*** When normally distributed, Mean is used to describe Central
Tendency
The graph of the associated probability density function
is called “Bell Shaped”
Confidential - Kenneth R. Martin
Statistics
Normal Curve
Developed from a frequency histogram, with  sample size,
 intervals (bin width), the associated curve becomes
smooth.
Typical of much data and distributions in reality.
The basis for most quality control techniques, formulas, and
assumptions.
However, different Normal Distributions (pdf’s) can have
varying means and SD’s.
•
•
•
•
–
The means and SD’s are independent (i.e. the mean does not effect
the SD, and vice versa)
Confidential - Kenneth R. Martin
Statistics
Various Normal Curves (Different means, common SD)
Confidential - Kenneth R. Martin
Statistics
Various Normal Curves (Different SD’s, common means)
Confidential - Kenneth R. Martin
Statistics
Various Normal Curves
Confidential - Kenneth R. Martin
Statistics
Standardized Normal Value
• There are an infinite combination of mean and SD’s for normal
curves.
– Thus, the shapes of any two normal curves will be different.
• To find the area under any normal curve, we can use the two
methods previously described (rectangles and integration).
– Or, we can use the Standard Normal Approach, thus using
tables to find the area under the curve, and thus
probabilities.
Standard Normal Distribution:
N (0,1)
Confidential - Kenneth R. Martin
Statistics
Standardized Normal Value
• Standard Normal Distribution has a Mean=0 and a SD=1
• Standard Normal Transformation (z-Transformation), converts
any normal distribution with any mean and any SD to a
Standard Normal Distribution with mean 0 and SD 1
• Standard Normal Distribution is distributed in “z-score” units,
along the associated x-axis. Z-score specifies the number of
SD units a value is above or below the mean (i.e. z = +1
indicates a value 1 SD above the mean).
• A formula is used to convert your mean and SD to a z-score.
Confidential - Kenneth R. Martin
Statistics
Normal Curve - Distribution of Data
Confidential - Kenneth R. Martin
Statistics
Standard Normal Curve - Distribution of Data (z-scores)
Confidential - Kenneth R. Martin
Statistics
Normal Curve - Distribution of Data
Confidential - Kenneth R. Martin
Statistics
Standard Normal Distribution (z-scores)
Confidential - Kenneth R. Martin
Statistics
Standardized Normal Value
Confidential - Kenneth R. Martin
Statistics
Normal distribution example
Confidential - Kenneth R. Martin
Statistics
Standard Normal Distribution example
Confidential - Kenneth R. Martin
Statistics
Standardized
Normal Table
Confidential - Kenneth R. Martin
Statistics
Standardized
Normal Table
Confidential - Kenneth R. Martin