Download Lecture 7 10122016

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Time series wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
MAT 135
Introductory Statistics and Data Analysis
Adjunct Instructor
Kenneth R. Martin
Lecture 7
October 12, 2016
Confidential - Kenneth R. Martin
Agenda
• Housekeeping
– Readings
– Exam #1 review
• Chapter 1, 14, 10, 2, & 3
Confidential - Kenneth R. Martin
Housekeeping
•
•
•
•
•
Read, Chapter 1.1 – 1.4
Read, Chapter 14.1 – 14.2
Read, Chapter 10.1
Read, Chapter 2
Read, Chapter 3
Confidential - Kenneth R. Martin
Housekeeping
• Exam #1 Review
Confidential - Kenneth R. Martin
Statistics – Application to Research
Confidential - Kenneth R. Martin
Statistics
• Why collect samples ?
Population and Sample
POPULATION
SAMPLE
Sampling
Scheme
Measure
Use data from the
SAMPLE to make
conclusions about the
POPULATION
Data!
 Often impractical to collect all
the data from the entire
population (i.e. U.S. census).
 Some test methods are
destructive – we wouldn’t have
any products or services left to
ship to a customer!
 Too expensive to sample the
entire population.
 Don’t have to collect 100% of
the population ! We can use
inferential statistics to make
sound conclusions about the
population.
Confidential - Kenneth R. Martin
Statistics
Describing the Data
•
Two methods to summarize the data:
–
–
Graphical - Histogram
Analytical - Central Tendency
Confidential - Kenneth R. Martin
Statistics
Central Tendency
•
A statistical measure which describes how the
data is distributed around its central value: which
includes the Mean, Median, and Mode.
–
However, Central Tendency does not tell about
data Variation / spread.
Confidential - Kenneth R. Martin
Statistics
Relationship of Central Tendency
*** Normal distribution: Mean = Median = Mode
Confidential - Kenneth R. Martin
Statistics
Frequency Distributions
Confidential - Kenneth R. Martin
Statistics
Various curves (Different data spreads, common means)
Confidential - Kenneth R. Martin
Statistics
Various curves (Different means, common data spreads)
Confidential - Kenneth R. Martin
Statistics
Various Normal Curves
Confidential - Kenneth R. Martin
Statistics
Measures of Variability - how the data is spread
from it’s central value
•
The central tendency does not indicate any levels of
variability (dispersion) from the mean.
A = {100, 200, 300, 400, 500}
B = {50, 150, 300, 450, 550}
C = {250, 300, 300, 300, 350}
•
The mean & median of this data are all the same,
but the variability of data is different in all data sets.
Confidential - Kenneth R. Martin
Statistics
Measures of Variability:
Can be values from 0 to ∞ (infinity)
–
•
•
0 means no variability of data
A large value indicates lots of variability of data
–
Values can never be negative
–
As soon as one value in a data set differs from
another, variability exists
Confidential - Kenneth R. Martin
Statistics
Measures of Variability (Dispersion) - Range
Range (R) = Max. value – Min. value
=XH–XL
 As data set size , the accuracy of using range .
 Limit the usage of Range to ~ 10 readings.
Confidential - Kenneth R. Martin
Statistics
Measures of Variability – Range Example
A = {100, 200, 300, 400, 500}
B = {50, 150, 300, 450, 550}
C = {250, 300, 300, 300, 350}
RA = ?
RB = ?
RC = ?
Confidential - Kenneth R. Martin
Statistics
Measures of Variability
• So what is the limitation of all three of these Range
calculations ?
Confidential - Kenneth R. Martin
Statistics
Measures of Variability (Dispersion) - Variance
• Variance: a measure of the variability of the average
squared distance that data points deviate from their
mean.
• Variance calculations include all data points.
Confidential - Kenneth R. Martin
Statistics
Measures of Variability (Dispersion) - Variance
• Sum of Squares (SS): the sum (addition) of the
squared deviations of values from their mean. The
SS is the numerator of the variance formula.
Variance, 2 , for the
Population. μ is the
population average
Variance, S2, for a
Sample. M is the
sample average.
Confidential - Kenneth R. Martin
Statistics
Variance - Example
A = {100, 200, 300, 400, 500}
• In this case, notice that the SS of
both the population and the sample
will be the same
• Remember: PREMDAS
• What is 2 ?
• What is S2 ?
Confidential - Kenneth R. Martin
Statistics
Measures of Variability (Dispersion) - Variance
• What is a big limitation with Variance ?
– What do you notice about the units of the mean, and the
units of Variance ?
Confidential - Kenneth R. Martin
Statistics
Measures of Dispersion – Standard Deviation
•
Also called the Root Mean Square deviation, it is a
measure of the spread of the variability of the data;
the average distance data deviate from their mean.
• Calculated by taking the
square root of the Variance
Confidential - Kenneth R. Martin
Statistics
Measures of Dispersion – Standard Deviation
•
When the data comes from the “population”, we shall
use “” (sigma) to denote the Standard Deviation.
•
•
•
The mean value will be represented by the Greek symbol  (mu)
The denominator does not have “uncertainty”, thus N
When the data comes from a “sample”, we shall use
“SD” to denote the Standard Deviation.
•
•
The mean value will be represented by M or X ( X-bar)
The denominator shows “uncertainty”, thus n-1
Confidential - Kenneth R. Martin
Statistics
Measures of Dispersion – Standard Deviation
•
We typically always want the standard deviation
(variance) value to be as small as possible.
–
We typically want to minimize variability !
 Standard deviation is always a better measure to
precisely describe the data distribution versus range.
•
Other formulas exist for Standard Deviation, but will
not be covered.
Confidential - Kenneth R. Martin
Statistics
Standard Deviation - Example
A = {100, 200, 300, 400, 500}
• What do we notice about the units
of Standard Deviation and the units
of the mean ?
•  The Mean and Standard
Deviation are typically reported
together.
Confidential - Kenneth R. Martin
Statistics
Standard Deviation - Example
B = {50, 150, 300, 450, 550}
Confidential - Kenneth R. Martin
Statistics
Measures of Dispersion – Coefficient of Variation
•
CVar – Allows a comparison of standard deviations
when the units of measure are not the same
Confidential - Kenneth R. Martin
Statistics
Coefficient of Variation - Example
Confidential - Kenneth R. Martin
Statistics
Box and Whisker Plot – Boxplot
•
•
Simple graphical tool to summarize data.
Need to determine 5 values (five-number summary)
from data, to generate a boxplot:
1.
2.
3.
4.
5.
Median (2nd Quartile)
Maximum data value
Minimum data value
1st Quartile (values below 1/4 observations)[whisker end]
3rd Quartile (values below 3/4 observations)[whisker end]
Confidential - Kenneth R. Martin
Statistics
Box and Whisker Plot – Boxplot Example
• Process aim = 9.0 minutes
• Spec = + / - 1.5 minutes
• n = 125
• R = 1.7
Confidential - Kenneth R. Martin
Statistics
Box and Whisker Plot - Boxplot Example
•
•
Inside box is the median value, and approximately
50% of observations
Whiskers extend from the box to extreme values
•
Example:
1.
2.
3.
4.
5.
Median; n=125: Median = 63rd value = 9.8
Max = 10.7
Min = 9.0
1st Quartile = X 125 * 0.25 ~ X Avg 31 & 32 value = 9.6
3rd Quartile = X 125 * 0.75 ~ X Avg 94 & 95 value = 10.0
Confidential - Kenneth R. Martin
Statistics
Box and Whisker Plot - Boxplot Example
9.0
Q1
9.6
9.8
Q2
Q3
10.7
10.0
•
Long Whiskers denote the existence of values much
larger than other values.
•
•
For this example, mean  median.
Other variants exist, i.e. + / - 1.5*IQR [whisker ends],
all other points are “outliers” as depicted as asterisks
•
IQR = Inner Quartile Range
Confidential - Kenneth R. Martin
Statistics
Box and Whisker Plot - Boxplot Example
Confidential - Kenneth R. Martin
Statistics
Measures of Variability (Dispersion) - IQR
• IQR – Interquartile Range
IQR = Q3 – Q1
Confidential - Kenneth R. Martin
Statistics
Box and Whisker Plot - Boxplot Example
9.0
Q1
9.6
•
9.8
Q2
Q3
10.0
For this example,
IQR = ?
Confidential - Kenneth R. Martin
10.7