Download ENGR 610 Applied Statistics Fall 2005

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Corecursion wikipedia , lookup

Data analysis wikipedia , lookup

Pattern recognition wikipedia , lookup

Determination of the day of the week wikipedia , lookup

Data assimilation wikipedia , lookup

Transcript
ENGR 610
Applied Statistics
Fall 2007 - Week 1
Marshall University
CITE
Jack Smith
http://mupfc.marshall.edu/~smith1106
Overview for Today



Syllabus
Introductions
Chapters 1-3




Introduction to Statistics and Quality
Improvement
Tables and Charts
Describing and Summarizing Data
Homework assignment
Syllabus
Week 1 (Aug 23)
Introduction - Descriptive Statistics
1-3
Week 2 (Aug 30)
Discrete Probability Distributions
4
Week 3 (Sept 6)
Continuous Probability Distributions
5
Week 4 (Sept 13)
Estimation Procedures
8
Week 5 (Sept 20)
Review, Exam 1
Week 7 (Sept 27)
Hypothesis Testing
9
Week 7 (Oct 4)
Hypothesis Testing
9
Week 8 (Oct 11)
Design of Experiments
10
Week 9 (Oct 18)
Design of Experiments
11
Week 10 (Oct 25)
Review, Exam 2
1-5, 8
9-11
Syllabus, cont’d
Week 11 (Nov 1)
Simple Linear Regression
12
Week 12 (Nov 8)
Multiple Regression
13
Week 13 (Nov 15)
More Regression
13
Fall Break (Nov 22)
(no class)
Week 14 (Nov 29)
Review, Exam 3
Week 15 (Dec 6)
(Exam 3 due)
Text -- Levine, Ramsey, Smidt, “Applied Statistics for
Engineers and Scientists: Using Microsoft Excel and
MINITAB” (Prentice-Hall, 2001) - with CD-ROM
12-13
Grading




25% - Homework and attendance
25% - Exam 1
25% - Exam 2
25% - Exam 3
Introductions







Name
Home town
Undergraduate degree, major, where
Major focus of study at MU
Occupation, if working
Background in statistics
Hopes for this course
Introduction to Statistics (Ch 1)





What is Statistics?
Variables
Operational Definitions
Sampling
Software
What is Statistics?

Descriptive Statistics


Methods that lead to the collection, tabulation,
summarization and presentation of data
Inferential Statistics

Methods that lead to conclusions, or estimates of
parameters, about a population (of size N)
based on summary measures (statistics) on a
sample (of size n) - in lieu of a census
Why Statistics?







Describe numerical information
Draw conclusions on a large population from
sample information only
Derive and test models
Understand and control variation
Improve quality of processes
Design experiments to extract maximum
information
Predict or affect future behavior
Variables

Categorical

Nominal



Mutually exclusive
Collectively exhaustive
Numerical


Discrete or Continuous
Scale



Ordered
Interval - equally spaced
Ratio - with absolute zero
Operational Definitions






Objective, not subjective
Specific tests, measurements
Specific criteria
Agreed to by all
Consistent between individuals
Stable over time
Sampling

Advantages



Cost, time, accuracy, feasibility, scope
Minimize destructive tests
Probability samples

Simple random


Systematic random


With or without replacement
Random start, but constant increment or rate
Non-probability samples

Convenience, Judgment, Quota (representative)
Software

Historical (mainframe, batch)


Specialized (workstations, stand-alone)


SAS, SPSS,…
SAS, SPSS, MINITAB, S-PLUS (R*), BMDP,…
Integrated (standard desktops)



DataDesk, JMP, SYSTAT, MINITAB
Excel, add-ons (e.g., PHStat - from Prentice-Hall)
MATLAB (Octave*)
*Open Source
Introduction to
Quality Improvement

Quality = fitness of use


Meeting user/customer needs,
expectations, perceptions and experience
Quality of…



Design - intentional differences, grades
Conformance - meets/exceeds design
Performance - long-term consistency
History of
Quality Improvement
Middle Ages
> Industrial Revolution
> Information Age
Smith, Taylor, Ford, Shewhart, Deming
Read text!
Themes of
Quality Improvement

The primary focus is on process improvement







Shewhart-Deming cycle: Plan, Do, Study, Act
Most of the variation in a process is systemic and
not due to the individual
Teamwork is an integral part of a qualitymanagement organization
Customer satisfaction - primary organizational goal
Organizational transformation needs to occur to
implement quality management
Fear must be removed from organizations
Higher quality costs less, not more, but it requires
an investment in training
Tables and Charts (Ch 2)








Process Flow Diagrams
Cause-and-Effect Diagrams
Time-Order Plots
Numerical Data
Concentration Diagrams
Categorical Data
Bivariate Categorical Data
Graphical Excellence
Process Flow Diagrams
Cause-and-Effect Diagrams

Also known as an Ishikawa or a
“fishbone” Diagram
Procedures or
methods
People or
personnel
Effect
Environment
Materials or
supplies
Machinery or
equipment
Time-Order Plots
Tables and Charts for
Numerical Data

Stem-and-Leaf Displays


Frequency Distribution



Poor man’s histogram
“Binning” by range
Histogram
Polygon
Concentration Diagrams



Data points overlaid on schematic or
picture of object or process of interest
By location
Displayed as individual symbols or
tallies
Tables and Charts for
Categorical Data


Bar Chart
Pie Chart


Almost always in percentages
Pareto Diagram




Sorted (usually descending)
Overlaid with cumulative line (polygon) plot
Separate scales
Usually in percentages
Examples
Tables and Charts for
Bivariate Categorical Data

Contingency Table




Cross-classification
Joint responses
Percentages by row, column, total
A B C
1
2
3
5 3 2 10
2 3 4 9
0 2 3 5
7 8 9 24
Side-by-Side (Cluster) Bar Chart

May prefer stacked bars with percentage data
Graphical Excellence

Tufte, “The Visual Display of Quantitative Information”


Data-ink Ratio


(data-ink)/(total ink used in graphic)
Chartjunk


Graphical excellence… gives the viewer the largest number
of ideas, in the shortest time, with the least ink - clearly,
precisely, efficiently, and truthfully
Non-data or redundant “ink”
Lie Factor

(size of effect in graph)/(size of effect in data)
Describing and Summarizing Data Descriptive Statistics (Ch 3)

Measures of…



Central Tendency
Variation
Shape



Skewness
Kurtosis
Box-and-Whisker Plots
Measures of
Central Tendency

Mean (arithmetic)


Median



Most popular (peak) value(s) - can be multi-modal
Midrange


Middle value - 50th percentile (2nd quartile)
Mode


Average value:
1 N
Xi

N i
(Max+Min)/2
Midhinge

(Q3+Q1)/2 - average of 1st and 3rd quartiles
Measures of Variation



Range (max-min)
Inter-Quartile Range (Q3-Q1)
Variance

Sum of squares (SS) of the deviation from mean divided by
the degrees of freedom (df) - see pp 113-5



df = N, for the whole population
df = n-1, for a sample
2nd moment about the mean (dispersion)
(1st moment about the mean is zero!)

Standard Deviation


Square root of variance (same units as variable)
Sample (s2, s, n) vs Population (2, , N)
Quantiles

Equipartitions of ranked array of observations




Percentiles - 100
Deciles - 10
Quartiles - 4 (25%, 50%, 75%)
Median - 2
Pn = n(N+1)/100 -th ordered observation
Dn = n(N+1)/10
Qn = n(N+1)/4
Median = (N+1)/2 = Q2 = D5 = P50
Measures of Shape

Symmetry



Skewness - extended tail in one direction
3rd moment about the mean
Kurtosis

Flatness, peakedness




Leptokurtic - highly peaked, long tails
Mesokurtic - “normal”, triangular, short tails
Platykurtic - broad, even
4th moment about the mean
See p 118.
Box-and-Whisker Plots

Graphical representation of five-number summary



Min, Max (full range)
Q1, Q3 (middle 50%)
Median (50th %-ile)
See pp 123-5

Shows symmetry (skewness) of distribution
Homework

Ch 1

Appendix 1.2



Problems: 1.25
Ch 2



Excel, Analysis ToolPak, PHStat add-in
Appendix 2.1
Problems: 2.54, 2.55, 2.61
Ch 3


Appendix 3.1
Problems: 3.27, 3.31 (data on CD)
Next Week

Probability and
Discrete Probability Distributions
(Ch 4)