Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ENGR 610 Applied Statistics Fall 2007 - Week 1 Marshall University CITE Jack Smith http://mupfc.marshall.edu/~smith1106 Overview for Today Syllabus Introductions Chapters 1-3 Introduction to Statistics and Quality Improvement Tables and Charts Describing and Summarizing Data Homework assignment Syllabus Week 1 (Aug 23) Introduction - Descriptive Statistics 1-3 Week 2 (Aug 30) Discrete Probability Distributions 4 Week 3 (Sept 6) Continuous Probability Distributions 5 Week 4 (Sept 13) Estimation Procedures 8 Week 5 (Sept 20) Review, Exam 1 Week 7 (Sept 27) Hypothesis Testing 9 Week 7 (Oct 4) Hypothesis Testing 9 Week 8 (Oct 11) Design of Experiments 10 Week 9 (Oct 18) Design of Experiments 11 Week 10 (Oct 25) Review, Exam 2 1-5, 8 9-11 Syllabus, cont’d Week 11 (Nov 1) Simple Linear Regression 12 Week 12 (Nov 8) Multiple Regression 13 Week 13 (Nov 15) More Regression 13 Fall Break (Nov 22) (no class) Week 14 (Nov 29) Review, Exam 3 Week 15 (Dec 6) (Exam 3 due) Text -- Levine, Ramsey, Smidt, “Applied Statistics for Engineers and Scientists: Using Microsoft Excel and MINITAB” (Prentice-Hall, 2001) - with CD-ROM 12-13 Grading 25% - Homework and attendance 25% - Exam 1 25% - Exam 2 25% - Exam 3 Introductions Name Home town Undergraduate degree, major, where Major focus of study at MU Occupation, if working Background in statistics Hopes for this course Introduction to Statistics (Ch 1) What is Statistics? Variables Operational Definitions Sampling Software What is Statistics? Descriptive Statistics Methods that lead to the collection, tabulation, summarization and presentation of data Inferential Statistics Methods that lead to conclusions, or estimates of parameters, about a population (of size N) based on summary measures (statistics) on a sample (of size n) - in lieu of a census Why Statistics? Describe numerical information Draw conclusions on a large population from sample information only Derive and test models Understand and control variation Improve quality of processes Design experiments to extract maximum information Predict or affect future behavior Variables Categorical Nominal Mutually exclusive Collectively exhaustive Numerical Discrete or Continuous Scale Ordered Interval - equally spaced Ratio - with absolute zero Operational Definitions Objective, not subjective Specific tests, measurements Specific criteria Agreed to by all Consistent between individuals Stable over time Sampling Advantages Cost, time, accuracy, feasibility, scope Minimize destructive tests Probability samples Simple random Systematic random With or without replacement Random start, but constant increment or rate Non-probability samples Convenience, Judgment, Quota (representative) Software Historical (mainframe, batch) Specialized (workstations, stand-alone) SAS, SPSS,… SAS, SPSS, MINITAB, S-PLUS (R*), BMDP,… Integrated (standard desktops) DataDesk, JMP, SYSTAT, MINITAB Excel, add-ons (e.g., PHStat - from Prentice-Hall) MATLAB (Octave*) *Open Source Introduction to Quality Improvement Quality = fitness of use Meeting user/customer needs, expectations, perceptions and experience Quality of… Design - intentional differences, grades Conformance - meets/exceeds design Performance - long-term consistency History of Quality Improvement Middle Ages > Industrial Revolution > Information Age Smith, Taylor, Ford, Shewhart, Deming Read text! Themes of Quality Improvement The primary focus is on process improvement Shewhart-Deming cycle: Plan, Do, Study, Act Most of the variation in a process is systemic and not due to the individual Teamwork is an integral part of a qualitymanagement organization Customer satisfaction - primary organizational goal Organizational transformation needs to occur to implement quality management Fear must be removed from organizations Higher quality costs less, not more, but it requires an investment in training Tables and Charts (Ch 2) Process Flow Diagrams Cause-and-Effect Diagrams Time-Order Plots Numerical Data Concentration Diagrams Categorical Data Bivariate Categorical Data Graphical Excellence Process Flow Diagrams Cause-and-Effect Diagrams Also known as an Ishikawa or a “fishbone” Diagram Procedures or methods People or personnel Effect Environment Materials or supplies Machinery or equipment Time-Order Plots Tables and Charts for Numerical Data Stem-and-Leaf Displays Frequency Distribution Poor man’s histogram “Binning” by range Histogram Polygon Concentration Diagrams Data points overlaid on schematic or picture of object or process of interest By location Displayed as individual symbols or tallies Tables and Charts for Categorical Data Bar Chart Pie Chart Almost always in percentages Pareto Diagram Sorted (usually descending) Overlaid with cumulative line (polygon) plot Separate scales Usually in percentages Examples Tables and Charts for Bivariate Categorical Data Contingency Table Cross-classification Joint responses Percentages by row, column, total A B C 1 2 3 5 3 2 10 2 3 4 9 0 2 3 5 7 8 9 24 Side-by-Side (Cluster) Bar Chart May prefer stacked bars with percentage data Graphical Excellence Tufte, “The Visual Display of Quantitative Information” Data-ink Ratio (data-ink)/(total ink used in graphic) Chartjunk Graphical excellence… gives the viewer the largest number of ideas, in the shortest time, with the least ink - clearly, precisely, efficiently, and truthfully Non-data or redundant “ink” Lie Factor (size of effect in graph)/(size of effect in data) Describing and Summarizing Data Descriptive Statistics (Ch 3) Measures of… Central Tendency Variation Shape Skewness Kurtosis Box-and-Whisker Plots Measures of Central Tendency Mean (arithmetic) Median Most popular (peak) value(s) - can be multi-modal Midrange Middle value - 50th percentile (2nd quartile) Mode Average value: 1 N Xi N i (Max+Min)/2 Midhinge (Q3+Q1)/2 - average of 1st and 3rd quartiles Measures of Variation Range (max-min) Inter-Quartile Range (Q3-Q1) Variance Sum of squares (SS) of the deviation from mean divided by the degrees of freedom (df) - see pp 113-5 df = N, for the whole population df = n-1, for a sample 2nd moment about the mean (dispersion) (1st moment about the mean is zero!) Standard Deviation Square root of variance (same units as variable) Sample (s2, s, n) vs Population (2, , N) Quantiles Equipartitions of ranked array of observations Percentiles - 100 Deciles - 10 Quartiles - 4 (25%, 50%, 75%) Median - 2 Pn = n(N+1)/100 -th ordered observation Dn = n(N+1)/10 Qn = n(N+1)/4 Median = (N+1)/2 = Q2 = D5 = P50 Measures of Shape Symmetry Skewness - extended tail in one direction 3rd moment about the mean Kurtosis Flatness, peakedness Leptokurtic - highly peaked, long tails Mesokurtic - “normal”, triangular, short tails Platykurtic - broad, even 4th moment about the mean See p 118. Box-and-Whisker Plots Graphical representation of five-number summary Min, Max (full range) Q1, Q3 (middle 50%) Median (50th %-ile) See pp 123-5 Shows symmetry (skewness) of distribution Homework Ch 1 Appendix 1.2 Problems: 1.25 Ch 2 Excel, Analysis ToolPak, PHStat add-in Appendix 2.1 Problems: 2.54, 2.55, 2.61 Ch 3 Appendix 3.1 Problems: 3.27, 3.31 (data on CD) Next Week Probability and Discrete Probability Distributions (Ch 4)