Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to Engineering Fall 2006 Lecture 13: Statistics 1 Review Introduction to Dimensions & Units Other Systems Dimensions in Equations 2 Review - Definitions Dimensions are properties that can be measured such as length, time, mass, temperature, or calculated by multiplying or dividing other dimensions, such as velocity (length/time) Units are means of expressing the dimensions such as feet or meter for length, hours/seconds for time. Every valid equation must be dimensionally homogeneous: that is, all additive terms on both sides of the equation must have the same unit 3 Review - Important??? 4 Outline Introduction to Statistics Describing Data Measures of Central Tendency Statistics in MatLab 5 Introduction to Statistics 6 Introduction 1 Statistics is the science of collecting, analyzing, and drawing conclusions from data Population The collection of all responses, measurements, or counts that are of interest. Sample A portion or subset of the population. 7 Introduction 2 Parameter: A number that describes a population characteristic. For example the average age of all people in the US Statistic: A number that describes a sample characteristic. For example, the average age of people from a sample of three states 8 Branches of Statistics Inferential Statistics Involves using sample data to draw conclusions about a population. Descriptive Statistics Involves organizing, summarizing, and displaying data displaying data. . 9 Use in Engineering Engineers are often asked to draw conclusions using uncertain, inconsistent or incomplete sets of data Statistics is also useful to describe and understand the variability in the data that could come from differences in process variables such as temperature or time. 10 Applications in Engineering Statistical signal processing Communications Systems and control Decision and resource allocation under uncertainty Reliability (dealing with noise, error control, failures) Thermodynamics 11 Describing Data 12 Example Data The amount of time (in seconds) that 25 jobs were in control of a large mainframe computer’s central processing unit (CPU) 0.02 0.75 1.17 1.61 2.59 0.15 0.82 1.23 1.94 3.07 0.19 0.92 1.38 2.01 3.35 0.47 0.96 1.40 2.16 3.76 0.71 1.16 1.59 2.41 4.75 We could describe this data using a graphical method or a numerical method Graphical Methods: histogram, stem and leaf, time series plot Numerical Methods: Central Tendency, Variation, Relative Standing 13 Histogram Steps for construction a histogram Calculate the range of the data: min(data) – max(data) Divide the range into classes (bins, intervals) of equal width if there are less than 25 observations use 5 or 6 classes If there are between 25 and 50 observations use 7 to 14 classes If there are more that 50 observations use 15 to 20 classes For each class, calculate the class frequency which is the number of observations in that class The histogram is a bar graph in which the categories are classes and the heights of the bars are determined by the class frequency 14 0.02 0.75 1.17 1.61 2.59 Example 0.15 0.82 1.23 1.94 3.07 0.19 0.92 1.38 2.01 3.35 0.47 0.96 1.40 2.16 3.76 0.71 1.16 1.59 2.41 4.75 The range of data is: 4.75 – 0.02 = 4.73 Divide the data into 7 intervals of .7 each beginning with 0.015 0.015 to 0.715: 0.715 to 1.415: 1.415 to 2.115: 2.115 to 2.815: 2.815 to 3.515: 3.515 to 4.215: 4.215 to 4.915: 5 9 4 3 2 1 1 9 8 7 6 5 4 3 2 1 0 0.0150.715 1.4152.115 2.8153.515 4.2154.915 15 Stem and Leaf Display Steps to construct a stem-and-leaf display Divide each observation into two parts: stem and leaf List the stems in order in a column Place the leaf for each observation in the appropriate stem row – arrange the leaves in each row in ascending order 16 Example Student scores on an exam: 12 15 20 27 31 36 37 44 46 48 49 50 51 55 Create the stems: 1 2 5 2 0 7 3 1 6 7 4 4 6 8 9 5 0 1 5 17 Time Series Plot Some data sets are a time series That’s is measurements taken at regular intervals over time These plots often reveal important features of the data set For example, the time series plot of the number of live births per 10,000 23-year-ol women in the US between 1917 and 1975: 18 Measures of Central Tendency 19 Central Tendency Measures of Central Tendency describe how numbers vary about a central point and how spread out they are Some are better descriptions than others These include Range Variance Mean Median Mode 20 Mean The mean (or average) is the simplest measure of central tendency to calculate Given a set of n measurements: 21 Median the median of a sample is defined as the value at which half of the measurements are lower and half are higher. A simple way to calculate median is to order all the measurements from lowest to highest. The number in the middle is the median if n is odd. If n is even, the median is the average of the middle two values. The median is sometimes more useful than the mean, particularly in cases where one or two values are significantly different than the rest of the values. 22 Example Given the following data: 3 5 12 17 18 22 25 26 30 31 The Mean is: 3 + 5 + 12 + 17 + 18 + 22 + 25 + 26 + 30 + 31 = 189 189 10 = 18.9 The Median is: (18 + 22)/2 = 20 Now add a new data element: 200 The new Mean is: 389 11 = 35.36 The new Median is: 22 23 Mode The mode of the sample is the most probable value of the n measurements, i.e., the one that occurs most frequently. Mode is not used as often as mean or median because it can be a misleading quantity, especially if the sample size is small and/or the distribution of measurements is not purely random. If none of the measurements are repeated, the mode is undefined. 24 Deviation The deviation of a measurement is defined as the difference between a particular measurement and the mean, i.e., for measurement i: di = xi - x When considering a group or sample of measurements, the deviation of one particular measurement is the same as the precision error or random error of that measurement. Deviation is not the same as accuracy error since accuracy error (inaccuracy) is defined as the difference between a particular measurement and the true value of the quantity being measured Because of bias (systematic) error, xtrue is often not even known, and the mean is not equal to xtrue if there are bias errors. 25 Average Deviation To get some feel for how much deviation is represented in the sample, we might first think of averaging all the deviations to obtain some kind of mean or average deviation. It turns out that the average of all the deviations is zero! Because by definition, some of the measurements are smaller than the average, and some are larger, and the average deviation turns out to be a meaningless and worthless calculation – it is always zero. a better kind of average is the average absolute deviation, defined as the average of the absolute value of each deviation. 26 Sample Standard Deviation an even better, and more accepted measure of how much deviation or scatter is in the data is obtained by calculating the sample standard deviation. (S n s = sqrt i=1 (xi – x)2 n ) 27 Observations S is kind of like an average of the deviations, but it is constructed by taking the square root of the average of the squared deviations Notice that the denominator is n – 1, not simply n. It turns out that for small sample size (n small), n - 1 yields a better estimate of the actual standard deviation than does n itself. As n gets big, the difference between using n or n – 1 in the denominator becomes negligible. The sample variance is the square of the sample standard deviation 28 Statistics in MatLab 29 MatLab Several of the most common statistical operators are directly available in MatLab M = mean(x) M = median(x) Y = std(x) Y = var(x) Calculates the sample average of a vector or the mean of each column of a matrix Calculates the median of a vector of the median of each column of a matrix 30 Sample Program 31 Sample Run 32 Possible Quiz Remember that even though each quiz is worth only 5 to 10 points, the points do add up to a significant contribution to your overall grade If there is a quiz it might cover these issues: What is a statistic? Define the mode. Why is the median sometimes a better measure than the mean? 33