Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 6. Descriptive Statistics 6.1 6.2 6.3 6.4 NIPRL Experimentation Data Presentation Sample Statistics Examples • Data: a mixture of nature and noise. • Is the noise manageable? The noise is desired to be represented by a probability distribution. • Statistical inference: – The science of deducing properties of an underlying probability distribution from data • Can we have information on the underlying probability distribution? The information is given in the form of (functions of) data. NIPRL Figure 6.1 The relationship between probability theory and statistical inference NIPRL 6.1 Experimentation 6.1.1 Samples • Population: the set of all the possible observations available from a particular probability distribution. • Sample: a subset of a population. • Random sample: a sample where the elements are chosen at random from the population • A sample is desired to be representative of the population. • Types of observations: numerical and nominal x NIPRL 6.1.2 Examples • Example 1: Machine breakdowns Suppose that an engineer in charge of the maintenance of a machine keeps records on the breakdown causes over a period of a year. Suppose that 46 breakdowns were observed by the engineer (see Figure 6.2). What is the population from which this sample is drawn? Factors to consider to check the representative of data: Quality of operators Working load on the machine Particularity of data observation (e.g., more rainy days than other years) NIPRL Figure 6.2 NIPRL Data set of machine breakdowns • Example 2: Defective computer chips The chip boxes are selected at random from ….. • Points to check on data: What is the data type? Are the data representative? How the randomness of data realized? • Statistical problem: What is the population from which the data are sampled? NIPRL Figure 6.4 Data set of defective computer chips NIPRL 6.2 Data presentation 6.2.1 Bar and Pareto charts 6.2.2 Pie charts 6.2.3 Histograms 6.2.4 Outliers An outlier is an observation which is not from the distribution from which the main body of the sample is collected. NIPRL Figure 6.7 Bar chart of machine breakdowns data set NIPRL Figure 6.9 Pareto chart of customer complaints for Internet company NIPRL Figure 6.12 Pie chart for machine breakdowns data set NIPRL Figure 6.14 Histogram of computer chips data set NIPRL Figure 6.16 Histograms of metal cylinder diameter data set with NIPRL different bandwidths Figure 6.18 A histogram with positive skewness NIPRL Figure 6.19 A histogram with negative skewness NIPRL Figure 6.21 Histogram of a data set with a possible outlier NIPRL 6.3 Sample statistics 6.3.1 Sample mean 6.3.2 Sample median 6.3.3 Sample trimmed mean 6.3.4 Sample mode 6.3.5 Sample variance 6.3.6 pth Sample quantiles 6.3.7 Boxplots NIPRL Cf. Chebyshev’s inequality: 2 Let E[ X ] \ and \ Var ( X ) . Then, P{| X | c } 1 1/ c 2 \ or P{| X | c } 1/ c 2 . In general, P{| X | } 2 / 2 . Cf. Theorem: the weak law of large numbers Let X i , i 1, , n be a sequence of i.i.d. random variables, each having mean and variance 2 . Then, for any 0, lim P{| X | } 0 n 1 n where \ X X i n i 1 NIPRL (proof) E[ X ] \ and \ Var ( X ) 2 / n. It \ follows \ from \ Chebyshev ' s \ inequality \ that 2 P{| X | } 2 . n Therefore, lim P{| X | } 0. n NIPRL Figure 6.22 NIPRL Illustrative data set Figure 6.23 Relationship between the sample mean, median, and trimmed mean for positively and negatively skewed data sets NIPRL Figure 6.20 A histogram for a bimodal distribution NIPRL Figure 6.24 NIPRL Boxplot of a data set Figure 6.30 NIPRL Rolling mill process Figure 6.31 NIPRL % scrap data set from rolling mill process Figure 6.32 Histogram of rolling mill scrap data set NIPRL Figure 6.33 NIPRL Boxplot and summary statistics for rolling mill scrap data set