Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Exercise I. The average values, variability and distribution of elements in a sample The course aims to quantify the traits of a statistical sample. This assessment will be carried out using the MS Excel spreadsheet, statistical functions included in the package, MS Excel and SPSS statistical package. Consider the following example: Here's a random sample of 10 elements: 1, 3, 1, 3, 1, 4, 3, 3, 4, 3. Analysis starts by ordering these observations. We get 1, 1, 1, 3, 3, 3, 3, 3, 4, 4. This allows evaluation of the extreme values: 1 (minimum) and 4 (maximum value). The rest we will calculate using the table below (the first two columns - blacked out - provide the source data): Observations Frequency Cumulative xi ni ( xi x)2 (xi x)2 ni xi ni frequency 1 3 0.3 3 2.56 7.68 3 5 0.8 15 0.16 0.80 4 2 1 8 1.96 3.92 ---------26 ---12.40 The sum of the products xi ni allows you to calculate the average value: x n 26 x i i 2.6 . The average value can then be used (the last two columns of that table) n 10 12.40 1.24 and to calculate the variance. By definition, the biased sample variance is 2 10 12.40 1.37777 . Accordingly, we calculate the standard the unbiased sample variance 02 9 deviations: biased 2 1.24 1.113553 or unbiased 0 02 1.377778 1.173787 standard deviation Similar results were obtained using a spreadsheet via the "insert function". In the "Insert" menu select "insert function", then the category of "statistics" and look in the book features for the "average". We get a screen like the one below: And the result of the calculation: A similar procedure calculates the value of the variance: and the standard deviation: Note that in both cases the values calculated using the spreadsheet are the unbiased values. Among the measures of centrality, it is also useful to know the median and modal values. The median is the middle value of the ordered sample (or the average of the two middle values if the number of data is even) and, therefore, for a sample of 1, 1, 1, 3, 3, 3, 3, 3, 4, 4, equals 3. The modal value is the value that occurs most often in the sample. Thus, for the sample 1, 1, 1, 3, 3, 3, 3, 3, 4, 4 it takes the value 3. Both values can be obtained as above using the "insert function" in the Excel spreadsheet (as well as with the rest in other spreadsheets). All these descriptive characteristics of the sample can be obtained simultaneously using the option Tools / data analysis / descriptive statistics To this end, the data analysis option must be activated by marking the appropriate option as shown below Using "descriptive statistics", we obtain for the sample 1, 1, 1, 3, 3, 3, 3, 3, 4, 4 the following results Column1 Mean 2,6 Standard Error 0,371184 Median 3 Mode 3 Standard Deviation 1,173788 Sample Variance 1,377778 Kurtosis -1,18069 Skewness -0,55651 Range 3 Minimum 1 Maximum 4 Sum 26 Count 10 Largest(1) 4 Smallest(1) 1 Note that on the basis of these data we can also give the coefficient of variability. For our sample, the coefficient of variation is 45.14%, and therefore we are dealing with a sample which should not be regarded as quasi-constant (immutable), since the heuristic criterion for quasi-constancy applies to those samples for which the coefficient of variation is less than 10 %. A similar procedure can be followed using the SPSS statistical package (in this case, version 17.0). For the sample 1, 1, 1, 3, 3, 3, 3, 3, 4, 4, we obtain We now come to the sample characteristic that is the histogram. By running the option "Data Analysis / Histogram", we obtain and Bin 1 2 3 More together with the graph Frequency Cumulative % 3 30,00% 0 30,00% 5 80,00% 2 100,00% Frequency Histogram 10 200,00% 5 100,00% 0 0,00% 1 2 3 More Frequency Cumulative % Bin Similarly, using the SPSS statistical package, we have: Another standard presentation of the characteristics of position can be obtained by using the so-called, boxplot. Additional problems. 1) Let X be an attribute of a population with a Poisson distribution function with parameter lambda =3. The following N=100 element sample was taken: 5 5 2 1 3 2 2 2 3 1 1 2 4 3 2 1 1 3 5 4 5 0 7 0 3 2 3 0 2 3 1 2 5 0 1 3 3 3 3 3 2 2 5 3 3 5 1 3 2 3 1 3 5 3 4 4 1 0 1 3 3 2 1 0 3 2 7 7 2 5 3 2 2 5 2 3 4 2 2 3 1 6 5 4 2 2 2 0 1 3 5 3 2 2 5 2 3 5 5 5 a) Draw a histogram in two ways: i) the number of classes k equals the integer part of (1 + 3.322logN) and ii) the number of classes k N . b) Compare your results with the theoretical distribution. 2) The electrical capacity of titanium plates was measured (in pF 103) and the following results were obtained: 11.0, 9.2, 9.9, 12.0, 8.0, 8.7, 7.1, 11.8, 11.7, 10.3 11.2, 8.1, 9.5, 11.5, 11.6, 9.7, 10.2, 11.4, 8.6, 10.0 a) Calculate the sample mean and standard deviation. b) Draw a histogram 3) A researcher recorded the number of tiny colloids of gold observed under a microscope in various time periods of equal length. The results are presented below, where nj stands for the number of periods in which j gold particles were observed. j nj 0 100 1 167 2 120 3 64 a) Calculate the sample mean and variance. b) Draw a histogram 4 28 5 5 6 1 7 1 c) Compare the empirical (sample) distribution pˆ j nj n with the theoretical Poisson distribution with parameter lambda=1.50. 4) Observations from a university book-shop show the following expenditure of N=50 students (in zlotys) 140 150 196 166 55 167 181 218 200 155 210 214 220 221 183 236 215 43 145 178 148 156 249 164 287 191 195 165 221 278 26 210 188 161 224 214 238 199 87 52 211 156 218 111 61 236 195 250 92 84 a) Calculate the sample mean and dispersion standard deviation when the sample elements are pre-grouped and when they are not grouped. b) Draw a histogram c) Calculate the median, mode, kurtosis and excess. d) Discuss the symmetry and “flatness” of the empirical distribution. 5) The life-time of electric lamps T (in hours) was investigated. The N=200 observations are presented below: Number of Limits observations within limits 0 – 300 52 300 – 600 41 600 – 900 30 900 – 1200 22 1200 – 1500 17 1500 – 1800 11 1800 – 2100 9 2100 – 2400 4 2400 – 2700 6 2700 – 3000 3 3000 – 3300 2 > 3300 1 a) Draw the histogram b) Compare this histogram with the density function of the exponential distribution with parameter lambda 0.0011 . c) Compare the theoretical and sample frequencies for each group of observations.