Download Exercise I

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Exercise I. The average values, variability and distribution of elements in a sample
The course aims to quantify the traits of a statistical sample. This assessment will be carried
out using the MS Excel spreadsheet, statistical functions included in the package, MS Excel
and SPSS statistical package.
Consider the following example:
Here's a random sample of 10 elements: 1, 3, 1, 3, 1, 4, 3, 3, 4, 3. Analysis starts by ordering
these observations. We get 1, 1, 1, 3, 3, 3, 3, 3, 4, 4. This allows evaluation of the extreme
values: 1 (minimum) and 4 (maximum value). The rest we will calculate using the table below
(the first two columns - blacked out - provide the source data):
Observations
Frequency
Cumulative
xi  ni
( xi  x)2
(xi x)2 ni
xi
ni
frequency
1
3
0.3
3
2.56
7.68
3
5
0.8
15
0.16
0.80
4
2
1
8
1.96
3.92
---------26
---12.40
The sum of the products xi  ni allows you to calculate the average value:
x  n 26
x i i 
 2.6 . The average value can then be used (the last two columns of that table)
n
10
12.40
 1.24 and
to calculate the variance. By definition, the biased sample variance is  2 
10
12.40
 1.37777 . Accordingly, we calculate the standard
the unbiased sample variance  02 
9
deviations: biased    2  1.24  1.113553 or
unbiased  0   02  1.377778  1.173787 standard deviation
Similar results were obtained using a spreadsheet via the "insert function". In the "Insert"
menu select "insert function", then the category of "statistics" and look in the book features
for the "average". We get a screen like the one below:
And the result of the calculation:
A similar procedure calculates the value of the variance:
and the standard deviation:
Note that in both cases the values calculated using the spreadsheet are the unbiased values.
Among the measures of centrality, it is also useful to know the median and modal values. The
median is the middle value of the ordered sample (or the average of the two middle values if
the number of data is even) and, therefore, for a sample of 1, 1, 1, 3, 3, 3, 3, 3, 4, 4, equals
3. The modal value is the value that occurs most often in the sample. Thus, for the sample 1,
1, 1, 3, 3, 3, 3, 3, 4, 4 it takes the value 3.
Both values can be obtained as above using the "insert function" in the Excel spreadsheet (as
well as with the rest in other spreadsheets).
All these descriptive characteristics of the sample can be obtained simultaneously using the
option Tools / data analysis / descriptive statistics
To this end, the data analysis option must be activated by marking the appropriate option as
shown below
Using "descriptive statistics",
we obtain for the sample 1, 1, 1, 3, 3, 3, 3, 3, 4, 4 the following results
Column1
Mean
2,6
Standard Error
0,371184
Median
3
Mode
3
Standard Deviation 1,173788
Sample Variance
1,377778
Kurtosis
-1,18069
Skewness
-0,55651
Range
3
Minimum
1
Maximum
4
Sum
26
Count
10
Largest(1)
4
Smallest(1)
1
Note that on the basis of these data we can also give the coefficient of variability. For our
sample, the coefficient of variation is 45.14%, and therefore we are dealing with a sample
which should not be regarded as quasi-constant (immutable), since the heuristic criterion for
quasi-constancy applies to those samples for which the coefficient of variation is less than
10 %.
A similar procedure can be followed using the SPSS statistical package (in this case, version
17.0). For the sample 1, 1, 1, 3, 3, 3, 3, 3, 4, 4, we obtain
We now come to the sample characteristic that is the histogram. By running the option "Data
Analysis / Histogram",
we obtain
and
Bin
1
2
3
More
together with the graph
Frequency Cumulative %
3
30,00%
0
30,00%
5
80,00%
2
100,00%
Frequency
Histogram
10
200,00%
5
100,00%
0
0,00%
1
2
3 More
Frequency
Cumulative %
Bin
Similarly, using the SPSS statistical package, we have:
Another standard presentation of the characteristics of position can be obtained by using the
so-called, boxplot.
Additional problems.
1) Let X be an attribute of a population with a Poisson distribution function with
parameter lambda  =3. The following N=100 element sample was taken:
5
5
2
1
3
2
2
2
3
1
1
2
4
3
2
1
1
3
5
4
5
0
7
0
3
2
3
0
2
3
1
2
5
0
1
3
3
3
3
3
2
2
5
3
3
5
1
3
2
3
1
3
5
3
4
4
1
0
1
3
3
2
1
0
3
2
7
7
2
5
3
2
2
5
2
3
4
2
2
3
1
6
5
4
2
2
2
0
1
3
5
3
2
2
5
2
3
5
5
5
a)
Draw a histogram in two ways: i) the number of classes k equals the
integer part of (1 + 3.322logN) and ii) the number of classes k  N .
b)
Compare your results with the theoretical distribution.
2) The electrical capacity of titanium plates was measured (in pF 103) and the following
results were obtained:
11.0, 9.2, 9.9, 12.0, 8.0, 8.7, 7.1, 11.8, 11.7, 10.3
11.2, 8.1, 9.5, 11.5, 11.6, 9.7, 10.2, 11.4, 8.6, 10.0
a) Calculate the sample mean and standard deviation.
b) Draw a histogram
3) A researcher recorded the number of tiny colloids of gold observed under a
microscope in various time periods of equal length. The results are presented below,
where nj stands for the number of periods in which j gold particles were observed.
j
nj
0
100
1
167
2
120
3
64
a) Calculate the sample mean and variance.
b) Draw a histogram
4
28
5
5
6
1
7
1
c) Compare the empirical (sample) distribution pˆ j 
nj
n
with the theoretical
Poisson distribution with parameter lambda=1.50.
4) Observations from a university book-shop show the following expenditure of N=50
students (in zlotys)
140
150
196
166
55
167
181
218
200
155
210
214
220
221
183
236
215
43
145
178
148
156
249
164
287
191
195
165
221
278
26
210
188
161
224
214
238
199
87
52
211
156
218
111
61
236
195
250
92
84
a) Calculate the sample mean and dispersion standard deviation when the sample
elements are pre-grouped and when they are not grouped.
b) Draw a histogram
c) Calculate the median, mode, kurtosis and excess.
d) Discuss the symmetry and “flatness” of the empirical distribution.
5) The life-time of electric lamps T (in hours) was investigated. The N=200 observations
are presented below:
Number of
Limits
observations
within limits
0 – 300
52
300 – 600
41
600 – 900
30
900 – 1200
22
1200 – 1500
17
1500 – 1800
11
1800 – 2100
9
2100 – 2400
4
2400 – 2700
6
2700 – 3000
3
3000 – 3300
2
> 3300
1
a) Draw the histogram
b) Compare this histogram with the density function of the exponential
distribution with parameter lambda   0.0011 .
c) Compare the theoretical and sample frequencies for each group of
observations.
Related documents