Download 1 - TU Delft: CiTG

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
1. The range and estimate a reasonable number of intervals as well as class
limits.
For a small dataset of the size 7, a reasonable number of classes is between 3 and 7
classes (but not more than 7 classes).
2. Estimate the relative and absolute frequencies as well as both frequencies
for cumulative frequency distributions.
Not much can go wrong here.
3. Make a histogram and cumulative frequency distribution (figures).
Interpret this briefly (one sentence!)
The interpretation should be regarding the fact if the histogram looks symmetrical, or if it
is skewed to the left or to the right.
4. Calculate the median, mode, and arithmetic mean.
The mode is the value with largest frequency or the average value of measurements of the
class with the largest frequency. If several classes have the same maximum frequency,
then the averages of those classes form the modes of the dataset.
5. Calculate the variance, standard deviation and coefficient of variation.
Not much can go wrong here. You may use in the standard deviation the expression
where the division goes by N (for populations) or by N-1 (for samples of populations).
6. Calculate the skewness and kurtosis. Interpret the results briefly (one
sentence!)
There are several expressions available to calculate the skewness and kurtosis of a dataset, and
in this computer exercise, all possible expressions are marked OK.
The equation for skewness in Excel is defined as:
n
 xi  x 



(n  1)(n  2)  s 
3
The formula for skewness on slide 223 of the PPT presentation is different and given by:
a3 =
m3
3/2
m2
3=
2
n
a3
(n - 1)(n - 2)
However, in both cases the check should be made if the skewness is smaller or larger
than 0, in order to identify if the tail to the left is larger than the tail to the right, or vice
versa.
The equation for kurtosis in Excel is defined as:
n(n  1)
 xi  x 



(n  1)( n  2)( n  3)  s 
4

3(n  1) 2
(n  2)( n  3)
The formula for kurtosis on slide 226 of the PPT presentation is given by:
a4 =
m4
2
m2
4=
3
n
a4
(n - 1)(n - 2)(n - 3)
These two expressions give very different outcomes, because the Excel formula gives a
so-called excess kurtosis. If the excess kurtosis is larger than 0, then we have a dataset
which is steeper than a normal distribution, and smaller than 0 corresponds to a
distribution which is flatter than a normal distribution.
For the expressions on slide 226, a comparison of α4 with the value 3 should be made.
7. Calculate the parameters for a Normal-Distribution and plot that as graph
(figure). Is the Normal-Distribution suitable for that data set?
The parameters may be calculated by the sample average and the sample standard
deviation formula’s of questions 4 and 5. The suitability question should be judged on
normal probability paper, where the data is plotted in sorted order (ascending) on the
horizontal axis against i/(N+1) on the vertical axis for i=1,…, 7 (which means 12,5%,
25%, 37,5%, …, 87,5%).
If an acceptable straight line can be drawn through the dataset, it can be concluded that
the data can be modeled by a normal distribution.