Download Element 2: Descriptive Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Expectation–maximization algorithm wikipedia , lookup

Transcript
INA: Element2
Descriptive Statistics
Max Carroll
6/15/2012
...An explanation to the spreadsheet
Contents
Element 2: Descriptive Statistics ............................................................................................................. 2
Calculating the Mode and the Median ............................................................................................... 2
Producing a frequency table (with suitable class intervals) ............................................................... 3
Calculating the mean based on the frequency table and using Excel's own standard formulae ....... 4
Calculating the standard deviation based on the frequency table and using Excel's own standard
formulae.............................................................................................................................................. 4
~1~
INA: Element 2
Max Carroll 21010175
Element 2: Descriptive Statistics
Calculating the Mode and the Median
First of all the data was copied and pasted into an excel spreadsheet. The data was given the label
"UNITS", so that it could be referenced from formulas without typing in the cell range each time. The
label "UNITS" is effectively synonymous with "K4:O9" when using formulas and the two expressions
would be interchangeable, i.e. the cell range of K4:O9, contains all of the raw data: -
The picture above shows selected cells from K4 to O9 containing the data set from the case study. It also shows that they
have the label of "UNITS"
The pictures above show how formulas have been used to calculate the median and mode from the data set in K4:O9 "UNITS". The left shows the formulas and the right shows the calculated values
The next step was to create some formulas that would compute the Mode and Median. I calculated
a few other values that I thought may be useful, the mean based on the raw data (to compare the
mean calculated from the frequency table later) and the minimum and maximum value, in order to
better decide, the class interval width and the lower and upper class limits.
~2~
INA: Element 2
Max Carroll 21010175
Producing a frequency table (with suitable class intervals)
First of all the minimum and maximum values were calculated from the data (as show on previous
page). The bottom lower class limit was constructed rounding down the lowest value to the nearest
10 (23 to 20) and the top upper class limit was constructed by rounding up the maximum value to
the nearest 10 (77 to 80). It was decided that class widths of 5 would be used to create a table of 12
rows of data.
The picture above shows the frequency table values and the table below shows the formulae used in the same table
~3~
INA: Element 2
Max Carroll 21010175
Calculating the mean based on the frequency table and using Excel's own
standard formulae
The sum of columns (f) Frequency and (fx) were taken. Then the mean was calculated by dividing
them in the fashion: Σ(fx)/ Σ(f)
The image to the left shows the values used in the calculation, upper right shows the mean value and the lower right shows
the formula used in the cell to calculate the mean.
The Excel Formula =AVERAGE(UNITS) returned an answer of 45.633. Calculating the average using
the formula : Σ(fx)/ Σ(f), we got a value of 45.833. As we can see the answers are extremely close to
each other. The difference between them is negligible in comparison with the magnitude of the
numbers.
Calculating the standard deviation based on the frequency table and using
Excel's own standard formulae
Totals were calculated for the (f),(fx) and (fx^2) columns of the frequency table. The square was then
taken for the total of the (fx) column to give us the [Σ(fx)]2 value.
Above left shows calculated values and above right shows formulae used to calculate
The above values were then plugged into the equation below in order to obtain the answer.
However the manner in which the equation was utilized affected the final value.
s
( fx) 2
n

(n 1)
 f .x 2 
~4~
INA: Element 2
Max Carroll 21010175
The above image shows the calculated standard deviations and to the right shows the formulae used to calculate those
values
The first value (Cell D31) was calculated by soft coding the formula based on the calculated values of
the frequency table. However when calculated in a slightly different way (Cell D35) we can see there
is a slightly different result. Here is a breakdown of the differences: 


Cell D35 is square rooted in a second stage as opposed to everything being done in the same
stage
The cell reference to E19 (30) and E19-1 has been replaced with the numbers 30 and 29
respectively
Instead of using [Σ(fx)]2 value the Σ(fx) value is squared in the formula itself (i.e. F23 became
F19^2)
Although the above alterations either refer to or substitute for identical values, they seemed to
produce answers that were different. I assume this is because of the way excel decides to round
figures in formulas in different circumstances. However what I can conclude is that the method in
which the equation was square rooted in a different step gave a more closer answer to Excel's inhouse formula for calculating standard deviation.
~5~
INA: Element 2
Max Carroll 21010175