Download Measures of Central Tendency Your specific assignment is to select

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Time series wikipedia , lookup

Transcript
Measures of Central Tendency
Your specific assignment is to select a type of quantitative data to collect from your own life. Some
examples could be: How many minutes do you spend on studying every day. How many hours do you
spend at work every day? Number of phone calls you get each day. In a brief paper, describe the data
you are going to collect. Start collecting data today so you have can have at least 10 observations,
preferably more. Note: you only have to choose one variable, and then collect 10 days worth of data on
that one variable. Using the data you have collected in above: Calculate the mean, median, and mode.
Are these numbers higher or lower than you would have expected? Which of these measures of central
tendency do you think most accurately describes the lifestyle variable you are looking at? Write a brief
paper with your calculations and the answers to these questions.
Plan for data collection
I am going to collect information on the number of emails received in my inbox each day. My
habit is to open the inbox at exactly 7 am on each day. Then I open all unread emails and read them and
send a reply if necessary. For data collection purpose, I have decided to note down the number of such
emails found unread in the inbox at exactly 7 am each day.
Collected data
I have collected the data for 13 days. The following table contains the number of emails received
in my inbox each day for 13 days.
Days
1 2 3 4 5 6 7 8 9 10 11 12 13
Number of emails received 17 12 10 17 17 18 11 14 13 16 7 12 8
Statistical Analysis
After collecting the data, the next step is to compute certain numbers which are used to
summarize the data. They are called measures of central tendency. The mean, the median and the mode
are three among the most important measures of central tendency.
Computation of the mean
The mean is defined as the sum of the observations divided by the number of observations.
βˆ‘π‘₯
Mean =
𝑛
, where 𝑛 is the number of observations and βˆ‘ π‘₯ is the sum of the observations.
Number of observations = n = 13
Sum of the observations = βˆ‘ π‘₯ = 17+ 12 + 10 + 17 + 18 + 11 + 14 + 13 + 16 + 7 + 12 + 8 = 172
Mean =
βˆ‘π‘₯
𝑛
=
172
13
= 13.23 (correct to 2 places of decimal)
Computation of the median
The median is defined as the middle most observation when the observations are arranged in the order
of magnitude.
The following are the observations arranged in the increasing order of magnitude.
Order of Observations 1 2 3 4 5 6 7 8 9 10 11 12 13
Value of observations 7 8 10 11 12 12 13 14 16 17 17 17 18
Since there are 13 observations, the middlemost observation is the (13+1)/2 = 7th observation.
Therefore, Median = 7th observation = 13
Computation of the mode
The mode is defined as the most frequently occurring observation. It is observation with maximum
frequency. The following table gives the frequencies of the observations.
Observations 7 8 10 11 12 13 14 16 17 18
Frequency
1 1 1 1 2 1 1 1 3 1
The observation with maximum frequency is 17.
Mode = 17.
Comparison of the three measures
Mean = 13.23
Median = 13
Mode = 17
Here the mean and median appears to be satisfactory as a measure of central tendency. But the mode is
greater than what is expected as a central value.
Both mean and median describes the life style somewhat accurately. But the mean has the disadvantage
that it is not a whole number, even though all observations are whole numbers.
The mode has the advantage that it is most repeating item. Out of 13 observations, the observation 17
repeats 3 times.
Collection of additional data
The process of collecting data is continued for 5 more days. The following table contains the number of
emails received in my inbox each day for 18 days.
Days
1
17
Number of emails received
2
12
3
10
4
17
5
17
6
18
7
11
8
14
9
13
10
16
11
7
12
12
13
8
14
9
15
12
16
17
17
18
18
14
The important measures of central tendency are computed for this larger sample.
Computation of the mean
The mean is defined as the sum of the observations divided by the number of observations.
βˆ‘π‘₯
Mean =
𝑛
, where 𝑛 is the number of observations and βˆ‘ π‘₯ is the sum of the observations.
Number of observations = n = 18
Sum of the observations = βˆ‘ π‘₯ = 17+ 12 + 10 + 17 + 18 + 11 + 14 + 13 + 16 + 7 + 12 + 8 + 9 + 12 + 17 +1 8
+ 14 = 242
Mean =
βˆ‘π‘₯
𝑛
=
242
18
= 13.44 (correct to 2 places of decimal)
Computation of the median
The median is defined as the middle most observation when the observations are arranged in the order
of magnitude.
The following are the observations arranged in the increasing order of magnitude.
Days
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Number of emails received
7
8
9
10
11
12
12
12
13
14
14
16
17
17
17
17
18
18
Since there are 18 observations, the middlemost observation is the (18+1)/2 = 9.5th observation.
Therefore, Median = (9th Obs + 10th Obs)/2 = (13 + 14)/2 = 13.5
Computation of the mode
The mode is defined as the most frequently occurring observation. It is observation with maximum
frequency. The following table gives the frequencies of the observations.
Observations 7 8 9 10 11 12 13 14 16 17 18
Frequency
1 1 1 1 1 2 1 2 1 4 2
The observation with maximum frequency is 17.
Mode = 17.
Comparison of the three measures
Mean = 13.44
Median = 13.5
Mode = 17
Here the mean and median appears to be satisfactory as a measure of central tendency. But the mode is
greater than what is expected as a central value.
Both mean and median describes the life style somewhat accurately. But the mean has the disadvantage
that it is not a whole number, even though all observations are whole numbers.
The mode has the advantage that it is the most repeating item. Out of 18 observations, the observation
17 repeats 4 times.
When the sample size increased from 13 to 18, there is no change in the value of the mode. The mean
and the median changed only slightly. So far as the present study is concerned, the central value is more
or less stable. There is no necessity of increasing the sample size as far as the present study is
concerned.
Part II
ANALYSIS USING EXCEL
ENTERIG THE DATA IN EXCEL
In cell A1 write β€œDays”.
From cells A2 to A19 enter the numbers of the day on which the observations are taken.
In cell B1 write β€œEmails”.
From cells B2 to B19 enter the number of emails on each day.
Thus the data is entered in to excel.
PREPARATION OF FREQUENCY DISTRIBUTION AND DESCRIPTIVE MEASURES
In cell A21 write β€œMinimum”
In cell B21 write the formula β€œ=MIN(B2:B19)” and press Enter
In cell A22 write β€œMaximum”
In cell B22 write the formula β€œ=MAX(B2:B19)” and press Enter
Thus we have computed the minimum and maximum of the data as 7 and 18
In a similar way mean, median, mode and standard deviation are calculated in cells B23, B24, B25 and
B26 using the formulae β€œ=AVERAGE(B2:B19)”, β€œ=MEDIAN(B2:B19)”, β€œ=MODE(B2:B19)”,
β€œ=STDEV(B2:B19)”.
In cell D1 write β€œValue”.
In cell E1 write β€œFrequency’.
From cells D2 to D13 enter the numbers 7 to 18
Select cells E2 to E13.
Enter the formula β€œ=FREQUENCY(B2:B19,D2:D13)” and press shift + ctrl + Enter
Fitting a normal distribution
The mean is 13.44 and standard deviation is 3.55
For a normal distribution,
approximately 68% of the data is within 1SD of the mean,
approximately 95% of the data is within 2SDs of the mean,
approximately 99% of the data is within 3SDs of the mean.
For our data,
50% of the data is within 1SD of the mean,
100% of the data is within 2SDs of the mean,
100% of the data is within 3SDs of the mean.
We can infer that our data is approximately normally distributed.