Download Mean

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Categorical variable wikipedia , lookup

Transcript
Distribution Center (Clutter) Criteria
The average
1 / 25
The Aim
By the end of this lecture, the
students will be aware of the central
distribution measures and be able to
calculate the extent of the distribution
center by using SPSS.
2 / 25
2
The Goals
•
•
•
•
•
Be able to count the central tendency
measures.
Be able to calculate the central tendency
measures by using SPSS.
Be able to draw histogram by using SPSS
and be able to evaluate the status of the
distribution.
Be able to explain weighted average.
Be able to make weigthing by using SPSS.
3 / 25
3
1-We can not have an idea about the overall data if we
do not summarize our obtained data in some way.
2-The graphic display can be a good way to summarize.
3-We can get a general idea by calculating the specific
characteristics of the data. We can find a value to
represent our data and we can calculate the distribution
of variables around this value.
4 / 25
4
•
•
•
•
•
Arithmetic mean
Median
Mod
Geometric mean
Weighted mean
5 / 25
5
Arithmetic Mean
Arithmetic mean is indicated by a line above x in
the formula.
By using Greek Sigma collection sign, this formula
is shown as follows:
Or
6 / 25
6
Person 1
No
Year
2
3
10 13 15
4
5
6
7
8
9
16
16
18
19
20
20
Arithmetic average age in the above data set
(10+13+15+16+16+18+19+20+20)/9 = 16,33
7 / 25
7
Median
•
•
When we sort the data from greater to
small, the middle walue is called “Median”
If the data number is an even number, the
average of the middle two values are
taken.
Person
No
1
Year
10 13 15
2
3
4
5
6
7
8
9
16
16
18
19
20
20
8 / 25
8
Mod
•
•
•
•
Mode is not frequently used measure of central tendency.
Most repetitive variable in the data set is called “Mod”.
Data sets can have multiple modes.
If each variables is repeated only once, no mod is
present.
Person 1
No
Year
2
3
10 13 15
4
5
6
7
8
9
16
16
18
19
20
20 9 / 25
9
Geometric Mean
1-In the case of slope of our data, using arithmetic
average it is not apropriate.
2-In the case of data that become skewed to the
right (tail of the bell curve is toward right on a
histogram graph), if we take the individual log data
(according to the base 10 or base e) the new data
set we will obtain may become symmetrical.
3-We can get the arithmetic mean of these
logarithm values.
10 / 25
10
• In order to return to the original unit,
data conversion (antilog) is required.
• The new value is called the geometric
mean.
• In general, geometric mean is close to
the median and It becomes smaller
value than the arithmetic mean.
11 / 25
11
www.aile.net/agep/istat/08_09/diyabet.sav
• When we examine " Weight" variables in data
set, we saw that the tail of the bell curve
toward right (right skewed) in the histogram
graph.
• Now, Let's have the logarithm of "weight" to
make a new variable.
12 / 25
12
• Transform> Compute variable> [Let us write
"logWeight " into "Target Variable " field and
"LG10 (Weight ) into "Numeric Expression"
field]> OK
• A new variable with a name of "logWeight" will
appear in our SPSS data set. Now let's look at
the histogram of this variable:
• Graphs> Interactive > Histogram [Let us drag
"logWeight" variable on X-axis. Then click on"
Histogram" tab. Let's mark "Normal curve" box
]> OK
13 / 25
13
50
Count
40
30
20
10
1,60
1,80
2,00
2,20
LogWei ght
• Let us calculate the arithmetic average of our “Weight” variable:
• Analyze> Descriptive Statistics > Descriptives [ Let us drag
"Weight" variable into "Variable (s)" area ]> OK
14 / 25
14
N
Minimum
Weight
424
Valid N
(listwise)
424
33,0
Mean
160,0 74,266
Maximum
Std. Deviation
15,1381
Now, let us take the arithmetic average of “logWeihgt”
variable :
Analyze> Descriptive Statistics > Descriptives [ Let us drag
“logWeight” variable into "Variable (s)" area ]> OK
Descriptive Statistics
N
LogWeigh
t
Valid N
(listwise)
424
Minimum Maximum
1,52
2,20
Mean
1,8625
Std. Deviation
,08428
424
15 / 25
15
In order to interpret the clinical value we obtained
we need to reverse "Weight" variable's unit back.
We must take anti- logarithm value of 1,862 .
Antilog (1.862) = 101,862 = 72.777 kg.
İn order to to calculate the median and mode of
”Weight” variable with SPSS.
Analyze>Descriptive Statistics>Frequencies [Let us
drag “Weight” variable into “Variable(s)” area.
Let us click on “Statistics” button. Under the title
"Central tendency” let us click on “Median” ve
“Mode” boxes ]>Continue>OK
16 / 25
16
Weighted Mean
•
•
We use weighted mean if some values of a
variable is more important than others.
We will give a coefficient to each value in our
sample. We multiply eacah value by coefficient
and we collect them. Then we divide by the total
value.
17 / 25
17
•
•
Ex . We examine the number of daily discharge of our city
hospital.
Our Variable "This day was the number of Suppose that our
variable is: "How many patients discharged from your hospital
today?" The followings are the obtained data for 3 hospitals in
our province are as follows:
Hospital 1 Hospital 2 Hospital 3
Discharged patient
20
5
50
We realise that the thirth hospital has discharged
maximum patient. The mean discarged number is 25.
We can not have an idea about the workload of
hospitals without knowing their beds capacity.
18 / 25
18
Suppose that the patient capacity as follows:
Hospital 1
Hospital 2 Hospital 3
Discharged patient
20
5
50
Bed capacity
50
50
400
We can get a better idea if we weighted discharge number according to
bad capacity.
Let us apply the formula:
(20x50 + 5x50 + 50x400)/(50+50+400) = 42,5 discharge.
19 / 25
19
Hospilal 1
Hospilal 2
Hospilal 3
Mean
Discharged patient
20
5
50
25
Bed capacity
50
50
400
166,6
Weighted discharged
66,6*
16,6
20,8
* 20 x 166,6 / 50
As a result:
we observe that: the weighted average (42,5
people) is much more from what appears at
first (25 people) and we see that hospital 1
work in the highest density.
20 / 25
20
•
We know that the number of children is affected
by the age factor and as age progresses having
more children.
www.aile.net/agep/istat/08_09/diyabet.sav
• Let us weighted "children" ( number of children)
variable according to age.
• Before weightining note that the arithmetic
average of "children" variable is 6,38.
21 / 25
21
• Data>Wieght Cases>Weight cases by>[Let us
drag “Age” variable into “Frequency Variable”
area]>OK.
• When we take the arithmetic mean of the number
of children ("children"), we will se that it is 6,61.
• This process is also called "corrected the number
of children by age".
• On International statistics statistics such as
mortality rates are given with correction
(weighted ) according to population or other
variables.
22 / 25
22
Mean type
Positive
Negative
Arithmetic  All values are used
 Affected by outliers
mean
 Defined algebraic
 It is affected by skewed
and maybe used mathematically
data
 Known sampling distribution
(see section : data
conversion )
Median
 Not affected by extreme  A large part of the
values
information is ignored
 Not affected bay inclined  It is not defined as
(skewed ) data
algebraic
 It is affected by the
sample distribution
23 / 25
23
Mean type
Positive
Negative
Mod
-It can easily be detected for
categorical data.
-A large part of the information
is ignored.
-It is not defined as algebraic.
-Sampling distribution is
unknown.
Geometric
mean
-Before recycling It has the same
advantages with arithmetic
average.
-Suitable for right skewed data.
-Only works if the log
transformation making a
symmetrical distribution.
Weighted
mean
-It has the sane advantages with
arithmetic average .
-The relative importance is given for
each observation.
-It is defined as algebraic.
-Weight should be known or
should be calculated.
24 / 25
24
Summary
25 / 25
25