Download Topic 2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
ECON 4630
ECON 5630
TOPIC #2: DESCRIPTIVE STATISTICS
I.
Graphical Methods
A.
Definitions
1.
Frequency distribution
2.
Relative frequency distribution
3.
Cumulative relative frequency
4.
Example: Family Size in Uganda
# of children
frequency (f)
0
1
1
3
2
5
3
1
4
3
5
4
6
6
7
10
8
7
9
3
10
3
11
2
12
1
13
1
relative frequency (f/n)
n = 50
2
B.
188.9
176.3
175.9
175
185.9
187.3
172.1
190.1
189.4
164.1
187.6
181.3
187.4
188.8
181.2
182.4
193.1
164.9
172.6
184.8
Grouping data into classes
1.
General
2.
Example: Men’s Height (sample of 200)
180.9
181.1
173.6
185.9
190.1
180.0
181.4
177.4
179.3
196.4
174.3
178.6
174.5
186.7
180.7
177.1
179.9
166.1
184.1
186.9
179.6
183.0
187.1
185.3
170.6
180.5
183.1
185.8
182.1
186.1
181.5
182.5
183.3
178.9
175.2
175.4
182.9
186.4
190.3
157.2
164
188.9
203.1
173.1
189.9
184.3
158.6
197.4
170.2
178.1
173.3
195.2
176.2
188.2
187.3
190.4
174.0
148.5
194.8
174.1
182.5
182.1
175.1
189.5
168.6
185.1
197.2
179.7
174.0
171.8
178.6
151.3
190.2
190.3
172.9
166.1
184.0
185.1
180.0
179.1
177.5
191.1
183.2
172.4
175.1
193.3
159.0
195.4
160.1
191.0
184.1
181.1
181.5
188.0
189.1
172.8
177.2
183.1
186.2
165.9
170.6
170.1
177.7
181.0
174.8
185.3
193.5
187.3
173.5
167.3
190.4
175.1
177.8
200.0
172.1
186.5
171.2
182.4
197.2
180.2
180.1
177.1
178.3
189.3
173.3
179.2
163.6
183.4
171.0
185.4
171.1
196.1
189.3
175.6
176.1
178.1
184.3
186.8
178.4
192.3
183.1
170.3
180.8
190.0
182.7
173.5
182.1
184.2
189.2
183.3
173.7
171.9
183.0
191.9
179.9
171.0
194.0
182.0
180.1
187.9
177.5
171.8
180.5
163.7
177.7
177.0
186.3
168.9
188.1
176.3
188.2
167.5
187.0
189.1
178.8
179.4
184.5
188.0
179.0
170.1
3
3.
Grouping Data Into Classes
a)
Select the Number of Classes, k
k
2k
1
2
3
4
5
6
b)
Select the Class Interval, i
4
Class Boundaries
LimitL - LimitU
c)
Set the Class Limits
d)
Tally the Data Into Classes
Class
Midpoint
f
f/n
Cumulative
frequency
Cumulative
relative
frequency
148.5 155.5 162.5 169.5 176.5 183.5 190.5 197.5 -
5
C.
Histograms
Definition: A histogram describes a frequency distribution using a series of
adjacent rectangles, where the height of each rectangle is
10
frequency
8
6
4
2
152
159
166
173
180
187
194
201
159
166
173
180
187
194
201
Relative frequency
0.25
0.20
0.15
0.10
0.05
152
6
D.
Frequency Polygons
Definition: A frequency polygon describes a frequency distribution by using a
series of line segments connecting
10
frequency
8
6
4
2
152
159
166
173
180
187
194
201
159
166
173
180
187
194
201
Relative frequency
0.25
0.20
0.15
0.10
0.05
152
7
Cumulative frequency and relative frequency polygons
200
1.00
160
0.80
120
0.60
80
0.40
40
0.20
152
159
166
173
180
187
194
201
8
Cumulative relative frequency
Cumulative frequency
E.
F.
Excel Commands
(1)
(2)
(3)
Place a table such as that on page 23 into Excel.
To get frequency, use the COUNTIF command minus the previous value
(see below).
To get frequency, use the COUNTIF command (see below).
A
B
C
1 188.9
2 176.3
range
3 175.9
148.5
155.4999
4
175
155.5
162.4999
5 185.9
162.5
169.4999
6 187.3
169.5
176.4999
7 172.1
176.5
183.4999
8 190.1
183.5
190.4999
9 189.4
190.5
197.4999
10 164.1
197.5
204.4999
11 187.6
12 etc.
(4)
II.
D
E
frequency
=(COUNTIF($A$1:$A$200,"<"&C3))
=(COUNTIF($A$1:$A$200,"<"&C4))-D3
=(COUNTIF($A$1:$A$200,"<"&C5))-D4
etc.
relative frequency
=(COUNTIF($A$1:$A$200,"<"&C3))
=(COUNTIF($A$1:$A$200,"<"&C4))
=(COUNTIF($A$1:$A$200,"<"&C5))
etc.
To create a histogram, click on the “Chart” icon on the Insert tab.
Alternatively, go to Data Analysis (under the Data tab), and select
Histogram. NOTE: if you don’t see Data Analysis, you will need to go to
Add-ins.
Location: Numerical Methods
A.
Mode: the most frequently-occurring value
1.
Advantages
2.
Disadvantages
9
B.
Percentiles
1.
Definition: the Pth percentile is a point in the data below which
2.
Calculating Percentiles Using “Raw” Data
a)
148.5
151.3
157.2
158.6
159.0
160.1
163.6
163.7
164.0
164.1
164.9
165.9
166.1
166.1
167.3
167.5
168.6
168.9
170.1
170.1
170.2
170.3
170.6
170.6
171.0
171.0
171.1
171.2
171.8
171.8
171.9
172.1
172.1
172.4
172.6
172.8
172.9
173.1
173.3
173.3
173.5
173.5
173.6
173.7
174.0
174.0
174.1
174.3
174.5
174.8
175.0
175.1
175.1
175.1
175.2
175.4
175.6
175.9
176.1
176.2
b)
Rank data in ascending order
176.3
176.3
177.0
177.1
177.1
177.2
177.4
177.5
177.5
177.7
177.7
177.8
178.1
178.1
178.3
178.4
178.6
178.6
178.8
178.9
179.0
179.1
179.2
179.3
179.4
179.6
179.7
179.9
179.9
180.0
180.0
180.1
180.1
180.2
180.5
180.5
180.7
180.8
180.9
181.0
181.1
181.1
181.2
181.3
181.4
181.5
181.5
182.0
182.1
182.1
182.1
182.4
182.4
182.5
182.5
182.7
182.9
183.0
183.0
183.1
183.1
183.1
183.2
183.3
183.3
183.4
184.0
184.1
184.1
184.2
184.3
184.3
184.5
184.8
185.1
185.1
185.3
185.3
185.4
185.8
185.9
185.9
186.1
186.2
186.3
186.4
186.5
186.7
186.8
186.9
187.0
187.1
187.3
187.3
187.3
187.4
187.6
187.9
188.0
188.0
188.1
188.2
188.2
188.8
188.9
188.9
189.1
189.1
189.2
189.3
189.3
189.4
189.5
189.9
190.0
190.1
190.1
190.2
190.3
190.3
190.4
190.4
191.0
191.1
191.9
192.3
193.1
193.3
193.5
194.0
194.8
195.2
195.4
196.1
196.4
197.2
197.2
197.4
200.0
203.1
Calculate position of the Pth percentile
LP =
where LP is the position of the Pth percentile
10
3.
c)
To find the Pth percentile, using the data arranged in ascending
order select the LPth observation from the top. If LP is not an
integer, find a weighted average of the two adjacent observations.
d)
Examples:
Calculating the qth percentile using grouped data
q th percentile  L 
where: L =
q n  CF (i)
f
the lower limit of the class containing the qth percentile
n=
total number of frequencies
f=
frequency in the class containing the qth percentile
CF = cumulative number of frequencies in the classes preceding
the class containing the qth percentile
i=
class interval
11
Class Boundaries
f
f/n
LimitL - LimitU
Class
Midpoint
Cumulative
frequency
Cumulative
relative
frequency
148.5 -155.4999
152
2
0.01
2
0.01
155.5 -162.4999
159
4
0.02
6
0.03
162.5 -169.4999
166
12
0.06
18
0.09
169.5 -176.4999
173
44
0.22
62
0.31
176.5 -183.4999
180
64
0.32
126
0.63
183.5 -190.4999
187
56
0.28
182
0.91
190.5 -196.4999
194
16
0.08
198
0.99
197.5 -204.4999
201
2
0.01
200
1.00
200
1.00
a)
Median
b)
Lower quartile
c)
Upper quartile
d)
33rd percentile
12
C.
The mean
1.
Digression: summation notation
n
 x  x
i 1
i
1
 x 2  x3  ...  x n 
Factoids:
n
(a)
 k  kn
i 1
(b)
(c)
n
n
i 1
i 1
 kxi  k  xi
n
n
n
i 1
i 1
i 1
  xi  y i    xi   y i
Examples:
2.
The population mean
3.
The sample mean in general
13
4.
Calculating the sample mean for “raw” data
5.
Calculating the sample mean for grouped data
14
6.
Geometric mean: useful for finding the average of percentages, ratios,
indexes, or growth rates.
Let xi be the percentage change in period i:
GM  n 1  x1   1  x2   1  x3    1  xn   1
Example: suppose Jack got a 3% raise in year 1, a 4% raise in year 2, and
a 10% raise in year 3. What’s his average raise?
Similarly, to find the average percent increase over time, use
GM  n
X end
 1 where Xend is the value at the end of the period.
X start
15
D.
Relative Positions of the Mean, Mode, and Median
1.
Symmetric Distributions
2.
Positive-skewed Distributions
3.
Negative-skewed Distributions
16
E.
Excel Commands
If your data are in cells A2 to A51,
1.
=MODE(A2:A51)
2.
=MEDIAN(A2:A51)
3.
=AVERAGE(A2:A51)
4.
=PERCENTILE(A2:A51,.25) for the 25th percentile
(NOTE: Excel uses a peculiar interpolation method to calculate
percentiles, so your calculations may differ from those calculated by
Excel, especially for smaller data sets.
17
III.
Measures of Spread
A.
Why Is Spread Important?
B.
Ways to Measure Spread
Example: UNT Women’s Basketball – points scored, last 12 games 2007-08
Game
Points Scored
1
87
2
70
3
67
4
70
5
55
6
75
7
73
8
70
9
63
10
52
11
75
12
53
XX
X  X 
2
Σ =810
18
1.
The range
2.
The inter-quartile range (IQR)
3.
Mean (absolute) deviation
19
4.
Variance and standard deviation
a)
Definitions:
Variance: the arithmetic mean of the squared deviations from the
mean.
Standard deviation: the square root of the variance; will be in the
same units as the data.
b)
Population Formulae
c)
Sample Formulae
i.
Formulae for “Raw” Data
20
ii.
d)
C.
Formulae for Grouped Data
The “Empirical Rule”
Excel Commands
1.
Many descriptive statistics are possible using
Tools
Data Analysis
Descriptive Statistics
2.
Alternatively, use Excel functions (assuming data are in cells from A2 to
A51:
a)
Minimum: =MIN(A2:A51)
b)
Maximum: =MAX(A2:A51)
c)
Sample standard deviation: =STDEV(A2:A51)
d)
Population standard deviation : =STDEVP(A2:A51)
21
NOTES
22
Related documents