Download Chapter 1 - amu faculty personal web pages

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Categorical variable wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
Chapter 1
Exercises
Bar Charts and Pie Charts
 In PowerPoint,
o insert a “table and content” slide. Click on the chart icon. Select Bar
(or Pie) Chart
o Paste (or input) the data in the table. Resize the chart data range to fit
your data.
o Reformat to make it look interesting and intelligent.
 In Excel,
o Select the data you want to make into a chart.
o In the Insert tab, click on Chart. Select Bar (or Pie) Chart
o Reformat to make it look interesting and intelligent.
Histograms:

Use file: MPG.dta
o Type all of these words, exactly as they appear
use http://amu-chemlab.avemaria.edu/~martinez/ECON303/mpg, clear
o br
o histogram mpg
 Pattern? Deviations?
 Shape? Center? Spread?
 Symmetry?
 Outliers?

Use arch_firms.dta
o
use http://amuchemlab.avemaria.edu/~martinez/ECON303/arch_firms,
clear
o br
o hist staff
o hist staff, width(10)
 Pattern? Deviations?
 Shape? Center? Spread?
 Symmetry?
 Outliers?


All of the data we’ll use for the class are at http://amuchemlab.avemaria.edu/~martinez/ECON303/index.htm
Google “FRED” or go to http://research.stlouisfed.org/fred2/
o Follow these links: Business/Fiscal > Household Sector >
Series: HOUST > Download Data
o Select Text, Comma Delimited as the File Format.
 Select “Excel” if you have Excel in your computer.
o Download and open the file.
o Copy the Date and Value numbers
o Open the Data Editor in Stata (type clear and then ed in the
command window). Paste the numbers.
o
hist



value
Pattern? Deviations?
Shape? Center? Spread?
Symmetry?
 Outliers?
Time plots

Yield_Tbill
o
o
o
o
o

use http://amuchemlab.avemaria.edu/~martinez/ECON303/yield_tbill,
clear
line rate year
tsset year
tsline rate
tsline rate in 11/21
Google “FRED” or go to http://research.stlouisfed.org/fred2/
o Run a search within FRED for RSAFSNA.
 What is RSAFSNA?
 Does RSAFSNA have a trend? What other patterns can you
distinguish? What do you think explains the patterns?
 Search for RSAFS. How is it different from RSAFSNA?
 Click on Download Data Select Text, Comma
Delimited as the File Format.
o Select “Excel” if you have Excel in your
computer.
 Download and open the file.
 Copy the Value column
 Open the Data Editor in Stata (type clear and then
ed in the command window). Paste the numbers.
 gen obs = _n
 line value obs || lfit value obs
 Trend? Seasonal Variation?
o Run a search within Fred for DTB3 (which is the “3-Month Treasury
Bill: Secondary Market Rate”)
 What does the graph tell you?
o Follow these links: Business/Fiscal > Industrial
Production > Series: INDPRO, Industrial
Production Index
 Does INDPRO have a trend? Notice it says that INDPRO is
seasonally adjusted. What does that mean?
 What do the shaded areas represent? What happens to
INDPRO in or around those shaded areas?
o Run a search within Fred for “unemployment rate”. Click on
“civilian unemployment rate”.
 Trend? Cycle?
Mean and Median
 Stata file: MPG
o
o
o
o

Bank_worker_earnings.dta
o
o
o
o
o
o

use http://amuchemlab.avemaria.edu/~martinez/ECON303/mpg, clear
describe
summarize mpg
hist mpg
describe
br
hist a
tabstat a, s(mean median)
by worker: tabstat annualearnings, stat(mean)
bysort worker: tabstat annualearnings, stat(mean)
amount_spent.dta
o
o
o
o
o
o
o
describe
br
tabstat amountspent, stat(mean)
hist a
tabstat amountspent, stat(mean median)
tabstat am if am<50, stat(mean median)
hist am if am<50
Quartiles, Box Plots

growth.dta
o
o
o
o
o
o
describe
br
hist g, width(1)
tabstat g, stat(mean median)
tabstat g if g<8, stat(mean median)
tabstat g if g<8, stat(min q max)

SAT_AVG.dta

o describe
o br
o summarize svavg smavg grad
o hist svavg
o hist smavg
o tabstat svavg smavg grad, stat(mean min q max n)
o graph box svavg smavg
use http://amuchemlab.avemaria.edu/~martinez/ECON303/growth_EE, clear
o describe
o br
o bysort region: tabstat g, stat(min q max)
o gr box g
o gr box g, over(r)
 Outliers? Which country is the outlier?
Standard Deviation
John’s parents recorded his height at various ages between 36 and 66 months. Below
is a record of the results.
Age (months)
Height (inches)
36
34
54
41
66
45
Calculate the standard deviation of John’s age. Show your work on the table on the
next page.
Standard Deviation of x
Deviations of a
from the mean of
a
Values of a
=
sx 

xi  x 2
n 1
Squared
deviations of a
sum of squared deviations of a =
Mean of a =
# of obs (n) =
n-1=
s 
2
a

sa 

 a
n 1

2
ai  a 2
n 1
=
=
SAT_AVG.dta
o
o
o
o
o
o
o
o

a i
describe
br
tabstat svavg
tabstat svavg
hist svavg
hist smavg
tabstat svavg
tabstat svavg
smavg grad, stat(mean median)
smavg grad, stat(min q max)
smavg grad, stat(sd var)
smavg grad, stat(mean min q max sd n)
Bank_worker_earnings.dta
o
o
o
o
o
o
describe
br
hist a
gr box a, over(w)
by worker: tabstat annualearnings, stat(sd)
bysort worker: tabstat annualearnings, stat(mean sd)
Recognizing Ouliers

Resistance to outliers
o
o
o
o
clear all
set obs 100
generate hprice=uniform()*200000+200000
generate hprice2=hprice
o Open the Data Editor (type ed). Scroll down to the bottom row.
replace the last observation of hprice2 with one million: 1000000.
First Exercise
o
o
o
o
o
o
o
hist hprice, width(10000) freq
graph rename hprice
hist hprice2, width(10000) freq
graph rename hprice2
graph combine hprice hprice2
graph combine hprice hprice2, rows(2)
graph combine hprice hprice2, xcommon
Second Exercise
o
o
tabstat hprice hprice2, stat(mean sd min q max)
graph box hprice2
Third Exercise
o go to www.realtor.org
 click on Research, then Housing Statistics, then State ExistingHome Sales. Then scroll down to find “State Existing-Home
Sales”.
 Download and open the Excel file.
 Copy the State numbers for the last quarter available, that is,
from cells I8 to I58
 Open the Data Editor in Stata. Paste the numbers.
 Also copy and paste the names of the states, cells A8 to A58

hist var1


What is the shape of the distribution?
Which states are the outliers? What explains their being
outliers? What would be a better measure of a “surprising”
number of home sales?
 It’s easiest to find this by asking the Browser to display
only observations whose values exceed or are below
some number, as in br if var1>400

go to www.realtor.org
 click on Research, then Housing Statistics, then State ExistingHome Sales. Then scroll down to find “State Existing-Home
Sales”.
 Download and open the Excel file.
Inputting Data












Select the last column (column J).
Go to the Home tab, Number Area, and click on Comma Style
o In Office 2003, go to Format Menu | Cells… . In the
“Number” tab, select “General”
o This causes the “percent” signs to disappear. If we kept the
percent signs, Stata would think that our numbers are
letters.
On column A, select the names of the states (starting with Alabama
and ending with Wyoming). Copy this.
Switch to STATA. Type “clear” in the command window.
Open the Data Editor. Paste the state names on the first column.
Go back to the Excel sheet and copy the numbers on the last
column that correspond to the states. Paste them on the second
column of the Data Editor in Stata.
rename var1 state
What does this do?
rename var2 change
hist change
dotplot change
dotplot change, mlab(state)
Search within Fred for RSAFS.
 Click on “View Data”
 Copy All. Paste onto an Excel sheet.
 Select the cells with the data (from A13 down). Go to the Data
Menu, select “Text to Columns …”, check “Delimited” and then
hit “Next”. Check “space” and then hit “finish”.
 Copy the data with the RSAFS numbers and paste it into the Data
Editor (to open the editor, type ed into the command window)



gen date = _n
format date %tm
br
replace date = date + 383
tsset date
rename var1 rsafs

tsline rsafs



What’s wrong with the date?
Why does this work?
What does this do?
The Normal Distributions

use http://amuchemlab.avemaria.edu/~martinez/ECON303/state_unemp.dta,
clear
o hist percent
o hist percent, width(0.5)
o hist percent, width(0.5) kdensity
o hist percent if p<8, width(0.5) kdensity
o qnorm p
o qnorm p if p<8

qnorm plots the quantiles of varname against the quantiles of a
Normally distributed variable. If varname were Normally
distributed, the histogram would follow the outline of the
Normal kernel density plot.

growth_EE.dta
o hist growth, width(1) kden
o hist g if region=="EA", width(1) kdensity
o hist g if r=="EA", width(1) kden kdenopts(w(.5))
o qnorm g
o qnorm growth if growth>0
If you get PCECC96 (which is Real Personal Consumption Expenditures)
from FRED, and you download that series’ Percentage change from a year
ago, you get a variable with a nearly Normal distribution.
-5
0
pcecc96
5
10

-2
0
2
4
Inverse Normal
6
8
. tabstat pcecc96, s(mean sd)
variable |
mean
sd
-------------+-------------------pcecc96 | 3.511934 1.975904
----------------------------------

Complete this table
mean - 3*sd
mean - 2*sd
mean - sd
mean
mean + sd
mean + 2*sd
mean + 3*sd
-0.439874
3.511934
5.487838
9.439646
What % of the observations lies within
3 standard deviations of the mean?
_____________
2 standard deviations of the mean?
_____________
1 standard deviation of the mean?
_____________
.3
.2
0
.1
Density
-5
0
5
10
pcecc96
o
If the % change of PCECC96 were truly Normally distributed, the Kernel
Density estimate (the smoothed out distribution) would overlap the Normal
curve with the same mean and standard deviation.
 kdensity pcecc96, normal normopts(lwidth(medium))
xline(-2.415778 -0.439874 1.53603 3.511934
5.487838 7.463742 9.439646)
.1
0
.05
Density
.15
.2
Kernel density estimate
-5
0
5
pcecc96
Kernel density estimate
Normal density
kernel = epanechnikov, bandwidth = 0.5560
o
If the distribution were perfectly symmetric (so mean=median),
 what % of the distribution would fall below the mean? _____
 what % of the distribution would fall above the mean? _____
 what % of the distribution would lie below the curve? _____
10
0
.05
.1
Density
.15
.2
Kernel density estimate
-5
0
5
10
Normal density
Standardized value of x = z 

With this information
mean
3.511934
x5
x6
x7
x8
5.487838
7.463742
9.439646
5.8830188
Standardized
Value
Kernel density estimate
.2
To say that an observation
Y has a standardized value
of -1.2 (that is, to say that
Z= -1.2) means that it lies
1.2 standard deviations
below the mean.
o That means that
11.51% of the
observations are
smaller than Y,
o and 88.49% of the
observations are
larger than Y.
Value
.15
-2.415778
-0.439874
1.53603
3.511934
Observation
.1
x1
x2
x3
x4
Standardized
Value
.05
Value
0
Observation

sd
1.975904
Complete this table
Density

xi  x
sx
-5
0
5
Normal density
10
.05
.1
.15
.2
Kernel density estimate
0
Similarly, we find in TABLE
A (inside the front cover of
your book) that if an
observation W has a
standardized value of 0.5
(Z=0.5), it lies 0.5 standard
deviations above the mean.
o That means that
___________% of the
observations are
smaller than W,
o and ___________% of
the observations are
larger than W.
Density

-5
0
5
10
Normal density
Problems 1.75, 1.77, 1.79
use http://amu-chemlab.avemaria.edu/~martinez/ECON303/mpg2.dta,
clear
o hist mpg
o tabstat mpg, stat(min q max mean sd n)
o hist mpg if mpg<32
o tabstat mpg if mpg<32, stat(min q max mean sd n)
o tabstat mpg if mpg>=32, stat(min q max mean sd n)
o qnorm mpg
o qnorm mpg if mpg<32

Normal Curve Statistical Applet on PBS 1e.
o http://bcs.whfreeman.com/pbs/
o Statistical Applets | Normal Curve