Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 1 Exercises Bar Charts and Pie Charts In PowerPoint, o insert a “table and content” slide. Click on the chart icon. Select Bar (or Pie) Chart o Paste (or input) the data in the table. Resize the chart data range to fit your data. o Reformat to make it look interesting and intelligent. In Excel, o Select the data you want to make into a chart. o In the Insert tab, click on Chart. Select Bar (or Pie) Chart o Reformat to make it look interesting and intelligent. Histograms: Use file: MPG.dta o Type all of these words, exactly as they appear use http://amu-chemlab.avemaria.edu/~martinez/ECON303/mpg, clear o br o histogram mpg Pattern? Deviations? Shape? Center? Spread? Symmetry? Outliers? Use arch_firms.dta o use http://amuchemlab.avemaria.edu/~martinez/ECON303/arch_firms, clear o br o hist staff o hist staff, width(10) Pattern? Deviations? Shape? Center? Spread? Symmetry? Outliers? All of the data we’ll use for the class are at http://amuchemlab.avemaria.edu/~martinez/ECON303/index.htm Google “FRED” or go to http://research.stlouisfed.org/fred2/ o Follow these links: Business/Fiscal > Household Sector > Series: HOUST > Download Data o Select Text, Comma Delimited as the File Format. Select “Excel” if you have Excel in your computer. o Download and open the file. o Copy the Date and Value numbers o Open the Data Editor in Stata (type clear and then ed in the command window). Paste the numbers. o hist value Pattern? Deviations? Shape? Center? Spread? Symmetry? Outliers? Time plots Yield_Tbill o o o o o use http://amuchemlab.avemaria.edu/~martinez/ECON303/yield_tbill, clear line rate year tsset year tsline rate tsline rate in 11/21 Google “FRED” or go to http://research.stlouisfed.org/fred2/ o Run a search within FRED for RSAFSNA. What is RSAFSNA? Does RSAFSNA have a trend? What other patterns can you distinguish? What do you think explains the patterns? Search for RSAFS. How is it different from RSAFSNA? Click on Download Data Select Text, Comma Delimited as the File Format. o Select “Excel” if you have Excel in your computer. Download and open the file. Copy the Value column Open the Data Editor in Stata (type clear and then ed in the command window). Paste the numbers. gen obs = _n line value obs || lfit value obs Trend? Seasonal Variation? o Run a search within Fred for DTB3 (which is the “3-Month Treasury Bill: Secondary Market Rate”) What does the graph tell you? o Follow these links: Business/Fiscal > Industrial Production > Series: INDPRO, Industrial Production Index Does INDPRO have a trend? Notice it says that INDPRO is seasonally adjusted. What does that mean? What do the shaded areas represent? What happens to INDPRO in or around those shaded areas? o Run a search within Fred for “unemployment rate”. Click on “civilian unemployment rate”. Trend? Cycle? Mean and Median Stata file: MPG o o o o Bank_worker_earnings.dta o o o o o o use http://amuchemlab.avemaria.edu/~martinez/ECON303/mpg, clear describe summarize mpg hist mpg describe br hist a tabstat a, s(mean median) by worker: tabstat annualearnings, stat(mean) bysort worker: tabstat annualearnings, stat(mean) amount_spent.dta o o o o o o o describe br tabstat amountspent, stat(mean) hist a tabstat amountspent, stat(mean median) tabstat am if am<50, stat(mean median) hist am if am<50 Quartiles, Box Plots growth.dta o o o o o o describe br hist g, width(1) tabstat g, stat(mean median) tabstat g if g<8, stat(mean median) tabstat g if g<8, stat(min q max) SAT_AVG.dta o describe o br o summarize svavg smavg grad o hist svavg o hist smavg o tabstat svavg smavg grad, stat(mean min q max n) o graph box svavg smavg use http://amuchemlab.avemaria.edu/~martinez/ECON303/growth_EE, clear o describe o br o bysort region: tabstat g, stat(min q max) o gr box g o gr box g, over(r) Outliers? Which country is the outlier? Standard Deviation John’s parents recorded his height at various ages between 36 and 66 months. Below is a record of the results. Age (months) Height (inches) 36 34 54 41 66 45 Calculate the standard deviation of John’s age. Show your work on the table on the next page. Standard Deviation of x Deviations of a from the mean of a Values of a = sx xi x 2 n 1 Squared deviations of a sum of squared deviations of a = Mean of a = # of obs (n) = n-1= s 2 a sa a n 1 2 ai a 2 n 1 = = SAT_AVG.dta o o o o o o o o a i describe br tabstat svavg tabstat svavg hist svavg hist smavg tabstat svavg tabstat svavg smavg grad, stat(mean median) smavg grad, stat(min q max) smavg grad, stat(sd var) smavg grad, stat(mean min q max sd n) Bank_worker_earnings.dta o o o o o o describe br hist a gr box a, over(w) by worker: tabstat annualearnings, stat(sd) bysort worker: tabstat annualearnings, stat(mean sd) Recognizing Ouliers Resistance to outliers o o o o clear all set obs 100 generate hprice=uniform()*200000+200000 generate hprice2=hprice o Open the Data Editor (type ed). Scroll down to the bottom row. replace the last observation of hprice2 with one million: 1000000. First Exercise o o o o o o o hist hprice, width(10000) freq graph rename hprice hist hprice2, width(10000) freq graph rename hprice2 graph combine hprice hprice2 graph combine hprice hprice2, rows(2) graph combine hprice hprice2, xcommon Second Exercise o o tabstat hprice hprice2, stat(mean sd min q max) graph box hprice2 Third Exercise o go to www.realtor.org click on Research, then Housing Statistics, then State ExistingHome Sales. Then scroll down to find “State Existing-Home Sales”. Download and open the Excel file. Copy the State numbers for the last quarter available, that is, from cells I8 to I58 Open the Data Editor in Stata. Paste the numbers. Also copy and paste the names of the states, cells A8 to A58 hist var1 What is the shape of the distribution? Which states are the outliers? What explains their being outliers? What would be a better measure of a “surprising” number of home sales? It’s easiest to find this by asking the Browser to display only observations whose values exceed or are below some number, as in br if var1>400 go to www.realtor.org click on Research, then Housing Statistics, then State ExistingHome Sales. Then scroll down to find “State Existing-Home Sales”. Download and open the Excel file. Inputting Data Select the last column (column J). Go to the Home tab, Number Area, and click on Comma Style o In Office 2003, go to Format Menu | Cells… . In the “Number” tab, select “General” o This causes the “percent” signs to disappear. If we kept the percent signs, Stata would think that our numbers are letters. On column A, select the names of the states (starting with Alabama and ending with Wyoming). Copy this. Switch to STATA. Type “clear” in the command window. Open the Data Editor. Paste the state names on the first column. Go back to the Excel sheet and copy the numbers on the last column that correspond to the states. Paste them on the second column of the Data Editor in Stata. rename var1 state What does this do? rename var2 change hist change dotplot change dotplot change, mlab(state) Search within Fred for RSAFS. Click on “View Data” Copy All. Paste onto an Excel sheet. Select the cells with the data (from A13 down). Go to the Data Menu, select “Text to Columns …”, check “Delimited” and then hit “Next”. Check “space” and then hit “finish”. Copy the data with the RSAFS numbers and paste it into the Data Editor (to open the editor, type ed into the command window) gen date = _n format date %tm br replace date = date + 383 tsset date rename var1 rsafs tsline rsafs What’s wrong with the date? Why does this work? What does this do? The Normal Distributions use http://amuchemlab.avemaria.edu/~martinez/ECON303/state_unemp.dta, clear o hist percent o hist percent, width(0.5) o hist percent, width(0.5) kdensity o hist percent if p<8, width(0.5) kdensity o qnorm p o qnorm p if p<8 qnorm plots the quantiles of varname against the quantiles of a Normally distributed variable. If varname were Normally distributed, the histogram would follow the outline of the Normal kernel density plot. growth_EE.dta o hist growth, width(1) kden o hist g if region=="EA", width(1) kdensity o hist g if r=="EA", width(1) kden kdenopts(w(.5)) o qnorm g o qnorm growth if growth>0 If you get PCECC96 (which is Real Personal Consumption Expenditures) from FRED, and you download that series’ Percentage change from a year ago, you get a variable with a nearly Normal distribution. -5 0 pcecc96 5 10 -2 0 2 4 Inverse Normal 6 8 . tabstat pcecc96, s(mean sd) variable | mean sd -------------+-------------------pcecc96 | 3.511934 1.975904 ---------------------------------- Complete this table mean - 3*sd mean - 2*sd mean - sd mean mean + sd mean + 2*sd mean + 3*sd -0.439874 3.511934 5.487838 9.439646 What % of the observations lies within 3 standard deviations of the mean? _____________ 2 standard deviations of the mean? _____________ 1 standard deviation of the mean? _____________ .3 .2 0 .1 Density -5 0 5 10 pcecc96 o If the % change of PCECC96 were truly Normally distributed, the Kernel Density estimate (the smoothed out distribution) would overlap the Normal curve with the same mean and standard deviation. kdensity pcecc96, normal normopts(lwidth(medium)) xline(-2.415778 -0.439874 1.53603 3.511934 5.487838 7.463742 9.439646) .1 0 .05 Density .15 .2 Kernel density estimate -5 0 5 pcecc96 Kernel density estimate Normal density kernel = epanechnikov, bandwidth = 0.5560 o If the distribution were perfectly symmetric (so mean=median), what % of the distribution would fall below the mean? _____ what % of the distribution would fall above the mean? _____ what % of the distribution would lie below the curve? _____ 10 0 .05 .1 Density .15 .2 Kernel density estimate -5 0 5 10 Normal density Standardized value of x = z With this information mean 3.511934 x5 x6 x7 x8 5.487838 7.463742 9.439646 5.8830188 Standardized Value Kernel density estimate .2 To say that an observation Y has a standardized value of -1.2 (that is, to say that Z= -1.2) means that it lies 1.2 standard deviations below the mean. o That means that 11.51% of the observations are smaller than Y, o and 88.49% of the observations are larger than Y. Value .15 -2.415778 -0.439874 1.53603 3.511934 Observation .1 x1 x2 x3 x4 Standardized Value .05 Value 0 Observation sd 1.975904 Complete this table Density xi x sx -5 0 5 Normal density 10 .05 .1 .15 .2 Kernel density estimate 0 Similarly, we find in TABLE A (inside the front cover of your book) that if an observation W has a standardized value of 0.5 (Z=0.5), it lies 0.5 standard deviations above the mean. o That means that ___________% of the observations are smaller than W, o and ___________% of the observations are larger than W. Density -5 0 5 10 Normal density Problems 1.75, 1.77, 1.79 use http://amu-chemlab.avemaria.edu/~martinez/ECON303/mpg2.dta, clear o hist mpg o tabstat mpg, stat(min q max mean sd n) o hist mpg if mpg<32 o tabstat mpg if mpg<32, stat(min q max mean sd n) o tabstat mpg if mpg>=32, stat(min q max mean sd n) o qnorm mpg o qnorm mpg if mpg<32 Normal Curve Statistical Applet on PBS 1e. o http://bcs.whfreeman.com/pbs/ o Statistical Applets | Normal Curve