Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Midterm Review 1 Econ 240A Descriptive Statistics Probability Inference Differences between populations Regression 2 I. Descriptive Statistics Telling stories with Tables and Graphs That are self-explanatory and esthetically appealing Exploratory Data Analysis for random variables that are not normally distributed Stem and Leaf diagrams Box and Whisker Plots 3 Stem and Leaf Diagtam Example: Problem 2.24 Prices in thousands of $ of houses sold in a Los Angeles suburb in a given year 4 Subsample Prices 289 208 255 215 270 222 206 221 210 224 209 250 222 213 220 250 209 Problem 2.24 Prices in thousands $ Houses sold in a Los Angeles suburb 5 Sorted Data Prices 192 195 198 200 202 205 206 206 208 208 209 209 209 209 209 210 211 Problem 2.24 Prices in thousands $ Houses sold in a Los Angeles suburb 6 Prices Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count 237.9882 3.314365 230 222 30.55693 933.7261 1.620493 1.164885 149 192 341 20229 85 Summary Statistics Problem 2.24 Prices in thousands $ Houses sold in a Los Angeles suburb 7 Stem & Leaf Display Stems 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 Leaves ->258 ->025668899999 ->01233457789 ->0012222223346699 ->00336 ->012244467788 ->00002255 ->0569 ->00235689 ->69 Problem 2.24 Prices in thousands $ Houses sold in a Los Angeles suburb ->68 ->01 8 Box and Whiskers Plots Example: Problem 4.30 Starting salaries by degree 9 BA Subsample Problem 4.50 Starting salaries By degree 26819 25797 29115 32877 30015 25090 23163 28225 25103 29742 24587 20780 30353 BSc BBA Other 28930 38968 34550 36602 35187 30245 35098 29452 31520 36793 30943 26680 36171 31610 29047 28396 39738 35037 26204 37444 26550 37280 38403 35704 37660 36459 32262 24539 37963 34206 27222 34138 26917 39536 42062 26723 32653 32700 36297 10 BA Smallest = 18719 Q1 = 25730 Median = 27765 Q3 = 29835.5 Largest = 37025 IQR = 4105.5 Outliers: 37025, 36345, 18719, 0 BoxPlot 10000 20000 BSc Smallest = 23451 Q1 = 29927 Median = 33396.5 Q3 = 36745.25 Largest = 40105 IQR = 6818.25 Outliers: 30000 40000 50000 40000 50000 40000 50000 40000 50000 BoxPlot 0 10000 20000 BBA Smallest = 23401 Q1 = 31316 Median = 34284 Q3 = 39551 Largest = 47639 IQR = 8235 Outliers: 30000 BoxPlot 0 10000 20000 Other Smallest = 21994 Q1 = 28253.5 Median = 29950.5 Q3 = 32905.25 Largest = 38812 IQR = 4651.75 Outliers: 30000 BoxPlot 0 10000 20000 30000 11 BA Smallest = 18719 Q1 = 25730 Median = 27765 Q3 = 29835.5 Largest = 37025 IQR = 4105.5 Outliers: 37025, 36345, 18719, 0 BoxPlot 10000 20000 BSc Smallest = 23451 Q1 = 29927 Median = 33396.5 Q3 = 36745.25 Largest = 40105 IQR = 6818.25 Outliers: 30000 40000 50000 BoxPlot 0 10000 20000 30000 40000 50000 0 10000 20000 30000 40000 50000 12 BBA Smallest = 23401 Q1 = 31316 Median = 34284 Q3 = 39551 Largest = 47639 IQR = 8235 Outliers: BoxPlot 0 10000 20000 Other Smallest = 21994 Q1 = 28253.5 Median = 29950.5 Q3 = 32905.25 Largest = 38812 IQR = 4651.75 Outliers: 30000 40000 50000 BoxPlot 0 10000 20000 30000 0 10000 20000 30000 40000 40000 50000 50000 13 II. Probability Concepts Elementary outcomes Bernoulli trials Random experiments events 14 Probability (Cont.) Rules or axioms: Addition rule P(AUB) = P(A) + P(B) – P(A^B) Conditional P(A/B) probability = P(A^B)/P(B) Independence 15 Probability ( Cont.) Conditional P(A/B) probability = P(A^B)/P(B) Independence P(A)*P(B) = P(A^B) So P(A/B) = P(A) 16 Probability (Cont.) Discrete Binomial Distribution = Cn(k) pk (1-p)n-k n repeated independent Bernoulli trials k successes and n-k failures P(k) 17 Binomial Random Number Generator Take 50 states Suppose each state was a battleground state, with probability 0.5 of winning that state What would the distribution of states look like? How few could you win? How many could you win? 18 Subsample 24 24 28 25 18 29 25 24 24 23 25 24 29 19 Histogram of States Won 8 6 4 2 36 34 32 30 28 26 24 22 20 0 18 Frequency 10 Bin 20 Discrete Probability Density, p=0.5 0.12 Probability 0.1 0.08 0.06 0.04 0.02 0 15 20 25 30 35 40 States Won 21 Discrete Cumulative Distribution, p=0.5 1.2 Probabilty 1 0.8 0.6 0.4 0.2 0 0 5 10 15 20 25 30 35 40 States Won 22 Discrete Cumulative Distribution 1.2 Probability 1 p=0.5 p=0.48 0.8 0.6 0.4 0.2 0 0 5 10 15 20 25 States Won 30 35 40 23 Probability (Cont.) Continuous normal distribution as an approximation to the binomial n*p>5, n(1-p)>5 f(z) = (1/2p)½ exp[-½*z2] z=(x-m)/s f(x) = (1/ s) (1/2p)½ exp[-½*{(x-m)/s}2] 24 III. Inference Rates and Proportions Population Means and Sample Means Population Variances and Sample Variances Decision Theory 25 Decision Theory In inference, I.e. hypothesis testing, and confidence interval estimation, we can make mistakes because we are making guesses about unknown parameters The objective is to minimize the expected cost of making errors E(C) = a*C(I) + b*C(II) 26 Sample Proportions from Polls pˆ k / n Where n is sample size and k is number of successes k ~ B(np, np(1 p) 27 Sample Proportions Epˆ (1 / n) Ek (1 / n)np p VARpˆ (1 / n 2 )Vark (1 / n 2 )np (1 p ) p (1 p ) / n pˆ ~ N ( p, (1 / n) p (1 p ) So estimated p-hat is approximately normal for large sample sizes 28 Sample Proportions Where the sample size is large 29 Problem 9.38 A commercial for a household appliances manufacturer claims that less than 5% of all of its products require a service call in the first year. A consumer protection association wants to check the claim by surveying 400 households that recently purchased one of the company’s appliances 30 Problem 9.38 (Cont.) What is the probability that more than 10% require a service call in the first year? What would you say about the commercial’s honesty if in a random sample of 400 households, 10% report at least one service call? 31 Problem 9.38 Answer Null Hypothesis: H0: p=0.05 Alternative Hypothesis: p>0.05 Statistic: z ( pˆ Epˆ ) / s ( pˆ p ) / (1 / n) pˆ (1 pˆ ) pˆ z (0.10 0.05) / (1 / 400)(0.05)0.95 z 4.59 32 Continuous Density of the Standardized Normal Variate, Z 0.5 NORMDENS 0.4 0.3 Z critical 0.2 5% 0.1 Z. 0.0 -4 -2 0 2 1.645 4 4.59 Z 33 Sample means and population means where the population variance is known 34 Problem 9.26, Sample Means The dean of a business school claims that the average MBA graduate is offered a starting salary of $55,000. The standard deviation of the offers is $4600. What is the probability that in a sample of 38 MBA graduates , the mean starting salary is less than $53,000? 35 Problem 9.26 (Cont.) Null Hypothesis: H0: m 55,000 Alternative Hypothesis: HA: m < 55,000 Statistic: z ( x Ex ) / s x ( x m ) /(s / n ) z (55000 53000) /( 4600 / 38 z 2000 / 746.3 2.68 36 Continuous Density of the Standardized Normal Variate, Z 0.5 NORMDENS 0.4 0.3 0.2 Zcrit(1%)= -2.33 0.1 0.0 -4 0.0037% -2 2.68 0 2 4 Z 37 Sample means and population means when the population variance is unknown 38 Problems 12.33 A federal agency responsible for enforcing laws governing weights and measures routinely inspects packages to determine whether the weight of the contents is at least as great as that advertised on the package. A random sample of 18 containers whose packaging states that the contents weighs 8 ounces was drawn. 39 Problems 12.33 (Cont.) Can we conclude that on average the containers are mislabeled? Use a 0.1. t ( x Ex ) / s x ( x m ) /( s / n ) 40 Dens ity Func tion for Student's t-dis tribution, 17 Degrees of Freedom 0.4 TDENS 0.3 0.2 t crit 5% 0.1 0.0 -2 1.74 -1 0 RANDT 1 2 1.74 41 Problems 12.33 (Cont.) 7.8 7.97 7.92 7.91 7.95 7.87 7.93 7.79 7.92 7.99 8.06 7.98 7.94 7.82 8.05 7.75 7.89 7.91 42 Mean 7.913888889 Standard Error Median Mode Standard Deviation 0.019969567 7.92 7.91 0.084723695 Sample Variance Kurtosis Skewness 0.007178105 -0.24366084 -0.22739254 Range Minimum Maximum Sum Count 0.31 7.75 8.06 142.45 18 43 Problems 12.33 (Cont.) Can we conclude that on average the containers are mislabeled? Use a 0.1. t ( x Ex ) / s x ( x m ) /( s / n ) t (7.914 8) /( 0.0847 / 18 ) 0.086 / 0.020 t 4.3 44 Confidence Intervals for Variances 45 Problems 12.33 &12.55 A federal agency responsible for enforcing laws governing weights and measures routinely inspects packages to determine whether the weight of the contents is at least as great as that advertised on the package. A random sample of 18 containers whose packaging states that the contents weighs 8 ounces was drawn. 46 Problems 12.33 &12.55 (Cont.) Estimate with 95% confidence the variance in contents’ weight. c2 variable with n-1 degrees of freedom is (n-1)s2 /s2 47 Chi Square Density for 17 Degrees of Freedom 0.08 CHIDENS 0.06 30.191 0.04 7.564 0.02 2.5% 2.5% 0.00 5 10 15 20 25 RANDCHI 48 Problems 12.33 &12.55(Cont.) 7.8 7.97 7.92 7.91 7.95 7.87 7.93 7.79 7.92 7.99 8.06 7.98 7.94 7.82 8.05 7.75 7.89 7.91 49 Mean 7.913888889 Standard Error Median Mode Standard Deviation 0.019969567 7.92 7.91 0.084723695 Sample Variance Kurtosis Skewness 0.007178105 -0.24366084 -0.22739254 Range Minimum Maximum Sum Count 0.31 7.75 8.06 142.45 18 50 Problems 12.33 &12.55(Cont.) 7.564<(n-1)s2 /s2<30.191 7.564<17*0.00718/s2<30.191 (1/7.564)*17*0.00718>s2>(1/30.191)*17*0 .00718 0.0161>s2>0.0040 51 IV. Differences in Populations Null Hypothesis: H0: m1 m2, or m1 m2 =0 Alternative Hypothesis: HA: m1 m2 ≠ 0 t [( x1 x2 ) ( m1 m 2 )] / s x1 x2 Var[ x1 x2 ] E[( x1 x2 ) ( m1 m 2 )]2 Var[ x1 x2 ] E[( x1 m1 ) ( x2 m 2 )]2 Var[ x1 x2 ] E[( x1 m1 ) 2 ( x2 m 2 ) 2 2( x1 m1 )( x2 m 2 )] Var[ x1 x2 ] Varx1 Varx2 2Covx1 x2 52 IV. Differences in Populations t [( x1 x2 ) ( m1 m 2 )] / s x1 x2 Var[ x1 x2 ] Varx1 Varx2 2Covx1 x2 t [( x1 x2 ) ( m1 m 2 )] / [Varx1 Varx2 ] t [( x1 x2 ) ( m1 m 2 )] / [(s 12 / n1 ) (s 22 / n2 )] Reference Ch. 9 & Ch. 13 53 V. Regression Model: yi = a + b*xi + ei Fitted : yˆ i aˆ bˆ * xi n n i 1 i 1 estimate : bˆ [ yi y ][ xi x ] / [ xi x ]2 estimate : aˆ y bˆ * x estimated _ error : eˆi ( yi yˆ i ) n Sum _ of _ Squared _ Re siduals : eˆi 2 i 1 n ANOVA : Total _ Sum _ of _ Squares (TSS ) [ yi y ]2 i 1 TSS Explained _ Sum( ESS ) Un exp lained _ Sum(USS ) n n i 1 i 1 2 TSS bˆ 2 [ xi x ]2 eˆi 54 Lab Five Fortune 500, 1999: Assets Vs. Revenue, In Logs 1000000 100000 Citigroup Bank of America Fannie May Chase Manhatten General Electric Morgan Stanley Prudential Merrill Lynch General Motors TIAA-CREF Bank One American International Exxon Mobil State Farm Log Assets Allstate Wal-Mart Kroger 10000 McKesson HBOC Ingram Micro Costco Wholesale 1000 10000 100000 1000000 Log Revenue 55 The Financials rank 5 7 11 26 31 48 50 30 29 19 17 firm General Electric Citigroup Bank of America Corp. Fannie Mae Chase Manhatten Corp. Prudential Ins.Co. of America Bank One Corp. Morgan Stanley Dean Witter Merrill Lynch TIAA-CREF American International Group industry revenue M$ profits M$ assets M$ Diversified Financials 111630 10717 405200 Diversified Financials 82005 9867 716900 Commercial banks 51392 7882 632574 Diversified Financials 36968.6 3911.9 575167.4 Commercial Banks 33710 5446 406105 Insurance: Life, Health(stock) 26618 813 285094 Commercial Banks 25986 3479 269425 Securities 33928 4791 366967 Securities 34879 2618 328071 Insurance: Life, Health(mutual) 39410.2 1024.07 289247.99 Insurance; P&C(stock) 40656.08 5055.44 268238 56 Excel Chart The Financials: Eleven Firms y = 0.4335x + 8.2535 13.6 2 R = 0.3039 ln Assets M$ 13.4 13.2 13 12.8 12.6 12.4 10 10.2 10.4 10.6 10.8 11 11.2 11.4 11.6 11.8 ln Revenue M$ 57 Excel Regression SUMMARY OUTPUT Regression Statistics Multiple R 0.5512779 R Square 0.3039073 Adjusted R Square 0.2265636 Standard Error 0.3117374 Observations 11 ANOVA df Regression Residual Total Intercept X Variable 1 1 9 10 SS MS F Significance F 0.381851405 0.381851 3.929312 0.078773838 0.874622016 0.09718 1.256473421 Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Lower 95.0% Upper 95.0% 8.2534951 2.33138973 3.540161 0.006313 2.979521108 13.52747 2.979521 13.52747 0.4335105 0.218696259 1.982249 0.078774 -0.061215204 0.928236 -0.06122 0.928236 58 Eviews Chart 13.6 Elev en Financial Firms 13.4 LNASSETS 13.2 13.0 12.8 12.6 12.4 10.0 10.5 11.0 11.5 12.0 LNSALES 59 Eviews Regression 60 Eviews: Actual, Fitted & residual 13.6 13.4 13.2 13.0 12.8 0.6 12.6 0.4 12.4 0.2 0.0 -0.2 -0.4 1 2 3 4 Residual 5 6 7 Actual 8 9 10 11 Fitted 61