Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1 Economics 240A Power One Outline Course Organization Course Overview Resources for Studying 2 I. Organization Lectures are on Tuesdays and Thursdays, 5:00-6:15 PM in North Hall 1105. Lecture Notes for class will cover the concepts Text: Gerald Keller, Statistics for Management and Economics, Seventh edition (2005) The Computer Lab is scheduled for Wednesdays, 3:00-3:50. ;400-4:50, & 5:005:50 in Leadbetter, Phelps 1530. The capacity is 25 stations. Software: Excel and EViews Lab Notes will cover the procedures of analysis TA: Stephane Verani, Office, NH 2048 Section: 140A F10:00-10:50 Girvetz 2115, 240A W 6-7:50 Phelps 3505 Exams: Midterm Tuesday, Nov. 6` Final Tuesday, December 13, 7:30-10:30 PM Organization ( Cont.) Problem Sets, Pre-Midterm: #1 Oct. 4, 2007 due Oct 11, 2007 #2 Oct 11, 2007 due Oct 18, 2007 #3 Oct 18, 2007 due Oct 25, 2007 #4 Oct 25, 2007 due Nov.1, 2007 Problem Set, Post-Midterm #5 Nov. 1, 2007 due Nov. 8, 2007 Exercises: as assigned on the Lab Notes Takehome Project: An exercise to test your quantitative and writing skills. You can work collectively but the 2-3 page report must be yours. Last Fall we also did group projects with PowerPoint presentations and I will probably repeat this format. Your grade for the course will be based on your scores on the midterm(18%), final(37%) and 2 projects(each 18%), and your effort as indicated by problem sets and lab exercises turned in for credit(9%). Of course the latter are more important than the weight indicated. I distribute the grades by letter, weighing the problem sets one third of a grade point, and by total score for the class, and reconcile the course grades. Course Overview Topics in Statistics • • • • • • • • Descriptive Statistics Exploratory Data Analysis Probability and Distributions Proportions Interval Estimation Hypothesis Testing Correlation and Regression Analysis of Variance 5 Concepts 1 Two types of data: • Time series • Cross section 6 http://research.stlouisfed.org/fred2/ Index 1982-84 =100 7 http://research.stlouisfed.org/fred2/ 8 CPIAUCNS Jan 1921- Aug 2007 9 Examples of: 1. Graphical Display of Results 2. Cross-Section Data 3. Survey Sample of 12,571 1. Men & women 2. Ages 15-44 10 What is the Message? 11 Concepts 2 Population Versus Sample Iowa Caucuses, New Hampshire Primary • Population: All eligible voters • Sample: Field poll in California sample Pop 12 13 14 Concepts 3 Different views of the world (universe) • Deterministic • Stochastic 15 Statistical Inference and Probability Deterministic • Newtonian physics: e g. distance = rate*time • Einsteinian(relativistic) physics: E=m*c2 Stochastic (random) • Quantum mechanics 16 Statistical Inference and Probability Probability: A tool to understand chance What is chancy about the statistical world we will study? Example: • Suppose I number everyone in the class from 1 to 65? • And draw one number a meeting to ask a question; what is the likelihood I will call on you today? 17 18 19 20 21 Resources for Studying Keller • Text Readings • CDROM • Applets Instructor • • • • Lecture Notes Lab Notes & Exercises Problem Sets PowerPoint Slide Shows 22 23 Keller CDROM 24 http://www.duxbury.com/statistics 25 Student Book Companion Site 26 Concepts 4 Three types of data • Cardinal • Ordinal • Categorical 27 Keller & Warrack Slide Show Excerpts from Ch. 2 28 Chapter 2 Graphical Descriptive Techniques 29 2.1 Introduction Descriptive statistics involves the arrangement, summary, and presentation of data, to enable meaningful interpretation, and to support decision making. Descriptive statistics methods make use of • graphical techniques • numerical descriptive measures. The methods presented apply to both • the entire population • the population sample 30 2.2 Types of data and information A variable - a characteristic of population or sample that is of interest for us. • Cereal choice • Capital expenditure • The waiting time for medical services Data - the actual values of variables • Interval data are numerical observations • Nominal data are categorical observations • Ordinal data are ordered categorical observations 31 Types of data - examples Interval data Nominal Age - income 55 42 75000 68000 . . . . Weight gain +10 +5 . . Person Marital status 1 2 3 married single single . . Computer . . Brand 1 2 3 . . IBM Dell IBM . . 32 Types of data - examples Interval data Nominal data With nominal data, all we can do is, calculate the proportion of data that falls into each category. Age - income 55 42 . . 75000 68000 . . gain Weight +10 +5 . . IBM 25 50% Dell Compaq 11 8 22% 16% Other 6 12% Total 50 33 Types of data – analysis Knowing the type of data is necessary to properly select the technique to be used when analyzing data. Type of analysis allowed for each type of data Interval data – arithmetic calculations Nominal data – counting the number of observation in each category Ordinal data - computations based on an ordering process 34 Cross-Sectional/Time-Series Data Cross sectional data is collected at a certain point in time • Marketing survey (observe preferences by gender, age) • Test score in a statistics course • Starting salaries of an MBA program graduates Time series data is collected over successive points in time • Weekly closing price of gold • Amount of crude oil imported monthly 35 2.3 Graphical Techniques for Interval Data Example 2.1: Providing information concerning the monthly bills of new subscribers in the first month after signing on with a telephone company. • Collect data • Prepare a frequency distribution • Draw a histogram 36 Example 2.1: Providing information Collect data Bills 42.19 38.45 29.23 89.35 118.04 110.46 0.00 72.88 83.05 . . (There are 200 data points Prepare a frequency distribution How many classes to use? Number of observations Less then 50 50 - 200 200 - 500 500 - 1,000 1,000 – 5,000 5,000- 50,000 More than 50,000 Number of classes 5-7 7-9 9-10 10-11 11-13 13-17 17-20 Class width = [Range] / [# of classes] [119.63 - 0] / [8] = 14.95 Largest Largest Largest Largest observation observation observation observation Smallest Smallest Smallest Smallest observation observation observation observation 15 37 Example 2.1: Providing information Draw a Histogram Frequency 80 60 40 20 0 15 30 45 60 75 90 105 120 Bills Bin Frequency 15 71 30 37 45 13 60 9 75 10 90 18 105 28 120 14 38 Example 2.1: Providing information nnnnWhat information can we extract from this histogram 60 40 Bills 120 105 90 75 60 45 0 30 20 15 Frequency About half of all A few bills are in Relatively, the bills are small the middle range large number 13+9+10=32 of large bills 80 71+37=108 18+28+14=60 39 Class width It is generally best to use equal class width, but sometimes unequal class width are called for. Unequal class width is used when the frequency associated with some classes is too low. Then, • several classes are combined together to form a wider and “more populated” class. • It is possible to form an open ended class at the higher end or lower end of the histogram. 40 Shapes of histograms There are four typical shape characteristics 41 Shapes of histograms Negatively skewed Positively skewed 42 Modal classes A modal class is the one with the largest number of observations. A unimodal histogram The modal class 43 Descriptive Statistics Central Tendency • mode • median • mean Dispersion • standard deviation • interquartile range (IQR) 44 Concepts 5 Normal Distribution • Central tendency: mean or average • Dispersion: standard deviation Non-normal distributions 45 46 Concepts 6 What do we mean by central tendency? Possibilities • What is the most likely outcome? • What outcome do we expect? • What is the outcome in the middle? 47 Moving from Concepts to Measures Mode: most likely value. 48 49 50 Moving from Concepts to Measures Mode: most likely value. Median: sort the data from largest to smallest. The observation with half of the values larger and half smaller is the median. 51 52 Moving from Concepts to Measures Median: sort the data from largest to smallest. The observation with half of the values larger and half smaller is the median. Mode: most likely value. Mean or average: sum the values of all of the observations and divide by the number of observations. 53 54 Concepts 7 What do we mean by dispersion? Possibilities • How far, on average are the values from the mean? • What is the range of values from the biggest to the smallest? 55 Exploratory Data Analysis Stem and Leaf Diagrams Box and Whiskers Plots 56 Weight Data Males: 140 145 160 190 155 165 150 190 195 138 160 155 153 145 170 175 175 170 180 135 170 157 130 185 190 155 170 155 215 150 145 155 155 150 155 150 180 160 135 160 130 155 150 148 155 150 140 180 190 145 150 164 140 142 136 123 155 Females: 140 120 130 138 121 125 116 145 150 112 125 130 120 130 131 120 118 125 135 125 118 122 115 102 115 150 110 116 108 95 125 133 110 150 108 58 59 Box Diagram median First or lowest quartile; 25% of observations below Upper or highest quartile 25% of observations above 60 61 Whiskers The whiskers end with points that are not outliers Outliers are beyond 1.5 times the interquartile range ( in this case IQR = 31), so 1.5*31 = 46.5 1st quartile – 1.5*IQR = 125 – 46.5 = 78.5,but the minimum is 95 so the lower whisker ends with 95. 62 3rd Quartile + 1.5* IQR = 156 + 46.5 = 202.5; 1st value below =195 Next Tuesday Only! Meet in Humanities and Social Sciences, HSSB, 1203 web: www.lsit.ucsb.edu • Exploratory data analysis using JMP 64