Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Sociology 601(Martin) Lecture for week 2: September 9 - 11 • Chapter 3.1: – Making Charts • Chapter 3.2 – 3.5 (if time permits) – Measures of central tendency – Measures of variation • Walk-through of the STATA graphic user interface. Definitions for charts • frequency distribution: a graph listing intervals of possible values for a variable (on the x-axis), and number of observations in each interval (on the y-axis). • relative frequency distribution: as above, but the y-axis has the percent or proportion of observations in each interval. • bar graph: the variable is ordinal or nominal scale. – The bars should not touch • histogram: the variable is interval scale. – The bars should touch General Rules for Relative Frequency Distributions • Whether you are making a bar graph or histogram: – Make sure each observation is in one and only one category. – Use categories of equal width. – Choose an appealing number of categories. – Decide whether to provide labels – Double-check your graph. • If you use fewer bars to describe the distribution of a variable, you lose information but gain clarity. Example from Text, p. 36 • Murders per 100,000 population, by State for 1993 Alabama 11.6 Louisiana 20.3 Ohio 6.0 Alaska 9.0 Maine 1.6 Oklahoma 8.4 Arizona 8.6 Maryland 12.7 Oregon 4.6 Arkansas 10.2 Massachusetts 3.9 Pennsylvania 6.8 California 13.1 Michigan 9.8 Rhode Island 3.9 Colorado 5.8 Minnesota 3.4 South Carolina 10.3 Connecticut 6.3 Mississippi 13.5 South Dakota 3.4 Delaware 5.0 Missouri 11.3 Tennessee 10.2 Florida 8.9 Montana 3.0 Texas 11.9 Georgia 11.4 Nebraska 3.9 Utah 3.1 Hawaii 3.8 Nevada 10.4 Vermont 3.6 Idaho 3.5 New Hampshire 2.0 Virginia 8.3 Illinois 11.4 New Jersey 5.3 Washington 5.2 Indiana 7.5 New Mexico 8.0 West Virginia 6.9 Iowa 2.3 New York 13.3 Wisconsin 4.4 Kansas 6.4 North Carolina 11.3 Wyoming 3.4 Kentucky 6.6 North Dakota 1.7 Frequency Distribution • Murders per 100,000 population for 1993, by State number of states 3 2 1 0 0 2 4 6 8 10 12 14 16 murder rate • What have we lost? What have we gained? 18 20 Relative Frequency Distribution • Murders per 100,000 population, by State relative frequency 0.06 0.04 0.02 0 0 2 4 6 8 10 12 murder rate 14 16 18 20 Collapsed Relative Frequency Distribution • Murders per 100,000 population, by State relative frequency 0.3 0.2 0.1 0 0-1.9 2-3.9 4-5.9 6-7.9 8-9.9 1011.9 121413.9 15.9 1617.9 murder rate • What have we lost? What have we gained? 1819.9 2021.9 3.2: Measuring central tendency - mean • Mean: sum of measurements divided by number of measurements. n • Equation for the mean of a sample: Y Y i 1 n • or, if you don’t have an equation editor, Ybar = SUM(Yi) / n where… Ybar is the sample mean (Yi) is a measurement of Y for case i n is the number of cases in the sample Weighted means • Weighted sample mean: the sum of measurements divided by the number of observations, adjusted for the number of cases in each observation Yweighted (n jY j ) n j – Example: we could weight the state murder rates by the number of persons in each state in 1993 to get the mean murder rate for persons in the US • If n = 2 the equation for the weighted mean is Yweighted (n1Y1 n2Y2 ) (n1 n2 ) 3.3 Other measures of central tendency • Median: the measurement that falls in the middle of an ordered sample – the median is the value of the 50th percentile • Percentile: the number such that p% of scores fall below it and (100-p)% of scores fall above it • Mode: the value that occurs most frequently 3.4: Measures of variation • range: the difference between the largest and smallest observations • interquartile range: the difference between the 25th and 75th percentile observation • deviation: for any observation, the difference between that observation and the sample mean Di = Yi - Ybar (one averaged measure of variation for a sample would be to take the mean of the absolute values of all the deviations for the sample) Variance and standard deviation: the most common measures of variation Yi Y s n1 2 2 s Yi Y 2 n 1 • variance: the mean of the squared deviations for a sample, labeled s2. • standard deviation: the square root of the variance, or the root mean squared deviation, labeled s. Practice: Calculate the mean, variance, and standard deviation. yi 1 2 3 3 4 4 7 8 Σyi ybar: ybar yi - ybar (yi – ybar)2 Σ(yi – ybar)2 s2: s: yi 1 2 3 3 4 4 7 48 Σyi ybar: ybar yi - ybar (yi – ybar)2 Σ(yi – ybar)2 s2: s: Interpreting the standard deviation. • s is (formally) the root mean squared deviation. • s is one version of the typical distance of an observation from the sample mean. • Because s accounts for squared deviations, it is affected by extreme scores. – Is this a desirable property? – Compare these samples: (-3,-3,+3,+3) vs (-2,-2,-2,+6) • Generally, for a continuous quantitative variable Y about 68% of scores fall between Ybar - s and Ybar + s. Interpreting sample statistics. • Recall that… – A statistic is a single number estimated from a sample – A parameter is a single number that summarizes some quality of a variable in a population. • For means: – the population mean is (mu) – The sample mean Ybar is an estimator of . • For standard deviations – the population standard deviation is (sigma), – The sample standard deviation s is an estimator of . A conceptual map of STATA source ---------interface---------- output .do file outside data set interactive data entry command window data editor pull-down menus icons log file results window graphics active data set The STATA windows environment - icons – Open (use) – Save – Print Results – Begin Log – Start viewer – Bring results window to front – Bring graph window to front – Do-file editor – Data editor – Data browser – Clear – Break The .do file: interface of choice for social research • Icons within the .do file: – – – – – – – – – – – New Open Save Print Find Cut Copy Paste Undo Do current file Run current file Sample commands in a .do file use "I:\601Fall08\socy601data.dta", clear summarize AGE summarize AGE [weight=ADULTS] tabulate AGE tabulate AGE [weight=ADULTS] clear How to create a log file • One approach is to use the log icon to start and stop a log. • Another approach is to type the log-starting command into a .do file : log using I:\601Fall08\week01hmwk.txt, replace *. . . (your work here) . . . log close