Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
STUDY GUIDE MIDTERM 2: CAUSALITY: - - - - example: butterfly ballots in 2000 election in Palm Beach county. Some claim that ballots caused Pat Buchanan to receive an excessive amount of votes. o Regularity Approach: did areas where ballot was used have higher Buchanan votes o Conterfactual Approach: what happened in areas without butterfly ballot? o Manipulation Approach: were the results similar in counties like Palm Beach that did not use the butterfly ballots. o Mechanisms Approach: how do butterfly ballots record votes that were not voters’ intentions RULES FOR CAUSAL THEORIES: o Must create falsifiable theories…must have real world evidence…cannot be definitional (ex. “all bachelors are unmarried men”) o Must make theories internally consistent…cannot be contradictory (ex. conservatives support individual freedom but oppose abortion and gay marriage) … often produces a clear hypothesis, as is the case in Downs’ spatial model o Choose dependent variable carefully… dependent variable should not cause changes in explanatory variable (reverse causation) o Maximize concreteness by choosing observable rather than unobservable concepts…culture, identity and utility may be useful for formulating theories but difficult to measure o State theories as encompassing as possible…consider the domain of applicability and how it forces us to think about the features of a theory…stating theories encompassingly may run against the maxim to be concrete, but must strike a balance between concreteness and generality FOUR CAUSAL HURDLES o 1) credible causal mechanism connecting X and Y o 2) could Y cause X (reverse causality) o 3) is there covariation between X and Y o 4) is there a confounding variable “Z” that is related to both X and Y and makes the observed association between X and Y spurious when evaluating another’s work, most frequent objection is that the researcher failed to control for some potentially important case of the dependant variable if credible cause can be made that an uncontrolled for “Z” might be related to both X and Y, we cannot conclude with full confidence that X indeed causes Y example: House spent in child care kids have problems in school credible causal mechanism?... Maybe kids are too young to be put into class with other kids and they learn aggressive behavior reverse causation?...Could behavior in school lead to problems in child care covariation?...is there covariation between time a child is in child care and aggression in kindergarten - - - - confounder? Simple bivariate comparisons can be very misleading despite initial appeal… if the comparisons we make are faulty, then our conclusions about causal relationships will also be faulty Experimental Designs= control and randomly assign values of independent variables to subject o Randomization o Researcher must have control over control groups Control= values subjects receive are NOT determined either by subjects or nature…must randomly assign treatment to subjects, treatment group and control group. Observational Studies= look out at world and observe something o No randomization TERMS: - - - - Unit of analysis= sort of phenomena that constitute cases Population= all cases that an inference is said to apply to Sample= the cases chosen for study referred to collectively Case= any observation intended to provide independent evidence of a proposition Observation= element of a case N/n= total number of observations in a given context Cross sectional research design… examines a cross section of social reality, focusing on variation across individual units and explaining the variation in dependent variable across them o Ex.) unit of analysis countries Population all countries Sample N (163 countries) Case mean turnout in 1990s Time series research design…examines variation with one unit over time o Ex.) unit of analysis US elections Populationall presidential elections Sample 20th century presidential elections (N=25) Case aggregate turnout in a single presidential election Comparability= how comparable are the cases to one another? Unit homogeneity. When comparing countries to one another, we assume that countries are alike in some ways. Independence= in comparing turnout across countries, we assume that one country is not influencing another country’s turnout Representativeness= is it straightforward to generalize from the sample to the population Variation= do cases offer variation in Xs and Ys? (If not, broaden study temporarily or geographically or make Y less specific) Replicability= can the research design be replicated? (reliable /replicable results are desirable) EXPERIMENTS: random assignment of treatment and researcher must have control - - - - - o Randomization ensures (or tries to) that no confounding variables are there o To check causality must check for the hurdles Causal mechanism Reverse causality Covariation Confounding variables Laboratory Experiment: researcher has control over entire experiment, except for the subject’s behavior Field Experiment: researcher only controls treatment…no control over subjects (but more control then observational studies) o Ex.) Get Out the Vote- canvassing was more influential then fliers or phone…BUT only certain people respond to surveysbiased sample Natural Experiment: nature intervenes in a way that the researcher wanted it to....enough randomization occurs (without the researcher’s intervention)…nature had an intervention “as if random”…could be an event studied after the fact o Ex.) cholera outbreak in London, studied by Snow…showed that cholera was waterborn…pipes intertwined and random o Ex.) Fox News Effect: observed in news channel influenced voted turnout…claimed new channel converted 200,000 voters and influence the outcome of Bush v. Gore…only in 20% households, researchers compared cities who got it and those that did not…RANDOM CLAIM: people didn’t control if they received the channel BUT didn’t control if actually watched it…affluence in area linked to monopoly or voter preference- also did not control if people watched it in their homes OBSERVATIONAL STUDIES: o Cross Sectional: looking across a unit…Time is held constant Often uses averages to define variables…if not, cross sectional could focus on an outlier and change the results o Time Series: time is variable and unit is constant Measurement: o Valid= measures what you are claiming to measure Face- measure what it’s claiming to? Content- does it incorporate all important elements? Construct- does it relate to other important measures? o Reliable= how consistent something is…will it always yield the same result? YOU CAN HAVE SOMETHING BE VALID AND NOT RELIABLE & VICE VERSA (prefer to have both at the same time) Reliable, not valid Not reliable, but average would be valid Not reliable, and not valid - - - - - Reliable and Valid CONCEPTUALIZATION & MEASUREMENT: Empirical= based off observations…data o We must have precise measurements of X&Y to establish a causal relationship…if not, cannot be confident in validity of theory Three Issues of Measurement: o Conceptual clarity- what do me mean by concept (i.e democracy)… what is the exact nature of the concept we’re trying to measure o Reliability- does repeating the measurement yield the same result?... repeatability and consistency. Produces identical results o Validity- does the measure accurately represent the concept while an invalid measure measures something other than originally intended FACE: does the measure seem to get at the concept CONTENT: what are the essential elements of the concepts Measurement bias is the systematic overreporting or underreporting of “true” values for the variable Policy IV Project= measures democracy on scale of -10 (strongly autocratic) +10 (strongly democratic) STATISTICAL INFERENCE: Think of variables in terms of label (description of variable) and its values (denominations in which the variable occurs) THREE TYPES OF VARIABLES: o Categorical- variables for which cases have values that are different or the same as values for other cases, but whose values cannot be naturally ranked- ordered from least to greatest (ex. religious identification)…NO RANKING. JUST DESCRIPTIVE CATEGORIES o Ordinal- variables for which cases have values that are different or the same as values for other cases…do not have equal unit differences (ex. researchers assigned numerical values to how people responded to a survey on financial stability now versus a year ago)…there is a ranking, no “equal unit differences” o Continuous- variables that do have equal unit differences (ex. age in years, height, weight etc) is equal unit difference and a ranking In analysis, we sometimes treat ordinal variables as if they were continuous Central tendency= typical values for a particular variable o Measures of central tendency: MEAN- average value of the variable (KNOW EQUATION). Use with continuous variables and when distribution is symmetric. - - - MEDIAN- value of the case that sits as the exact center of our cases when we rank them from the smallest to the largest observed values. 50% below and above- for continuous variables, use if outliers are present or if there is a skewed distribution MODE- most frequently occurring value…only measure of central tendency that is appropriate for categorical variables or ordinal. Variation= (dispersion) tell the distribution of values that it takes across the cases for which we measure it Samples are able to tell the researcher something about the population as a whole Significant correlation in the sample does NOT guarantee similar one in the population Statistical Inference= process of making probabilistic statements about a population characteristic based on our knowledge of sample characteristics Normal distribution is symmetrical so that the mode, median and mean are all equal o Normal curves also never reach zero - Frequency distribution= distribution of scores or numbers that are NOT normally shaped CENTRAL LIMIT THEOREM: no matter what the underlying shape of the frequency distribution (uniform, normal etc) the hypothetical distribution of the sample means (called the sampling distribution) will be normal, with mean equal to the true population mean and standard deviation equal to the standard error of the mean STANDARD DEVIATION: - STANDARD ERROR (of the mean): - - The distribution of the sample approaches normality as the sample size increases CONFIDENCE INTERVALS: - Central Limit Theorem only applies to randomly selected samples o Cannot use with a sample of convenience because they are not random and results cannot reveal anything about the underlying population Smaller standard error tighter confidence interval Larger standard error wider confidence interval If interested in estimating population values based on our sample, with as much precision as possible, then it is desirable to have tighter instead of wider confidence intervals BIVARIATE HYPOTHESIS TEST: - - - - P –VALUE: o “p”= probability observes relationship we are finding data due to random chance tells probability we would see the observed relationship between two variables in our sample data of there were truly no relationship between them in the unobserved population o values range between 0-1 lower p-value greater confidence we have that there is a systematic relationship between two variables for which we estimated the particular p-value o RELATIONSHIP MIGHT BE SPURIOUS (CONFOUNDING VARIABLE) o relates to what we know about a sample to what we know about a population as a whole o it is tempting to assume that when p-value is very clost to zero it indicates relationship between X&Y is strong…NOT NECESSARILY TRUE P-value represents the degree of confidence that there is a relationship in the underlying population, but provides no information on strength of relationship we compare actual relationship between X&Y in sample data to what we would expect to find if X&Y were not related in the underlying population - o the more different the empirically observed relationship (in the sample) is from what we would expect to find if there were a relationship in the population, the more confident we are that X&Y are related in the population lower p-value increases confidence that there is a relationship o variables are statistically significant if p< .05 o finding that X&Y have statistically significant relationship does NOT mean the relationship between X&Y is strong and causal must examine the 4 causal hurdles