Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Research Questions, Variables, and Hypotheses 2/21/06 1 Review What are research questions? What are variables? Definition Function Measurement Scale 2 Hypotheses OK, now that we know how to set up a research project by posing research questions and labeling variables, it is time to move on to a more formal way of structuring and interpreting research. 3 Hypotheses Definitions Hypotheses are predictions about the relationship among two or more variables or groups based on a theory or previous research (Pittenger, 2003) Hypotheses are assumptions or theories that a researcher makes and tests. Why are hypotheses important? 4 Importance of Hypotheses Hypotheses: Direct our observations Identifies the variables examined and data to be collected Describe a relationship among variables Can state that as one variable increases, the other will decrease; as one variables increases, the other will increase, and so on. Refer to populations Hypotheses help researchers infer that results of a sample will translate to a population 5 4 Functions of Hypotheses Hypotheses can: Estimate Population Characteristics Correlate Variables Display Differences among Two or more populations Show possible Cause and Effect What research designs relate to each of these 4 functions? 6 Symbols used in Hypotheses M= mean µ (mu: mew)= population mean Roman Letters (e.g., A, B, C, D) are used to represent statistics Greek Letters (e.g., α, β) are used to represent parameters α= significance level; probability of committing a Type I Error (α= .05) p= probability value (p= .05) Null Hypothesis= (H0: µ1 - µ2 = 0 or H0: µ1 = µ2) Alternative Hypothesis= (H1: µ1-µ2 ≠ 0 or H1: µ1 ≠ µ2 ) Sometimes you may see it noted as HA 7 Types of Hypotheses Research Hypotheses Statistical Hypotheses 8 Research Hypotheses Research Hypothesis: A statement of the relationship among two or more variables or groups. The acceptance or non-acceptance of which is based on resolving a logical alternative with a null hypothesis. Example: Graduate students who read the text in research methods will score higher on their comprehensive exams than graduate students who did not read their research methods text. 9 Research Hypotheses Cont. Research hypotheses can be stated as Directional or Non-directional. Directional hypotheses predict the specific relationship among two or more variables or groups: Graduate students who read the text in research methods will score higher on their comprehensive exams than graduate students who did not read their research methods text. H0: µ1 < µ2 H1: µ1 > µ2 IQ scores will correlate in a positive manner with Self Esteem Scores Cats will bark less frequently than Dogs 10 Research Hypotheses Cont. Non-Directional Hypotheses predict that there will be differences among two or more groups, but do not specify the direction of the differences Men and Women will differ in their recall of phone numbers The scores on the Geriatric Depression Scale will differ between people with Stroke and people with Alzheimer’s disease IQ scores will correlate with Self Esteem scores H0: µ1 = µ2 H1: µ1 ≠ µ2 11 Research Hypotheses Cont. Your Turn! Come up with some directional and non-directional research hypotheses Share with the class 12 Statistical Hypotheses Statistical Hypotheses are mathematical, or logical statements that help researchers interpret the results of research Statistical hypotheses consist of the Null Hypothesis (H0), the hypothesis of no difference and the Alternative Hypothesis (H1 or HA) which is similar in form to the research hypothesis. Null: (H0: µ1 - µ2 = 0 ) Alternative: (H1: µ1 - µ2 ≠ 0) 13 Written out examples: Null: There will be no difference in the comprehensive test scores of graduate students who read the text in research methods and those who did not read their research methods text. Alternative: Graduate students who read the text in research methods will score higher on their comprehensive exams than graduate students who did not read their research methods text. 14 Statistical Hypotheses Cont. Remember, and this is important: The null hypothesis always implies that there is no relation or statistical difference between variables or groups The alternative hypothesis implies that there is a meaningful relationship among variables or groups 15 Testing Hypotheses Is it possible that our observations occurred by chance? 16 A story…(borrowed from Gonick & Smith, 1993) In the southern US several legal cases examined racial bias in jury selection. In the 50’s & 60’s panels of jurors were drawn from a list of eligible citizens and very few African Americans (roughly 4 out of 80) were found on the jury panels despite the fact that approximately 50% of the eligible citizens were African American. Can this be a result of pure chance? 17 A story - continued If the selection of potential jurors was random – then the number of African-Americans on an 80 person panel would be X (a binomial random variable) with n=80 trials and p=.5 This means the chances for getting a panel of 80 jurors with only four African Americans is .0000000000000000014 or 1.4 X 10-18. These odds are similar to getting three consecutive royal flushes in poker. 18 Testing Hypotheses Cont. We use a variety of statistical procedures to test null hypotheses. The choice of which procedure we use depends on a variety of factors including: the research hypothesis, the data, the sampling strategy, and what we want to be able to say as a result of our testing. 19 Types of Tests Statistical procedures that are commonly used for hypothesis testing include: correlation, analysis of variance (ANOVA), analysis of covariance (ANCOVA), regression, multivariate analysis of variance (MANOVA), ttests, and Chi-Square. Each of these procedures has an associated test statistic, which is used to determine significance. For example ANOVA, ANCOVA, and regression use F statistics and their associated p-values. Multivariate procedures, like MANOVA, use a variety of test statistics with interesting names, like Wilk’s lambda. These are then related to a more common test statistic, like F. The secret here, for the layperson, is that all test statistics are eventually related to a probability distribution and a pvalue. These p-values mean the same thing across test statistics. 20 Error Types In hypothesis testing, we must contend with two types of errors -- Type I and Type II. Errors are mistakes that we can make when judging the null hypothesis Type I error is what happens when the tested hypothesis is falsely rejected. (It is when you say you found something, but that something is really an error.) A type I error is a false positive. Type II error is what happens when a false tested hypothesis is not rejected (Hays, 1986). (It is when you don’t find something that is, in fact, there.) A type II error is a false negative. 21 Error Types Cont. Alpha is the level of probability (pre-set by the researcher) that the tested hypothesis will be falsely rejected. Alpha is the pre-set risk of a Type I error. In other words, alpha is the degree of risk that you accept, in advance of conducting the study, that what you find will be an error. Beta is the probability (often neglected by the researcher) that a false null hypothesis will not be rejected. Beta is the probability that you won’t find what you are looking for if, in fact, it is really there. 22 Error Types Cont. The picture on the next slide, which is modified from Hays (1986) and Ferguson and Takane (1989), illustrates the relationships among Type I and Type II errors and alpha and beta. 23 Error Types Cont. Error Types Chart H0 is True H1 is True Reject H0 Decision Fail to Reject (decide in favor of H0) Type I α Correct 1- β Correct 1- α Type II β 24 Example Do we use Null Hypotheses in the real world? “Innocent until Proven Guilty” Defendant Innocent Defendant Guilty Reject Presumption of Innocence (Guilty Verdict) Type I Error Correct Fail to Reject Presumption of Innocence (Not Guilty Verdict) Correct Type II Error 25 26 Power, Effect Size, and Measurement STATISTICAL POWER Statistical power is “the probability of rejecting a null hypothesis that is, in fact, false” (Williams, 1986, p. 67). Put more simply, statistical power is the probability of finding relationships or differences that in fact exist (Cohen, 1988). In our legal example, power is the probability of being found guilty, if you committed a crime. In terms of beta (the probability of a Type II error – odds of getting off despite being guilty) is statistical power = 1 - beta. So, if you have a 20 percent chance of being found not guilty, even though you committed the crime, the statistical power of the judicial system is: 1 - .20 = .80. 27 Power Statistical power is a function of “the preset significance criterion [alpha], the reliability of sample results, and the effect size [the actual size of the difference or strength of the relationship]...” (Cohen, 1988, p. 4). Considering complex interrelationships of the above criteria, one can say that The researcher can easily set alpha, but cannot easily set beta. Alpha and beta are directly, but not perfectly related. Lowering alpha increases beta and lowers the power. Increasing alpha decreases beta and increases power. 28 Power, Effect Size, and Measurement Cont. Statistical power is then related to: • Sample size • Effect size • Statistical design (including number of groups, 1- vs. 2-tailed tests) • Significance criteria 29 Power, Effect Size, and Measurement Cont. EFFECT SIZE Effect size (ES) refers to the amount of common variance between the independent variable(s) (IV) and the dependent variable(s) (DV), or the degree to which changes in the IV(s) result in changes in the DV(s). 30 Power, Effect Size, and Measurement Cont. Effect Size Cont. For example, if I am interested in the differences between male and female UF students concerning the number of parking tickets over two years, my effect size would be the size of the difference between the means of the two groups. Or, if I wanted to test a specific intervention for students with learning disabilities, and I had a test, which I believed measured the effectiveness of my intervention; then my effect size might be the difference in test scores between an experimental group that received the intervention and a control group that did not receive the intervention. 31 Power, Effect Size, and Measurement Cont. Effect Size Cont. Similarly, if I wanted to examine the impact of a specific course on research anxiety, effect size could be the differences in the mean scores of research anxiety between an experimental group who completed the course and a control group who did not. 32 Power, Effect Size, and Measurement Cont. Effect Size Cont. Here is a large problem: Effect size depends on what measure we use to operationalize the construct. For example, effect size depends on the net we use, the test we select, etc. Actual effect sizes may be much larger than observed effect sizes. What might be considered a moderate to large effect in a laboratory situation may appear as a small effect in the real world where you can’t control numerous sources of extraneous variance, e.g., variability in individual characteristics, treatment implementation, environmental characteristics (Cohen, 1988). Small effect sizes are common and should be expected in ex post facto and quasi experimental situations (Cohen, 1988). 33 Power, Effect Size, and Measurement Cont. RELATIONSHIP OF MEASUREMENT, RESEARCH DESIGN, ANDSTATISTICAL POWER This is just a conceptual introduction. We will return to validity of measurement in a future lecture. All research depends on an operational definition of the constructs of interest. In intervention research, the operational definitions of both the treatments and the outcomes influence effect size. As we are all aware, there are a variety of frames of reference regarding interventions and outcomes. 34 Power, Effect Size, and Measurement Cont. RELATIONSHIP OF MEASUREMENT, RESEARCH DESIGN, ANDSTATISTICAL POWER Cont. Consider the elephant fable with the researchers who mapped different parts of the elephant. Their descriptions of the elephant differed considerably. What we see in research depends, at least in part, on what facet(s) of the construct of interest is (are) operationalized by our outcome measure(s). It is always better to look at the construct in more than one way (more than one facet) in order to limit threats to validity from mono-operational bias. In other words, looking at the elephant from different 35 angles can improve the degree to which our descriptions of the elephant actually describe the elephant. Power, Effect Size, and Measurement Cont. RELATIONSHIP OF MEASUREMENT, RESEARCH DESIGN, ANDSTATISTICAL POWER Cont. Now, consider measuring the same elephant with portable X-Ray machines. Pictures of each part of the elephant are taken and then compared with each other. Not only do these pictures not resemble each other, but they also don’t resemble the descriptions provided by the previous group of researchers. This chapter of the elephant fable indicates how what we see is indicated by our method of observation or measurement. Again, a researcher interested in a deeper understanding of the elephant may choose multiple methods of measurement in order to avoid threats to 36 validity from mono-method bias. Power, Effect Size, and Measurement Cont. RELATIONSHIP OF MEASUREMENT, RESEARCH DESIGN, ANDSTATISTICAL POWER Cont. The relationship of measurement, research design, and statistical power means that large treatment effects can actually be observed as small effects. In other words, even if an intervention is very effective, measurement and design complications may make the effect appear small and thus require high statistical power for detection. The following telescope model depicts the interrelation. The effect is obscured when we only look at part of the construct of interest. The apparent effect size is then attenuated by the extent to which our operational definitions (including our measurement techniques) do not reliably and validly capture the construct of interest (i.e., 37 intervention effectiveness). Power, Effect Size, and Measurement Cont. Telescope Model Actual Effect Size Observed Effect Size Statistical Design Research Design Measurement 38 Power, Effect Size, and Measurement Cont. RELATIONSHIP OF MEASUREMENT, RESEARCH DESIGN, ANDSTATISTICAL POWER Cont. Apparent effect size is further attenuated when research design does not fully filter out extraneous sources of variation (e.g., counselor or client differences). Violations of assumptions of statistical procedures can further attenuate effect size. Interestingly, problems in research design and statistical design can also introduce sources of Type 1 error (e.g., dust on the lens or false positive results). The relationship of effect size, measurement, and design is further complicated by the frame of reference or angle from which one approaches or operationalizes the construct. E.g. the researcher 39 that only looks at the leg of the elephant. Power, Effect Size, and Measurement Cont. RELATIONSHIP OF MEASUREMENT, RESEARCH DESIGN, ANDSTATISTICAL POWER Cont. Validity is a key element of the relationship of effect size, measurement, and design. Clearly, qualitative methods can further valid operationalization of constructs. Multiple operational definitions and multiple methods as recommended by Cook and Campbell (1979) can enhance the validity of research, including counseling effectiveness research. 40 Power, Effect Size, and Measurement Cont. PRE-ANALYSIS STATISTICAL POWER Pre-analysis statistical power estimation is a recommended technique. The following steps will allow you to consider statistical power in research planning. 1. Estimate effect size from past research and the type of experimental design planned. When you are unsure, underestimate effect size so as to overestimate power. Also, in quasi-experimental or ex post facto circumstances, it is usually best to estimate a small effect size unless otherwise indicated. 2. Decide on exact statistical test and significance criterion. 3. Determine acceptable level of power, .80 is nice but .70 may be acceptable in some circumstances. 41 Power, Effect Size, and Measurement Cont. PRE-ANALYSIS STATISTICAL POWER Cont. 4. Use power tables for that statistical test or an appropriate computer program to determine the number of subjects required for the specified significance criterion and desired level of power. 5. If you have a fixed number of subjects, consider adjusting the significance criterion (alpha) or statistical design if necessary to obtain adequate power. Recall, the .05 significance criterion is not sacred, especially when it results in a power of less than .30 (i.e., less than a 30% chance of finding differences that actually exist). (Szymanski & Parker, 1992) 42 Power, Effect Size, and Measurement Cont. ALPHA INFLATION Multiple comparisons can increase alpha, the probability of a Type I error. The probability of a Type I error escalates with the number of comparisons made in the study. The experiment-wise alpha is computed as: 1-(1-alpha)n 43 Power, Effect Size, and Measurement Cont. ALPHA INFLATION One way to guard against alpha inflation is to use a Bonneferoni-type procedure and to split alpha by the number of comparisons. There are a variety of such procedures that can be used (see e.g., Marasciulo & Serlin, 1988) according to the relative importance of the tested hypotheses. The problem with reducing alpha is that it inflates beta. In situations in which alpha inflation is accepted due to a problem with power, one must look to replications for confidence in the findings. 44 Power, Effect Size, and Measurement Cont. See Power Tables 45 Test Statistics, Probability, and Significance In order to test a hypothesis, we compare the obtained value of a test statistic (e.g., the obtained F) to a critical value of the test statistic (e.g., a critical F) that is associated with the preset significance level (alpha). If the obtained value of the test statistic is greater than the critical value, we determine that there is a significant difference or relationship. 46 Test Statistics, Probability, and Significance Cont. Test Statistic: The specific statistic (i.e., the tool) that is chosen to test the null hypothesis. Examples include F, t, r. Obtained Value: The actual value obtained when applying the test statistic to the data of interest. The probability value associated with the obtained value is p. Critical Value: The critical value of the test statistic that is associated with the chosen significance level (alpha). If the obtained value is greater that the critical value, the result is significant. 47 Test Statistics, Probability, and Significance Cont. Probability Value: The probability that observed relationships or differences are due to chance. Alpha: Alpha is also known as significance level or rejection region. It is the level of probability set by the researcher as grounds for rejection of the null hypothesis (Williams, 1986, p. 58). Alpha is the probability level associated with the critical value of the test statistic. In other words, alpha is our predetermined risk that differences that we declare to be real are actually due to chance. Obtained: This is also known as the obtained probability (p): significance of the test statistic. It is the “probability that the data could have arisen if 48 Ho were true” (Cohen, 1994, p. 998). Test Statistics, Probability, and Significance Cont. Significance: What happens when the obtained probability p is less than our predetermined alpha. Significance also occurs when the obtained value of the test statistic is greater than the critical value of the test statistic. Test Statistic Probability Value Critical Value Significance Level (alpha) Obtained Value Obtained or Actual Probability (p) Note that larger obtained values of test statistics are generally related with smaller values of p. If Obtained Value > Critical Value, then * Significance * If p < Alpha, then * Significance * 49 Revisiting the Bell Curve 50 Test Statistics, Probability, and Significance Cont. Whether you are looking at obtained values of test statistics in relation to critical values or you are looking at actual probability levels, it is important to note that the test statistics and their associated probabilities only tell us the probability that a difference or relationship occurred by chance… These statistics DO NOT tell us the size of the group differences or the strength of relationships. 51 Steps in Hypothesis Testing for Quantitative Research Designs Hypothesis testing is a 4 phase procedure: Phase I: Research Hypotheses, Design, and Variables Phase II: Statistical Hypotheses Phase III: Hypotheses Testing Phase IV: Decision/Interpretation 52 Phase I: Research Hypotheses, Design, and Variables 1. 2. 3. State your research hypotheses. Decide on a research design based on your research problem, your hypotheses, and what you really want to be able to say about your results (e.g., if you want to say that A caused B, you will need an experimental or time-series design; if probable cause is sufficient, a quasi-experimental design would be appropriate). Operationally define your variables. Recall that one variable can have more than one operational definition. 53 Phase II: Statistical Hypotheses 1. 2. Consider your chosen statistical procedures. Write one statistical null hypotheses for each operational definition of each variable that reflects that statistical operations to be performed. 54 Phase III: Hypotheses Testing Complete the following steps for each statistical null hypothesis: 1. Select a significance level (alpha). 1. Compute the value of the test statistic (e.g., F, r, t). 1. Compare the obtained value of the test statistics with the critical value associated with the selected significance level or compare the obtained pvalue with the pre-selected alpha value. 1. If the obtained value of the test statistic is greater than the critical value (or if the obtained p-value is less than the pre-selected alpha value), reject the null hypothesis. If the obtained value is less than the critical value of the test hypothesis, fail to reject the null hypothesis. Another way of looking it: If p is less than or equal to alpha, reject the null hypothesis. 55 Phase IV: Decision/Interpretation For each research hypothesis, consider the decisions regarding the statistical null hypotheses. For each research hypothesis, consider qualitative contextual information relating potential plausibility. Cautiously explain your findings with respect to the research hypotheses. List and discuss the limitations (threats to valid inference). 1. 2. 3. 4. Note: Null hypothesis testing is currently under scrutiny (see e.g., Cohen, 1994; Kirk, 1996). It is generally recommended that you report the effect size along with the value of the test statistic and the pvalue. An alternative is to report confidence intervals. 56 Points to Consider about Hypotheses Testing FISHING IN LAKE ALICE We don’t prove the null hypothesis. If you go fishing on Lake Alice and you don’t catch fish, you cannot conclude that there are no fish in the lake!!! 57 Points to Consider about Hypotheses Testing Cont. What circumstances might keep us from finding fish in the lake? Possible problems include: 1. 2. 3. wrong or insensitive outcome measures (using the large net for small fish), sampling problems (looking in the wrong part of the lake), or methodological problems (scooping the top of the lake rather than the bottom, where the fish hang out). 58 Points to Consider about Hypotheses Testing Cont. Returning to hypothesis testing: Failure to reject the null hypothesis cannot be interpreted as proof that no differences or relationships exist. Existing differences or relationships might be obscured by: 1. 2. 3. 4. insensitive outcome measures (the wrong fishnet), inappropriate statistical designs, poor sampling strategies, and low statistical power. 59 Questions about Hypotheses ? 60