Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Tripken 1 Research Methods/Stats An Exercise in STATISTICS 1. 2. 3. 4. 5. Steps in Conducting Research Formulating a "testable" hypothesis Designing the study Collecting the data Analyze the data and draw conclusions But we don't stop there . . . Reporting the findings Formulating a "testable" hypothesis What is a TESTABLE hypothesis? A prediction that can be precisely measured. To do this we must identify the variables we wish to study and clearly define how we are going to measure or control them. This is what is called an OPERATIONAL DEFINITION. Exercise1 : Can science assess this question? Read each question below and decide whether you think science could answer this question. You need to consider whether science can measure the important variables in each question. 1. Is the content of dreams the result of unconscious motives and desires? 2. Is there any relationship between birth order and personality? 3. Do college students consume more pizza than any other group? 4. Can humans be innately evil? 5. Do sales of pain relievers increase during periods of economic crisis? 6. Do animals dream? Designing the study There are several research methods to choose from. The choice is greatly influenced by the nature of the research question. For example, if you are simply interested in whether certain groups of people endorse a particular attitude a survey would be the most efficient method. Once a method of study has been chosen, concerns such as how will it be conducted and who will be the subjects (and how will you obtain the subjects) need to be worked out. Collecting the data There are several techniques used to collect data. With our survey we could decide whether we want to "interview" our subjects or will they complete a pencil and paper "questionnaire." If we want to assess whether caffeine intake makes people more jittery and anxious we might use a physiological measure such as heart rate to assess anxiety. As you can see the nature of the topic and the design being used influences the choice of data collection. Analyzing the data and drawing conclusions Researchers use statistics to help them to "organize, summarize, and interpret numerical data" (Weiten, 1998, p. 53). With the use of statistics researchers can assess whether their predictions (hypotheses) were supported or were not supported by the data gathered. Reporting the findings Researchers share their findings with other scientists and with the general public. This serves two purposes. First, it informs others about what was found. Second, it allows others to comment on the research. Were its 1 Tripken 1 Research Methods/Stats methods sound? Did its conclusions go beyond the established facts? What new questions does it raise for science? How might we use this information to better our lives? Typically, researchers prepare a written report that is submitted to a journal in the appropriate area of research. For example, if our survey examined the attitudes of mothers and fathers on child care issues, then a journal in Developmental Psychology would be more appropriate than one in Clinical Psychology. Many journals that publish scientific research in psychology (as in most other disciplines) are "refereed journals." This means that before the report is published experts in the area review the study. They consider the appropriateness of the article for the particular journal; the importance of the issue; whether there are flaws in the study's design, or analysis. Experimental Designs: Independent & Dependent Variables Example 1 Dr. Imanut wants to examine whether a new drug increases the maze running performance of older rats. Just like aging humans, older rats show signs of poorer memory for new things. Dr. Imanut teaches two groups of older rats to find a piece of tasty rat chow in the maze. One group of rats is given the new drug while they are learning the maze. The second group is not given the drug. One week after having learned the maze he retests the rats and records how long it takes them to find the rat chow. What is the independent variable? Hint: What did the researcher manipulate (allow to vary) in this study? a) age of the rats. b) type of maze. c) length of time it took the rats to run the maze. d) presence or absence of the new drug. What is the dependent variable? Hint: What was the measure of the research subjects' responses? a) age of the rats. b) type of maze. c) length of time it took the rats to run the maze. d) presence or absence of the new drug. Example 2 A researcher wanted to study the effects of sleep deprivation on physical coordination. The researcher selected 25 year-old male college students and deprived some of the subjects to either 24, 36, or 45 hours of sleep. In the present study the independent variable was: a) the length of time the subjects were deprived of sleep. b) the age of the subjects. c) the gender of the subjects. d) the physical coordination skills of the subjects. In the present study the dependent variable was: a) the length of time the subjects were deprived of sleep. b) the age of the subjects. c) the gender of the subjects. d) the physical coordination skills of the subjects. Example 3 A researcher wanted to know whether the number of people present would influence subjects' judgments on a simple perceptual task. In each case the other members of the group gave an incorrect answer. The researcher then noted whether the subject conformed to the group decision. In the present study the independent variable was: 2 Tripken 1 Research Methods/Stats a) the number of people in the group. b) whether the group members gave the correct or incorrect answer. c) whether the subjects conformed with the group. d) the type of perceptual task. In the present study the dependent variable was: a) the number of people in the group. b) whether the group members gave the correct or incorrect answer. c) whether the subjects conformed with the group. d) the type of perceptual task. Example 4 An investigator had 60 subjects watch a videotaped re-enactment of a bank robbery. Half of the subjects were asked by a police investigator to recall the event, while the remaining subjects were interviewed by a police investigator while they were hypnotized. In the present study the independent variable was: a) whether a police investigator was used. b) whether subjects were hypnotized. c) how much subjects recalled. d) what subjects watched. In the present study the dependent variable was: a) whether a police investigator was used. b) whether subjects were hypnotized. c) how much subjects recalled. d) what subjects watched. Control & Experimental Groups In an experiment, researchers are typically concerned about the performance of subjects in the experimental group. If a researcher wants to know if a new drug helps improve memory, the researcher is most interested in the how people who are given the drug perform on the memory test. However, in order to conclude that the drug "improves" memory, people who take it must perform better than those who do not take the drug. The CONTROL GROUP serves as the BASELINE performance. The group given the drug serves as the EXPERIMENTAL GROUP. Confounding/Extraneous Variables In order to isolate the effect of the independent variable on the dependent variable, researchers must rule out alternative explanations. In other words, only the independent variable can be allowed to vary. The term CONFOUNDING/EXTRANEOUS VARIABLE is used to refer to any other factor that might affect the dependent variable. Try the following exercise to see if you can spot potential problems in these hypothetical research studies. EXAMPLE 1 A researcher wanted to assess whether mood influenced people's memory. The researcher hypothesized that positive moods would lead to greater memory performance than would a negative mood state. On Monday the researcher had 50 subjects learn a list of nonsense syllables and then watch a very humorous comedy film. Their recall of the list of syllables was then assessed. On Tuesday the researcher had a second group of 50 subjects learn the same list of nonsense syllables and then watch an upsetting documentary on World War II. Their recall of the list was then assessed after having watched the film. EXAMPLE 2 A researcher wanted to see whether a new way of teaching English was superior to a more traditional approach. The researcher selected two Thursday night classes at a local community college. In 3 Tripken 1 Research Methods/Stats one class the instructor used a traditional method, the second instructor used the newer approach. The researcher then assessed students language ability after they had completed the program. How Researchers Control Sources of Error To control for potential extraneous variables and other sources of error researchers use: A standardized set of procedures Equivalent Control and Experimental Groups Standardized procedures means that subjects are treated the same way in all regards except for the independent variable(s). Researchers also need to ensure that the control and experimental groups are similar on important variables at the outset. To do this researchers can use one of three methods. Use the same subjects in both the control and experimental groups. (This is called a repeated measures design). Match subjects on important variables (e.g., for every 20 year old female in the control group there is a 20 year old female in the experimental group). Random assignment. (Let chance decide who gets placed into which group. Thus, each subject has an equal chance of being placed in either group). Why three methods? Sometimes we cannot use the same subjects in both the control and experimental groups. Sometimes after having been in one of the conditions it alters the subjects' behavior. This change may carry over to the next condition and thus serve as an extraneous variable. For example, a researcher wants to study whether a new drug is better than an old drug to reduce anxiety symptoms. If we gave the old drug to the subjects and assessed them and then gave the new drug, there might be carry-over effects from the old drug still. Thus, we might want to use two different groups of people who suffer from anxiety. We could match our subjects on important variables such as age, gender, severity of symptoms. Thus, for every 40 year old male with mild symptoms in the old drug (control) group there is a similar subject in the new drug (experimental) group. However, in finding perfect matches for our subjects we might have to go through many people. This is not very resource efficient. In order to find 50 people who are perfect matches for another group of 50 we might have to go through a few hundred potenital subjects. A more simple method is random assignment. We let chance determine who is in the control and experimental groups. With a large enough sample of subjects it is highly unlikely that the majority of people with severe symptoms would be in one group. Advantages & Disadvantages of the Experiment The advantage of the experimental approach is that it allows investigators enough control to examine cause and effect relationships. Experiments allow us to answer "what causes something to occur?" This is the second goal of science, understanding and prediction. However, this degree of control can also be a potential weakness for experiments. By controlling features of the environments of subjects the researcher may create too artificial an environment. This means that while the researcher may have accurately understood the "cause" of the subjects' behavior, the findings only apply under such rigid, non-real world conditions, to have limited use in explaining real-world behavior. (The desire of psychology is to understand this real-world behavior too). A second weakness for experiments is that some questions for ethical or technical reasons cannot be studied using an experiment. An important question is whether people who have had less optimal rearing 4 Tripken 1 Research Methods/Stats experiences, such as poverty or abuse, continue to have difficulties in their adult years because of this poor rearing. Yet, we cannot place children in abusive environments just to see if it "causes" damage that persists into adulthood. Thus, we use other research methods, such as correlational studies. We might see whether there is a relationship between childhood poverty or abuse and psychological and behavioral problems of adults, by asking adults about their childhood experiences and their life as an adult. Descriptive & Correlational Designs These designs allow us to fulfill the first goal of science, and to isolate possible causes for experiments to then assess. Remember only experiments can assess cause and effect. No matter how convincing data from descriptive and correlational studies may sound, because they have less control over the variables and the environments that they study, non-experimental designs cannot rule out extraneous variables as the cause of what is being observed. There are many types of non-experimental methods. We will focus on three approaches: Case Study Naturalistic Observation Survey CASE STUDIES involve in-depth examination of a single person or a few people. This approach is frequently employed in clinical psychology. Typically the individual or small group of individuals being examined possesses some skill, or has some problem that is unusual. STRENGTH: Such cases can expand our knowledge about the variations in human behavior. While most researchers are interested in what is the "general" trend in behavior, those using a case study approach highlight individuality. Considerable information is gathered. Thus, the conclusions drawn are based on a more complete set of information about the subjects. WEAKNESS: Despite their strengths, case studies have some very big drawbacks. First, like all nonexperimental approaches, they are merely describing what is occurring, but cannot tell us "why" it is occurring. Second, there is considerable room in case studies for "researcher bias" to creep in. While no approach, including the experiment, is immune from researcher bias when in the hands of an incompetent or poorly trained researcher, some approaches are at greater risk for this problem even when conducted by capable people. Why is the case study more at risk? The case study method involves considerably more interaction between the researcher and the subjects than most other research methods. In addition, it is from the researcher's journals of his or her subjects that the data comes from. While this might also be supplemented by test scores and more objective measures, it is the researcher that brings all this together in the form of a descriptive "case study" of the individual(s) in question. A final problem with case studies is that the small number of cases examined make it unlikely that they represent those who may have similar problems or abilities as those studied. This problem means we might not be able to generalize (apply) the study's findings to other people with similar problems. Thus, a case study of a single person with schizophrenia is unlikely to be representative of all people who suffer from this disorder. NATURALISTIC OBSERVATION studies as their name implies observe organisms in their natural settings. A researcher who wants to examine aggressive behavior in male and female youngsters may watch children in the school playground, and record the number of aggressive acts boys and girls display. STRENGTH: The behavior of the subjects is likely to reflect their true behavior as it takes place in a natural setting, where they do not realize that they are being observed. 5 Tripken 1 Research Methods/Stats WEAKNESS: The researcher has no control over the setting. For example, in our playground study, more than a child's gender may be affecting the child's aggressive behavior. In addition, subjects may not have an opportunity to display the behavior the researcher is trying to observe because of factors beyond the researcher's control. For example, some of the children who are usually the most aggressive may not be at school that day or in detention because of previous misconduct, thus they are not in the sample of children on the playground. Finally, the topics of study are limited to only people's overt behavior. A researcher cannot study topics like attitudes or thoughts using a naturalistic observation study. SURVEY studies ask large numbers of people questions about their behaviors, attitudes, and opinions. Some surveys merely describe what people say they think and do. Other survey studies attempt to find relationships between the characteristics of the respondents and their reported behaviors and opinions. For example, is there a relationship between gender and people's attitudes about some social issue? When surveys have this second purpose we refer to them as CORRELATIONAL STUDIES. STRENGTH: Surveys allow us to gather information from large groups of people. Surveys also allow us to assess a wider variety of behaviors than can be studied in a naturalistic observation study. WEAKNESS: Surveys require that the subjects understand the language. Thus, some members of the population may be excluded from survey research. Surveys also rely heavily on subjects' memory and honesty. CORRELATIONAL STUDIES Correlational studies look for relationships between variables. Do people who experience divorce have more psychological problems? Do children who come from economically advantaged families perform better academically? In each case we are asking is there a relationship between variable X and variable Y? Correlational studies only tell us that there is a relationship between the two variables. They do not tell us which variable "caused" the other. For example, a researcher measures people's marital status and their psychological adjustment and finds that there is a correlation between the two variables. More people who are no longer married report experiencing psychological problems. It might be tempting to conclude that the stress of experiencing a divorce causes depression and anxiety. However, it is also likely that people who suffer from psychological problems are harder for partners to live with, and thus more likely to have their marriage end in divorce. The researcher would need to determine which variable came first, the marital breakup or the psychological problems. Establishing Causality In order to establish causality we need three things. That there is a correlation between the two variables Time order. That the presumed cause came before the presumed effect Rule out alternative explanations Correlational studies give us the first thing. Certain studies if they follow subjects over a period of time may provide us with the second. But correlational studies have less control over the subjects' environment and thus have difficulty ruling out alternative explanations. Correlation Some studies are interested in whether two variables are related to each other. Is there a relationship between birth order and IQ scores? Is there a relationship between socioeconomic status (SES) and health? 6 Tripken 1 Research Methods/Stats The CORRELATION COEFFICIENT is a statistic that shows the strength of the relationship between the two variables. The correlation coefficient falls between -1.00 and +1.00. The statistic shows both the STRENGTH of the relationship between the variables, and the DIRECTION of the relationship. The numerical value indicates the strength of the relationship. The sign in front of the numerical value indicates the direction of the relationship. Let us consider each of these in more detail. THE NUMBERICAL VALUE: Correlation coefficient values that are close to zero (e.g., -.13, +.08) suggest that there is no relationship between the two variables. The closer the correlation is to one (e.g., -.97, +.83) the stronger the relationship between the two variables. Thus, we might expect that there would be no relationship between the height of college students and their SAT scores, and we would be correct. The correlation coefficient is very close to zero. However, we might expect a correlation between adult height and weight to be stronger, and again we would be correct. THE SIGN: The sign of the correlation coefficient tells us whether these two variable are directly related or inversely related. Do the two variables increase and decrease in the same direction? The more time a student spends studying the better their grade, the less time spent studying the lower the grade. Notice how both study time and grade vary in the same direction. As studying increases grades increase, and when studying decreases grades decline. Grade and study time would be POSITIVELY correlated. The term POSITIVE does not necessarily mean it’s a good thing (when is getting a poor grade a "good" thing!). It simply means that there is a direct relationship; the variables are varying (changing) in the same direction. Do the two variables vary in opposing directions? As the number of children in a family increase the lower the IQ scores of the children. Thus, family size and children's IQ scores vary in the opposite direction. As family size increases the IQ scores decline, as the family size decreases IQ scores increase. IQ and family size are NEGATIVELY correlated (inversely related). Try the following exercise to see if you understand the concept of correlation. INSTRUCTIONS Read each of the descriptions below. Then determine whether the relationship described suggests a positive or negative correlation (the section on "statistics: correlation" will review what is meant by Positive and Negative correlation). Then consider why we might find this relationship. The more you think about the correlation suggested the more possible explanations for this relationship you are likely to find. This highlights why causality cannot be established through correlational research. (The section on correlational studies reviews this idea). A researcher finds that students who attend fewer classes get poorer grades. Is this a positive or negative correlation? Why might we find a relationship between attendance and grades? Example 1: A researcher finds that students who have more absences get poorer grades. Cities with more stores selling pornography have higher rates of violence. Is this a positive or negative correlation? Why might we find a relationship between attendance and grades? Example 2: Cities with more stores selling pornography have higher rates of violence. 7 Tripken 1 Research Methods/Stats The longer couples have been together the more similar they are in their attitudes and opinions. Is this a positive or negative correlation? Why might we find a relationship between attendance and grades? Example 3: The longer couples have been together the more similar they are in their attitudes and opinions. Moral of the Lesson: In each case above there was more than one explanation for why we might find the relationship between the variables. Since we cannot rule out these alternative explanations, we cannot conclude that changes in one variable "caused" changes in the other variable. The snappy phrase to express this idea is: CORRELATION does not equal CAUSATION Inferential Statistics Inferential Statistics allow researchers to draw conclusions (inferences) from the data. There are several types of inferential statistics. The choice of statistic depends on the nature of the study. Covering the different procedures used is beyond the scope of this course. However, understanding why they are used is important. A researcher asks two groups of children to complete a personality test. The researcher then wants to know whether the males scored differently than the females on certain measures of personality. We will create a fictitious personality trait "Q." Here are the scores for the girls and the boys: Girls Boys 23 37 40 56 37 18 41 41 41 42 33 38 28 50 25 22 24 33 13 47 28 25 44 46 Mean=31.42 Mean=37.92 SD=9.03 SD=11.14 The mean score for the "Q" trait in boys was higher than the mean score for "Q" in the girls. But notice how within the two groups there was considerable fluctuation. By "chance" alone we might have obtained these different values. Thus, in order to conclude that "Q" shows a gender difference, we need to rule out that these differences were just a fluke. This is where inferential statistics come in to play. An important concept in inferential statistics is STATISTICAL SIGNIFICANCE. When an inferential statistic reveals a statistically significant result the differences between the groups were unlikely due to chance. Thus, we can rule out chance with a certain degree of confidence. When the results of the inferential statistic are not statistically significant, chance could still be a reason why we obtained the observations that we did. 8 Tripken 1 Research Methods/Stats In the example above we would use an inferential statistic called a T-TEST. The t-test is used when we are comparing TWO groups. In this instance the t-test does not yield a statistically significant difference. In other words, the differences between the scores for the boys and the scores for the girls are not large enough for us to rule out chance as a possible explanation. We would have to conclude then that there is no gender difference for our hypothetical "Q" trait. Inferential statistics do not tell you whether your study is accurate or whether your findings are important. Statistics cannot make up for an ill-conceived study or theory. They simply assess whether we can rule out the first "extraneous" variable of all research, CHANCE. Statistics Statistics are used to organize, summarize, and interpret empirical data. Descriptive Statistics helps us to organize and summarize the data. Inferential Statistics help us to interpret the data gathered. Organizing the Data Data can be organized using frequency counts and graphs to visually structure the data set. For example, a researcher tallies the following scores on a memory test gathered from 23 subjects. 5 13 11 12 12 11 11 12 8 12 8 11 10 13 7 12 7 5 7 12 9 11 14 Arranged in this way the data set is very confusing. However, we could group the numbers in a frequency count. The data above ranges from 5 to 14. 5 xx 6 2 0 7 xxx 3 8 xx 2 9 x 1 10 x 1 11 xxxxx 5 12 xxxxxx 6 13 xx 2 14 x 1 We can see that the data is grouped more toward the higher end than the lower end, with almost half of the sample scoring 11 or 12. Summarizing the Data While a frequency table like the one above helps us to make some sense out of the numbers, it would be nice if we could somehow summarize the scores of 23 subjects with a single score. Measures of Central Tendency Measures of Variability Measures of Central Tendency 9 Tripken 1 Research Methods/Stats There are three measures of central tendency: MEAN, MEDIAN, MODE Each is a single score to represent a group of scores. As their collective name suggests they are looking for the most "central" or typical response. The MEAN is the arithmetic average. Sum the data points in the above example and divide by the number of data points. 233 / 23 = 10.13 The MEDIAN is the exact midpoint of the data set. To calculate the median you place the numbers in order. 5 5 7 7 7 8 8 9 10 11 11 11 11 11 12 12 12 12 12 12 13 13 14 The midpoint is the observation that is in the middle of the set. As there were 23 people this would be the 12th data point. 5 5 7 7 7 8 8 9 10 11 11 11 11 11 12 12 12 12 12 12 13 13 14 What if there had been 24 people? The median is the mean of the middle two points. In our example above it would still be 11. The MODE is the most frequent score in the data set. In the example above this is 12. Six people scored 12 on the memory test. Why Three Different Types of Measures? MODE: Survey researchers are often interested in what is the most common score and thus the MODE is the measure of choice. If I asked students which they prefer COKE or PEPSI, the mean and median score is meaningless. What I want to know is which soda is preferred. Thus, I want the modal response. MEDIAN: Notice in our memory score example the MEAN, MEDIAN, and MODE were not identical. The mean was a little over 10, the median 11, and the mode 12. Why? In our example, the data was SKEWED. What is "Skewed"? It means that many scores were bunched down one end with a few scores existing at the other end of the scale. In what is called a NORMAL DISTRIBUTION the data points are symmetric and the mean, median, and mode are the same value. In our example the data was not strongly skewed, because the three values were at least close together. But consider the following data set. 0 xxxxxxx 1 xxxxx 2 xxxx 3 xx 4 x 5 6 7 Many scores bunch between 0 and 4, with a few trailing off at 7-10. The MEAN is 60 / 23=2.6. The MEDIAN is 1. As you can see with the following example, the MEAN is strongly influenced by the more extreme (atypical) scores. When the data are very skewed, the MEAN can be a poor representative of the data set, while the MEDIAN is unaffected by extreme values. x 8 9 10 xxx 10 Tripken 1 Research Methods/Stats MEAN: The mean is the preferred choice, especially when the data are not highly skewed. The mean is used in the calculation of most Inferential Statistics and is used to calculate variability. Try the following exercise to see if you understand the concepts of mean, median, and mode. Exercise - Measures of Central Tendency 1. The measure that is most commonly used by researchers because it is used to calculate inferential statistics is: _____________________________ 2. The measure that is least affected by extreme scores is: _________________________ 3. The Mode of the following set of data (5 6 6 7 8 8 8 9) is: _______________________ 4. The Mean of the data set is: __________________________ 5. The Median is: __________________________________ The measures of central tendency summarize the data in terms of a single number, but not all scores in the data set reflect that value. Measures of variability allow us to assess how much the scores in the data differ from each other. The simplest measure of variability is the RANGE (highest and lowest score). However, the range is also strongly influenced by more extreme scores. 0 1 1 1 2 2 2 3 5 12 The range is 0-12. But most scores are really 0-5, thus, the range can be misleading. 11 Tripken 1 Research Methods/Stats Another method is to subtract each observation from the mean of the data set. This is the Standard Deviation. Below is an example of how to calculate the standard deviation of the data set above. The mean in the above data set is 29 / 10=2.9. We will subtract this value from each and every score in our data set. 0 1 1 1 2 2 2 3 5 12 - -2.92 -1.92 -1.92 -1.92 -0.92 -0.92 -0.92 0.12 2.12 9.12 2.9 2.9 2.9 2.9 2.9 2.9 2.9 2.9 2.9 2.9 = = = = = = = = = = -2.9 -1.9 -1.9 -1.9 -0.9 -0.9 -0.9 0.1 2.1 9.1 = 8.41 3.61 = 3.61 = 3.61 = 0.81 = 0.81 = 0.81 = 0.01 = 4.41 = 82.81 If we were to sum up the values in the final column we would have a score of zero. In addition, working with negative numbers is always annoying, so if we square the values in the last column all numbers will end up as positive values (a negative value squared ends up as a positive value). The sum of the final column is 108.9. We need to calculate the Mean of this score (108.9 / 10). This value is 10.89. This is the variance of the data, but as it is based on squared values, it is also a squared value. If the values above were people's reaction times, the value of 10.89 would be in squared reaction times, which is hard to comprehend. Thus, we square root this value. The standard deviation is 3.3. = The larger the standard deviation the greater the degree of variability in the data set. Thus, let us compare the following two groups of responses. 2224 4 5 5 6 1 1 1 2367 9 Group 1 mean = 3.75 Group 2 mean = 3.75 Standard deviation (SD)= 1.48 Standard deviation (SD)= 2.95 Groups 1 and 2 have the same mean score. However, the scores in group 2 are more variable. The SD value reflects this greater variation of the individual scores from the mean. Correlation Some studies are interested in whether two variables are related to each other. Is there a relationship between birth order and IQ scores? Is there a relationship between socioeconomic status (SES) and health? The CORRELATION COEFFICIENT is a statistic that shows the strength of the relationship between the two variables. The correlation coefficient falls between -1.00 and +1.00. The statistic shows both the STRENGTH of the relationship between the variables, and the DIRECTION of the relationship. The numerical value indicates the strength of the relationship. The sign in front of the numerical value indicates the direction of the relationship. Let us consider each of these in more detail. 12 Tripken 1 Research Methods/Stats THE NUMBERICAL VALUE: Correlation coefficient values that are close to zero (e.g., -.13, +.08) suggest that there is no relationship between the two variables. The closer the correlation is to one (e.g., -.97, +.83) the stronger the relationship between the two variables. Thus, we might expect that there would be no relationship between the height of college students and their SAT scores, and we would be correct. The correlation coefficient is very close to zero. However, we might expect a correlation between adult height and weight to be stronger, and again we would be correct. THE SIGN: The sign of the correlation coefficient tells us whether these two variable are directly related or inversely related. Do the two variables increase and decrease in the same direction? The more time a student spends studying the better their grade, the less time spent studying the lower the grade. Notice how both study time and grade vary in the same direction. As studying increases grades increase, and when studying decreases grades decline. Grade and study time would be POSITIVELY correlated. The term POSITIVE does not necessarily mean its a good thing (when is getting a poor grade a "good" thing!). It simply means that there is a direct relationship, the variables are varying (changing) in the same direction. Do the two variables vary in opposing directions? As the number of children in a family increase the lower the IQ scores of the children. Thus, family size and children's IQ scores vary in the opposite direction. As family size increases the IQ scores decline, as the family size decreases IQ scores increase. IQ and family size are NEGATIVELY correlated (inversely related). Inferential Statistics Inferential Statistics allow researchers to draw conclusions (inferences) from the data. There are several types of inferential statistics. The choice of statistic depends on the nature of the study. Covering the different procedures used is beyond the scope of this course. However, understanding why they are used is important. A researcher asks two groups of children to complete a personality test. The researcher then wants to know whether the males scored differently than the females on certain measures of personality. We will create a fictitious personality trait "QZ." Here are the scores for the girls and the boys: Girls Boys 23 37 40 56 37 18 41 41 41 42 33 38 28 50 25 22 The mean score for the "QZ" trait in boys was higher than the mean score for "QZ" in the girls. But notice how within the two groups there was considerable fluctuation. By "chance" alone we might have obtained these different values. Thus, in order to conclude that "QZ" shows a gender difference, we need to rule out that these differences were just a fluke. This is where inferential statistics come in to play. 13 Tripken 1 Research Methods/Stats 24 33 13 47 28 25 44 46 Mean=31.42 Mean=37.92 SD=9.03 SD=11.14 An important concept in inferential statistics is STATISTICAL SIGNIFICANCE. When an inferential statistic reveals a statistically significant result the differences between the groups were unlikely due to chance. Thus, we can rule out chance with a certain degree of confidence. When the results of the inferential statistic are not statistically significant, chance could still be a reason why we obtained the observations that we did. In the example above we would use an inferential statistic called a T-TEST. The t-test is used when we are comparing TWO groups. In this instance the t-test does not yield a statistically significant difference. In other words, the differences between the scores for the boys and the scores for the girls are not large enough for us to rule out chance as a possible explanation. We would have to conclude then that there is no gender difference for our hypothetical "QZ" trait. Inferential statistics do not tell you whether your study is accurate or whether your findings are important. Statistics cannot make up for an ill-conceived study or theory. They simply assess whether we can rule out the first "extraneous" variable of all research, CHANCE. How do we know if our research has concluded anything of value? Tests of statistical significance determine if the difference is to big to be due to chance alone The tests look at two factors: 1. They look at the size of the difference. The bigger the difference between the groups, the more likely the results are to be statistically significant. For example, if the Experimental group averages 95% and the control group averages 45% on our test, that difference would probably be statistically significant. (Intuitively, you do the same thing. If your team gets beat by one point, you point out that the other team was lucky. You don't have to concede that the other team is better. However, if they beat your team by 30 points, you may have to admit that the other team is better). 2. They look at the number of participants. The more participants that are used, the more likely the results are to be statistically significant. (Why? Because if you only have a few participants, the groups might be very different at the beginning of the study. However, if you have 100 participants in each group, the groups should be pretty similar before the start of the study. If they are very similar at the start, then, if they are even slightly different at the end, that difference could be due to the treatment. Similarly, in sports, if one team beats another in a seven game series that's more convincing evidence of the team's superiority than winning a single game.) Two possible verdicts from statistical tests 1. statistically significant: you are sure beyond a reasonable doubt (your doubt is less than 5% (<.05%)) that the difference between your groups is too big to be due to chance alone. So, if the difference between the treatment group and the no-treatment group is too big to be due to chance alone, then some of that difference is probably due to treatment. In other words, the treatment probably had an effect. 14 Tripken 1 Research Methods/Stats 2. not statistically significant: you are not sure, beyond a reasonable doubt, that the difference between the groups is due to anything more than just chance. So, you can't conclude anything. The results are inconclusive. 15 Tripken 1 Research Methods/Stats T-test - What is it? The T-test is used to determine whether there’s a significant difference between two group means. It helps to answer the underlying question: do the two groups come from the same population, and only appear different because of chance errors, or is there some significant difference between these two groups, such that we can say that they’re really from two entirely different populations? For example, is the PROBABLITY of group 1 acting calmer, after taking a new anxiety medication, because of the meds or is there “calm-ness” due to chance/was it an accident? Three basic factors help determine whether an apparent difference between two groups is a true difference or just an error due to chance: 1. the larger the sample, the less likely that the difference is due to sampling errors or chance 2. the larger the difference between the two means, the less likely the difference is due to sampling errors 3. The smaller variance among the participants, the less likely that the difference was created by sampling errors Reporting Data -When t is significant: basically, is your results is due to the meds, or, is your results due to chance? ** The difference between the means must be statistically significant for you to be able to claim that your experiment created change. The Z-test, similar to the t-test, is a statistical test used in inference which determines if the difference between a sample mean and the population mean is large enough to be statistically significant, that is, if it is unlikely to have occurred by chance. The Z-test is used primarily with standardized testing to determine if the test scores of a particular sample of test takers are within or outside of the standard performance of test takers. Definition of a P value Consider an experiment where you've measured values in two samples, and the means are different. How sure are you that the population means are different as well? There are two possibilities: The populations have different means. The populations have the same mean, and the difference you observed is a coincidence of random sampling. The P value is a probability, with a value ranging from zero to one. It is the answer to this question: If the populations really have the same mean overall, what is the probability that random sampling would lead to a difference between sample means as large (or larger) than you observed? How are P values calculated? There are many methods, and you'll need to read a statistics text, and take some Tylenol, to learn about them. The choice of statistical tests depends on how you express the results of an experiment (measurement, survival time, proportion, etc.), on whether the treatment groups are paired, and on whether you are willing to assume that measured values follow a Gaussian bell-shaped distribution. 16 Tripken 1 Research Methods/Stats We use the T and Z Test to determine PROBILITY levels in your experiment. Common misinterpretation of a P value Many people misunderstand what question a P value answers. If the P value is 0.03, that means that there is a 3% chance of observing a difference as large as you observed even if the two population means are identical. It is tempting to conclude, therefore, that there is a 97% chance that the difference you observed reflects a real difference between populations and a 3% chance that the difference is due to chance. Wrong. What you can say is that random sampling from identical populations would lead to a difference smaller than you observed in 97% of experiments and larger than you observed in 3% of experiments. You have to choose. Would you rather believe in a 3% coincidence? Or that the population means are really different? "Extremely significant" results Intuitively, you probably think that P=0.0001 is more statistically significant than P=0.04. Using strict definitions, this is not correct. Once you have set a threshold P/alpha value for statistical significance, every result is either statistically significant or is not statistically significant. Some statisticians feel very strongly about this. Many scientists are not so rigid, and refer to results as being "very significant" or "extremely significant" when the P value is tiny. Often, results are flagged with a single asterisk when the P value is less than 0.05, with two asterisks when the P value is less than 0.01, and three asterisks when the P value is less than 0.001. P<.05 – results are significant P<.01 – results are very significant P<.001 – results are extremely significant Statistical hypothesis testing The P value is a fraction. In many situations, the best thing to do is report that number to summarize the results of a comparison. 1. Set a threshold P value (also called the alpha for significance) before you do the experiment. Traditionally 0.05 is a minimum threshold for significance. 2. Define the null hypothesis. If you are comparing two means, the null hypothesis is that the two populations have the same mean. 3. Do the appropriate statistical test to compute the P value. 4. Compare the P value to the preset threshold value. If the P value is less than the threshold, state that you "reject the null hypothesis" and that the difference is "statistically significant". If the P value is greater than the threshold, state that you "do not reject the null hypothesis" and that the difference is "not statistically significant". 17 Tripken 1 Research Methods/Stats Answers Only Answers: 1. Is the content of dreams the result of unconscious motives and desires? While this was the assumption that Freud made, one of the reasons his theory was challenged was because it was unscientific. We have no objective measure of the "unconscious" to even establish that it exists. So science cannot answer this question at present. 2. Is there any relationship between birth order and personality? Birth order is fairly straight-forward to measure, personality would have to be more precisely defined. However, science can assess whether birth order is related to certain personality characteristics. Asking whether it "causes" such characteristics would be more difficult to assess as there are far too many uncontrolled factors. 3. Do college students consume more pizza than any other group? While we would have to be more precise in defining who are the college students and who are the other groups, science could assess this more precisely defined question. 4. Can humans be innately evil? No, science cannot answer if humans are innately evil. How do you measure "evil"? This is too much of a value judgment, perhaps best left to philosophy and theology. 5. Do sales of pain relievers increase during periods of economic crisis? While we would have to be more precise about which types of pain relievers and what would be defined as an economic crisis, science could assess this. 6. Do animals dream? While most mammals do experience REM sleep, we cannot ask Fido and Fluffy what they were experiencing. We need more objective measures. Science has the same problem with answering whether human infants and fetuses dream. Both groups experience REM sleep (in fact, 50% or more of their sleep time is spent in REM), but we cannot ask either what they were experiencing. Thus, at present science cannot answer this question. IV and DV Answers/Example 1 D) Independent variable was the presence or absence of the drug. This was the variable being manipulated by the researcher. C) Dependent variable was the length of time it took the rats to remember where the rat chow was after one week. This was the measure of the subjects' response. Example 2 Independent variable was the length of time the subjects were sleep deprived. Dependent variable was the physical coordination skills of the subjects. Example 3 Independent variable was the number of people in the group. Dependent variable was whether the subjects conformed with the group. Example 4 Independent variable was whether the subjects were hypnotized. Dependent variable was how much subjects recalled. Confounding Variables / Answers EXAMPLE 1 ANSWER - In any study where different subjects are being used in the treatment groups it is important that you establish that the groups are the same at the outset. Thus, any differences found at the end were due to your manipulation and not to preexisting differences. There is no mention of a pre-test of subjects' mood or that subjects' moods had even been altered by watching the films (post-test). The researcher is assuming that watching a funny film would make someone happy, or witnessing an upsetting film will produce a negative mood. This is not always a safe assumption, and should always be verified. 18 Tripken 1 Research Methods/Stats Second, the day of the week might be a possible confound. In this case, it might lead to fewer "happy" subjects in the "positive mood group" as this group was assessed on a MONDAY! EXAMPLE 2 ANSWER - There are a couple of problems with this study. Although the researcher did attempt to control potential extraneous variables (the subjects were all night students at the same community college), the experimental and control groups were not only taught using two different methods of instruction, but by two different instructors. One teacher may have been a much better instructor than the other. If this was the teacher who used the newer method, students superior performance may have had as much to do with the teacher as it did the new approach. A second problem, is the students. The researcher did not establish the level of ability of the two classes at the outset. One class may have had students with a higher level of language skills than the other. The control and experimental group should be very similar to one another at the outset. Thus, any differences at the end cannot be attributed to any preexisting differences. CORRELATION SECTION EXAMPLE 1 - ANSWER NEGATIVE correlation: As the number of absences increase, the grade declines. The variables are changing in the opposite direction. We might find this relationship for many reasons. (1) Students who are more absent miss important pieces of information that would increase their chances of performing better in the class. (2) Students who have difficulty in a class may stop attending because they see no reason for going. Thus, does the greater class absence "cause" the poor grades or do poor grades "cause" the greater absence. (3) A student who is not highly motivated may be absent more often and may do poorly. Thus, these two variables are related to other variables (such as motivation) which may be the real reason for the relationship between class absences and grades. Example 2 Correlation answer - POSITIVE correlation: As the number of stores selling pornography increases violence increases. Both variables are varying in the same direction. One possibility is that pornography (which often depicts violence as well as sexuality) "causes" aggression. It is also possible that people who are aggressive are drawn to images that depict aggression. But there is likely a third factor in this case, the POPULATION of a CITY. As cities increase in population there is an increase in violence, there is also an increase in the number of stores selling anything (pornography, cars, grocery stores). Example 3 Correlation Answer - POSITIVE correlation: As the amount of time spent together increases, similarity increases. Both variables are varying in the same direction. This might happen because over time people influence each other. It might also occur because people who are most similar to each other to start with stay together longer. Thus, does the amount of time together "cause" the similarity, or does initial similarity make it more likely that people will stay together. Measuring Answers: Mean, Median, 8, 7.12, 7.5 Measures of Variation 19