Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Psychology 230: Statistics Lecture Notes PLEASE NOTE: MONDAY’S EXAM WILL START PROMPTLY AT 5:30PM. 10/27/07Announcements: 1. Review Session today 2. Midterm 2 next Monday Unit 2 started off by introducing you to hypothesis testing using z-scores (can only be used when you're comparing a sample to a population that has a known mean and std dev). We then moved to the t-test, which can be used when you know the population mean but not the std dev. Today, we're looking at independent-samples t-tests, which are used when you don't know the population mean and std dev, and when you're comparing two samples in a between-subjects design. Between-subjects design-- There are two treatment conditions in the experimental design and a completely different group of subjects in each condition. This is distinct from a within-subjects or repeated-subjects design, in which the same subjects can be used in both conditions. The standard error of the mean is the average amount by which samples drawn from a population deviate from the population mean. Four steps of hypothesis testing: 1. State the hypotheses. One hypothesis is the null hypothesis, which is that there is no difference between the population and the sample means BECAUSE the sample is drawn from the population. 2. Set the criterion level for t. We pick a level of alpha such as .05. When we do that, we're saying that a t value so extreme that 5% or fewer of the samples drawn from a population can have that value is enough to reject to reject the null hypothesis. 3. Collect data and compute t. We compute t and compare it to the critical value of t. If the computed t for our sample is BIGGER than the criterion t, then we can reject the null hypothesis. 4. Make a decision. Do we reject the null hypothesis? YES/NO If the t we compute is bigger than the null hypothesis, we can reject it. If it's smaller, then we can't. The two differences between the single-sample t-test and the indepdendent measures t-test are: 1. The numerator (top) is a difference between two SAMPLE means, rather than the difference between a sample mean and a population mean. 2. The denominator (bottom) is the "standard error of the mean difference" as opposed to the "standard error of the mean." How are these two terms different? The standard error of the mean difference is the average amount by which two sample means drawn from the same population differ from each other. The independent-measures t-test can be used even with two samples that have unequal size. For example, n1=5 and n2= 7. To compute the standard error of the mean difference, we need to add together the standard error of the two samples, which are based on the sample variances. These variances are estimates of the population variance. Which sample will provide a better estimate of the population variance? When you present the results of an independent measures t-test test in a publication, you will present it in this form: t(18) = 4.00, p<.05. What does that tell you? The 18 is the degrees of freedom, which tells you that 20 people were involved in the study (you just don't know how they were divided into samples). The 4.00 tells you the computed value of t for your results. The p<.05 tells you that the t value value was BIGGER than the criterion t, allowing you to reject the null hypothesis and to claim a statistically significant difference between the two samples. 10/22/07Announcements: 1. Review session next week after CH.10. 2. Midterm 2 in two weeks. 3. Studying the answer key. CH.9: Introduction to the t Statistic Z-scores and t-tests are very similar statistics, but z-scores require us to know the population standard deviation, which is unrealistic most of the time. We can estimate the population SD using the sample SD, and we can therefore estimate the population standard error using the sample standard error. The t statistic has a distribution that is similar to the distribution of z-scores in samples drawn from a population. But there is a difference: The t-distribution is more spread out. FIGURE 9.1 on p. 278 As sample size gets smaller, the variability among samples is going to be greater, and that's why the t distribution is more spread out. T Distribution Table -- p. 703 As df approaches infinity, t becomes z. When df are small, the difference between the sample and population mean tend to be larger, and therefore t is bigger than z. Hypothesis Testing with the t Statistic Four Steps: 1. State your hypotheses. 2. Locate the critical region (based on your alpha level). 3. Collect your sample data and compute the t statistic. 4. Evaluate the null hypothesis. Learning Check #1 on p. 292 Assumptions that we make when we use the t statistic: 1. The values in the sample must consist of independent observations. 2. Population that is sampled must have a normal distribution. 10/15/07Announcements: The answer keys are posted in the glass cases in the third-floor hallway of the Psychology Bldg. CH.8: Introduction to Hypothesis Testing Four Steps in Hypothesis Testing 1. State the hypotheses: Research Hypothesis (H1)-- Predicting that two groups will be different somehow. Usually, those groups differ in that one receives an experimental treatment and the other does not. The "treatment effect" is the difference in the means of the treatment and control groups. In statistics, we don't test the research hypothesis; instead, we test the Null Hypothesis (H0). This hypothesis says that there is NO difference between groups, that the treatment effect is NOT significant. We can never completely prove a hypothesis to be true. But it's very easy to disprove a hypothesis, because all it takes is one example to show that it's false. So, instead of trying to prove that H1 is true, we show that H0 is false. 2. Set the criteria for a decision. Alpha = .05 or Alpha = .01 Alpha is the probability of getting a result based on chance alone. The larger a value of z you obtain, the less likely it is that your sample mean is differing from the population mean purely by chance. The critical region consists of sample outcomes that are very unlikely to be obtained if the null hypothesis is true. The null hypothesis says that your sample mean does not differ significantly from the population mean. 3. Collect data and compute sample statistics. In this case, the statistic that we're using is z-score. 4. Make a decision. That means answering the question: Can I reject H0? (Have I successfully disproved H0?) As alpha gets smaller, the critical region gets smaller. So, for alpha = .05, the critical region is a larger part of the distribution than it is for alpha = .01. And rejecting the null hypothesis means that you have concluded that the treatment has a SIGNIFICANT effect. Statistics is defined as logic in the face of uncertainty. In hypothesis testing, we always have uncertainty. Alpha is the probability of rejecting the null hypothesis when it is true. Type I error-- The error we make when we reject a null hypothesis that is actually true (false positive). ALPHA IS THE PROBABILITY OF COMMITTING A TYPE I ERROR Type II error-- The error we make when we fail to reject a null hypothesis that is false (false negative). BETA IS THE PROBABILITY OF COMMITTING A TYPE II ERROR A Type II error occurs when the treatment effect is small or when the sample size is small. In the literature, you will statements of the following type: "The treatment had a significant effect, z = 2.45, p < .05." What does that mean? A significant effect means that you can reject the null hypothesis, that there is a big enough difference between two groups that you can't attribute that difference to chance alone. z = 2.45 tells you that you have a very large difference between your sample mean and the population mean. p < .05 tells you that alpha was set to .05 and that the value of z is large enough that there is a less than 5% probability that this value could have obtained from any one sample drawn from the population. When we compute our sample statistic (in this case, it's z), we compare it to a "critical z" and if it's more extreme, then we can reject H0. There are four assumptions that must hold true for hypothesis testing to work: 1. Random sampling-- The sample was selected in a non-systematic way, so that each sample had an equal chance of being selected from the population. 2. Independent observations-- If you're study of statistics students, you can't take all the students from one class, because there is the possibility of introducing a "confound," which is an unintended source of variability, such as the quality of the instruction provided by the teacher in that class. 3. Normal distributions-- All of the inferential statistics used in this course assume normality. 4. The value of the population standard deviation is unchanged by the treatment. (If a constant is added to every individual's score in a population, the standard deviation will remain unchanged). One of the criticisms of hypothesis testing is that it just tells you if the effect is significant but it doesn't tell you anything about the effect size. One-tailed versus two-tailed tests A one-tailed test is used when the researcher has a specific prediction about the direction of the treatment effect. Two-tailed tests are more rigorous than one-tailed tests, which is why two-tailed tests are more common. Cohen's d is a statistic used to determine the size of a treatment effect. Statistical power is the probability that a test will correctly reject a false null hypothesis. The statistical power of a test is affected by several variables: 1. The size of the effect; the bigger the effect, the greater the power. 2. The alpha level; increasing alpha increases power 3. The sample size: increasing sample size increases power 4. Using a one-tailed versus two-tailed tests: A one-tailed test has greater power. 10/8/07ANNOUNCEMENTS: Midterm 1 scores are now posted. The grade cutoffs are approximately: A B C D 21 18 14 10 - 25 20 17 13 The answer keys are posted in the glass cases in the third-floor hallway of the Psychology bldg. CH.7: PROBABILITY AND SAMPLING A sampling distribution is a distribution of sample means around the population mean. The reason that sampling distributions are used in inferential statistics is to decide if a particular sample is derived from a particular population. Central Limit Theorem-- For any population, the distribution of sample means for sample size, n, will have a mean equal to the population mean, and a std dev equal to the population SD divided by the square root of n. The distribution of those sample means will become a normal distribution as n approaches infinity. The standard error of the mean (also known as SE or SEM) is the population SD divided by the square root of the sample size, n. One way to think of the standard error is as the average amount by which a sample mean deviates from the population mean. Standard error serves two functions: 1. It describes the distribution of sample means for any population, regardless of shape, mean, or SD. 2. It lets you know that the distribution of sample means is becoming normal as n gets bigger. Is sample has n=30 or bigger, the sampling distribution is going to be normal. The standard error is affected by two variables: 1. The size of the sample: The bigger the sample size, the smaller the SE. 2. The population standard deviation: The bigger the SD, the bigger the SE. The primary use of the sampling distribution is to find out the probability associated with any sample mean. Learning Check 1a, 1c, and 2a on p. 208. THE UNIT NORMAL TABLE LETS YOU CONVERT A z-SCORE INTO A PROBABILITY, PERCENTAGE, OR PERCENTILE SCORE. Sampling distributions can be used in inferential statistics by looking at the probability of a given sample mean coming from a particular population. 10/1/07 CH.6: Probability The probability of an outcome, A, is a proportion: total # of A outcomes _________________ probability = total # of possible outcomes Random sample- A sampling procedure in which each individual has an equal chance of being selected. The assumption in random sampling is that sampling occurs WITH replacement. Learning Check 1-3 on p.167 #1. 20 males: 15 white, 5 spotted 30 females: 15 white, 15 spotted a) Prob (F) = 30 females/ 50 total rats= 0.60 b) Prob (white male) = 15 white males/50 total rats = 0.30 c) Prob (spotted male) = 5spotted males/50 total rats = 0.10 Prob (spotted female) = 15 spotted females/50 total rats = 0.30 #2. 10 red, 30 blue a) Prob (red) = 10/40 = 0.25 = 25% b) Random sample of three marbles, and the first two are blue. Prob (red) = 0.25 Random sampling means sampling with replacement 3. p (X>2) = 7 boxes greater than 2/10 total boxes = 0.7 UNIT NORMAL TABLE This table allows you to convert a z-score to a proportion in the distribution. Sometimes, you have to convert raw scores, X, to a proportion in the distribution. This requires two steps: 1. Convert X to a z-score. 2. Use the Unit Normal Table to convert the z-score to a proportion. Learning Check #1 on pp.180-181. 9/24/07Announcements: 1. Review session today! 2. First midterm next Monday (10/1). Z-Scores- a standardized score that specifies the precise location of each score in a distribution A z-score is a measure of an individual's "weirdness"--how much s/he deviates from the mean of the population. It lets you compare scores on different scales of measurement. The unit of measurement for a z-score is standard deviation. If you say that someone's z-score is z=-1.8, you are saying that the score is 1.8 standard deviations below the mean. Standard deviation is essentially the average amount of deviation in your sample or population. So, if M=75 and SD=10 on an exam, that means that the average amount of deviation from the mean of 75 is plus-or-minus 10 points. A z-score gives you two pieces of information: 1. The sign tells you if a score is above the mean (+) or below the mean (-). 2. the number tells you how far away from the mean that score lies, in standard deviations units. When you convert raw scores into z-scores, the mean and standard deviation are transformed as follows: Mean = 0 Std Dev = 1 The advantage of having a mean equal to zero is that you can tell easily if a score is above or below the mean just by looking at its sign. The advantage of having BOTH a mean equal to zero and a std dev equal to one is that it lets you compare "apples to oranges" (i.e., scores in two different distributions). There are other standardized scales: For IQ, the mean is equal to 100 and the std dev is equal to 15. For SAT scores, the mean is 500 and the std dev is 100. You can convert raw scores into any standardized scale you want, but the way to do it is to convert from raw scores to z-scores and then from z-scores to the other scale. There are two steps: 1. Transform the original score to a z-score. 2. Convert the z-score into a score on the new scale. 9/10/07- Central Tendency (CH.3) & Variability (CH.4) Announcements: 1. NO CLASS NEXT MONDAY (9/17) 2. Book release party- Thurs, 9/20, 4:30pm, UA Bookstore 3. Review Session along with CH.5 Monday, 9/24 Central tendency is the center of a group of scores. Three measures: mean, median, and mode. The mean of a group of scores is the sum of the scores divided by the number of scores in the sample or population. Weighted mean-- The combined mean of two samples. Suppose you want to combine two samples: Sample 1 has n=12 and M = 6 and Sample 2 has n=8 and M=7. Computing a weighted mean: 1. Determine the grand total of all the scores: 128 2. Determine the total sample size 3. Divide the total sum of scores by the total sample size to get the new mean Computing a mean from a frequency distribution table Characteristics of a Mean (from a mathematical perspective) 1. Changing the value of any score in the sample or population will change the value of the mean. 2. Adding or removing a score will change the mean. 3. Adding or subtracting a constant from each score in the sample or population will add or subtract the constant from the mean. 4. Multiplying or dividing each score by a constant will multiply or divide the mean by that constant. Median-- The middle score in a distribution Computing the median: 1. When N is an odd number, the median is just the middle score: 3 5 8 10 11 The median is 8. It's the middle of the five scores. 2. When N is even, the median is the halfway point between the two middle scores 3 5 8 10 11 13 The median is just 9, because that's the halfway point between the two middle scores of 8 and 10. 3. When there are several scores eith the same value in the middle of the distribution: 1 2 2 3 4 4 4 4 4 5 a. Count the number of scores below the tied score: 4 (there are four scores below 4) b. Form a fraction: number of scores needed to split the distribution in half ------------------------------------------------------------------------number of tied scores c. Add the fraction to the lower real limit of the tied scores The lower real limit of 4 = 3.5 Add 1/5, or .2, to 3.5: Median = 3.5 + .2 = 3.7 1 -5 Mode- The most frequent score in a distribution X 7 6 5 4 3 2 1 0 f 1 0 3 2 3 5 4 2 The mode is 2 because a score of X=2 is the most frequent (f = 5). When do we use mean, median, and mode as our measure of central tendency? 1. Use the median: a) With skewed distributions or when there are extreme scores in your distribution. b) When there are undetermined or missing values for certain individuals c) With open ended distributions (number of absences and you have a value of "5 or more") d) With ordinal scales 2. Use the mode a) With nominal scales: Big Mac - 5 Quarter Pounder - 7 Chicken McNuggets - 3 Quarter pounder is considered the mode in this sample because it's the most frequently ordered meal. b) When dealing with discrete values: mean number of children per household is 1.8, but it's better not to chop children into pieces, so the mode is 2. c) When describing the shape of a distribution. Bimodal distributions-- Distributions having two peaks. Multimodal- Distributions having two or more peaks. VARIABILITY1. It tells you how spread out or clustered together a set of scores is. 2. Variability lets you look at an individual's score and determine its "weirdness" or "deviance" in terms of how far it is from the center. Range-- Difference between the upper real limit of the highest score in the distribution and the lower real limit of the lowest score. 3, 4, 5, 7, 9, 10, 11, 13 The range of these scores is 13.5 -2.5 = 11 Interquartile range-- Divide the distribution into fourths, and take the difference between Q3, which is three fourths of the way to the highest score, and Q1, which is 1/4 of the way to the highest score. Q1 is midway between 4, which is the end of the first quartile, and 5, which is the beginning of the second quartile, and that is equal to 4.5. Q3 = 10.5 Interquartile range = 10.5 - 4.5 = 6 Semi-interquartile range = interquartile range divided by two = 6/2 = 3 The standard measures of variability in psychology are variance and standard deviation. Why are sample variance and std deviation computed using n-1 instead of n? The answer is that these statistics are generally used to ESTIMATE the population parameters. In general, there is less variability in a sample than in the population from which that sample derives. n-1 is a correction factor that makes the sample variance and std deviation closer to the population variance and std dev. n-1 is called "degrees of freedom" and abbreviated "df." For n=5, there are four degrees of freedom. If you were assigning a score to each individual in the sample, the first four scores would be free to vary, but the fifth score would be predetermined once you have assigned the other four scores. For some reason, using df instead of n gives just the right amount of correction to the variance and standard deviation scores to make them "unbiased" estimates of the population parameters. Characteristics of the standard deviation 1. Adding or subtracting a constant from each score in the distribution will have NO effect on the standard deviation (SD). 2. Multiplying or dividing each score by a constant will cause the SD to be multiplied or divided by that constant. 8/27/07Announcements: 1. 2. 3. 4. 5. Textbook- http://vas.web.arizona.edu, Download the Study Guide Lecture notes- "230 Lecture Notes" Sept 17 Class-- Jenne may start class at 6pm. Book release party- UA Bookstore, 4:30-6:30pm, Thurs, Sept 20, refreshments The "B" list. CH.2: Freq. Distributions This distribution of scores, 5,3, 2, 3,1, 1, 0, 4, 3,4, 2, 2, can be arranged into a frequency distribution table: X 5 4 3 2 1 0 f 1 2 3 3 2 1 This table shows the frequency with which each score occurs in the distribution, where X = score, and f = frequency. Rules for Constructing a grouped freq distribution table: 1. You should have no more than 10 class intervals. 2. The width of each interval should be a simple number: 2, 5, 10, or 20. 3. The bottom score should be a multiple of the interval. 4. All intervals should be the same width. Graphing data- You can present your data in the form of a table or a graph. Three different types of graphs1. Histogram-- Can only be used with data measured on an interval or ratio scale. 2. Polygon--Interval or ratio scale only. 3. Bar graph-- Can only be used with data measured on a nominal or ordinal scale. Every score has real limits-- There is some amount of error in the score. Assumption: The scores that are shown are accurate to the nearest unit, plus or minus half a unit. A score of 2, for example, is really somewhere between 1.5 and 2.5. Those are the lower and upper real limits for that score, respectively. Types of distributions 1. Normal- Perfectly symmetrical. The greatest frequencies occur in the middle of the distribution with equal declines in frequency on both ends. 2. Skewed distributions a. Positive skew-- The "tail" of the distribution is for the higher scores. b. Negative skew- The "tail" represents the lower scores. You can have a bimodal distribution, as well, which has more than one frequency peak: Percentile rank-- The percentage of individuals in a distribution with a score at or below a particular value. Percentile score-- The percentage of the sample at or below a specific individual. Cumulative frequency (cf)- The frequency of scores at or below your score. In the table shown here (see below), a score of 3.5 has a percentile rank of 70% and a score of 2.5 has a percentile rank of 30% because these are the upper limits of the class intervals shown in the table. What is the percentile rank of 3.0 in this table? The only way to determine this is by a process called "interpolation." If X=3.5 has a percentile rank of 70% and X=2.5 has a percentile rank of 30%, then X=3.0, which is halfway between the other two scores, should have a percentile rank that is halfway between 30 and 70 percent (i.e., 50%). Rules of interpolation: 1. Find the width of the interval on both scales (one scale is score and the other is percentile rank) In the problem we just looked at, the width of the class interval was 1 unit (3.5 - 2.5) and the width of the percentile rank interval was 40% (70% - 30%) 2. Locate the position of the intermediate value in the interval. This is the fraction of the interval width. In this example, 3.0 is halfway between 2.5 and 3.5 fraction is .5/1, because (3.0 -2.5)/(3.5 - 2.5) 3. Use the fraction to determine the distance from the bottom of the interval, multiplying the fraction by the width of the interval. (40%)(.5) = 20% 4. Add this distance to the bottom to determine the position on the unknown scale. 30% (bottom score) + 20% (distance from bottom score) = 50% Problem 23 on p.66: 8/20/07CH.1: Intro to Statistics Population vs. sample- The population is the set of all individuals that you are interested in studying. The sample is the subset of that population that you choose to study and to represent the population. Parameter vs. statistic- A parameter is a value of a variable that describes a population (e.g., average height). Example: For males in the US, the population parameter for height is 69 inches. A statistic is a value of a variable that describes a sample. Example: For males in this class, the average height is 72 inches. Descriptive vs. inferential statistics. In descriptive statistics, we are looking to summarize and describe the characteristics of our sample. In inferential statistics, we study a sample and then make generalizations about the population parameters based on the sample statistics. If we measure the height of all the males in our sample and find that the mean height is 72 inches, then we might infer that the average height in the population is 72 inches. When we use inferential statistics, we have sampling error. In this case, the sampling error is 3 inches, which is the discrepancy between the sample statistic and the population parameter. Two basic quantitative methods of research in psychology: 1. Correlational method- Measure two or more variables and look for the numeric relationship between the two. Example: The relationship between oat bran consumption and heart health. How would you study this relationship? In a correlational design, you might give people a survey to find out how much oat bran is in their diet and then you might test their heart health in a variety of ways, such as measuring blood pressure or oxygen consumption when exercising. Correlation does not imply causation. Why? Because the relationship between two variables, such as oat bran and heart health, can be mediated by any number of factors, such as the fact that eating oat bran keeps you from eating something much worse for your heart. 2. Experimental method- Here, we manipulate one variable (called the "independent variable") and then measure the effect of that manipulation on a second variable ("dependent variable"). Example: I'm going to test the hypothesis that writing out the information in the study guide improves students' test performance. The independent variable is "study guide use" and it has two levels: One level is "yes" (meaning that yes, people in that condition are allowed to write out the information) and the other is "no" (these people cannot write the information). Two groups: The treatment group (or experimental group) receives the treatment that we believe will have an effect on the dependent variable, namely writing out the information in the study guide. The control group receives either no treatment or a neutral level of treatment (write out all the information in the daily comic strips). The dependent variable in this example is test score. It's called dependent because we are assuming that changes in this variable depend on or are influenced by changes in the independent variable. In an experimental design, if done properly, we can make inferences about the causal relationship between the independent variable (iv) and the dependent variable (dv). The strength of the causal inference that we can make about this relationship based on our study is called the validity of the study, and specifically the internal validity. In experimental designs, you must look out for the effects of confounds. These are variables that have an unanticipated but significant effect on the dependent variable. If you let people self-select into the experimental and treatment conditions, they may be do based on internal factors that you are not considering (e.g., laziness versus self-discipline). In experimental research, your goal is to manipulate the iv and control the possible effect of any confound on your dv. Third approach not mentioned in your book: 3. Quasi-experimental approach-- This approach looks like experimental research in most ways except for one: there is not an IV that can be manipulated. Example: If you're looking at the effect of gender or age on test performance, the DV is test performance but gender or age is not a true IV. Why not? You cannot manipulate age or gender. (Flynn effect: In all the time that IQ data have been collected, raw IQ scores have been increasing steadily). Constructs and operational definitions. A construct is an internal variable that cannot be measured directly (e.g., intelligence, creativity, psychological adjustment). An operational definition is a way of definining a construct that has a precise method of measurement. One measure of intelligence is verbal fluency. List the number of words associated with "doctor." Discrete vs. continuous variable- A discrete variable is one that separates individuals into categories (e.g., gender). A continuous variable is one that may produce an infinite number of possible values (e.g., height, weight). There are four scales of measurement: Nominal--Used with discrete variables, separating individuals into categories that have no quantitative distinctions (e.g., gender, ethnicity) Ordinal--Used with discrete variables, in cases where there are perceived differences in ranking or magnitute Rankings or standings in sports 1. Angels 2. Mariners 3. Rangers Interval- Used with continuous variables, where there is no absolute zero and where every interval is identical. Example: Temperature in Celsius or Fahrenheit It's considered interval because the interval separating 30 and 31 degrees is the same as the interval separating 300 and 301 degrees. But there is no absolute zero because 0 degrees Fahrenheit or Celsius does not mean the absence of temperature (there are definitely lower temperatures than those). Ratio- Used with continuous variables, where there IS an absolute zero and where every interval is the same. Example: Temperature in a Kelvin scale (0K= -273C), Percentage, precipitation, height, weight, length. Learning to use statistical notation: