Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
History of statistics wikipedia , lookup
Foundations of statistics wikipedia , lookup
Psychometrics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Categorical variable wikipedia , lookup
Omnibus test wikipedia , lookup
Misuse of statistics wikipedia , lookup
Page 1 EDFI 641 Statistics in Education Course Packet Dr. Rachel Vannatta Table of Contents Video #1—Introduction to Statistics................................................................2 Video #2—Frequency Distributions ...................................................................6 Video #3—Central Tendencies & Variability .................................................. 10 Video #4—Probability & z Score ...................................................................... 18 M-n-M Activity........................................................................................ 19 Video #5—Distribution of Sample Means...................................................... 24 Video #6—Hypothesis Testing......................................................................... 28 Video #7—t Test ................................................................................................. 38 Video #8—t Test of Independent Samples................................................... 44 Interpreting Research ......................................................................... 49 Video #9—t Test of Related Samples............................................................ 50 Interpreting Research ......................................................................... 54 Coke vs. Pepsi Experiment ................................................................... 55 Video #10—AVOVA............................................................................................. 57 Interpreting Research ......................................................................... 63 Video #11—Correlation & Regression.............................................................. 64 Interpreting Research ......................................................................... 73 Video #12—Chi Square ...................................................................................... 75 Interpreting Research .......................................................................... 81 Statistical Test Grid ............................................................................ 82 Unit Normal (z-score) Table ............................................................................. 84 t Distribution Table ............................................................................................ 88 F Distribution (ANOVA) table .......................................................................... 89 Pearson Correlation Table.................................................................................. 90 Chi Square Distribution Table ........................................................................... 91 Video #1—Introduction to Statistics Page 2 Population—the entire group of individuals that the researcher WISHES to study. Sample—a set of individuals selected from population, intended to represent the population Parameter—value that describes the population Statistic—value that describes the sample Two major types of statistical methods • descriptive stats—summarize, organize and simplify data (e.g., mean, standard deviation, tables, graphs, distributions) • data • raw score • inferential stats—techniques that allow us to study samples and make generalizations about the population from which they were selected (e.g., t test, ANOVA, correlation) • sampling error—amount of error between the sample statistic and the population parameter (degree to which the sample differs from the population) • random sampling—used to minimize error between sample and population Inferential statistics also allow us to study relationships between/among variables that the sample holds. • variable—characteristic/condition that differs among individuals (gender, height, test scores, IQ) • construct—hypothetical concepts/theory to organize observations • operational definition—defines a construct in terms of how it is measured Types of Variables • categorical variable (discrete)—consists of separate categories (e.g., gender, religion, classification of personality) • quantitative variable (continuous)—can be divided into an infinite number of fractional parts (e.g., height, time, age) • independent variable—usually a treatment that has been manipulated (control group versus experimental group), usually categorical • dependent variable—usually the effect, usually quantitative • confounding variable—an uncontrolled variable that creates a difference between the control and experimental groups Variables determine type of relationship being studied • mutual • causal • Groups must be compared to examine cause and effectÆ groups are created by a categorical variable Independent Variable Dependent Variable Key Words Causal Categorical Mutual Quantitative Quantitative Cause Effect Increase/Decrease Difference Quantitative Relate Relationship Predict Associate Page 3 Class #1: In-Class Practice Problems In the following research questions, identify the independent and dependent variables and indicate if it is categorical or quantitative. 1. Is there a significant relationship between college GPA and SAT scores among college freshmen? independent variable— dependent variable— research design— 2. Does receiving a special diet of oat bran significantly decrease cholesterol levels among middle-age adults? Note: Researcher compared a treatment group to a control group. Groups were created using random selection and assignment. independent variable— dependent variable— research design— 3. Does socio-economic status (low, middle, high) effect reading achievement among preschoolers? independent variable— dependent variable— research design— 4. Does receiving whole-language reading instruction increase reading achievement among elementary students? Note: Research compared treatment group (whole-language) to control group (traditional). Existing groups were used. independent variable— dependent variable— research design— Page 4 Research Designs • • • • Correlational—studies relationships among 2 or more variables to explain for predict behaviors • usually both IV and DV are quantitative • example: Teacher studies the relationship between English grades and overall GPA. Experimental—examines cause and effect; manipulates a treatment and tests the outcome; compares the experimental and control groups (groups are randomly created) • IV=nominal; DV= interval/ratio • example: Researcher compares grades of a group of students that receive computer-assisted instruction to a group that receives none. Groups were created through random assignment. Quasi-Experimental—examines cause and effect; indirectly manipulates a treatment and tests the outcome; compares the experimental and control groups (uses existing groups) • IV=nominal; DV= interval/ratio • example: Researcher compares grades of a group of students that receive computer-assisted instruction to a group that receives none. Existing groups were used. Causal Comparative—examines cause and effect (cautiously); compares groups created by some categorical characteristic (gender, religion, ethnicity) • IV=nominal; DV= interval/ratio • example: Researcher compares final grades of male and female students. Most research is guided by a hypothesis, a prediction about the effect of the treatment. Measurement Scales • Nominal—numbers have NO numerical value but represent categories (religion, ethnicity, occupation, gender) • Ordinal—numbers represent a rank (1 begin the best); interval can vary (e.g., class rank, Olympic ordinals) • Interval—numbers have typical numerical value; interval are equal; no real zero (e.g., temperature, test score) • Ratio—same as interval but has a real zero (e.g., money, time) Page 5 Identify the measurement scale (nominal, ordinal, interval, ratio) for each. _________________5. Size of school district (small, medium, large) _________________6. Rank of faculty on their teaching ratings _________________7. Social security number _________________8. Color of person’s eyes _________________9. IQ scores _________________10. Degree in Fahrenheit _________________11. Religious affiliation ________________12. Medalists in an Olympic event ________________13. Income in actual dollars Page 6 Video #2—Frequency Distributions Frequency distribution—table/graph of the number of individuals located in each category • places scores in highest to lowest; • groups together all individuals who have the same score f X 10 9 8 6 5 4 1 4 5 6 2 2 Proportion and Percents of Frequency Distributions • Proportion—relative frequencies; measures the fraction of the total group that is associated with each score; most often appear as decimals • • proportion = p = f N Percentage—percent of the total group that is associated with each score • X 10 9 8 6 5 4 percentage = p (100) = f (100) N f 1 4 5 6 2 2 p=f/N 1/20=.05 4/20=.20 5/20=.25 6/20=.30 2/20=.10 2/20=.10 %=p(100) 5 20 25 30 10 10 cum f 1 5 10 16 18 20 cum% 5 25 50 80 90 100 Page 7 Grouped Frequency Distribution Table • used when data covers a wide range of values; groups are based on class intervals • to construct a grouped frequency distribution table, follow these rules: • rule 1—number of intervals—shoot for 8-12 intervals, 10 intervals being the ideal • rule 2—interval width—use appropriate width to reach appropriate # of intervals • rule 3—interval starting pt—should be a multiple of the width • rule 4—all intervals should be the same width Helpful Hints • use the following equation to determine the number of intervals and the width of intervals that is appropriate for the data • number of intervals = highest score - lowest score + 1 * interval width • ALWAYS round up the number of intervals! It is impossible to have a fourth of an interval at the end of the distribution. So even if the number of intervals (using the above formula equals 8.25, round up to 9!* • try different widths, until an appropriate number of intervals is calculated Example: N=25 51, 55, 57, 60, 63, 66, 68, 69, 70, 72, 74, 74, 74, 75, 77, 79, 83, 84, 85, 85, 88, 90, 92, 95, 98 • number of intervals = X 95-99 90-94 85-89 80-84 75-79 70-74 65-69 60-64 55-59 50-54 98 – 51 + 1 5 = 48 = 9.6 5 (round up to 10) f 2 2 3 2 3 5 3 2 2 1 * keep in mind that since a continuous variable contains an infinite number of points, a score is not assigned a single point but rather an interval with boundaries, also called real limits, that separate a score from the adjacent scores. Example: X=88 • upper real limit = 88.4 • lower real limit= 87.5 • therefore, a score of 87.75 would fall in the interval of X=88 Frequency Distribution Graphs • Uses an x-axis to represent scores or and a y-axis to represent frequencies • List scores increasing in value from left to right • List frequencies in increasing value from bottom to top • The height of the y-axis should be approximately 2/3 to 3/4 of the length of the x-axis Page 8 • Creating a Grouped Frequency Histogram—follow rules for Grouped Frequency Table • Histogram—used for interval/ratio data; a bar represents an interval (real limits of the score or class interval); bars touch each other to represent the continuous nature of the data; height corresponds to frequency • Example: Using data from the Grouped Frequency Table on previous page 5 4 f 3 2 1 50-54 Starting pt. is a multiple of the width (5) • 55-59 60-64 65-69 70-74 75-79 80-84 85-89 90-94 95-99 10 intervals meet the 8-12 interval requirement Interval width is 5 in order to generate 10 intervals Other Types of Frequency Distribution Graphs and Polygons • Bar Graph—used for nominal/ordinal data; a bar represents a category, bars do not touch • Frequency Distribution Polygons— used for interval/ratio data; a single dot represents an individual score or a class interval; dots are connected • Distribution Curve—shows relative frequencies for the population; smooth • Normal—symmetrical; greatest frequency in the middle, smallest frequency in the extremes (tails) • Positively Skewed— smallest frequency in the positive (right) end of the distribution • Negatively Skewed— smallest frequency in the negative (left) end of the distribution Video # 2 In-Class Practice Problems Page 9 10, 15, 18, 22, 25, 26, 29, 31, 33, 33, 34, 37, 38, 39, 39, 40, 40, 40, 41, 42, 42, 43, 44, 45, 46, 46, 47, 48, 49, 50 1. Using the data above, do the following: a. construct a histogram based upon the grouped frequency distribution b. determine the distribution type (normal, positive, negative) from the histogram Video #3: Central Tendency Page 10 Measure of Central Tendency • • • • • describes a group of individuals with a single measurement that is most representative of all individuals Types: mean, median, and mode Mean—arithmetic average • used for interval/ratio (quantitative) data • computed by adding all the scores and dividing by the number of scores • Population mean = μ = ΣX N Sample mean = X = ΣX n Median—the midpoint; the score that divides the distribution exactly in half; 50% are above and below the median • used for ordinal data or when: there is a skewed distribution, some scores are undetermined, or there is an open-ended distribution • Calculating the median when N is an odd number • make sure scores are in order; find the middle score • Calculating the median when N is an even number • make sure scores are in order; find the two middle scores; add the two scores & divide by 2 Mode—the most frequent score • used especially for nominal data • represented by the highest point in the frequency distribution Central Tendency and the Shape of Distributions • Normal distribution—mean, median, and mode are equal and smack-dab in the middle of the distribution • Skewed Distributions • not symmetrical • mean, median, mode are different • extreme scores on one end of the distribution • Mean is most affected by extreme scores, so it will be furthest out in the tail Negatively Skewed—extreme scores are on the low end of the distribution Mean Median Mode Page 11 Positively Skewed—extreme scores are on the high end of the distribution Mode Median Mean Variability Variability—a measure that describes how spread out or close together the scores are within the distribution • Range—distance between the highest score and the lowest score in the distribution; easiest measure of variability • range = (high score - low score) Distribution 1 Range = 10 Mean= 6 Median= 6 Mode= 6 SD= 2.45 7 6 6 5 5 5 4 4 4 3 3 3 2 2 2 1 1 1 0 1 2 3 4 5 6 7 8 9 10 11 Page 12 • Standard Deviation from the Mean • most common measure of variability; • average distance of scores from the mean 7 6 6 5 5 5 4 4 4 3 3 3 2 2 2 1 1 1 0 1 9 2 3 4 5 6 7 8 9 10 8 11 8 8 7 6 5 4 4 3 3 2 2 2 1 1 3 Distribution #2 2 2 1 0 1 2 3 4 5 6 7 8 9 10 11 Page 13 Page 14 Standard Deviation Activity o Need 16 pieces of candy (M-n-M’s, Skittles, etc.) o You must use all 16 pieces for each distribution. o Use Distribution Graph from Blackboard Course Site (located in Course Documents) Steps 1. For distribution A create a normal distribution like Dr. Vannatta’s with your candy. Trace outline of distribution. Now on your own, complete the following: 2. For distribution B, move candy around to create a distribution that has greater variability than A. Trace outline of distribution. 3. For distribution C, move candy around to create a distribution that has less variability than A. Trace outline of distribution. 4. For distribution D, move candy around to create a distribution that has the least possible amount of variability. Variability Key Concepts • • • Variability shows how spread out scores are in the distribution. • Range only takes into account the two extreme scores (highest and lowest) • Standard deviation compares all scores to the mean When scores are close to the mean, then variability is less. When scores are far from the mean (outliers, extreme ends of the distribution), then variability is more. Calculating Standard Deviation • standard deviation for population = σ = Σ(X - μ)2 N • standard deviation for sample = s = Σ(X - X)2 n-1 • degrees of freedom (df = n - 1) —an adjustment of sample bias; to calculate the standard deviation, we must know the sample mean—this places a restriction on sample variability since only (n - 1) scores are free to vary once we know the sample mean. Page 15 • Example for calculating the standard deviation for a sample (X - X)2 X X X - X 2 5 -3 9 3 5 -2 4 3 5 -2 4 4 5 -1 1 4 5 -1 1 4 5 -1 1 5 5 0 0 5 5 0 0 5 5 0 0 5 5 0 0 6 5 1 1 6 5 1 1 6 5 1 1 7 5 2 4 7 5 2 4 8 5 3 9 • Variance(s2) = Σ (X - X)2 = SS = 40 = 2.6 n–1 • n–1 Standard dev (s) = Σ(X - X)2= SS = 2.6 = 1.62 n-1 n-1 • Sum of squares—sum of squared deviation scores or sum of squared differences • SS = Σ(X - X)2 also SS = s2(n-1) • Variance—mean of squared deviation scores; sum of squares divided by the number of scores minus 1 • variance = s2 = Σ(X - X)2 n-1 SS = 40 Steps to Calculate Standard Deviation 1. 2. 3. 4. 15 Calculate mean (X) Calculate the difference between each score and the mean (X – X) Square each difference (X –X)2 Add the squared differences • This is the Sum of Squares (SS) = Σ(X – X)2 5. Divide SS by degrees of freedom (df = n-1) This is Variance = Σ(X – X)2 n-1 6. Take the square root of variance • This is the Standard Deviation (SD) = • Σ(X – X)2 n-1 Page 16 Standard Deviation Calculation Practice a. Calculate the standard deviation for the following data (X=6). X X X – X b. Calculate the standard deviation for the following data (X=6). Notice the mean is the same, but three scores have been changed to 6. (X – X)2 X 2 2 2 6 8 6 8 6 10 10 X X - X SS= (X - X)2 SS= How does the change in data effect the SD? Why? Characteristics of standard deviation • a small standard deviation indicates that scores are close together • a large standard deviation indicate that scores are spread out • adding a constant to each score will not chance the standard deviation • multiplying each score by a constant cause the standard deviation to multiply by that same constant • research articles usually use (SD) to refer to the standard deviation • Standard deviation and the normal distribution • three standard deviations on each side of the mean -3σ −2σ −1σ mean +1σ +2σ +3σ Video #3: In-Class Practice Problems For the following sample of scores: 1, 2, 3, 3, 4, 4, 5, 5, 5, 5, 6, 6, 7, 7, 8, 9 Page 17 This data is slightly different from what is presented in the video so that a “cleaner” mean would be calculated. a. Sketch a frequency distribution histogram. b. Calculate the following: mean = ____________________ median = ____________________ mode = ____________________ range = ____________________ degrees of freedom = ____________________ standard deviation = ____________________ c. From you calculations, identify the distribution type. Page 18 Video #4: Probability Probability is used to: • determine the types of sample we are likely to obtain from a population • make conclusions about the population from the sample Probability—fraction, proportion or percent of selecting a specific outcome out of the total number of possible selections • • probability of A = number of A’s total number of possible outcomes • probability of selecting a heart out of deck of cards • p (heart) = 13 = 1 = .25 Æ 25% 52 4 Probability and the Normal Distribution • • • • A normal distribution holds 100% of the individuals in it the mean, median and mode are all equal and divide the distribution in half 50% of distribution is above and below the mean When the percent is divided by the standard deviations, it looks like this 99.7% 95% 68% 13.59% 2.14% .13% 34.13% 34.13% 13.59% 2.14% .13% -3σ −2σ −1σ mean +1σ +2σ +3σ 0.13% 2.28% 15.87% 50% 84.13% 97.72% 99.87% Page 19 z z Page 20 z Scores z score—measure of relative position; identifies position of a raw score in terms of the number of standard deviations it falls above or below the mean • Use z scores to convert raw score into percentile rank z=X-μ σ Example: Jill gets a raw score of 55 on a standardized math test (μ=50, σ=10). What is Jill’s z score? • z = X - μ = 55 - 50 σ 10 = 5 = .5 10 So Jill is .5 standard deviation above the mean. 99.7% 95% 68% 13.59% .13% 34.13% 34.13% 13.59% 2.14% 2.14% -3z -2z -3z mean 1z 2z 0.13% 2.28% 15.87% 50% 84.13% 97.72% .13% 3z 99.87% View area under the normal curve in terms of probability and percent: • • • • • • What What What What What • is the probability of selecting a score that fall beyond 1z? p=.1587 is the probability of selecting a score that fall below -2z? p=.0228 is the percentile rank of someone who has a z score of 2? 98th %tile is the percentile rank of someone who has a z score of 1? 84th %tile if we have a z-score of 1.2, how can we find the probability or percentile rank? we use the table of z scores provided in your course packet (see statistical tables on page 84) Page 21 Putting it all together • Suppose Jack receives a raw score of 540 on the SAT-math (μ=500, σ=100). What is Jack’s z score and percentile rank? Jack z = 540 - 500 = .4 100 Proportion (p) = .6554 Rank = 65.54 %tile 200 -3z 300 -2z 400 -1z 500 0z 600 +1z 700 +2z 800 +3z • Use z score determine an unknown raw score • Suppose an individual scored at the 70th % on a standardized test (μ,=100, σ=10), but for some reason we don’t know his raw score and need to calculate it. 1. Use the equation: raw score = μ + zσ 2. Use the percentile rank and convert it to a probability (example: 70% Æ .7000). 3. Use the z-table to identify the z-score associated with the probability • .7000 corresponds to a z-score of z=.52 (Notice that we could not find a probability of exactly .7000 but had to find a probability that was closest to .7000, which was .6985). • Now just plug • raw score • raw score • raw score in z, μ, and σ to our equation = μ + zσ = 100 + .52(10) = 105.2 Video #4: In-Class Practice Problems Page 22 For the problems 1-4, apply the parameters (μ = 50, σ = 5). 1. Draw the distribution. Include z-scores and mean and standard deviation. 2. Bebe scored 48. Place Bebe’s score on the distribution. What is her z-score and percentile rank? 3. Kenny scored 63. Place Kenny’s score on the distribution. What is his z-score and percentile rank? 4. Sally is at the 71st percentile. Place Sally on the distribution. What is her z-score and raw score? Page 23 For the problems 5-7, use the following parameters from the GRE (μ = 500, σ = 100). 5. Mary scored 570. What is her z-score and percentile rank? 6. Dick scored 340. What is his z-score and percentile rank? 7. Jill is at the 38th percentile. What is her raw score? For the problems 8-10, use the parameters from an IQ test (μ = 100, σ = 15). 8. Wendy scored at the 90th percentile. What is her raw score? 9. What percent falls between the scores of 100 and 115? 10. Jack scored 80. What is his z score and percentile rank? Answers for Class #4 In-Class Problems: 5) z=.7, percentile rank=75.8; 6) z=-1.6, percentile rank=5.48; 7) z=-.31, raw score = 469; 8) z=1.28, raw score = 119.2; 9) 34.13% fall between the mean and 1z; 10) z=-1.33, percentile rank = 9.2 Video #5: Distribution of Sample Means Page 24 With statistics, we are usually trying to make conclusions/inferences about the population from the studied sample. • Consequently, we want to compare the sample to the population of similar samples. But in doing so, two issues arise: • How do we know is a sample is representative of the population when every sample is different? • How can we transform a population distribution of individuals to a population distribution of sample means? • Every sample is different from the population, this is known as sampling error, or the discrepancy/error between the sample and the population. • Random sampling is used to minimize sampling error, which can occur randomly If we were to take a population distribution of individuals. . . • randomly group individuals into similar sized samples • then calculated the means of these samples and placed them into a frequency distribution • a normal curve would form—this distribution is known as the distribution of sample means. • any distribution that is of sample statistics and NOT individual scores is referred to as a sampling distribution. Characteristics of the distribution of sample means • • • • will approach a normal distribution as sample size increases (a sample size greater than 30 is considered normal) the mean of the distribution of sample means is equal to the population mean of individuals and is also known as the expected value of X. standard deviation of this new distribution is called the standard error of X. standard error (σx)—measures the standard distance between the sample mean (X) and the population mean (μ); indicates how good an estimate X will be for μ. • standard error (σx) = • as sample size increases, the standard error will decrease-----> which means that the samples are more representative of the population σ n Page 25 Probability and the Distribution of Sample Means We can now use the distribution of sample means to find the probability of obtaining a specific sample mean from the population of samples • • Example: What is the probability of getting a sample mean of 515 or higher on the SAT-math (μ=500, σ=100) with a random sample of n = 400? • Calculate the standard error for samples of n=400. • σx = • pop of individuals pop of samples (n=400) σ = n 100 400 = 100 20 = 5 Draw distribution of sample means -3z 200 485 -2z 300 490 -1z 400 495 0z 500 500 +1z 600 505 +2z 700 510 +3z 800 515 • A sample mean of 515 corresponds to +3z • Using the z table, +3z corresponds to a probability of .0013 (.13%) Page 26 • What if the sample mean does not correspond to a whole z score? • • Use z = X - μ σx Example: What is the probability of getting a sample mean of 104 or higher on an IQ test (μ=100, σ=15) with a random sample of n = 36? • Calculate the standard error for samples of n=36. • • σx = σ n = 15 6 = 2.5 Draw distribution of sample means pop of individuals pop of samples (n=36) • = 15 36 -3z 55 92.5 -2z 70 95 -1z 85 97.5 0z 100 100 +1z 115 102.5 +2z 130 105 A sample mean of 104 corresponds to +1.6z z = X - μ = 104 − 100 = 4 = 1.6 σX 2.5 2.5 • Using the z table, +1.6z corresponds to a probability of .0548 (5.48%) +3z 145 107.5 Page 27 Video #5: In-Class Practice Problems 1. A normal population has μ = 70 and σ = 12. a. Sketch the population distribution. What proportion of the scores have values greater than a score of X = 73? b. Sketch the distribution of sample means for samples of size n = 16. What proportion of the means have values greater than a mean of X = 73? -3z -2z -1z 0z +1z +2z +3z pop of individuals pop of samples (n=36) 2. For a normal population with μ = 70 and σ= 20, what is the probability of obtaining a sample mean greater than X = 75 a. For a random sample of n =4? b. For a random sample of n =16? c. For a random sample of n = 100? pop of samples (n=4) pop of samples (n=16) pop of samples (n=100) -3z -2z -1z 0z +1z +2z +3z Video #6: Hypothesis Testing Page 28 Hypothesis Testing—using sample data to evaluate a hypothesis (prediction) about the population so conclusions/inferences can be made about the population from the sample • • We are testing a hypothesis to determine if the treatment has caused a significant change in the population the majority of sample means are in the middle of the distribution; so for a sample to be significantly different, it should be with the extreme means in the tails of the distribution, where the probability is very low Steps in Hypothesis Testing 1. Stating the Hypotheses 2. Establish significance criteria 3. Collect and analyze data 4. Evaluate null hypothesis 5. Draw conclusion Step 1—Stating the Hypotheses • • hypotheses should be stated in terms of the population like a research question, your hypothesis should include three parts: variables, relationship, and sample • two hypotheses must be developed—an alternative and a null • Write alternative hypothesis in statement form • Write notation for both alternative and null • • • hypotheses can also be directional or non-directional • non-directional—just a prediction of a change/effect • Key words: effect, impact, difference, cause • • alternative hypothesis—the actual prediction about the change or relationship that may occur in the population null hypothesis—statement that the treatment has no effect on the population directional—a prediction of increase or decrease • Key words: increase, decrease, higher, lower, positive, negative Summary of Hypotheses Notation (applying example values of μ=60) Alternative Null One-tailed H1: μsprog > 60 H0: μsprog ≤ 60 Two-tailed H1: μsprog< 60 H1: μsprog ≠ 60 H0: μsprog ≥ 60 H0: μsprog = 60 (Directional) (Non-directional) Page 29 • Example: Suppose that local school district implemented an experimental program for science education. After one year, 100 children in the special program obtained a mean score of X=63 on a national science achievement test (μ=60, σ=12). Did the program have an impact on the participants’ science achievement? • alternative—The science program will significantly effect science achievement among program participants. This is an example of a non-directional hypothesis; • • H1: μsprog ≠ 60 null— The science program will NOT significantly effect science achievement among program participants. • H0: μsprog = 60 Step 2—Establish significance criteria • • • • How much does the population need to change to show a significant effect from the treatment? Is the change due to the treatment or sampling error? Typically to be significantly different, we require the sample to be different from 95% or 99% of the population • By setting a benchmark or criteria that requires the change in the population mean to be quite large and the probability of this change due to be very low, we decrease our chance of a Type I error • • • this criteria is known as the level of significance or alpha level (α) most commonly used alpha levels are .05 (5%) and .01 (1%) these levels of significance correspond with specific z scores, but depends upon whether the hypothesis is directional or non-directional non-directional hypothesis--->2-tailed test • .05 level -------> zcritical = ± 1.96 • .01 level -------> zcritical = ± 2.58 99% 95% -3z -2z -1z 0z -2.58z -1.96z • directional hypothesis---> 1-tailed test • .05 level -------> zcritical = + or - 1.65 • .01 level -------> zcritical = + or - 2.33 +1z +2z +3z +1.96z +2.58z 95% 99% • -3z -2z -1z 0z when the sample mean exceeds the limit, then it differs significantly so we would reject the null +1z+ 2z +1.65z +2.33z +3z Page 30 Step 3—Collect & analyze sample data--random selection highly recommended so that sample is representative of population • • Recall that when a test statistic is calculated by hand, you need to identify the critical value (zcritical), which is then compared to the test statistic (zcalculated) to determine significance. Computer automatically determines the probability of obtaining a test statistic due to chance. Consequently, when determining significance you do NOT compare zcalculated to zcritical, rather you examine the p-value or level of significance. • • If p (or sig) is less than alpha level (.05 or .01) Ætest statistic is significantÆreject the null. If p (or sig) is greater than alpha level (.05 or .01) Ætest statistic is NOT significantÆfail to reject the null. Decision-making Table Hand Calculations Computer Comparison Significance? Decision? Conclusion zcalculated ≥ zcritical zcalculated < zcritical p ≤ alpha p > alpha Significance! Not! Reject Null Fail to Reject Null Restate Alternative Restate Null Significance! Not! Reject Null Fail to Reject Null Restate Alternative Restate Null Step 4–Evaluate the null hypothesis Compare the data with the null • if the sample data is significantly different, then reject the null • if the sample data is NOT significantly different, then fail to reject the null • Step 5—Draw conclusion If null is rejectedÆrestate alternative hypothesis for conclusion. If you fail to reject the nullÆstate the null hypothesis as conclusion • • Errors in Hypothesis Testing--Two types of errors are possible when testing a hypothesis: • • Type I Error—we could make the mistake of rejecting the null when it really the H0 is true, when there really isn’t a significant change due to the treatment • this kind of error may be due to sampling error (the sample was above the population mean even before the treatment) • minimize a Type I error by setting low alpha (α) level (low probability for making an error) • Type I error is more serious! Type II Error— we could make the mistake of not rejecting the null when we should have, when there really is a significant change due to the treatment • the treatment effect was not big enough most likely due to sampling error (the sample was below the population mean even before the treatment) Page 31 • Putting it all togetherÆExample of a two-tailed test • Let’s go back to our previous example of the science program: After one year, 100 children in the special program obtained a mean score of 63 on a national science achievement test (μ=60, σ=12). Did the program have an impact on the participants’ science achievement? Test at the .05 level. • Step 1: Develop hypotheses • • • State Alternative—Special science program will significantly effect science achievement among program participants. Determine if it is a one-tailed or two-tailed test. • It is non-directional hypothesis ------>two-tailed • Notation: H1: μsprog ≠ 60 H0: μsprog = 60 Step 2: Establish significance criteria • ComputerÆα = .05 • Hand calculationsÆidentify z scores used for the alpha level and the appropriate test. • • two-tailed test at .05 corresponds to zcritical = ± 1.96 Step 3: Collect and analyze sample data • ComputerÆenter and analyze data • Hand calculationsÆ • Calculate standard error σx = σ = 12 = 12 = 1.2 n 100 10 • Draw distribution of sample means and shade in critical region 95% pop of individuals pop of sample means (n=100) -3z 24 56.4 -2z 36 57.6 -1.96z -1z 48 58.8 0z 60 60 1z 72 61.2 2z 84 62.4 +1.96z 3z 96 63.6 Page 32 • Step 4: Compare sample data to null • ComputerÆ • • • Identify test statistic and level of significance (p-value) in output • z = 2.49, p=.0064 Compare level of significance with alpha level • p-value of .0064 is less than .05Æit is significantÆreject null Hand calculationsÆ • • Calculate test statistic Convert sample mean into z score to determine if it falls in critical region. z = X − μ = 63 - 60 = 3 = 2.5 σX 1.2 1.2 • it exceeds +1.96z, so it is significant, reject the null Step 5: Draw conclusion— • Null is rejected so alternative hypothesis is restated as conclusion • Participation in the science program did significantly effect science achievement scores among program participants. Example of a one-tailed test: Suppose we took the same example, but hypothesized that the program would cause a significant increase in achievement scores--this would be a directional hypothesis. In addition, let’s change the level of significance to .01 Recall: n = 100, X = 63, μ = 60, σ = 12 • Step 1: Develop hypotheses • • • State alternative: Special science program will significantly increase science achievement scores among program participants. Determine if it is a one-tailed or two-tailed test. • It is directional hypothesis ------>one-tailed H1: μsprog > 60 H0: μsprog < 60 Step 2: Establish significance criteria • ComputerÆα = .01 • Hand calculationsÆIdentify z scores used for the alpha level and the appropriate test. • • one-tailed test at .01 corresponds to z = + 2.33, since we are looking for an increase, we are focusing on the positive end of the distribution Step 3: Collect and analyze sample data • ComputerÆenter and analyze data • Hand calculationsÆ • Calculate standard error σx = σ n = 12 100 = 12 10 = 1.2 Page 33 • Draw distribution of sample means and shade in critical region 99% -3z pop of individuals 24 sample means (n=100) 56.4 -2z 36 57.6 -1z 48 0z 60 58.8 60 +1z 72 61.2 +2z 84 62.4 +3z 96 63.6 +2.33z • Step 4: Compare sample data to null • ComputerÆ Identify test statistic and level of significance (p-value) in output • z= 2.49, p=.0032 • Compare level of significance with alpha level • p-value of .0032 is less than .01Æit is significantÆreject null Hand calculationsÆ • Calculate test statistic • Convert sample mean to z score to determine if it falls into the critical region. • • z = X - μ = 63 – 60 = σX 1.2 3 = 2.5 it exceeds +2.33z, 1.2 so it is significant,reject the null • Step 5: Draw conclusion • Null is rejected so alternative hypothesis is restated as conclusion • Participation in the science program did significantly increase achievement scores among program participants. Page 34 Assumptions for Hypothesis Testing with z Scores • • • random sampling and independent observations population standard deviation will remain the same after the treatment; it is like adding a constant—the mean changes but the σ will not normal sampling distribution Reporting of Results of the Statistical Test • • • • p-value is reported in as: • reject the null—p<.05 • fail to reject the null—p>.05 z test results statement include the following parts: • sample mean; (M=63) • z calculated with the degrees of freedom in parentheses; (z(99) = 2.5) • to calculate degrees of freedom (df); df = n - 1 • in our example, n=100, so df= n-1 = 100 - 1 = 99 • alpha level; (p< .05) • two-tailed or one-tailed include population mean and SD (μ=60, σ=12) Example from one-tailed test: Participation (M=63) in the science program did significantly increase achievement scores; z(99)=2.5, p<.05, one-tailed; when compared to the population (μ=60, σ=12). Video #6: In-Class Practice Problems Page 35 Complete the process of hypothesis testing for each of the scenarios. 1. A high school counselor created preparation course for the SAT-verbal (μ=500, σ=100). A random sample of n = 16 students complete the course and then take the SAT. The sample had a mean score of X = 554. Does the course have a significant affect on SAT scores? Test at the .01 level. Z-test results: μ - mean of Variable (Std. Dev. = 100) H0 : μ=500 HA : μ not equal 500 Variable var1 n Sample Mean 16 Std. Err. 554 25 Z-Stat P-value 2.16 0.0308 a. Alternative hypothesis in sentence form. b. Circle: one-tailed or two-tailed c. Write the alternative and null hypotheses using correct notation. H 1: H0: d. zcalculated = f. Circle: e. Level of significance (p) = reject null or fail to reject null g. Write your conclusion in sentence form. Page 36 2. A researcher believes that children who grow up as an only child develop vocabulary skills at a faster rate than children in large families. To test this, a sample of n = 25 four-year-old only children are tested on a standardized vocabulary test (μ=60, σ=10). The sample obtains a mean of X = 63.8. Test at the .05 level. Z-test results: μ - mean of Variable (Std. Dev. = 10) H0 : μ=10 HA : μ > 10 Variable var1 n Sample Mean 25 Std. Err. 63.8 Z-Stat 2 26.9 P-value <0.0001 a. Alternative hypothesis in sentence form. b. Circle: one-tailed or two-tailed c. Write the alternative and null hypotheses using correct notation. H 1: H0: d. zcalculated = f. Circle: e. Level of significance (p) = reject null or fail to reject null g. Write your conclusion in sentence form. There was an error when conducting this test. The population mean is NOT 10 but rather 60. The result is still significant, but the z-statistics would have been 1.93 with p=.03. Page 37 3. A psychologist investigates IQ among autistic children to determine if their IQ is significantly different from the norm. Using a standardized IQ test (μ=100, σ=10), he tests 10 autistic children, all age 12. The following output was generated using StatCrunch. Test at α = .05. Sample data are: 105, 110, 130, 150, 185, 100, 125, 95, 85, 120 Z-test results: μ - mean of Variable (Std. Dev. = 10) H0 : μ=100 HA : μ not equal 100 Variable n var1 10 Sample Mean Std. Err. Z-Stat P-value 120.5 3.1622777 6.4826694 <0.0001 a. Alternative hypothesis in sentence form. b. Circle: one-tailed or two-tailed c. Write the alternative and null hypotheses using correct notation. H 1: H0: d. zcalculated = f. Circle: e. Level of significance (p) = reject null or fail to reject null g. Write your conclusion in sentence form. Page 38 Video #7: The t Statistic To use the z score as a test statistic, we must know the population standard deviation in order to calculate the standard error of sample means. Unfortunately, most of the time we do not know σ, so what do we do? The t statistic, commonly known as a t test, allows us to compare the sample to the null by using the sample standard deviation to estimate the standard error of sample means. estimated standard error (sX) = s n The t statistic uses a formula very similar to z but instead utilizes the estimated standard error. t=X-μ z= X-μ σX sX Tip on when to use which: • if you know σ, then use z • if you don’t know σ, use t Since we are comparing a single sample mean to a population mean, this t test is called Single Sample t Test or One Sample t Test. The t Distribution Since the t statistic utilizes the estimated standard error (sX), the t distribution only approximates the normal distribution and is based on degrees of freedom • (df = n - 1) not the total sample size. • as df and sample size increase, the closer the s represents σ, and the better the t distribution approximates the normal (z) distribution • since the t distribution has more variability, it is more spread out and flatter • we use the t statistic in a very similar way as we used z, in that we use a t distribution table to find the probability of a t statistic • note: since the t statistic is dependent on degrees of freedom, the critical t statistics corresponding to levels of significance (α) vary with the degrees of freedom, unlike the critical z scores (where a two-tailed test at .05 will always corresponds to zcritical = ± 1.96) Summary Table of Hypotheses Notation (applies values from following example) Alternative Null One-tailed H1: μ > 27 H0: μ ≤ 27 Two-tailed H1: μ ≠ 27 H0: μ = 27 Page 39 Reporting of Results of the t Test t Test results statement include the following parts: • results with sample mean and standard deviation; (M = 24.58 , SD = 3.48 ) • t calculated with the degrees of freedom in parentheses; (t(11) = -2.40) • alpha level or p-value; (p< .05) • two-tailed or one-tailed Example: Subjects (M = 24.58 , SD = 3.48) spent significantly less time talking to parents than the therapist’s claim; t(11) = -2.40, p< .05, two-tailed. Assumptions of the t test: independent observations, normal population Putting it all togetherÆExample of a two-tailed t test A family therapist states that parent talk to their teens an average of 27 minutes per week. Surprised by this claim, a counselor collects data on 12 teens and finds the following (X = 24.58, s = 3.48) Does the amount of parent talk for the sample significantly differ from the therapist’s claim? Test at the .05 level. • Step 1: Develop hypotheses • State Alternative: Amount of parent talk for sample will significantly differ from the norm. • Determine if it is a one-tailed or two-tailed test. • It is non-directional hypothesis ------>two-tailed • H1: μ≠ 27 (samples will be different) • H0: μ= 27 (samples will NOT be different) • Step 2: Establish significance criteria • ComputerÆ α=.05 • Hand calculationsÆIdentify tcritical used for the alpha level, the appropriate test, & df • two-tailed test at .05 (df =11) corresponds to tcritical = ± 2.201 • Step 3: Collect and analyze sample data • ComputerÆenter and analyze data • Hand calculationsÆ • Calculate estimated standard errorsx = s n = 3.48 = 3.48 = 1.01 12 3.46 Page 40 • Step 4: Compare sample data to null------>calculate test statistic • ComputerÆIdentify test statistic and p-value in output o t(11)=-2.396, p=.019 o p-value (.019) is less than alpha (.05)Æso it is significantÆreject null Hand CalculationsÆ • Convert the sample mean into a t statistic to determine if it falls into the critical region. tcalculated = X - μ = 24.58 - 27 = -2.42 sX 1.01 1.01 • = -2.396 it exceeds -2.201, so it is sig., reject null Step 5: Draw conclusion • Amount of parent talk for sample (M = 24.58, SD = 3.48) significantly differs from the norm; t(11)=-2.396, p<.05, two-tailed. Page 41 Video #7: In-Class Practice Problems 1. On a standardized spatial skills task, normative data reveals that people typically get μ = 15 correct solutions. A psychologist tests n = 7 individuals who have brain injuries in the right cerebral hemisphere. For the following data, determine whether or not right-hemisphere damage results in reduced performance on the spatial skills task. Test at the .05 level. Data: 12, 16, 9, 8, 10, 17, 10 T-test results: μ - mean of Variable H0 : μ = 15 HA : μ < 15 Variable Sample Mean var1 11.714286 a. Independent Variable = One-tailed 1.3222327 DF 6 Two-tailed e. Write the alternative and null hypotheses using correct notation. H1: H0: f. tcalculated = i. g. Level of significance (p) = reject null or fail to reject null Write your conclusion in sentence form. P-value -2.4849744 Scale (circle): Categorical d. Alternative hypothesis in sentence form. h. Circle: T-Stat Scale (circle): Categorical b. Dependent Variable = c. Circle: Std. Err. 0.0237 Quantitative Quantitative Page 42 2. A researcher would like to examine the effects of humidity on eating behavior. It is know that laboratory rats normally eat an average of μ = 21 grams of food each day. The researcher selects a random sample of n = 25 rats and places them in a controlledatmosphere room where the relative humidity is maintained at 90%. On the basis of this sample, can the researcher conclude that humidity affects eating behavior. Test at the .05 level. T-test results: μ - mean of Variable H0 : μ = 21 HA : μ not equal 21 Variable Sample Mean var1 Two-tailed e. Write the alternative and null hypotheses using correct notation. H1: H0: f. tcalculated = i. g. Level of significance (p) = reject null or fail to reject null Write your conclusion in sentence form. P-value 24 -6.1593122 <0.0001 Scale (circle): Categorical d. Alternative hypothesis in sentence form. h. Circle: T-Stat Scale (circle): Categorical b. Dependent Variable = One-tailed DF 16.12 0.79229623 a. Independent Variable = c. Circle: Std. Err. Quantitative Quantitative Page 43 3. Does the average age of students enrolled in EDFI 641 differ significantly from the average age of BGSU grad students (24 years)? Test at the .01 level. T-test results: μ - mean of Variable H0 : μ = 24 HA : μ not equal 24 Variable Sample Mean var1 27.125 1.4314183 a. Independent Variable = One-tailed Two-tailed e. Write the alternative and null hypotheses using correct notation. H1: H0: f. tcalculated = i. g. Level of significance (p) = reject null T-Stat 15 2.1831493 Scale (circle): Categorical d. Alternative hypothesis in sentence form. h. Circle: DF Scale (circle): Categorical b. Dependent Variable = c. Circle: Std. Err. or fail to reject null Write your conclusion in sentence form. P-value 0.0453 Quantitative Quantitative Page 44 Video #8: t Test of Independent Samples So far, we have only used one sample to draw inferences about one population. What if we want to compare two different groups, such as male vs female or Treatment A students vs Treatment B students? t Test of Independent Samples draws conclusions about two populations by comparing two samples; since we are looking at differences between the two samples and the two populations, the t statistic reflects these multiple comparisons tsingle sample = X - μ sX tind samples = (X1 - X2) - (μ1 − μ2) sX1 - X2 where sX1 - X2 = sp2 n1 + sp 2 n2 Recall, that for the single sample t test, we calculated the estimated standard error. Since we are now comparing two samples to two populations, we calculate the standard error of sample mean differences. Standard error of sample mean differences —total amount of error involved in using two sample means to approximate two population means (averages the error of the two sources). • However, the preceding formula for sX1 - X2 is only appropriate when the two samples are the same size. To correct for the bias in sample variances, we need to combine the two sample variances into a single value called pooled variance. Pooled Variance—averages the two sample variances, which allows the bigger sample to carry more weight. pooled variance = sp2 = SS1 + SS2 df1 + df2 • Using the pooled variance, we can now calculate an unbiased measure of the standard error of sample mean differences: sX1 - X2 = sp2 n1 + sp 2 n2 Hypothesis Testing with t Test of Independent Samples t Test of Independent Samples used to test a hypothesis about the mean difference between two populations • • null hypothesis reflects no difference alternative hypothesis reflects a difference One-tailed Two-tailed • • Alternative H1: μ1 > μ2 OR H1: μ1 − μ2 > 0 H1: μ1 ≠ μ2 OR H1: μ1 - μ2 ≠ 0 H0: μ1 ≤ μ2 H0: μ1 = μ2 Null OR H0: μ1 − μ2 ≤ 0 OR H1: μ1 - μ2 = 0 rejection of null------>data indicate a significant difference between the two populations failure to reject null------>data indicate NO significant difference between the two populations Assumptions about t test of independent samples: independent observations, each population must be normal and have equal variances (homogeneity of variance). Page 45 Putting it all togetherÆExample of a one-tailed t test A psychologist would like to examine the effects of fatigue on mental alertness. An attention test is prepared that requires subjects to sit in front of a blank TV screen and press a response button each time a dot appears on the screen. A total of 110 dots are presented during a 90 minute period, and the psychologist records the number of errors for each subject. Two groups of subjects are selected. The first group (n =5) is test after they have been awake for 24 hours (X = 34, SS = 63). The second group (n=10) is tested in the morning after a full night’s sleep (X = 24, SS = 100). Can the psychologist conclude that fatigue significantly increases errors on an attention task? Test at .05 level. • Step 1: Develop hypotheses • State alternative: Fatigue will significantly increase the number of errors on an attention task. • It is directional hypothesis ------>one-tailed H1: μfatigue > μrested H0: μfatigue ≤ μrested • Step 2: Establish significance criteria • ComputerÆα=.05 • Hand calculationsÆIdentify tcritical used for the alpha level, the appropriate test, and df • one-tailed test at .05 (df =13) corresponds to tcritical = +1.771 • Step 3: Collect and analyze sample data; • ComputerÆ • Hand calculationsÆCalculate pooled variance pooled variance = sp2 = SS1 + SS2 = 63 + 100 df1 + df2 4+9 • Calculate standard error of sample mean differences sX1 - X2 = • = 163 = 12.54 13 sp2 n1 + sp2 n2 = 12.54 + 12.54 = 2.51 + 1.25 = 1.94 5 10 Step 4: Compare sample data to null------>calculate test statistic • ComputerÆreview output Two Sample T-test results (with pooled variances): μ1 - mean of var2 where var1=1 μ2 - mean of var2 where var1=2 H0 : μ1 - μ2 = 0 HA : μ1 - μ2 > 0 Difference μ1 - μ2 • Sample Mean 10 Std. Err. 1.9360149 DF 13 T-Stat 5.1652493 P-value <0.0001 Identify test statistic and p-value in outputÆt(13)=5.17, p<.0001 Compare p-value to alpha levelÆ p is less than .05Æreject null • Hand calculationsÆCalculate t • tind samples = (X1 - X2) - (μ1 − μ2) sX1 - X2 Page 46 = (34 - 24) - 0 = 10 1.94 1.94 = 5.15 • tcalculated > t critical, reject null • Step 5: Draw conclusion • Null is rejects so alternative hypothesis is restated as conclusion • Fatigue significantly increased the number of errors in attention task; t(13)=5.17, p<.0001, one-tailed. Some additional thoughts when comparing groups: • Create frequency polygons for each group to decide which measure of central tendency is appropriate and if they follow a normal distribution • If possible use information about known groups, such as norms from standardized tests, to compare sample data • Calculate effect size as a measure of the magnitude of a difference between the two groups. This has become very important in recent years. • A t test will not calculate effect size. You must calculate it by hand. o A common index of effect size (r2) Percentage of Variance accounted for • effect size (r2) = t2 t2 + df • Typically an effect size of 0.50 (50%)or larger signifies an important difference • Use inferential statistics very cautiously especially when dealing with non-random samples-be very careful in generalizing your results to the population Page 47 In-Class Practice Problems 1. Extensive data indicate that first-born children develop different characteristics than later-born children. For example, first-borns tend to be more responsible, hard working, higher achieving, and more self-disciplined than their later-born siblings. The following data represent scores on a test measuring self-esteem and pride. Samples of n=10 first-born college freshman and n=20 later-born freshmen were each given the self-esteem test. Do these data indicate a significant difference? Test at the .05 level. Summary statistics for var2 grouped by var1 var1 n Mean Variance Std. Dev. 1 10 43.1 17.211111 4.1486278 1.3119112 43.5 2 20 36.8 25.010527 5.0010524 1.1182693 36.5 Two Sample T-test results (with pooled variances): μ1 - mean of var2 where var1=1 μ2 - mean of var2 where var1=2 H0 : μ1 - μ2 = 0 HA : μ1 - μ2 not equal 0 Std. Err. Difference Median Sample Mean Range Max Q1 Q3 14 36 50 40 46 18 30 48 33 40 Std. Err. 6.3 μ1 − μ2 Min 1.8372631 DF 28 T-Stat 3.4290135 P-value 0.0019 a. Independent Variable = Scale (circle): Categorical Quantitative b. Dependent Variable = Scale (circle): Categorical Quantitative c. Circle: One-tailed Two-tailed d. Alternative hypothesis in sentence form. e. Write the alternative and null hypotheses using correct notation. H1: H0: f. tcalculated = h. Circle: i. g. Level of significance (p) = reject null or fail to reject null Write your conclusion in sentence form. j. effect size r2= Page 48 2. Does level of anxiety (measured on a scale from 1 to 10) when enrolling in a statistics class differ by gender? Test at the .05 level. Summary statistics for var2 grouped by var1 var1 n Mean Variance 1 2 Std. Dev. 10 7.1 8.1 2.8460498 10 5.6 6.711111 2.5905812 Std. Err. Median Range Min Max Q1 Q3 0.9 8 7 3 10 4 10 0.8192137 5 7 3 10 4 7 Two Sample T-test results (with pooled variances): μ1 - mean of var2 where var1=1 Difference Sample Mean μ2 - mean of var2 where var1=2 1.5 μ1 - μ2 H0 : μ1 - μ2 = 0 HA : μ1 - μ2 not equal 0 Std. Err. 1.2170091 DF 18 T-Stat 1.2325299 P-value 0.2336 a. Independent Variable = Scale (circle): Categorical Quantitative b. Dependent Variable = Scale (circle): Categorical Quantitative c. Circle: One-tailed Two-tailed d. Alternative hypothesis in sentence form. e. Write the alternative and null hypotheses using correct notation. H1: H0: f. tcalculated = h. Circle: g. Level of significance (p) = reject null or fail to reject null j. effect size r2= i. Write your conclusion in sentence form. Page 49 Additional Practice: Interpreting Research Articles t-test of Independent Sample Read the following excerpt to complete the questions on the next page: Researchers studied women enlisted in the Navy and examined the impact of sexual harassment on their satisfaction with the military. Among the participants, 436 were sexually harassed and 582 were not. Participants completed a 7-item question that utilized a 5 point scale in which higher scores indicate more positive perceptions. Item 3 scores have been reversed to align with the positive nature of the other items. Table 1. Mean responses and t-test results Question 1. 2. 3. 4. 5. I would recommend the Navy to others. I am satisfied with my rating. I plan to leave the Navy because I am dissatisfied. My experiences have encouraged me to stay in the Navy. This command provides the information people need to make decisions about staying in the Navy. 6. In general, I am satisfied with the Navy. 7. I intend to stay in the Navy for at least 20 years. t Mean Harassed 3.31 3.24 3.17 2.24 2.71 Mean Not Harassed 3.60 3.56 3.67 2.58 3.00 3.76* 4.02* 5.89* 4.56* 3.80* 3.29 2.66 3.68 3.22 5.41* 5.63* * indicates p<.001 Source: Newell, C.E., Rosenfeld, P., & Culbertson, A. L. (1995). Sexual harassment experiences and equal opportunity perceptions of Navy women. Sex Roles, 32, 159-168. 1. Which group of Navy women is more likely to recommend the Navy to others? In other words, which group has the higher mean for item one? 2. Is the mean difference for item 1 statistically significant? 3. Should we reject the null hypothesis for item 1? Explain. 4. How many items generated statistically significant mean differences? 5. In general, what can we conclude about sexual harassment and navy satisfaction? Answers: 1) Those who have NOT been sexually harassed have the higher mean and are more likely to recommend the Navy to others; 2) Yes, it is significant at the p<.001 level. 3) Yes, the t result is significant at p<.001.; 4) all items were significant; 5) Navy women who have NOT been sexually harassed are more satisfied with the Navy than those who have been sexually harassed. Video #9: t Test of Related Samples Page 50 Many times research evaluates the effect of a treatment by uses a pretreatment and post treatment design with a single sample, this is called a repeated measures study. • since the test uses the same sample, there is no risk that one group is different from another even before the treatment begins. • researchers try to build upon this concept when studying two samples by matching subjects from the two groups--this helps to eliminate pretreatment differences • t test of related samples compares the differences between the pre and post treatment scores of the sample to pre-post differences in the population. • difference score = D = X2 - X1 • Mean of differences (D) = ΣD n Computing the t of related samples • Recall tsingle sample = X - μ sX • For t of related samples, the sample data are the difference scores (D) and the population data we are interested in is NOT the population mean but the population mean difference (μD), therefore, t related samples = D - μD sD where sD = s n • We are not comparing means of the pre and post, rather the pre and post scores for each individual are compared! Developing the hypotheses: Alternative Null One-tailed H 1 : μD > 0 H 0 : μD ≤ 0 Two-tailed H 1 : μD ≠ 0 H 0 : μD = 0 Assumptions of the related samples t test • independent observations, normal distribution of pop of differences Page 51 Putting it all togetherÆExample of a one-tailed t test A researcher is interested in studying the effects of endorphins (the feeling-good chemical that is released in the brain at the end of aerobic exercise) on pain tolerance. A sample of 16 subjects is obtained; each person’s tolerance for pain is tested before and after a 50 minute session of aerobic exercise. On the average, the pain tolerance for the sample was D =10.5 higher after exercise than it was before. The SS for the sample difference scores was SS = 960. Do these data indicate a significant increase in pain tolerance following exercise. Test at the .01 level. • Step 1: Develop hypotheses • State alternative—Exercise will significantly increase pain tolerance • It is directional hypothesis ------>one-tailed H1: μD > 0 H0: μD ≤ 0 • Step 2: Establish significance criteria • ComputerÆ α=.01 • Hand calculationsÆIdentify tcritical used for the alpha level, the appropriate test, and df • one-tailed test at .01 (df =15) corresponds to tcritical = +2.602 • Step 3: Collect and analyze sample data • ComputerÆ • Hand calculationsÆ • Calculate sample mean of D (D): D = 10.5 • Calculate standard deviation of D scores s= • SS = n-1 960 = 15 64 = 8 Calculate estimated standard error of D sD = s = 8 n 16 = 2 • Step 4: Compare sample data to null------>calculate test statistic • ComputerÆ • Identify test statistic and p-value; t(15)=5.25, p<.001 • Compare p-value with alpha level • .001 is less than .01Æ reject null • Hand calculationsÆCalculate t trelated samples = D - μD = 10.5 = 5.25 it exceeds tcriticalÆreject null 2 sD • Step 5: Draw conclusion • Aerobic exercise significantly increased pain tolerance; t(15)=5.25, p<.001, one-tailed. Page 52 In-Class Practice Problems 1. An investigator for NASA examines the effect of cabin temperature on reaction time. A random sample of 10 astronauts and pilots is selected. Each person’s reaction time to an emergency light is measured in a simulator where the cabin temperature is maintained at 70 degrees F and again the next day at 95 degrees. Using the results of this experiment, can the psychologist conclude that temperature has a significant effect on reaction time. Test at the .01 level. Summary statistics Column n var1 10 var2 10 Mean Variance Std. Dev. Std. Err. Median Range Min Max Q1 Q3 203 381.55554 19.533447 6.177018 205.5 55 176 231 183 216 223 417.1111 20.423298 6.458414 224 65 190 255 206 240 Paired T-test results: μD - mean of the differences between var1 and var2 H0:μD = 0 HA:μD not equal 0 Difference Sample Diff. var1 - var2 Std. Err. -20 1.67332 DF T-Stat 9 P-value -11.952286 <0.0001 a. Independent Variable = Scale (circle): Categorical Quantitative b. Dependent Variable = Scale (circle): Categorical Quantitative c. Circle: One-tailed Two-tailed d. Alternative hypothesis in sentence form. e. Write the alternative and null hypotheses using correct notation. H 1: H0: f. tcalculated = h. Circle: g. Level of significance (p) = reject null or fail to reject null i. Write your conclusion in sentence form. Page 53 2. Does eating oatmeal decrease cholesterol levels? A researcher implements a 30-day treatment that consists of eating a bowl of oatmeal everyday for breakfast. Cholesterol is measured before (var1) and after (var2) the treatment for the 10 participants. An α = .05 was utilized. Summary statistics Column n Mean Variance Std. Dev. Std. Err. Median var1 10 258.2 192.4 13.870832 4.3863425 var2 10 222 269.33334 16.411379 5.1897335 Range Min Max Q1 Q3 257.5 40 240 280 245 270 221 56 190 246 210 230 Paired T-test results: μD - mean of differences between var1 and var2 H0:μD = 0 HA:μD > 0 Difference Sample Diff. var1 - var2 36.2 Std. Err. 4.319979 DF T-Stat 9 8.379669 P-value <0.0001 a. Independent Variable = Scale (circle): Categorical b. Dependent Variable = Scale (circle): Categorical c. Circle: One-tailed Two-tailed d. Alternative hypothesis in sentence form. e. Write the alternative and null hypotheses using correct notation. H 1: H0: f. tcalculated = h. Circle: g. Level of significance (p) = reject null or fail to reject null i. Write your conclusion in sentence form. Quantitative Quantitative Page 54 Additional Practice: Interpreting Research Articles t-test of Related Samples Read the following excerpt to complete the questions on the next page: Seventy-four drug users participated in a Behavioral Counseling Program to reduce drug use. Among the participants, 75% were male, 75% were adults, 12% were minority, and 25% were mandated to obtain counseling by a public agency. With respect to drug use, about 50% used cocaine and 75% used marijuana. The Behavioral Counseling Program consisted of three parts: 1) stimulus control, including competing response training; 2) urge control procedure for interrupting incipient drug use urges, thoughts, and actions; and 3) behavior contracting, especially between youth and parents. Drug use was measured at the beginning of treatment, the end of treatment, and one month after treatment. Drug use decreased substantially from pretreatment to the end of treatment ( t=4.28, p<.001) with slight, nonsignificant decrease from end of treatment to the follow-up month ( t=.92,p=.72). The decrease from pretreatment to follow-up remained statistically significant ( t=4.42, p<.001). Source: Azrin, N. H., Acierno, R., Kogan, E. S., Donohue, B., Besalel, V. A., & McMahon, P.T. (1996). Follow-up results of supportive versus behavioral therapy for illicit drug use. Behavior Research and Therapy, 34, 41-46. 1. As is customary in journal article, the research did not state the null hypothesis. Write the appropriate null hypothesis for the first t-test result reported in the excerpt. 2. Should the null hypothesis written for item 1 be rejected? Explain. 3. Should the null hypothesis be rejected for the second t test reported in the excerpt. Explain. 4. The last difference in the excerpt was statistically significant at the .001 level. Was it also significant at the .05 level? Answers: 1)The treatment of Behavioral Counseling Program will NOT significantly reduce drug use among participants. 2) Yes, since the p-value is less than .05. 3) No, the p-value is greater than .05. 4)Yes, If it is significant at p<.001 then it is also significant at p<.05. Coke vs. Pepsi Experiment: t tests Page 55 We are going to conduct an experiment using the Coke vs. Pepsi Taste Test that investigates two research questions: 1) Are diet drinkers (when compared to regular drinkers) more accurate in tasting the difference between Coke and Pepsi? • This question will utilize a t-test of independent samples, which you can complete for 5 points of extra credit (Extra Credit #1). 2) When tasting the difference between Coke and Pepsi, is one’s prediction of accuracy significantly different from one’s actual ability/accuracy? • This question will utilize a t-test of related samples, which you will complete for 5 points of extra credit (Extra Credit #2). In order to complete this experiment, you need at least one other person (who has the same pop preference as you) to participate. It would be great if you can find 2-4 more individuals. Directions: 1. Identify your pop preference (Diet or Regular). • If you prefer diet pop, purchase one can/bottle of Diet Coke and one of Diet Pepsi. • If you prefer regular, purchase can/bottle of Coke and one of Pepsi. 2. In addition to the pop, you will need the following supplies to complete this experiment. • 5 small paper cups for each participant • Pen or pencil • Napkins in case you spill • Pretzels or chips for “cleansing one’s palate” 3. Once you have your supplies and participants together, record each participant’s name in the first column of the data grid below and one’s preference (diet=1, regular=2) in the second column. Data Grid Name Preference Prediction % Actual % 4. Have each participant predict how accurate they will be in identifying the pop as Coke or Pepsi. Since each person will be given 5 cups of pop, predict how many times out of 5 chances you will be correct in the identification process (e.g., 3/5). Then, convert that fraction into a percent (e.g., 3/5=60%). Record this percent in the third column of the grid. 5. Determine who will complete the taste test first. Have that person turn away while another participant fills 5 cups with pop (make sure that some cups have Pepsi and other cups have Coke Page 56 and that you know which cups have which pop). Hint: Don’t write the name of the pop on the bottom of the cup; it will show through as the person drinks the pop. 6. Have the taste tester proceed in identifying the pop in each cup, while another participant records the accuracy. Don’t tell the results to the taster until all 5 cups have been tasted. Calculate the number of correct tastes out of five. Convert that fraction into a percent and record the percent in column 4 of the grid. 7. Once you and your fellow participants have finished the taste test, add your results to the spreadsheet below. 8. Go to StatCrunch and enter ALL the data from the spreadsheet (including the data provided for 15 individuals). You should have a minimum of n=17 for your sample. Proceed with the t-test directions. Extra Credit Worksheets are in Computer Lab Packet! Video #10: Analysis of Variance Page 57 Analysis of Variance (ANOVA) is a hypothesis testing procedure that evaluates mean differences between two or more treatments or groups; t test can only compare two groups. Single Factor Design—studies the effect that one factor (independent variable) has on the dependent variable. Note that although there is only one factor, this factor has more than two categories so that we are comparing two or more groups/treatments. Hypothesis Testing for ANOVA • • Null hypothesis states that there is no difference among the groups or treatments • H0: μ1 = μ2 = μ3 Alternative hypothesis states that at least one mean is different from the others • H1: At least one mean will differ ANOVA Test Statistic ANOVA creates a test statistic called an F-ratio that is similar to t statistic t= • Recall that • F is similar to t, but since there are more than two means to compare, variance will be used to represent the differences between all the means being compared. F= • obtained difference between sample means = tsingle = X - μ difference expected by chance (error) sX variance (differences ) between sample means variance (differences ) expected by chance (error) Like t, a large F value indicates the treatment effect (mean differences) that is unlikely due to chance. • when the treatment had no effect so that the means are the same (H0 is true), the F-ratio will be close to 1.00 Distribution of F-ratios • Like t, F is also distributed • But the F distribution is not normal; it is positively skewed, the degree of which depends upon the degrees of freedom from the two variances. • large df -------> nearly all F-ratios are clustered around 1.00 • small df -------> the F-ratios are more spread out • Since the F distribution is positively skewed, we are only looking in one tail for the difference. As a result we don’t need to indicate if the test is one or two tailed. • Recall: we expect F near 1.00 if the null is true and expect a large F if the null is rejected • therefore, significant F-ratios will be in the tail of the F distribution F Page 58 = variance (differences) between group means variance (differences) expected by chance/error (within groups) Variance (differences) between groups can be due to: • treatment effect • individual differences (subjects within the various groups are different even before the treatment begins • experimental error (caused by poor equipment, lack of attention/knowledge on the researcher’s part, unpredictable change of events) Variance within groups can be due to: • individual differences (subjects within the various groups are different even before the treatment begins • experimental error (caused by poor equipment, lack of attention/knowledge on the researcher’s part, unpredictable change of events) Consequently, if we divide the variance between treatments by the variance within treatments, (individual differences and error cancel out) so we can determine the treatment effect. F = variance between groups = variance within groups treatment effect + individual differences + error individual differences + error The last few steps of ANOVA require the following calculations: • df between groups = k – 1 Æ where k is number of groups • df within-groups = N – k Æwhere N is total number of individuals in groups • MS between = variance between treatments = SSbetween df between • MS within = variance within treatments = SSwithin df within • F-ratio = MS between MS within Page 59 Putting it all together Example: A number of studies on jetlag have found that jetlag seems to be worse when people are traveling east. A researcher examines how many days it takes a person to adjust after taking a long flight. One groups flies west across time zones (NY to CA); a second group flies east (CA to NY); and a third group takes a long flight within one time zone (San Francisco to Seattle). Perform an analysis of variance to determine if jetlag varies for the direction of travel. Use the .05 level of significance. Computer Results Analysis of Variance results for var2 grouped by var1 Sample means: Group n Mean Std. Error 1 6 2.5 0.4281744 2 6 6 0.57735026 3 6 0.5 0.2236068 ANOVA table: Source Treatments df SS MS F-Stat 2 93 46.5 Error 15 17 1.1333333 Total 17 110 41.02941 P-value <0.0001 Step 1: Develop hypotheses • State alternative—Direction of travel will significantly effect jetlag. • H0: μ1 = μ2 = μ3 H1: At least one mean will differ Step 2: Establish significance criteria • ComputerÆ α=.05 Step 3: Collect and analyze sample data • ComputerÆenter data Step 4: Compare sample data to null------>calculate test statistic • ComputerÆ • Identify test statistic and p-value; F(2, 15)=41.03, p<.0001 • Compare p-value with alpha level • .0001 is less than .05Æ reject null Step 5: Draw conclusion • Direction of travel significantly effected jetlag. Page 60 Post Hoc Tests So far, we have only been able to determine if there is a significant difference (treatment had an effect), but we are unable to determine which group is different. We could do a t test for each comparison, but we run the risk of a type I error when we run several hypothesis tests, called experimentwise alpha level, the overall probability of a Type I error over a series of separate hypothesis tests. Fortunately, there are some test that are very conservative and allow us to determine which group is different after ANOVA has been conducted and a difference has been found; these are called Post Hoc Tests. The Scheffe Test is the safest post hoc test used to compare two groups/treatments. It is safe because it uses the value of k to calculate the df and the critical F-ratio from the original ANOVA to determine if it is significant. Unfortunately, StatCrunch is unable to conduct Post Hoc tests! Reporting of ANOVA Results Much of the time an ANOVA summary table is presented that includes SS, df, and MS for each treatment as well as the F-ratio; in addition a table of means and standard deviations for each treatment will be presented. Using the previous example, the tables would look like the following M SE Westbound 2.5 0.43 Source Between treatments Within treatments Total Eastbound 6.0 0.58 Same zone 0.5 0.22 ANOVA SUMMARY SS df 93 2 17 15 110 17 MS 46.5 1.13 F = 41.02 When space is an issue, the results should include the F-ration with both degrees of freedom in parentheses and the p-value. Do NOT indicate one-tailed or two-tailed! • Travel direction does effect jetlag; F(2, 15) = 41.02, p < .05. Assumptions of ANOVA: independent observations, samples are selected from normal populations that also have equal variances. ANOVA Page 61 In-Class Practice Problems 1. The extent to which a person’s attitude can be changed depends on how big a change you are trying to produce. In a classic study on persuasion, Aronson, et al. (1985) obtained three groups of subjects. One group listened to a persuasive message that differed only slightly from the subjects’ original attitudes. For the second group, there was a moderate discrepancy between the message and the original attitudes. For the third group, there was a large discrepancy between the message and the original attitudes. For each subject, the amount of attitude change was measured. Data were entered for the three groups (small, moderate, large discrepancy) and an ANOVA was utilized to determine if the amount of discrepancy between the original attitude and the persuasive argument has a significant effect on the amount of attitude change. Test at the .05 level. Analysis of Variance results for var2 grouped by var1 Group n Mean Std. Error 1 6 1.5 0.4281744 2 6 6.6666665 0.71492034 3 6 1 0.2581989 Source df SS MS Treatments 2 118.111115 59.055557 Error 15 22.833334 1.5222223 Total 17 140.94444 F-Stat 38.79562 P-value <0.0001 a. Independent Variable = Scale (circle): Categorical Quantitative b. Dependent Variable = Scale (circle): Categorical c. Alternative hypothesis in sentence form. d. Write the alternative and null hypotheses using correct notation. H1: H0: e. Fcalculated = f. Level of significance (p) = g. Circle: reject null or fail to reject null h. Write your conclusion in sentence form. Quantitative Page 62 2. A psychologist would like to examine the relative effectiveness of three therapy techniques for treating mild phobias. A sample of N=15 individuals who display a moderate fear of spiders is obtained. These individuals are randomly assigned to the three therapies. After a certain amount of therapy, the psychologist measures the degree of fear reported by each individual. ANOVA was conducted to determine if there are any significant differences among the three therapies. Test at the .05 level. Analysis of Variance results for var2 grouped by var1 Group n Mean Std. Error Source df SS MS 2 20.933332 10.466666 1.7 1 5 4 0.70710677 Treatments 2 5 1.6 0.50990194 Error 12 20.4 3 5 1.4 0.50990194 Total 14 41.333332 F-Stat 6.1568627 P-value 0.0145 a. Independent Variable = Scale (circle): Categorical Quantitative b. Dependent Variable = Scale (circle): Categorical Quantitative c. Alternative hypothesis in sentence form. d. Write the alternative and null hypotheses using correct notation. H0: H 1: e. Fcalculated = f. Level of significance (p) = g. Circle: reject null or fail to reject null h. Write your conclusion in sentence form. Page 63 Additional Practice: Interpreting Research Articles ANOVA Read the following excerpt to complete the questions on the next page: Researchers examined the impact of teacher self-efficacy on classroom technology use. Participants included 101 teachers from four elementary (K-6) schools in Northwest Ohio. Of the 101 participants, 13 were male. Teachers were administered the Teacher Attribute Survey (TAS) which measured classroom technology use (teacher, student, and overall). Teacher self-efficacy was also measured in the instrument and represented one’s belief in affecting student performance. Low, moderate, and high levels of self-efficacy were created. As such, a teacher with low self-efficacy was defined as 3.29 or below, medium self-efficacy as range from 3.3 to 4.6, and high self-efficacy as 4.61 and higher. Table 1. Means and ANOVA results for Self-Efficacy groups and Technology Use Technology Use Means by Level of Self-Efficacy Low (n=12) Moderate (n=78) High (n=11) ANOVA Results Teacher Tech Use 1.73 2.15 2.36 F(2,98)=3.77, p<.05 Student Tech Use 1.24 1.49 1.81 F(2,98)=4.52, p<.05 Overall Tech Use 2.08 1.82 2.08 F(2,98)=4.71, p<.05 1. Which type of technology use is the highest among all levels of self-efficacy? 2. Which group of teachers (low, moderate, or high self-efficacy) report the highest technology use among their students? 3. Write the null hypothesis for self-efficacy and overall technology use, where the ANOVA results indicate: F(2,98)=4.71, p<.05. 4. Considering the null hypothesis that you wrote for item 3, should the null hypothesis be rejected? Explain. Answers: 1) teacher technology use; 2) teachers with high self-efficacy (M=1.81); 3) Self-efficacy will NOT significantly impact overall technology use among teachers; 4) Reject the null, F(2,98)=4.71, p<.05. Video #11: Correlation and Regression Page 64 Correlation—statistical technique used to measure and describe a relationship between two quantitative variables; correlation measures 3 characteristics: • direction of relationship • positive—as one variable increases so does the other (food intake & weight) • negative (inverse)—as one variable increases the other decreases (exercise & weight) y y x Positive (r = +.90) x Negative (r = -.90) • form of relationship • linear—the relationship between x and y falls in a straight line • curvilinear— the relationship between x and y curves (age across the lifespan is a variable that often creates a curvilinear relationship) • degree (strength) of relationship • degree of relationship is reflected in a correlation coefficient (usually r) • r ranges between -1 to +1, 0 indicating no relationship, while +1 indicates a perfect positive relationship, and -1 indicates a perfect negative relationship Pearson Correlation Coefficient • measures the degree and direction of linear relationship between two variables • r = • since we will be computing variability for each variable as well as their variability together, we will be using SS and a new concept, SP, sum of products. • Sum of products is used to compute the amount of covariability of two variables • degree to which X and Y vary together = degree to which X and Y vary separately SP = Σ (X - X)(Y - Y) SP SSXSSY Page 65 Correlation • • • does NOT measure cause and effect when data have a limited range of scores, the value of the correlation can be exaggerated interpreting strength of coefficient (practical significance): • r > .8 is very strong • r = .6 - .79 is strong • r = .4 - .59 is fair • r < .39 is weak • to describe how accurately one variable predicts the other, square r. For example, if r=.60, then r2 = .36, which can be interpreted as 36% of the variability in Y scores can be predicted from the relationship with X. r2 is called the coefficient of determination because is measures the proportion of variability in one variable that can be determined from the relationship with the other variable. Hypothesis Testing (hypotheses use the Greek letter rho, ρ, to signify r) One-tailed Alternative H 1: ρ > 0 Null H0: ρ ≤ 0 Two-tailed H 1: ρ ≠ 0 H0: ρ = 0 Putting it all together Example: To measure the relationship between anxiety level and test performance, a psychologist obtains a sample of n=6 college students from an intro stats course. Students arrive fifteen minutes prior to the exam and complete physiological measures of anxiety (heart rate, skin resistance, blood pressure, etc.). Anxiety ratings and exam scores are listed below. Compute the Pearson correlation to determine if a negative relationship exists between anxiety and test performance. Test at the .05 level. • Step 1: Develop hypotheses. • State Alternative: Anxiety and test performance will negatively relate. • It is a directional hypothesis ----Æone-tailed H1: ρ < 0 (population shows negative correlation) H0: ρ > 0 (population does not show negative correlation) • Step 2: Establish significance criteria • ComputerÆ StatCrunch does not calculate the p-value for the correlation coefficient. As a result, we must identify rcritical used for α, tails, and df • df = n –2 = 6 – 2 = 4, r critical = -.729 • Notice that df is n-2 for correlation, since we need two points to create a line. • Hand calculationsÆIdentify rcritical used for α, tails, and df Page 66 • Step 3: Utilize sample data to calculate r • ComputerÆ • Hand calculationsÆCalculate SP, SSX, SSY, r Anxiety Rating (X) 5 2 7 7 4 5 X= 5 • (X - X) 0 -3 2 2 -1 0 (Y - Y) -3 5 -3 -4 3 2 (X - X) (Y - Y) 0 -15 -6 -8 -3 0 SP = -32 (X - X)2 0 9 4 4 1 0 SSX=18 (Y - Y)2 9 25 9 16 9 4 SSY= 72 Step 4: Compare sample data to null------>calculate test statistic • ComputerÆIdentify test statistic and compare rcalculated to rcritical • Correlation between var2 and var1 is: -0.8888889 • r falls into critical region, it is significantÆreject null Hand CalculationsÆ • Calculate r = SP = - 32 = -32 = -.888 SSX SSY 18(72) 36 • • • Exam Score(Y) 80 88 80 79 86 85 Y = 83 Compare rcalculated to rcritical r falls into critical regionÆreject null Step 5: Draw conclusion • A negative relationship exists between anxiety and test performance, r(4)=-.889, p<.05, one-tailed. Computer Output Correlation between var2 and var1 is: -0.8888889 Page 67 Regression Regression—statistical technique for finding the best-fitting straight line for a set of data; used when wanting to determine the ability of one variable to predict another variable (e.g., using SAT score to predict freshman college GPA) Regression line—line that represents the linear relationship; represented by a linear equation • Y = a + bX, where a = Y-intercept and b=slope • Least-squares method helps determine the best-fitting line by minimizing the error between the predicted & actual values of Y. • Y = a + bX , where b = SP SSX and a = Y – bX Example: Using the correlation problem we just solved, let’s calculate the regression line. • Step 1: Use X, Y, SSX, SP to calculate b and a • (previously calculated: X= 5, Y = 83, SP = -32, SSX=18, SSY= 72) • b = SP = -32 = -1.777 18 SSX • a = Y – bX a = 83 – (-1.777)(5) a = 83 + 8.888 a = 91.888 • Step 2: Calculate regression equation • Y = a + bX Y = 91.89 -1.78X We can • • • now use regression equation to predict Y for a given value of X. If X=7, what is the predicted value of Y? Y = 91.89 -1.78X Y = 91.89 -1.78(7) = 79.43 Page 68 Computer Output Computer Output: The output in the video will appear different, since a different version of StatCrunch was used. Regression equation Simple linear regression results: Dependent Variable: var2 Independent Variable: var1 var2 = 91.888885 - 1.7777778 var1 Sample size: 6 R (correlation coefficient) = -0.8889 R-sq = 0.79012346 Estimate of error standard deviation: 1.9436506 Correlation coefficient Parameter estimates: Parameter Estimate Std. Err. Intercept 91.888885 2.4241583 4 37.905483 <0.0001 -1.7777778 0.45812285 4 -3.88057 0.0178 Slope DF T-Stat P-Value Analysis of variance table for regression model: Source DF SS MS Model 1 56.88889 Error 4 15.111111 3.7777777 Total 5 F-stat 56.88889 15.058824 P-value 0.0178 Predicted value for Y when X=7 72 Predicted values: X value Pred. Y 7 79.44444 s.e.(Pred. y) 1.2120792 Ignore these pvalues since they are NOT for the correlation coefficient (r). 95% C.I. (76.07917, 82.809715) 95% P.I. (73.08468, 85.80421) Video #11 In-Class Practice Problems Page 69 1. You probably have read about he relationship between years of education and salary potential. The following hypothetical data represent a sample of n = 10 men who have been employed for five years. Does this data indicate a significant relationship between years of higher education and salary. Test at the .05 level. Also find the regression equation for predicting salary from education. (X) Years of Higher Education: 4, 4, 2, 8, 0, 5, 10, 4, 12, 0 (Y)Salary (in $1000s): 31, 29, 28, 42, 23, 35, 45, 27, 44, 24 Simple linear regression results: Dependent Variable: salary Independent Variable: education salary = 23.135265 + 1.9723947 education Sample size: 10 R (correlation coefficient) = 0.9601 R-sq = 0.92169785 Estimate of error standard deviation: 2.4466708 Parameter estimates: Parameter Estimate Std. Err. DF T-Stat P-Value Intercept 23.135265 1.2611643 8 18.34437 <0.0001 Slope 1.9723947 0.20325504 8 9.704039 <0.0001 Analysis of variance table for regression model: Source DF SS MS F-stat Model 1 563.71045 563.71045 Error 8 47.88958 5.9861975 Total 9 611.6 94.168365 P-value <0.0001 Predicted values: X value 5 Pred. Y 32.99724 s.e.(Pred. y) 0.77397215 95% C.I. (31.212456, 34.78202) 95% P.I. (27.07964, 38.91484) Page 70 a. Independent Variable = Scale: Categorical b. Dependent Variable = Scale: Categorical c. Circle: One-Tailed OR Two-Tailed d. Alternative hypothesis in sentence form. e. Write the alternative and null hypotheses using correct notation. H 1: H0: f. rcritical = h. Circle: g. rcalculated = reject null or fail to reject null i. Write your conclusion in sentence form. j. Regression equation: k. If one has 5 years of education, what is the predicted salary? Quantitative Quantitative Page 71 2. Research has shown that similarity in attitudes, beliefs, and interests plays an important role in interpersonal attraction. A therapist examines the correlation in attitudes between husbands (X) and wives (Y). She administers a questionnaire that measures how liberal or conservative one’s attitudes are. Low scores indicate that the person has liberal attitudes while high scores indicate conservatism (scale 1-10). Ten couples participate. Test at the .01 level. Simple linear regression results: Dependent Variable: wife att Independent Variable: hus att wife att = 0.7785714 + 0.8035714 hus att Sample size: 10 R (correlation coefficient) = 0.7869 R-sq = 0.61919034 Estimate of error standard deviation: 1.6673064 Parameter estimates: Parameter Estimate Std. Err. DF T-Stat P-Value Intercept 0.7785714 1.4370375 8 0.54178923 0.6027 Slope 0.8035714 0.22280319 8 3.6066422 0.0069 Analysis of variance table for regression model: Source DF SS MS Model 1 36.160713 36.160713 Error 8 22.239286 2.7799108 Total 9 58.4 F-stat P-value 13.007869 0.0069 Predicted values: X value 5 Pred. Y 4.7964287 s.e.(Pred. y) 0.57239175 95% C.I. (3.4764907, 6.1163664) 95% P.I. (0.73135275, 8.861505) Page 72 a. Independent Variable = Scale: Categorical b. Dependent Variable = Scale: Categorical c. Circle: One-Tailed OR Quantitative Quantitative Two-Tailed d. Alternative hypothesis in sentence form. e. Write the alternative and null hypotheses using correct notation. H 1: H0: f. rcritical = h. Circle: g. rcalculated = reject null or fail to reject null i. Write your conclusion in sentence form. j. Regression equation: k. If the husband has moderate attitude of “5”, what is the value of the wife’s attitude? Page 73 Additional Practice: Interpreting Research Articles Correlation Read the following excerpt to complete the questions on the next page: Boivin and Hymel (1997) examined the relationships among social behavior, peer experiences and selfperception. A total of 793 French Canadian children participated in the study (393 girls, 400 boys). The participants ranged from third to fifth grade, were from ten elementary schools and from a variety of socioeconomic backgrounds. The following variables were measured: Aggression and withdrawal were measure by showing a picture of all classmates and asking each student to choose two classmates who best fit each descriptor. For aggression, a score was obtained for each child by summing the number of times he or she was selected for these descriptors: “gets into lots of fights,” “loses temper easily,” “too bossy,” and “picks on other kids.” For withdrawal, a score was obtained for each child by summer the number of times he or she was selected for these descriptors: “rather play alone than with others” and “very shy.” Social preference was assessed by asking each child to name three other children they would like most and like least for playing together, inviting others to a birthday party, and sitting next to each other on a bus (Higher scores indicate greater social preference.) Victimization by peers was measure by asking each child to nominate up to five other students who could be described as being made fun of, being called names, and getting hit and pushed by other kids. (Higher scores indicated greater victimization.) Number of affiliative links was measured by asking, “You have probably noticed children in class who often hang around together and others who are more often alone. Could you name children who often hang around together?” (Higher scores indicate a larger number of affiliative links.) Loneliness was measured with a 16-item questionnaire with higher scores indicating greater loneliness. Perceived social acceptance and behavior-conflict were two aspects of self-concept measured with Harter’s Self-Perception Profile for Children. Higher scores reflect a better self-concept in each of the two domains. Table 1. Correlations among the social behavior, peer expectation, and self-perception measures 1 1. Withdrawal 2. Aggression 3. Social Preference 4. Victimization by Peers 5. # of Affiliate Links 6. Loneliness 7. Perceived social acceptance 8. Perceived behavior-conduct --.10 -.39 .42 -.35 .29 -.27 .06 2 --.44 .53 .05 .12 -.04 -.32 3 --.68 .35 -.34 .28 .17 4 --.21 .34 -.26 -.17 5 6 --.18 .18 -.06 --.69 -.35 7 -.39 Source: Boivine, M. & Hymel, S. (1997). Peer experiences and social self-perceptions: A sequential model. Developmental Psychology, 33, 135-143. Notice that the correlation coefficients are presented in a matrix. The column header represent the same variables presented in the row headers, however the column header only uses the number to indicate a certain variable. For example, the circle coefficient of .39, represents the correlation between “Perceived Social Acceptance” and “Perceived Behavior Conduct”. 8 -- Page 74 -861. What is the value of the Pearson r for the relationship between withdrawal and loneliness? Describe this value in terms of strength and direction. 2. What is the value of the Pearson r for the relationship between social preference and victimization by peers? Describe this value in terms of strength and direction. 3. Which variable has the strongest relationship with withdrawal? 4. Which variable has the weakest relationship with withdrawal? 5. The Pearson r for the relationship between withdrawal and loneliness indicates that those who tend to be more lonely tend to be: A. more withdrawn B. less withdrawn 6. Which of the following pairs has the strongest relationship between them? A. Perceived social acceptance and loneliness B. Withdrawal and victimization by peers C. Number of affiliate links and aggression 7. Which of the following pairs has the weakest relationship between them? A. Withdrawal and social preference B. Withdrawal and perceived social acceptance C. Withdrawal and perceived behavior-conduct Answers: 1) .29, weak and positive; 2) -.68, strong and negative; 3) Victimization by peers, r=.42; 4) Perceived behavior-conduct, r=.06; 5) A, more withdrawn; 6) A; 7) C. Video #12: Chi Square Test for Independence Page 75 So far we have used parametric tests to evaluate a hypothesis about the population. Parametric tests require certain assumptions about the population parameters, such as a normal distribution, homogeneity of variance, and a quantitative (interval/ratio) dependent variable. When these assumptions for parametric tests cannot be fulfilled, nonparametric tests can be used. Nonparametric tests • • • • usually do not state a hypothesis in terms of the population distribution, so they are often called distribution-free tests are suited for data that utilize a nominal or ordinal scale are not as sensitive as parametric tests—are more likely to fail in detecting a real difference between two treatments one commonly used nonparametric tests is the Chi Square Test for Independence. Chi Square Test of Independence • Used to test a relationship (differences) between two categorical variables • If variables are independent of one another, then there is no relationship. As a result the distribution of one variable will have the same shape for all the categories of the second variable. • Alternative hypothesis for Chi Square Test for Independence can be written to focus on the relationship or on the differences. • H1: Gender is related to learning style. • H1: Learning style will differ by gender. • Chi Square Test for Independence compares the observed and expected frequencies. Our expected frequencies come from our null hypothesis and our observed data. χ2 = (fo-fe)2 fe Building on our example of females and males with respect to learning styles, the table below presents the data observed for a sample of 125 males and 75 females. Males Females Audio 30 30 60 Visual 30 25 55 Kinesthetic 65 20 85 125 75 Page 76 • If the distribution for gender is predicted to be the same for the each learning style category, then the same proportion/percent of males and females in each category would be expected. • to calculate the expected frequency for each category this formula is used • fe = fcfr where fc = column total, fr = row total, n n = sample size • the table of expected frequencies would look something like this • Audio Visual Kinesthetic Males 60(125)/200=38 55(125)/200=34 85(125)/200=53 125 Females 60(75)/200=22 55(75)/200=21 85(75)/200=32 75 60 55 85 Degrees of freedom are calculated a bit differently • df = (R - 1)(C - 1), where R= number of rows, C=number of columns • in our example, df = (2-1)(3-1) = 1(2) = 2 • using this and α=.05, our χ2critical = 5.99 Page 77 Putting it all together Example: Based upon the observed frequencies presented in the table below, can a researcher conclude that learning styles differ by gender? Test at the .05 level. Audio Visual Kinesthetic Males 30 30 65 125 Females 30 25 20 75 60 55 85 • Step 1: Develop hypotheses. • State Alternative: Learning style will significantly differ by gender. • Step 2: Establish significance criteria • ComputerÆ α = .05 • Hand calculationsÆIdentify χ2critical used for α and df • df = (2-1)(3-1) = 2 χ2critical = 5.99 • Step 3: Utilize sample data to calculate χ2 • ComputerÆ enter data • Hand calculationsÆCalculate expected frequencies (fe), fo-fe, (fo-fe)2 male-audio female-audio male-visual female-visual male-kinesthetic female-kinesthetic • fo fe fo-fe 30 30 30 25 65 20 38 22 34 21 53 32 -8 8 -4 4 12 -12 (fo-fe)2 fe 64 64 16 16 144 144 1.68 2.91 0.47 0.76 2.72 4.50 Σ = 13.04 Step 4: Compare sample data to null------>calculate test statistic • ComputerÆIdentify test statistic and compare p-value to a level Statistic Chi-square • (fo-fe)2 Hand • • • DF Value 2 13.042 P-value 0.0019 o p-value is less than .05 Æreject null CalculationsÆ Calculate χ2 = 13.04 Compare χ2calculated to χ2critical Since χ2 = 13.04 and exceeds the χ2critical= 5.99, the null is rejected Step 5: Draw conclusion • Males and females differ in learning styles; χ2(2, n=200)=13.04, p<.05. Page 78 Computer Output Contingency table results: Rows: var1 (1=male, 2=female) Columns: var2 (1=audio, 2=visual, 3=kinesthetic) Cell format: 1 Count Row percent Column percent Total percent Chi-square DF 2 3 Total 1 30 24% 50% 15% 30 24% 54.55% 15% 65 52% 76.47% 32.5% 125 100.00% 62.5% 62.5% 2 30 40% 50% 15% 25 33.33% 45.45% 12.5% 20 26.67% 23.53% 10% 75 100.00% 37.5% 37.5% 60 30% 100.00% 30% 55 27.5% 100.00% 27.5% 85 42.5% 100.00% 42.5% 200 100.00% 100.00% 100.00% Total Statistic 2 Value 13.042 P-value 0.0019 Assumptions of Chi Square Tests • Random sampling • Independence of observations • Expected frequency for any cell MUST be greater than 5 Reporting Chi Square Results Statement should include chi-square value with df and n in parenthesis, and p-value: • Males and females differ in learning styles; χ2(2, n=200)=13.04, p<.05. Page 79 Video #12 In-Class Practice Problems 1. The US Senate recently considered a controversial amendment for school prayer. The amendment did not get the required two-thirds majority, but the results of the vote are interesting when viewed in terms of the party affiliation of the senators. Does the vote on the prayer amendment (var2: 1=yes, 2=no) differ by political party (var1: 1=demo, 2=rep). Test at the .05 level. Contingency table results: Rows: var1 Columns: var2 Statistic DF Chi-square 1 Value 6.3032928 1 2 Total 1 19 42.22% 33.93% 19% 26 57.78% 59.09% 26% 45 100.00% 45% 45% 2 37 67.27% 66.07% 37% 18 32.73% 40.91% 18% 55 100.00% 55% 55% 56 56% 100.00% 56% 44 44% 100.00% 44% 100 100.00% 100.00% 100.00% P-value 0.0121 Total a. Independent Variable = Scale (circle): Categorical Quantitative b. Dependent Variable = Scale (circle): Categorical Quantitative c. Alternative hypothesis in sentence form. d. χ 2 calculated = e. Level of significance (p) = f. Circle: reject null or ) fail to reject null g. Write your conclusion in sentence form. (1 pt) Page 80 2. A stats instructor would like to know whether it is worthwhile to require students to do weekly homework assignments. For one section of the course, homework is assigned, collected and graded each week. For the second section, the same problems are recommended but not required. At the end of the semester, all students complete the same final exam. Letter grades (A, B, C, D, F) are tabulated for each student by section. Do these data indicate significant grade differences for students with homework versus no homework? Test at the .05 level. Contingency table results: Rows: var1 Columns: var2 1 DF Chi-square Value 3 4 5 Total 1 6 30% 66.67% 14.29% 5 25% 50% 11.9% 5 25% 45.45% 11.9% 2 10% 28.57% 4.762% 2 20 10% 100.00% 40% 47.62% 4.762% 47.62% 2 3 13.64% 33.33% 7.143% 5 22.73% 50% 11.9% 6 27.27% 54.55% 14.29% 5 22.73% 71.43% 11.9% 3 22 13.64% 100.00% 60% 52.38% 7.143% 52.38% 9 10 11 7 5 42 21.43% 23.81% 26.19% 16.67% 11.9% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 21.43% 23.81% 26.19% 16.67% 11.9% 100.00% Total Statistic 2 P-value 4 2.4870248 0.647 a. Independent Variable = Scale (circle): Categorical Quantitative b. Dependent Variable = Scale (circle): Categorical Quantitative c. Alternative hypothesis in sentence form. d. χ 2 calculated = e. Level of significance (p) = f. Circle: reject null or fail to reject null g. Write your conclusion in sentence form. (1 pt) Page 81 Additional Practice: Interpreting Research Articles Read the following excerpt to complete the questions on the next page: Researchers surveyed 120 college sophomores and juniors enrolled in general education psychology courses. Participants were between the ages of 18 and 23 and completed a survey that measured class absenteeism (cutting class) in the past month (for no valid reason) and seven negative behaviors and two positive behaviors--all measured using yes/no response. Negative behaviors included: speeding, slapped/hit someone, getting drunk, breaking the law, telling a significant lie, thinking about dropping out of school, feeling depressed, getting a tattoo, piercing body. Positive behaviors were reading a book that wasn’t required for class and visiting family. Table 1. Number and percentage of students answering “yes” to behaviors by groups of students who have cut class (n=68) and not cut class (n=52) Cutting Behavior Getting drunk Speeding Breaking law Telling significant lie Thoughts of dropping out Feeling depressed Hitting/ slapping Getting tattoo Piercing body Reading a non-required book Visiting family Note: * p<.05, ** p<.002 N 59 63 35 14 8 7 8 12 18 25 62 Not Cutting % 87 93 51 21 12 10 12 19 26 37 91 N 24 39 10 8 3 5 11 4 7 15 40 % 46 75 19 15 6 10 21 8 13 29 77 χ2 22.79** 7.19* 13.07** 0.53 0.79 0.02 1.95 3.16 3.17 0.83 4.61* Source: Trice, A.D. , Holland, S. A., & Gagne, P.E. (2000). Voluntary class absences and other behaviors in college students: An exploratory analysis. Psychological Reports, 87, 179-182. 1. What percentage of students who did not cut class report reading a non-required book? 2. Is the difference in frequencies for speeding significant for the two groups? Explain. 3. Write the null hypothesis for group differences in getting drunk. 4. Should the null hypothesis you wrote for item 3 be rejected? Explain. 5. What can you conclude about students who cut class and get drunk? Answers: 1) 29%; 2) yes, • 2 =7.10, p<.05; 3) Students who cut class will NOT significantly differ in the behavior of getting drunk from students who do not cut class; 4) The null should be rejected since • 2 =22.79, p<.002; 5) Students who cut class are more likely to get drunk and vice versa. Page 82 Statistical Test Grid Independent Variable Dependent Variable Categorical Categorical Quantitative Chi Square Test of Independence 1 Quantitative t test (2) Single Sample Independent Samples Related Samples ANOVA (3+) Pearson Correlation (relate) Regression (predict) 2 3 Overview Items Page 83 1. Does disability category (LD, EBD, none, etc.) differ by gender? 2. Does gender effect GRE scores? 3. Are GRE scores related to graduate GPA? 4. Does SES (low, middle, high) effect reading preparedness (as measured by a test) among preschoolers? 5. Does a seminar on self-esteem increase self-esteem scores? (Self-esteem was measure before and after the seminar) 6. Does learning style type differ by hand preference? 7. Do ACT scores predict college freshman GPA? 8. Do BGSU’s GRE scores for entering graduate students significantly differ from the population norm? 9. Does a reading intervention significantly increase 4th grade reading proficiency scores? Note: one group receives intervention, while another group receives traditional instruction. 10. Does foot size (small, medium, large) effect IQ? Page 84 Page 85 Page 86 Page 87 Page 88 Page 89 Page 90 Page 91 Page 92