Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistical Analysis of Data Graziano and Raulin Research Methods: Chapter 5 This multimedia product and its contents are protected under copyright law. The following are prohibited by law: (1) Any public performance or display, including transmission of any image over a network; (2) Preparation of any derivative work, including the extraction, in whole or in part, of any images; (3) Any rental, lease, or lending of the program. Copyright © Allyn & Bacon (2007) Individual Differences A fact of life – People differ from one another – People differ from one occasion to another Most psychological variables have small effects compared to individual differences Statistics give us a way to detect such subtle effects Copyright © Allyn & Bacon (2007) Descriptive Statistics Are used to describe the data Many types of descriptive statistics – Frequency distributions – Summary measures – Graphical representations of the data A way to visualize the data The first step in any statistical analysis Copyright © Allyn & Bacon (2007) Frequency Distributions First step in organization of data – Can see how the scores are distributed Used with all types of data Illustrate relationships between variables in a cross-tabulation Simplify distributions by using a grouped frequency distribution Copyright © Allyn & Bacon (2007) Creating Frequency Distributions Create a column with all possible scores Count the number of people that fall into each score – Some frequencies may be zero (no one had that score) Can only do a frequency distribution if: – The scores are not continuous – The range of scores is not too large (becomes unwieldy) Copyright © Allyn & Bacon (2007) Creating a Grouped Frequency Distribution Start by creating about 10-15 equal sized intervals sufficient to cover the range of scores Count the number of people in each interval Necessary whenever the distribution is continuous Useful when the range of scores is large Copyright © Allyn & Bacon (2007) Cross-Tabulation A way to see the relationship between two nominal or ordinal variables – When done with score data, it is usually done as a scatter plot (covered later) Create a set of cells by listing the values of one variable as columns and the values of the other as rows Copyright © Allyn & Bacon (2007) Cross-Tabulation Example Males Females Total Democrats 4 5 9 Republicans 6 1 7 Other 7 1 8 Total 17 7 24 Copyright © Allyn & Bacon (2007) Graphing Data Visual displays are often easier to comprehend Two types of graphs covered here – Histograms – Frequency Polygons Copyright © Allyn & Bacon (2007) Histograms A bar graph, as shown at the right Can be used to graph either – Data representing discrete categories – Data representing scores from a continuous variable Sample Histogram 60 50 40 Freq 30 20 10 0 1 2 3 4 5 6 Scores Copyright © Allyn & Bacon (2007) Graphing 2 Distributions Possible to graph two or more distributions to see how they compare Note that one of the two groups in this histogram was the same group graphed previously Sample Histogram 80 70 60 50 Freq 40 30 20 10 0 Copyright © Allyn & Bacon (2007) 1 2 3 4 5 6 Scores Frequency Polygon Like a histogram except that the frequency is shown with a dot, with the dots connected Frequency Polygon 60 50 40 Frequency 30 20 10 0 1 2 3 4 Scores Copyright © Allyn & Bacon (2007) 5 6 Two Frequency Polygons Can compare two of more frequency polygons on the same scale Easier to compare groups because the graph appears less cluttered than multiple histograms Frequency Polygons 80 70 60 Frequency 50 40 30 20 10 0 1 2 3 4 Scores Copyright © Allyn & Bacon (2007) 5 6 Shapes of Distributions Many psychological variables are distributed normally The distribution is skewed if scores bunch up at one end Copyright © Allyn & Bacon (2007) Measures of Central Tendency Mode: the most frequently occurring score – Easy to compute from frequency distribution Median: the middle score in a distribution – Less affected than the mean by a few deviant scores Mean: the arithmetic average – Most commonly used central tendency measure – Used in later inferential statistics Copyright © Allyn & Bacon (2007) Finding the Mode Easiest way to find the mode is to construct a frequency distribution first Find the score with the largest frequency If there are two or more scores that are tied for the largest frequency, report each of them Copyright © Allyn & Bacon (2007) Computing the Median Order the scores from smallest to largest Determine the middle score [(N+1)/2] – If 7 scores, the middle is the fourth score [(7+1)/2]=4 – If 10 scores, the middle score is half way between the 5th and 6th scores [(10+1)/2]=5.5 Copyright © Allyn & Bacon (2007) Computing the Mean Compute the mean of 3, 4, 2, 5, 7, & 5 Sum the numbers (26) Count the numbers (6) Plug these values into the equations X X N 26 X 4.33 6 Copyright © Allyn & Bacon (2007) Measuring Variability Range: lowest to highest score Average Deviation: average distance from the mean Variance: average squared distance from the mean – Used in later inferential statistics Standard Deviation: square root of variance Copyright © Allyn & Bacon (2007) The Range Computing the Range – Find the lowest score – Find the highest score – Subtract the lowest from the highest score Easy to compute, but unstable because it relies on only two scores Copyright © Allyn & Bacon (2007) The Average Deviation Computing the average deviation – Compute the mean – Compute the distance of each score from the mean (absolute distance, ignore sign) – Sum those distances and divide by the number of scores Easy to understand conceptually, but rarely used because it does not have good statistical properties Copyright © Allyn & Bacon (2007) The Variance Computing the Variance – Compute the mean – Compute the distance of each score from the mean – Square those distance – Sum those squared distances and divide by the degrees of freedom (N - 1) Good statistical properties, but this measure of variability is in squared units Copyright © Allyn & Bacon (2007) The Standard Deviation Computing the Standard Deviation – Compute the variance – Take the square root of the variance This measure, like the variance, has good statistical properties and is measured in the same units as the mean Copyright © Allyn & Bacon (2007) Measures of Relationship Pearson product-moment correlation – Used with interval or ratio data Spearman rank-order correlation – Used when one variable is ordinal and the second is at least ordinal Scatter plots – Visual representation of a correlation – Helps to identify nonlinear relationships Copyright © Allyn & Bacon (2007) Correlations Range from –1.00 to +1.00 – A -1.00 means a perfect negative relationship (as one score decreases, the other increases a predictable amount) – +1.00 means a perfect positive relationship – 0.00 means that there is no relationship Copyright © Allyn & Bacon (2007) Linear Relationships Correlation coefficients are sensitive only to linear relationships Linear relationships mean that the points of a scatter plot cluster around a straight line Should always look at the scatter plot to see whether the correlation coefficient is appropriate Copyright © Allyn & Bacon (2007) Regression Using a correlation to predict one variable from knowing the score on the other variable Usually a linear regression (finding the best fitting straight line for the data) Best illustrated in a scatter plot with the regression line also plotted (see Figure 5.6) Copyright © Allyn & Bacon (2007) Reliability Indices Test-retest reliability and interrater reliability are indexed with a Pearson product-moment correlation Internal consistency reliability is indexed with coefficient alpha Details on these computations are included on the Student Resource Website Copyright © Allyn & Bacon (2007) Standard Scores (Z-scores) A way to put scores on a common scale Computed by subtracting the mean from the score and dividing by the standard deviation Interpreting the Z-score – Positive Z-scores are above the mean; negative Z-scores are below the mean – The larger the absolute value of the Z-score, the further the score is from the mean Copyright © Allyn & Bacon (2007) Inferential Statistics Used to draw inferences about populations on the basis of samples Sometimes called “statistical tests” Provide an objective way of quantifying the strength of the evidence for a hypothesis Copyright © Allyn & Bacon (2007) Populations and Samples Population: the larger groups of all participants of interest Sample: a subset of the population Samples almost never represent populations perfectly (sampling error) – Not really an error – Just the natural variability that you can expect from one sample to another Copyright © Allyn & Bacon (2007) The Null Hypothesis States that there is NO difference between the population means Compare sample means to test the null hypothesis Population parameters & sample statistics – Population parameter: descriptive statistic computed from everyone in the population – Sample statistics: a descriptive statistic computed from everyone in your sample Copyright © Allyn & Bacon (2007) Statistical Decisions Either Reject or Fail to Reject the null hypothesis – Rejecting the null hypothesis suggests that there is a difference in the populations sampled – Failing to reject suggests that no difference exists – Decision is based on probability – Alpha: the statistical decision criteria used in testing the null hypothesis – Traditionally, alpha is set to small values (.05 or .01) Always a chance for error in our decision Copyright © Allyn & Bacon (2007) Statistical Decision Process Reject Null Hypothesis Retain Null Hypothesis Null Hypothesis is True Type I Error Correct Decision Null Hypothesis is False Correct Decision Type II Error Copyright © Allyn & Bacon (2007) Testing for Mean Differences t-test for independent groups: tests mean difference of two independent groups Correlated t-test: tests mean difference of two correlated groups Analysis of Variance: tests mean differences in two or more groups – Groups may or may not be independent – Also capable of evaluating factorial designs Copyright © Allyn & Bacon (2007) Power of a Statistical Test Sensitivity of the procedure to detect real differences between populations A function of both the statistical test and the precision of the research design Increasing the sample size increases the power – Larger samples estimate the population parameters more precisely Copyright © Allyn & Bacon (2007) Effect Size Indication the size of the group differences Unlike the statistical test, the effect size is NOT affected by the size of the sample More details on effect size – In Chapter 15 – On the Student Resource Website Copyright © Allyn & Bacon (2007) Statistical versus Practical Significance Statistical significance: Is the observed group difference unlikely to be due to sampling error – Can get statistical significance, even with very small population differences if the sample size is large enough Practical significance looks at whether the difference is large enough to be of value in a practical sense – More concerned with the effect size Copyright © Allyn & Bacon (2007) Meta-Analysis Relatively new statistical technique Allows researchers to statistically combine the results of several studies to get a sense of how powerful the effect is – Discussed in more detail in Chapter 15 Copyright © Allyn & Bacon (2007) Summary Statistics allow us to detect and evaluate group differences that are small compared to individual differences Descriptive versus inferential statistics – Descriptive statistics describe the data – Inferential statistics are used to draw inferences about population parameters on the basis of sample statistics Statistics objectify evaluations, but do not guarantee correct decisions Copyright © Allyn & Bacon (2007)