Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Survey

Document related concepts

Transcript

Research Methods in Crime and Justice Chapter 14 Data and Information Analysis Analysis • During analysis researchers evaluate the data in order to answer research questions or hypotheses. • Analysis occurs at the end of the research process, after the data are collected. • But planning for analysis should start at the very beginning of the research process. Quantitative Data Analysis • Why do we not like statistics? – Some of us are not particularly fond of math. – Statisticians use strange words. – We really do not trust statisticians. • There are two general categories of statistics. – Descriptive statistics – Inferential statistics Descriptive Statistics • Descriptive statistics describe the current state of something. • These statistics provide us a single number that summarizes a characteristic of an entire sample or population. – – – – – Measures of central tendency Measures of variability Percentages, percentiles and percent change Rates The normal distribution Measures of Central Tendency • In quantitative data sets, data tend to cluster around a central value. • Measures of central tendency tell us what is usual or typical about the cases in a sample or population • There are three commonly used measures of central tendency. – The mean – The median – The mode Measures of Central Tendency • The mean is the average of all the values of a particular variable. • It is calculated by adding together all of the values for a particular variable and dividing that sum by the total number of cases. – The most commonly used measure of central tendency. – Outliers, which are extremely high or low numbers, can dramatically change the mean. Measures of Central Tendency • The median refers to the middlemost value. • It is the value that is situated in the middle, with half the cases equal to or greater than and half the cases equal to or lesser than this value. • Because the median does not depend on the sum of all values it is less susceptible to outliers. Measures of Central Tendency • The mode is the most frequently occurring value in a population or sample. • In most cases, there is only one most frequently occurring value. Measures of Central Tendency • The decision about which measure of central tendency to use should be based on; – Whether the data are skewed by outliers, and – What level the variables are measured at. Measures of Central Tendency • For data skewed by outliners, the median or the mode would be more appropriate. • The mode is the only measure available for nominal level variables. • The mode and median are appropriate measures for ordinal level variables. • The mean, median and mode are appropriate measures for interval and ratio level variables Measures of Variability • Measures of variability tell us how much variation exists between the cases in a sample or population. • There are two commonly used measures of variability. – The range – The standard deviation Measures of Variability • The range is the difference between the highest and lowest value in a sample or population. • The range is computed by subtracting the smallest value from the largest value. – The most commonly used measure of variability. – Outliers, which are extremely high or low numbers, can dramatically change the range. Measures of Variability • The standard deviation considers how much each value varies from the mean. • Higher standard deviations indicate higher levels of variation within a sample or population. • Because the standard deviation considers both the mean and the total number of cases in the sample or population, it not as susceptible to outliers. Percentages • A percentage is a portion of a sample or population. • All percentages are based on a denominator of 100. • Percentages are calculated by dividing the number of like cases by the total number of cases, then multiplying that quotient by 100. Percentile • A percentile is a statistic that tells us where a value ranks within a distribution. • Sometimes this is referred to as the percentile rank. • For example, if your score on an exam was at the 90th percentile, 90 percent of all the people who took the exam scored equal to or less than you. • The median is at the 50th percentile. Percent Change • Percent change is a descriptive statistic that indicates how much something changed from one time to the next. • We calculate the percent change by subtracting the original number from the new number, dividing that difference by the original number and then multiplying that quotient by 100. Rates • A rate is a descriptive statistic that tells us how common an event is within a standard segment of the population. • In criminal justice and criminological research, we also use a lot of rates. • Rates enable us to compare similar behaviors across multiple locations. The Normal Distribution • When data are normally distributed. – 68.2 percent of all cases fall within one standard deviation of the mean. – 95.4 percent of all cases fall within two standard deviations of the mean, and – 99.9 percent of all cases fall within three standard deviations of the mean. – The mean, median and mode are equal. • We can use this information to predict outcomes. Inferential Statistics • Inferential statistics provide information that can help us predict (i.e. infer) outcomes. • There are six inferential statistical techniques commonly used in criminal justice research and practice. – – – – – – t-test Analysis of variance Chi Square Pearson r Spearman rho Multiple regression Statistical Significance • The first thing we want to know when looking at inferential statistics is whether the statistic is statistically significant. • Statistical significance is a measure of the probability that the statistic is due to chance. • As a general rule, if the statistical significance of a statistic is .05 or less, we can conclude that the results are not due to chance. t-test • The t-test is a statistical technique used to determine whether or not two groups are different with respect to a single variable. • The t-test requires interval or ratio level data. • A t-test produces a t-score statistic. • If the statistical significance of the t-score is .05 or less, we can conclude that the difference between the two groups is not due to chance. t-test • A t-test produces a t-score statistic. • If the statistical significance of the t-score is .05 or less, we can conclude that the difference between the two groups is not due to chance. Analysis of Variance • The analysis of variance (ANOVA) can evaluate the difference between two or more groups with respect to a single variable. • The ANOVA requires interval or ratio level data. Analysis of Variance • An ANOVA an F-ratio statistic. • If the statistical significance of the F-ratio is .05 or less, we can conclude that the difference between at least two of the groups is not due to chance. • Determining which two groups are different requires the use of a post-hoc test. Chi Square • The Chi Square test is used to determine whether there is a difference between what we expected to happen and what actually happened. • This statistical model requires nominal data. Chi Square • The operative statistic is called the chi-square statistic. • If the statistical significance of the chi square statistic is .05 or less, we can conclude that the difference between what happened and what was supposed to happen was not due to chance. • A review of the contingency table is required to determine where the difference lies. Pearson r • The Pearson r is used to determine whether or not two variables are associated or correlated. • It measures both the degree of correlation, as well as the nature of the correlation. • In order to use the Pearson r, the data must be collected at the interval or ratio level. Pearson r • Numerically, the Pearson r statistic ranges from -1 to +1. • The closer it is to -1 or +1, the higher the level of correlation between the two variables. • The closer it is to 0, the lower the level of correlation between the two variables. Pearson r • If the statistic is positive (+), an increase (or decrease) in one variable leads to an increase (or decrease) in the other. • If the statistic is negative (-), an increase (or decrease) in one variable leads to a decrease (or increase) in the other. Pearson r • The Pearson r is a useful statistical technique, but it has two important limitations. – It cannot be used to determine which variable is the cause and which variable is the effect. – It cannot determine whether two variables are related directly or indirectly. Spearman rho • The Spearman rho statistic, like the Pearson r, measures the level and nature of correlation between two variables. • The Spearman rho is used for variables measured at the ordinal level of measurement. • The range of the Spearman rho statistic is -.80 to +.80. Multiple Regression • Multiple regression enables the analyst to measure the individual and combined effects of various independent variables on a single dependent variable. • The multiple regression model requires data collected at the interval or ratio levels. Multiple Regression • The primary statistics produced by a multiple regression are called coefficients. – The unstandardized coefficient allows the analyst to predict the value of the dependent variable with known values of the independent variable(s). – The standardized coefficient allows the analyst to rank order the independent variables in terms of their actual effect on the dependent variable. Multiple Regression • Regression models include numerous diagnostic statistics that that indicate how well the independent variable(s) predict the outcome of the dependent variable. • The most useful is the R2. • This is a measure of how much variation (in percentage form) in the dependent variable is explained by the independent variable(s). Selecting an Appropriate Statistical Technique • The decision on which statistical technique would be the most appropriate for the data collected during the research process depends on; – The level (nominal, ordinal, interval, ratio) at which the data are measured, and – The type (association or difference) of hypothesis. Selecting an Appropriate Statistical Technique • Use Chi Square when; – The data are collected at the nominal level, and – For a hypothesis of difference. • Use Spearman rho when; – The data are collected at the ordinal level, and – For a hypothesis of association. Selecting an Appropriate Statistical Technique • Use Pearson r when; – The data are measured at the interval or ratio level, – For hypothesis of association, and – You do not need to use the independent variables to predict the outcome of the dependent variable. Selecting an Appropriate Statistical Technique • Use multiple regression when; – The data are measured at the interval or ratio level, – For hypothesis of association, and – You need to use the independent variables to predict the outcome of the dependent variable. Selecting an Appropriate Statistical Technique • Use a t-test when; – The data are measured at the interval or ratio level, – For hypothesis of difference, and – You are comparing the difference (with respect to a single variable) between two groups. Selecting an Appropriate Statistical Technique • Use an Analysis of Variance when; – The data are measured at the interval or ratio level, – For hypothesis of difference, and – You are comparing the difference (with respect to a single variable) between two or more groups. Qualitative Data Analysis • The subjective and interpretive nature of qualitative research produces a challenge in terms of data analysis. • Human behavior may be interpreted in many different ways depending on the context in which the behavior occurs. Qualitative Data Analysis • The challenge of the qualitative researcher is to understand this subjective meaning, how it arises out of a particular social context, and how it relates to broader social patterns. Qualitative Data Analysis • There are six commonly used techniques in qualitative data analysis. – Transcription – Memoing – Segmenting – Coding – Diagramming – Matrices Transcription • Qualitative researchers often make audio or video recordings of their observations. • These notes must be transcribed into a written form prior to analysis. • The process of producing a written transcript from video and audio recordings is known as transcription. Transcription • The transcription process must capture the subjective elements and contextual nuances of the observation. • An effective filing system that facilitates cross referencing is essential. Memoing • To enhance the quality of their transcripts, qualitative researchers record their thoughts or impressions, within the text of the transcript. • This process is commonly called memoing. • Memos are often written in the field and added to the transcript later. Memoing • Memos are essentially reminders of what the researcher is thinking at the time. • Collectively, memos can reveal common patterns in qualitative data. Segmenting • Segmenting is a process used by researchers to organize or categorize qualitative data. • This stage of qualitative data analysis occurs after the researcher is familiar with the data. • The categories (or natural divisions) within qualitative data are often used to develop typologies. Coding • In the process of segmenting the data, researchers usually apply a particular name or descriptive word to the segments of the data that they identify as meaningful. • This process of marking segments of the data with consistent names and terms is referred to as coding. Coding • There are two general types of codes in qualitative analysis. – a priori codes are names or labels that are established at the outset of the research project, prior to data collection. – Grounded codes are names or labels that are discovered within the qualitative data during or shortly after the data collection process. Diagramming • Written information can be effectively communicated through a visual image. • Diagramming is a process whereby researchers develop visual images to illustrate common themes or interactions between qualitative data. • Flow charts or a hierarchical diagrams illustrate relationships within data and tell a ‘visual story’ of how the data ‘fit’ together. Matrices • Matrices are tables that illustrate relationships between variables. • Similar to diagramming in that is illustrates relationships within qualitative data. • The difference is that matrices are tabular, while diagrams are more figurative. Getting to the Point • During the analysis phase, researchers evaluate the data they gather to answer their research questions or hypotheses. • Even though analysis occurs near the end of the research process, considerations of analysis should occur earlier in the research process. Getting to the Point • Statistics summarize large amounts of data into a single number and enable us to communicate information efficiently. • There are two general types of statistics – Descriptive statistics, and – Inferential statistics. Getting to the Point • Descriptive statistics describe the current state of something. • An important set of descriptive statistics are known as the measures of central tendency. • These measures include the mean, median, and mode. Getting to the Point • The mean is calculated by adding together all of the values for a particular variable and dividing that sum by the total number of cases. • Although it is a good measure of central tendency, it is sensitive to extreme values, or outliers. Getting to the Point • The median is referred to as the middlemost value because it is the value that is situated in the middle, with half the cases equal to or greater than and half the cases equal to or lesser than this value. • It is less susceptible to extreme values or outliers than the mean. Getting to the Point • The mode is the most frequently occurring value in a population or sample. • Like the median, the mode is less susceptible to extreme values or outliers than the mean. Getting to the Point • The decision about which measure of central tendency to use should be based on two factors; – whether the data are skewed toward extreme scores, and – what level the variables are measured at. Getting to the Point • Measures of variability are descriptive statistics that tell us how much variation exists within a sample or population. • Among the measures of variability is the range, which is the difference between the highest and lowest value in a sample or population. • This descriptive statistic, like the mean, is susceptible to extreme scores or outliers. Getting to the Point • The standard deviation is a descriptive statistic that describes how much variability exist within a sample or population. • Because the standard deviation considers both the mean and the total number of cases in the sample or population, it is a much more stable statistic than the range. Getting to the Point • A percentage is a descriptive statistic that describes a portion of a sample or population. • Percentages are calculated by dividing the number of like cases by the total number of cases, then multiplying that quotient by 100. Getting to the Point • A percentile is a statistic that tells us where a value ranks within a distribution. • Sometimes this is referred to as the percentile rank. • We calculate the percentile rank by dividing the number of cases below the value by the total number of cases and then multiplying that quotient by 100. Getting to the Point • Percent change is a descriptive statistic that indicates how much something changed from one time to the next. • We calculate the percent change by subtracting the original number from the new number, dividing that difference by the original number and then multiplying that quotient by 100. Getting to the Point • Rates are a descriptive statistic that enable us to compare similar behaviors across multiple locations. • Rates factor in population size and report incidents per n units. Getting to the Point • In normally distributed data; – the mean, median and mode are equal. – 68.2 percent of all cases fall within one standard deviation of the mean. – 95.4 percent of all cases fall within two standard deviations of the mean – 99.9 percent of all cases fall within three standard deviations of the mean. Getting to the Point • Inferential statistics enable analysts to determine the probability of certain outcomes. Getting to the Point • When reading inferential statistics, we are concerned with statistical significance, which is a measure of the probability that the statistic is due to chance. • If the statistical significance of a statistic is .05 or less, we can conclude that the results are not due to chance. Getting to the Point • The t-test is a statistical technique used to determine whether or not two groups are different with respect to a single variable. • t-tests require interval or ratio level data. • If the statistical significance of the t-score is .05 or less, it can be concluded that the difference between the two groups is not due to chance. Getting to the Point • The analysis of variance (ANOVA) model allows analysts to compare two or more groups to see if they are different with respect to a single variable measured at the interval or ratio level. • An ANOVA produces an F-ratio statistic. • If the statistical significance of the F-ratio is .05 or less, it can be concluded that the difference between at least two of the groups is not due to chance. Getting to the Point • The Chi Square test is used to determine whether there is a statistically significant difference between what we expect to happen and what actually happens. • The operative statistic is called the chi-square statistic. • If the statistical significance of the chi square statistic is .05 or less, we conclude that the difference between what actually happened and what was expected to happen was not due to chance. Getting to the Point • The Pearson r is used to determine whether two variables measured at the interval or ratio level are correlated. • The Pearson r coefficient ranges from -1 to +1. • The closer it is to -1 or +1, the higher the level of correlation between the two variables. Getting to the Point • Positive Pearson r coefficients indicate a positive correlation. • Negative Pearson r coefficients indicate a negative correlation. Getting to the Point • The Spearman rho statistic is similar to the Pearson r. • This statistic indicates the level of correlation between variables measured at the ordinal level. • It ranges from -.80 to +.80. Getting to the Point • Multiple regression enables the analyst to measure the individual and combined effects of various independent variables on a dependent variable. • A multiple regression requires data collected at the interval or ratio levels. Getting to the Point • The decision as to which inferential statistical technique to use depends on; – the level at which the data are measured, and – the type of hypothesis that the study is testing. Getting to the Point • Qualitative researchers focus more on analyzing words than they do numbers; they attempt to explain the ‘how’ and ‘why’ of social processes. Getting to the Point • The process of producing a written transcript of interviews that have been video- or audiotaped is known as transcription. • These transcripts provide the written data that qualitative researchers analyze. Getting to the Point • Qualitative researchers use a process called memoing to record their thoughts and ideas on the research data. • Memoing is typically on-going throughout the data collection process. Getting to the Point • Segmenting is a process used by researchers to organize or categorize qualitative data. • This stage of qualitative data analysis occurs after the researcher has familiarized themselves with the data. Getting to the Point • After segmenting the data, qualitative researchers go through their data and code it. • Coding refers to a process whereby researchers; – identify recurring themes, – label these themes with a descriptive word or phrase (“codes), and – organize their notes or transcripts according to these themes. Getting to the Point • Diagramming is a process by which researchers develop flow charts or hierarchical diagrams to illustrate relationships between different parts of their qualitative data. • Researchers also use matrices, or tables, to illustrate such relationships. Getting to the Point • There are a number of software programs specifically designed for qualitative data analysis. • These programs include ATLAS™, Nvivo™, NUD-IST™, and Ethnograph™. • Using these and other programs, researchers and practitioners can mine data for patterns and other useful information. Research Methods in Crime and Justice Chapter 14 Data and Information Analysis