Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia, lookup

Foundations of statistics wikipedia, lookup

Student's t-test wikipedia, lookup

History of statistics wikipedia, lookup

Categorical variable wikipedia, lookup

Time series wikipedia, lookup

Misuse of statistics wikipedia, lookup

Transcript
```Research Methods in Crime and Justice
Chapter 14
Data and Information Analysis
Analysis
• During analysis researchers evaluate the data
in order to answer research questions or
hypotheses.
• Analysis occurs at the end of the research
process, after the data are collected.
• But planning for analysis should start at the
very beginning of the research process.
Quantitative Data Analysis
• Why do we not like statistics?
– Some of us are not particularly fond of math.
– Statisticians use strange words.
– We really do not trust statisticians.
• There are two general categories of statistics.
– Descriptive statistics
– Inferential statistics
Descriptive Statistics
• Descriptive statistics describe the current state of
something.
• These statistics provide us a single number that
summarizes a characteristic of an entire sample
or population.
–
–
–
–
–
Measures of central tendency
Measures of variability
Percentages, percentiles and percent change
Rates
The normal distribution
Measures of Central Tendency
• In quantitative data sets, data tend to cluster
around a central value.
• Measures of central tendency tell us what is
usual or typical about the cases in a sample or
population
• There are three commonly used measures of
central tendency.
– The mean
– The median
– The mode
Measures of Central Tendency
• The mean is the average of all the values of a
particular variable.
• It is calculated by adding together all of the
values for a particular variable and dividing
that sum by the total number of cases.
– The most commonly used measure of central
tendency.
– Outliers, which are extremely high or low
numbers, can dramatically change the mean.
Measures of Central Tendency
• The median refers to the middlemost value.
• It is the value that is situated in the middle,
with half the cases equal to or greater than
and half the cases equal to or lesser than this
value.
• Because the median does not depend on the
sum of all values it is less susceptible to
outliers.
Measures of Central Tendency
• The mode is the most frequently occurring
value in a population or sample.
• In most cases, there is only one most
frequently occurring value.
Measures of Central Tendency
• The decision about which measure of central
tendency to use should be based on;
– Whether the data are skewed by outliers, and
– What level the variables are measured at.
Measures of Central Tendency
• For data skewed by outliners, the median or
the mode would be more appropriate.
• The mode is the only measure available for
nominal level variables.
• The mode and median are appropriate
measures for ordinal level variables.
• The mean, median and mode are appropriate
measures for interval and ratio level variables
Measures of Variability
• Measures of variability tell us how much
variation exists between the cases in a sample
or population.
• There are two commonly used measures of
variability.
– The range
– The standard deviation
Measures of Variability
• The range is the difference between the
highest and lowest value in a sample or
population.
• The range is computed by subtracting the
smallest value from the largest value.
– The most commonly used measure of variability.
– Outliers, which are extremely high or low
numbers, can dramatically change the range.
Measures of Variability
• The standard deviation considers how much
each value varies from the mean.
• Higher standard deviations indicate higher
levels of variation within a sample or
population.
• Because the standard deviation considers
both the mean and the total number of cases
in the sample or population, it not as
susceptible to outliers.
Percentages
• A percentage is a portion of a sample or
population.
• All percentages are based on a denominator
of 100.
• Percentages are calculated by dividing the
number of like cases by the total number of
cases, then multiplying that quotient by 100.
Percentile
• A percentile is a statistic that tells us where a
value ranks within a distribution.
• Sometimes this is referred to as the percentile
rank.
• For example, if your score on an exam was at
the 90th percentile, 90 percent of all the
people who took the exam scored equal to or
less than you.
• The median is at the 50th percentile.
Percent Change
• Percent change is a descriptive statistic that
indicates how much something changed from
one time to the next.
• We calculate the percent change by
subtracting the original number from the new
number, dividing that difference by the
original number and then multiplying that
quotient by 100.
Rates
• A rate is a descriptive statistic that tells us how
common an event is within a standard
segment of the population.
• In criminal justice and criminological research,
we also use a lot of rates.
• Rates enable us to compare similar behaviors
across multiple locations.
The Normal Distribution
• When data are normally distributed.
– 68.2 percent of all cases fall within one standard
deviation of the mean.
– 95.4 percent of all cases fall within two standard
deviations of the mean, and
– 99.9 percent of all cases fall within three standard
deviations of the mean.
– The mean, median and mode are equal.
• We can use this information to predict outcomes.
Inferential Statistics
• Inferential statistics provide information that can
help us predict (i.e. infer) outcomes.
• There are six inferential statistical techniques
commonly used in criminal justice research and
practice.
–
–
–
–
–
–
t-test
Analysis of variance
Chi Square
Pearson r
Spearman rho
Multiple regression
Statistical Significance
• The first thing we want to know when looking
at inferential statistics is whether the statistic
is statistically significant.
• Statistical significance is a measure of the
probability that the statistic is due to chance.
• As a general rule, if the statistical significance
of a statistic is .05 or less, we can conclude
that the results are not due to chance.
t-test
• The t-test is a statistical technique used to
determine whether or not two groups are
different with respect to a single variable.
• The t-test requires interval or ratio level data.
• A t-test produces a t-score statistic.
• If the statistical significance of the t-score is
.05 or less, we can conclude that the
difference between the two groups is not due
to chance.
t-test
• A t-test produces a t-score statistic.
• If the statistical significance of the t-score is
.05 or less, we can conclude that the
difference between the two groups is not due
to chance.
Analysis of Variance
• The analysis of variance (ANOVA) can
evaluate the difference between two or more
groups with respect to a single variable.
• The ANOVA requires interval or ratio level
data.
Analysis of Variance
• An ANOVA an F-ratio statistic.
• If the statistical significance of the F-ratio is
.05 or less, we can conclude that the
difference between at least two of the groups
is not due to chance.
• Determining which two groups are different
requires the use of a post-hoc test.
Chi Square
• The Chi Square test is used to determine
whether there is a difference between what
we expected to happen and what actually
happened.
• This statistical model requires nominal data.
Chi Square
• The operative statistic is called the chi-square
statistic.
• If the statistical significance of the chi square
statistic is .05 or less, we can conclude that
the difference between what happened and
what was supposed to happen was not due to
chance.
• A review of the contingency table is required
to determine where the difference lies.
Pearson r
• The Pearson r is used to determine whether
or not two variables are associated or
correlated.
• It measures both the degree of correlation, as
well as the nature of the correlation.
• In order to use the Pearson r, the data must be
collected at the interval or ratio level.
Pearson r
• Numerically, the Pearson r statistic ranges
from -1 to +1.
• The closer it is to -1 or +1, the higher the level
of correlation between the two variables.
• The closer it is to 0, the lower the level of
correlation between the two variables.
Pearson r
• If the statistic is positive (+), an increase (or
decrease) in one variable leads to an increase
(or decrease) in the other.
• If the statistic is negative (-), an increase (or
decrease) in one variable leads to a decrease
(or increase) in the other.
Pearson r
• The Pearson r is a useful statistical technique,
but it has two important limitations.
– It cannot be used to determine which variable is
the cause and which variable is the effect.
– It cannot determine whether two variables are
related directly or indirectly.
Spearman rho
• The Spearman rho statistic, like the Pearson r,
measures the level and nature of correlation
between two variables.
• The Spearman rho is used for variables
measured at the ordinal level of
measurement.
• The range of the Spearman rho statistic is -.80
to +.80.
Multiple Regression
• Multiple regression enables the analyst to
measure the individual and combined effects
of various independent variables on a single
dependent variable.
• The multiple regression model requires data
collected at the interval or ratio levels.
Multiple Regression
• The primary statistics produced by a multiple
regression are called coefficients.
– The unstandardized coefficient allows the analyst
to predict the value of the dependent variable
with known values of the independent variable(s).
– The standardized coefficient allows the analyst to
rank order the independent variables in terms of
their actual effect on the dependent variable.
Multiple Regression
• Regression models include numerous
diagnostic statistics that that indicate how
well the independent variable(s) predict the
outcome of the dependent variable.
• The most useful is the R2.
• This is a measure of how much variation (in
percentage form) in the dependent variable is
explained by the independent variable(s).
Selecting an Appropriate Statistical
Technique
• The decision on which statistical technique
would be the most appropriate for the data
collected during the research process depends
on;
– The level (nominal, ordinal, interval, ratio) at
which the data are measured, and
– The type (association or difference) of hypothesis.
Selecting an Appropriate Statistical
Technique
• Use Chi Square when;
– The data are collected at the nominal level, and
– For a hypothesis of difference.
• Use Spearman rho when;
– The data are collected at the ordinal level, and
– For a hypothesis of association.
Selecting an Appropriate Statistical
Technique
• Use Pearson r when;
– The data are measured at the interval or ratio
level,
– For hypothesis of association, and
– You do not need to use the independent variables
to predict the outcome of the dependent variable.
Selecting an Appropriate Statistical
Technique
• Use multiple regression when;
– The data are measured at the interval or ratio
level,
– For hypothesis of association, and
– You need to use the independent variables to
predict the outcome of the dependent variable.
Selecting an Appropriate Statistical
Technique
• Use a t-test when;
– The data are measured at the interval or ratio
level,
– For hypothesis of difference, and
– You are comparing the difference (with respect to
a single variable) between two groups.
Selecting an Appropriate Statistical
Technique
• Use an Analysis of Variance when;
– The data are measured at the interval or ratio
level,
– For hypothesis of difference, and
– You are comparing the difference (with respect to
a single variable) between two or more groups.
Qualitative Data Analysis
• The subjective and interpretive nature of
qualitative research produces a challenge in
terms of data analysis.
• Human behavior may be interpreted in many
different ways depending on the context in
which the behavior occurs.
Qualitative Data Analysis
• The challenge of the qualitative researcher is
to understand this subjective meaning, how it
arises out of a particular social context, and
how it relates to broader social patterns.
Qualitative Data Analysis
• There are six commonly used techniques in
qualitative data analysis.
– Transcription
– Memoing
– Segmenting
– Coding
– Diagramming
– Matrices
Transcription
• Qualitative researchers often make audio or
video recordings of their observations.
• These notes must be transcribed into a
written form prior to analysis.
• The process of producing a written transcript
from video and audio recordings is known as
transcription.
Transcription
• The transcription process must capture the
subjective elements and contextual nuances
of the observation.
• An effective filing system that facilitates cross
referencing is essential.
Memoing
• To enhance the quality of their transcripts,
qualitative researchers record their thoughts
or impressions, within the text of the
transcript.
• This process is commonly called memoing.
• Memos are often written in the field and
Memoing
• Memos are essentially reminders of what the
researcher is thinking at the time.
• Collectively, memos can reveal common
patterns in qualitative data.
Segmenting
• Segmenting is a process used by researchers
to organize or categorize qualitative data.
• This stage of qualitative data analysis occurs
after the researcher is familiar with the data.
• The categories (or natural divisions) within
qualitative data are often used to develop
typologies.
Coding
• In the process of segmenting the data,
researchers usually apply a particular name or
descriptive word to the segments of the data
that they identify as meaningful.
• This process of marking segments of the data
with consistent names and terms is referred to
as coding.
Coding
• There are two general types of codes in
qualitative analysis.
– a priori codes are names or labels that are
established at the outset of the research project,
prior to data collection.
– Grounded codes are names or labels that are
discovered within the qualitative data during or
shortly after the data collection process.
Diagramming
• Written information can be effectively
communicated through a visual image.
• Diagramming is a process whereby
researchers develop visual images to illustrate
common themes or interactions between
qualitative data.
• Flow charts or a hierarchical diagrams
illustrate relationships within data and tell a
‘visual story’ of how the data ‘fit’ together.
Matrices
• Matrices are tables that illustrate
relationships between variables.
• Similar to diagramming in that is illustrates
relationships within qualitative data.
• The difference is that matrices are tabular,
while diagrams are more figurative.
Getting to the Point
• During the analysis phase, researchers
evaluate the data they gather to answer their
research questions or hypotheses.
• Even though analysis occurs near the end of
the research process, considerations of
analysis should occur earlier in the research
process.
Getting to the Point
• Statistics summarize large amounts of data
into a single number and enable us to
communicate information efficiently.
• There are two general types of statistics
– Descriptive statistics, and
– Inferential statistics.
Getting to the Point
• Descriptive statistics describe the current
state of something.
• An important set of descriptive statistics are
known as the measures of central tendency.
• These measures include the mean, median,
and mode.
Getting to the Point
• The mean is calculated by adding together all
of the values for a particular variable and
dividing that sum by the total number of
cases.
• Although it is a good measure of central
tendency, it is sensitive to extreme values, or
outliers.
Getting to the Point
• The median is referred to as the middlemost
value because it is the value that is situated in
the middle, with half the cases equal to or
greater than and half the cases equal to or
lesser than this value.
• It is less susceptible to extreme values or
outliers than the mean.
Getting to the Point
• The mode is the most frequently occurring
value in a population or sample.
• Like the median, the mode is less susceptible
to extreme values or outliers than the mean.
Getting to the Point
• The decision about which measure of central
tendency to use should be based on two
factors;
– whether the data are skewed toward extreme
scores, and
– what level the variables are measured at.
Getting to the Point
• Measures of variability are descriptive
statistics that tell us how much variation exists
within a sample or population.
• Among the measures of variability is the
range, which is the difference between the
highest and lowest value in a sample or
population.
• This descriptive statistic, like the mean, is
susceptible to extreme scores or outliers.
Getting to the Point
• The standard deviation is a descriptive
statistic that describes how much variability
exist within a sample or population.
• Because the standard deviation considers
both the mean and the total number of cases
in the sample or population, it is a much more
stable statistic than the range.
Getting to the Point
• A percentage is a descriptive statistic that
describes a portion of a sample or population.
• Percentages are calculated by dividing the
number of like cases by the total number of
cases, then multiplying that quotient by 100.
Getting to the Point
• A percentile is a statistic that tells us where a
value ranks within a distribution.
• Sometimes this is referred to as the percentile
rank.
• We calculate the percentile rank by dividing
the number of cases below the value by the
total number of cases and then multiplying
that quotient by 100.
Getting to the Point
• Percent change is a descriptive statistic that
indicates how much something changed from
one time to the next.
• We calculate the percent change by
subtracting the original number from the new
number, dividing that difference by the
original number and then multiplying that
quotient by 100.
Getting to the Point
• Rates are a descriptive statistic that enable us
to compare similar behaviors across multiple
locations.
• Rates factor in population size and report
incidents per n units.
Getting to the Point
• In normally distributed data;
– the mean, median and mode are equal.
– 68.2 percent of all cases fall within one standard
deviation of the mean.
– 95.4 percent of all cases fall within two standard
deviations of the mean
– 99.9 percent of all cases fall within three standard
deviations of the mean.
Getting to the Point
• Inferential statistics enable analysts to
determine the probability of certain
outcomes.
Getting to the Point
• When reading inferential statistics, we are
concerned with statistical significance, which
is a measure of the probability that the
statistic is due to chance.
• If the statistical significance of a statistic is .05
or less, we can conclude that the results are
not due to chance.
Getting to the Point
• The t-test is a statistical technique used to
determine whether or not two groups are
different with respect to a single variable.
• t-tests require interval or ratio level data.
• If the statistical significance of the t-score is
.05 or less, it can be concluded that the
difference between the two groups is not due
to chance.
Getting to the Point
• The analysis of variance (ANOVA) model allows
analysts to compare two or more groups to see if
they are different with respect to a single variable
measured at the interval or ratio level.
• An ANOVA produces an F-ratio statistic.
• If the statistical significance of the F-ratio is .05 or
less, it can be concluded that the difference
between at least two of the groups is not due to
chance.
Getting to the Point
• The Chi Square test is used to determine whether
there is a statistically significant difference
between what we expect to happen and what
actually happens.
• The operative statistic is called the chi-square
statistic.
• If the statistical significance of the chi square
statistic is .05 or less, we conclude that the
difference between what actually happened and
what was expected to happen was not due to
chance.
Getting to the Point
• The Pearson r is used to determine whether
two variables measured at the interval or ratio
level are correlated.
• The Pearson r coefficient ranges from -1 to +1.
• The closer it is to -1 or +1, the higher the level
of correlation between the two variables.
Getting to the Point
• Positive Pearson r coefficients indicate a
positive correlation.
• Negative Pearson r coefficients indicate a
negative correlation.
Getting to the Point
• The Spearman rho statistic is similar to the
Pearson r.
• This statistic indicates the level of correlation
between variables measured at the ordinal
level.
• It ranges from -.80 to +.80.
Getting to the Point
• Multiple regression enables the analyst to
measure the individual and combined effects
of various independent variables on a
dependent variable.
• A multiple regression requires data collected
at the interval or ratio levels.
Getting to the Point
• The decision as to which inferential statistical
technique to use depends on;
– the level at which the data are measured, and
– the type of hypothesis that the study is testing.
Getting to the Point
• Qualitative researchers focus more on
analyzing words than they do numbers; they
attempt to explain the ‘how’ and ‘why’ of
social processes.
Getting to the Point
• The process of producing a written transcript
of interviews that have been video- or audiotaped is known as transcription.
• These transcripts provide the written data
that qualitative researchers analyze.
Getting to the Point
• Qualitative researchers use a process called
memoing to record their thoughts and ideas
on the research data.
• Memoing is typically on-going throughout the
data collection process.
Getting to the Point
• Segmenting is a process used by researchers
to organize or categorize qualitative data.
• This stage of qualitative data analysis occurs
after the researcher has familiarized
themselves with the data.
Getting to the Point
• After segmenting the data, qualitative
researchers go through their data and code it.
• Coding refers to a process whereby
researchers;
– identify recurring themes,
– label these themes with a descriptive word or
phrase (“codes), and
– organize their notes or transcripts according to
these themes.
Getting to the Point
• Diagramming is a process by which
researchers develop flow charts or hierarchical
diagrams to illustrate relationships between
different parts of their qualitative data.
• Researchers also use matrices, or tables, to
illustrate such relationships.
Getting to the Point
• There are a number of software programs
specifically designed for qualitative data
analysis.
• These programs include ATLAS™, Nvivo™,
NUD-IST™, and Ethnograph™.
• Using these and other programs, researchers
and practitioners can mine data for patterns
and other useful information.
Research Methods in Crime and Justice
Chapter 14
Data and Information Analysis
```