Download Inferential Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Time series wikipedia , lookup

Omnibus test wikipedia , lookup

Student's t-test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Statistical Analysis
Scientists analyze data collected in an experiment to look for patterns or relationships among
variables. In order to determine that the patterns we observe are real, and not due to chance and our
own preconceived notions, we must test the perceived pattern for significance.
Statistical analysis allows scientists to test whether or not patterns are real, and not due to
chance or preconceived notions of the observer. We can never be 100% sure, but we can set some
level of certainty to our observations. A level of certainty accepted by most scientists is 95%. We will
be using tests that allow us to say we are 95% confident in our results.
Types of Data
Quantitative Data – represented by a number
Continuous quantitative – measurement scale divisible into partial units
Example - Distance in kilometers, volume in liters
Discrete quantitative - measurement scale with whole integers only
Example- People that can touch their toes, number of wolves born in given year
Quantitative data can also be subdivided based on zero point of the measuring scale.
Ratio data - measured using standard scale with equal divisible intervals & absolute zero.
Example- Temp of a gas on Kelvin scale, Velocity of an object in m/sec
Interval data - If the scale does not have absolute zero.
Example- Temp of sub on Celsius scale, pH
Qualitative Data (Nominal or Ordinal)
Nominal - When objects are named or can’t be ranked.
Example- Gender (male/female), color of hair (red, black, brown)
Ordinal - When objects are placed into categories that can be ranked.
Example- (activity of an animal on a scale of 1 to 5), Moh’s hardness scale for minerals
Describing data
Statisticians describe a set of data in 2 ways
1. Compute a measure of central tendency (number that is most typical of the entire set of data)
Mode
value that occurs most often (in a tie, use both)
Median
middle value when ranked highest to lowest
Mean
mathematical average
2. Describe variation (spread within the data - how closely the individual data points cluster
around the mean)
For quantitative data – Range (difference between smallest & largest DV),
Standard Deviation (σ), Variance σ2
Example - Plant height
For qualitative data - Frequency distribution (number of cases falling into each
category of variable)
Example - color of tomatoes produced with different ground colors
Making decisions about descriptive statistics & Graphs
Quantitative Data
Parameters
Type of data
Central
tendency
Variation
Degrees of
freedom
Level of
significance
Qualitative
Ratio data
Interval data
Nominal data
Ordinal data
data collected using
a scale with equal
intervals and with an
absolute
zero
(distance, velocity)
using a scale with
equal intervals but
no absolute zero
(temp0C, pH)
objects are placed
into categories that
cannot be ranked
(male/female
or
brown, black, red
hair)
objects are placed
into categories that
can
be
ranked
(Moh’s
hardness
scale
or
color
ranked 1- 10)
Mean
Mean
Mode
Median




Range
Standard
deviation
 Variance
Total # of samples -2
(ex. 15+15-2 = 28)
Range
Standard
deviation
 Variance
Total # of samples -2
(ex. 15+15-2 = 28)
Frequency
Distribution
0.025
0.025
0.05
(#rows
–1)
(#columns-1)
Frequency
Distribution
x
(#rows
–1)
(#columns-1)
x
0.05
Inferential Statistics - to determine if the data is statistically significant.
It limits the possibility that the data differences occurred by random chance or due to some unknown,
uncontrolled variable.
If the data is shown to be statistically significant then the data differences can be explained by changes
in the independent variable.
Statistical Tests and Graphs
1. The t-test (or Analysis of Variance): when you have two or more groups/sets and you want to
compare measurements of each group.
2. The Chi-square test: when you have counts that can be placed into yes or no categories, or
other simple categories such as quadrants.
3. The Pearson R Correlation: to test how the values of one event or object relates to the values of
another event or object. (for comparing two events such as nighttime temperatures and number of
patients in an emergency room)
Is Dependent Variable (DV) continuous, ordinal, or nominal?
Dependent Variable (DV)
Continuous
Continuous IV
T-test or ANOVA
Nominal IV
T-test or ANOVA
Ordinal IV
T-test
Scatter plot
Scatter plot of means
Bar graph of means
Bar graph of means
tDependent Variable (DV)
Continuous
Dependent Variable (DV)
Ordinal
Continuous IV
Chi-square
Nominal IV
Mann-Whitney’s test
Ordinal IV
Spearman’s test
Scatter plot
Or Histogram
Bar of means
Scatter plot
Bar graph of means
tDependent Variable (DV)
Continuous
Dependent Variable (DV)
Nominal
Continuous IV
T-test or F-test
Nominal IV
Paired-Mcnemar’s
Unpaired-Chi-square
Ordinal IV
Spearman’s test
Bar graph of means
Bar graph of proportions
Scatter plot
Null Hypothesis (μ)
- Basically states that there is no difference between the mean of your
control group and the mean of your experimental group. Therefore any observed difference between
the two sample means occurred by chance and is not significant. If you can reject your null hypothesis
then there is a significant difference between your control and experimental groups.
Write your null hypothesis here:
________________________________________________________________________________
__________________________________________________________________________________
Level of significance () -
It communicates probability of error in rejecting Null hypothesis
is affected by sample size.
Each test statistic has associated a p-value that would reflect how ‘comfortable’ is the researcher in
rejecting the null hypothesis in support of the alternative hypothesis.
This ‘comfort zone’ is attained when the p-value of a test statistic is below 0.05. If the p-value is not
in the ‘comfort zone’ then it is concluded that there is not statistical evidence to reject the null state.
We will use p-value < 0.05 which means that the probability of error in the research is 5/100
(95% results have no error).
Degree of freedom (df) - It is number of independent observations in a sample.
For t-test df = (n1-1) + (n2-1)
For Chi-square test df = (#rows – 1) (#columns – 1)
For Pearson R correlation df = (n-2) subtract 2 from the number of comparisons made.
The larger the sample (df), smaller the difference between means.
The scientists have more confidence in experiments with larger sample size & repeated trials.
The influence of the level of significance () can be seen by examining numerical relationships in a
horizontal row of the same excerpted t sampling distribution.
The smaller the level of significance () & error rate, the larger the difference between means required
for significance.
Use the tables for the t-test and the Chi-square test to find the table value. Use your calculated degrees
of freedom and the Level of Significance of 0.05 (95%) to find the correct value.
Determine if the calculated value is greater or less than the table value.
If the calculated value is smaller than table value, the Null hypothesis is NOT rejected.
When calculated value equals or exceeds the table value, the Null hypothesis is rejected.
For t-test: Refer to null hypothesis descriptions for decision to accept or reject the null
hypothesis.
For Chi-square: If x2 Calculated > x2 Table, then the null hypothesis is rejected.
For Pearson R Correlation: If the calculated value is greater than the table value, then reject
the null hypothesis.
If the r = 0.00 there is zero correlation.
If the r = 1.00 there is a perfect correlation.
Values can be + or - .Positive values indicate increase in X corresponds to increase in Y.
Negative values indicate increases in one value are associated with decreases in the other.
What Does it Mean to Accept or Reject the Null Hypothesis?
The null hypothesis generally states that there is no significant difference between your two sets of
data. If it is accepted, it means that any differences in your data are not significant and probably due
to random chance. If the null hypothesis is rejected, it means that there is a significant difference in
your two sets of data and these differences are due to the factors (independent variable) that you
changed.
Rejection of Null hypothesis supports the alternative decision that a true difference exists between
means and leads to support the original research hypothesis.
Decide whether to accept or reject your null hypothesis. Accept _________ Reject ________