Download Inferential Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Inferential Statistics
Which Statistic Do I Use?
•Dependent Variable Type
•Continuous
•Categorical
•Number of Factors (Independent Variables)
•One
•Two or More (Factorial Analysis)
•Number of Levels (of the Independent Variable)
•Two
•Three or More (Between or Repeated Measures)
Inferential Statistics
Random Error
•will be responsible for some difference in the means
Inferential Statistics
•gives the probability that the difference between means reflects
random error rather than a real difference
Null Hypothesis
Null Hypothesis (H0):
•states that there is no difference between the sample means
•any observed difference is due to random error
•the Null Hypothesis is rejected when there is a low probability that the
attained results are due to random error
Research Hypothesis
Research Hypothesis (H1):
•state that there is a difference between the sample means
•the difference is not due to random error
•the difference is due to the Independent Variable
Probability
•probability is the likelihood of the occurrence of some event or outcome
•if probability is low we reject the possibility of random error
•a significant result is one that is very unlikely if the null hypothesis is correct
•alpha level: probability required for significance
•most common alpha level is p< 0.05
•if there is less than 5% chance that the results were due to random error then the
results are considered to be statistically significant
Samples and Populations
Samples
are subsets of the population. Inferential statistics reflect what would
happen if you had multiple samples
Sampling
If a dependent variable within the population is normally distributed and we can
calculate or estimate the mean and standard deviation of the population, then we can
use probabilities to determine if the independent variable has caused a significant
change in the dependent variable. This is the essence of the scientific method. To
begin we must collect a representative sample of our much larger population.
Representative Sample
means that all significant subgroups of the population are represented in the sample.
Random Sampling
is used to increase the chances of obtaining a representative sample. It assures that
everyone in the population of interest is equally likely to be chosen as a subject. The
larger the random sample, the more likely it will be representative of the
population.
Randomization
•Assures that any extraneous variable will affect all
•participants equally
•Uses lists of random numbers or random number generators
•Any variable that can not be held constant is controlled by
randomization
•Can be used for scheduling the ordering of events
Random
Numbers
Step 1: make a numbered list of all
of your experimental participants
Step 2: flip through the 4 pages and
arbitrarily put your finger on a page
Step 3: read the numbers in
sequence, either down or across
Step 4: assign those numbers, in
order, to your list of participants; if
the number is duplicated then skip
and go to the next one
Step 5: assign your participants with
the highest numbers to group 1 and
the ones with the lowest numbers to
group 2.
Who Needs Random Sampling?
Hite Report Survey on Women’s Relationships:
Percentage of
Respondents
Hite Report Findings
100
80
60
40
20
0
95
70
Had an Affair
Felt Harassed
Women's Disclosure about Relationship
We Need Random Sampling
When Survey was Redone with a Random Sample:
Percentage of
Respondents
Hite Report Findings
100
80
60
40
20
0
14
Had an Affair
3
Felt Harassed
Women's Disclosure about Relationship
Sampling Techniques
• Probability Sampling
• Simple Random Sampling
• Stratified Random Sampling
• Cluster Sampling
• Non-Probability Sampling
• Haphazard Sampling
• Quota Sampling
Evaluating Samples
• Sampling Frame
• is defined as the actual population of individuals from which a
random sample will be drawn
• must assess how well the sampling frame matches the overall
population
• Response Rate
•
•
•
•
percentage who actually take the survey
determines amount of bias in the final data
low response rate is less accurate
must do all you can to increase the response rate
• Reasons for Using Convenience Sampling
• most research in psychology uses non-probability sampling
• saves time and money and is generalizable
• research is to study relationships not estimate population
Sampling Distribution
If the sample is completely random then the sample mean should be a good
estimate of the population mean.
When we take multiple samples from our population we will get a range of
means.
We can make a Distribution of Sample Means or sometimes called a Sampling
Distribution.
The Central Limit Theorem states that the distribution of sample means
approaches a normal distribution when n is large. In such a distribution of
unlimited number of sample means, the mean of the sample means will equal
the population mean:
Sampling Distribution
The Sampling Distribution is a
probability distribution of all possible
outcomes due simply to chance
based on the assumption that the null
hypothesis is true.
When your outcome becomes highly
unlikely based on pure chance we
reject the Null Hypothesis.
Science sets this low probability at
p < 0.05 or only 5% due to chance.
Standard Error of the Mean
The standard deviation of the distribution of sample means is called the standard
error of the mean or standard error for short. It is represented by the following
formula:
Since the standard deviation of the population is sometimes difficult to get, a
good estimate of the standard error uses the standard deviation of the sample.
This formula is shown below.
Type I and Type II Error
Anytime you observe a difference in behavior between groups, it may exist for two
reasons:
1.) there is a real difference between the groups, or
2.) the results are due to error involved in sampling.
This error can be described in two ways:
Type I error is when you reject the null hypothesis when shouldn't have because the null
hypothesis is actually true - there is not difference between your groups.
Type II error is when you fail to reject the null hypothesis when you should have because
there really is a significant difference between your groups.
The probability of committing a Type I error is designated by alpha.
An alpha level of 0.05 is reasonable and widely accepted by all scientists.
The null hypothesis can be rejected if there is less than 0.05 probability of committing a
Type I error ( p < .05 ).
One-Tailed Hypothesis
If the scientific hypothesis predicts a direction of the results, we say it is a OneTailed Hypothesis because it is predicting that alpha will fall only in one specific
directional tail. If the sample mean falls in this area we can reject the null
hypothesis. This is shown below:
Two-Tailed Hypothesis
If the scientific hypothesis does not predict a direction of the results, we say it is a
Two-Tailed Hypothesis because it is predicting that alpha will be split between both
tails of the distribution. If the sample mean falls in either of these areas we can
reject the null hypothesis. This is shown below:
Degrees of Freedom
The term degrees of freedom refers to the number of scores within a data set that
are free to vary.
In any sample with a fixed mean, the sum of the deviation scores is equal to zero.
If your sample has an n = 10. The first 9 scores are free to vary but the 10th
score must be a specific value that makes the entire distribution equal to zero.
Therefore in a single sample the degrees of freedom would be equal to n - 1.
Which Statistic Do I Use?
•Dependent Variable Type
•Continuous
•Categorical
•Number of Factors (Independent Variables)
•One
•Two or More (Factorial Analysis)
•Number of Levels (of the Independent Variable)
•Two
•Three or More (Between or Repeated Measures)
t-test
•examines whether two groups are significantly different from each other
•must specify null hypothesis and significance level (alpha)
•we calculate our t and determine where it lies on the sampling distribution
•the t-test is a ratio between the group mean difference and the variability within
groups
t-test Types
•One Sample t-test
•for comparison of one sample to a population
•Independent t-test
•for comparison of two independent samples
•Paired or Correlated t-test
•for comparison of two correlated samples
Analysis of Variance (ANOVA)
•compares three or more groups
•can compare two or more Independent Variables
•must specify null hypothesis and significance level (alpha)
•One-Way ANOVA
•Two-Way ANOVA
One-Way ANOVA
compares at least three levels
compares only one Independent Variable
Variance Source
Sum of
Squares
Degrees of
Freedom
Mean
Square
Between
SSbg
dfbg
MSbg
Within
SSwg
dfwg
MSwg
F ratio
F
Main Effect
The Main Effect is the effect that an independent variable has on the dependent
variable. In a One-Way ANOVA, there is only one Main Effect since there is only one
independent variables or Factors of the experiment.
The null hypothesis for the Main Effect of the is:
H0 : µ1 = µ 2 = . . . µk
The research hypothesis for the Main Effect is:
H1 : At least one of the sample means comes from a different population
distribution than the others
Two-Way ANOVA
compares at least two or more levels
compares at least two Independent Variable
Variance Source
Sum of
Squares
Degrees of
Freedom
Mean Square
F ratio
Rows
SSr
dfr
MSr
Fr
Columns
SSc
dfc
MSc
Fc
Interaction
SSr x c
dfr x c
MSr x c
Fr x c
Within
SSwg
SStotal
dfwg
dftotal
MSwg
Total
Chi-Square
The Chi-Square (X2) is used for analysis of nominal data.
Remember that nominal data are categorical data without any order of value.
Two good examples of nominal data are "yes-no" and "true-false" answers on a
survey.
Chi-Square analyses can be either One-Way, with one independent variable, or
Two-Way, with two independent variables.