Download Inferential Statistics + Estimation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Inferential Statistics
Chapter 5
Keep in Mind!
1) Statistics are useful for figuring out random noise
from real effects.
2) Numbers are not absolute, and they can be easily
manipulated.
3) Always scrutinize data closely, and draw your own
conclusions.
4) 85% of all statistics are made up on the spot: the
rest are all wrong
The End of Descriptive Statistics
• Descriptive statistics’ work stops when the data
are :
– Displayed in an organized manner.
– Reduced to compact indices (e.g. mean and SD).
– Described in terms of distributional shape.
– Evaluated with regard to reliability and validity.
– Examined to reveal the strength and direction of
relationship (in the case of bivariate concern).
Inferential Statistics
• Drawing conclusions with regard to the
subjects from the data.
• Making statement(s) from the data in the
sample about the population.
• Inf. Stats. deals with the issue of
generalization of the study results.
Sample vs. Population
• A population comes of a group larger than the
sample.
• A sample is extracted form the population.
• Measurements are carried over the sample to
make an ‘educated guess’ about the
population.
Why Sampling?
• Cost in terms of money and time.
• Sometimes, the population of interest may
extend into the future. (abstract population to fit
the sample), the idea of ‘not pulled from’.
• Otherwise, the population is called tangible in
nature.
– The researcher begins with the population and ends
with the sample.
– There is a sampling frame in this case.
Statistic and Parameter
• Before making the educated guess or inference,
researchers should deal with:
– What the population is,
– how to extract the sample,
– what characteristics to measure in the sample,
• After deciding on the first and second points
• Reliability and validity are involved
– and, what the study statistical focus will be.
• In what way should the data be summarized?
Statistic and Parameter, cont’d
• Statistic is the numerical value that is
computed from the sample.
• Parameter is the value that is in the
population.
– Can never be computed, why?
• See symbols in Chapter 5, Page 104.
Types of Samples
• The choice of sample will influence:
– The accuracy of the inferential guess
– And/Or the definition of the population toward
which the guess is directed
• There several types of samples.
Probability Sample
• Conditions:
– The number of the population has to be known a
priori and small.
– The probability that every ‘object/person’ in the
population can be drawn in the sample should be
known.
• The resulting sample is probability sample.
• There are several types of probability sample.
Probability Sample, cont’d
• Simple Random Sample
– Population names in a hat, then draw the sample
blindly (randomly).
– The population has to be small to be conceivable.
– Equal opportunity of being in the sample to each
member of the population.
– Can result in unrepresentative samples.
Probability Sample, cont’d
• Stratified Random Sample
– Divide population based on variables (males and females)
– Draw the sample from the segments of population.
– More representative than previous type.
• Oversampling:
– If difficulties are anticipated in getting objects in certain strata.
– Need to make equal size strata from different size segments of
population.
Probability Sample, cont’d
• Systematics Sample
– Go through ordered list of possible sample
members (population)
– Select one from ever certain range (one from
between X and Y) randomly.
Probability Sample, cont’d
• Cluster Sample
– Divide the population into clusters such as schools
or house holds.
– Select the sample items randomly from the
clusters.
Nonprobability Samples
• If you do not have a finite population,
• and no probability of being selected (for each
member/item) is known a priori,
• The sample is called nonprobability sample.
• Extreme care about generalization is needed.
• Hard to determine the population to which
the findings should apply.
Nonprobability Sample, cont’d
• Purposive Sample
– Select a large group initially
– Do screening to satisfy some requirements needed to
answer the research question.
– The final set of items selected make up the purposive
sample.
• To whom/which you should generalize?
– The portion that satisfied the screening criteria
– An abstract population made up of objects similar to those
in the sample
Nonprobability Sample, cont’d
• Convenience Sample
– Select from whoever/whichever is available.
– The population in this case is abstract.
– Please provide detailed description of the sample
in this case.
Nonprobability Sample, cont’d
• Quota Sample
– Predetermined sample slots
– Continue to find objects to satisfy the slots
– Similar to stratified probability sample, but
• The population SPS is known a priori.
– In QS, what goes in the sample is know.
Nonprobability Sample, cont’d
• Snowball Sample
– Like a two-stage convenience or purposive
sample.
– Conveniently find a part of the sample
– That part of the sample (i.e. people you recruit)
should bring others to participate.
Sampling Problems
• Especially with nonprobability samples
• The inference becomes hard and murky if the
data are collected from less than 100% of the
sample.
• Three main problems:
– Low Response Rates
– Refusal to Participate
– Attrition
Low Response Rate
• Happens.
• Do not try to justify it by saying it is normal.
• Tell the reader of your article how hard you
tried to get high response rates (several
methods)
Refusal to Participate
• Attrition
• Participants dropping out of a study.
• Check for attrition bias.
Warnings
• Mismatch may happen between the source of
data and the destination of the inferential claims.
• Some concerns about the book script:
Warnings, cont’d
• The size of the sample does not matter, it is the quality that
does.
• The term ‘random’ must be treated very carefully.
– Choices in the sample must satisfy the concept of ‘randomness’.
– You should clearly describe the process of selecting the sample.
– Usually, computer-generated random numbers are used.
• Statistical inferences are worthless in the empirical
evidence if the sample/population is not described in
detail.
– See example on page 22 in the text book.
Chapter 6
Estimation
Estimation
• Two approaches researchers use to make
educated guesses from sample statistics.
– Estimation
• Interval
• Point
– Hypothesis testing
Interval Estimation
• Three main concepts
– Sampling Error
– Standard Error
– Confidence Interval
Sampling Error
• Is the magnitude of the difference between
statistic and parameter.
• Coin Flipping Exercise.
• SE does not imply that sample was extracted
improperly.
• It does not imply that the summarization was
improper.
• It goes to the natural behavior of the sample.
– The biased coin example.
Sampling Distribution and Standard Error
•
Sampling distribution
– The distribution of sample statistics
– Aka, the distribution of the means of multiple samples from the population.
– Flipping the coin 20 times and repeating the process several times (several samples result)
•
Standard Error
– How variable the sample statistic is when multiple samples of the same size are drawn form
the same population.
– It is equal to the SD of the sampling distribution.
– How much dispersion exists.
– Indicates the extent to which the statistic fluctuates from sample to sample around the value
of the parameter.
Standard Error, cont’d
• Usually one sample is used.
• How to get around the standard error of
multiple samples (sampling distribution)?
– Estimate the standard error.
– The standard error of the mean is: 𝜎𝑥 =
𝜎𝑥
𝑁
Confidence Interval
• More advantageous than SE when trying to
understand the connection between statistic and
parameter.
• Can be built around any statistic.
• Can be represented in three ways:
– In text.
– In a table.
– In a picture (figure).
Confidence Interval, cont’d
• How to construct a confidence interval?
– Can be around the mean, the median, the correlation
coefficient, and so on.
– This YouTube video shows how to construct a CI around
the mean.
• http://www.youtube.com/watch?v=siqx4PbqJ6s
• See this as well:http://office.microsoft.com/en-us/excelhelp/confidence-HP005209021.aspx
– Excel has the Function CONFIDENCE which takes the
confidence level, the SD, and the size of the sample.
Confidence Interval, cont’d
• How to interpret CI?
– The chance that the parameter lies within the
limits of the CI. It depends on the confidence level
(usually 95% is reported).
Confidence Interval vs. Estimated Standard Error
• Should both be supplied to the reader of
research.
• CI is usually supplied with level of confidence.
• SE is usually not. It is usually assumed to be
68%.
Point Estimation
• Interval estimation is more logical to indicate
an educational guess than point estimation.
• In point estimation, the computed statistic is
assumed to be the same as the parameter.
• Reliability and Validity coefficient can support
the point estimation.
Warnings
• The second of two numbers separated by (±) can mean:
– Standard deviation
– OR, estimated standard error
– OR, half the distance to the end point of CI.
• Always use ‘estimated’ before you state the standard error.
• CI means that the sample statistic can fall at any point
within CI.
• For estimation to be proper, the sample has to be randomly
selected from the population OR the population has to
match closely the nature of the sample.