Download HANDOUT1 1a.Discussion of Descriptive Statistics Definition Usage

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Statistics education wikipedia , lookup

Statistical inference wikipedia , lookup

Time series wikipedia , lookup

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
DEM810: RESEARCH METHODS
HANDOUT1
Statistics is a collection of mathematical techniques that help to summarize, analyze, present and
interpret data—to make sense or meaning of our observations. It is also defined as a set of
procedures for gathering, measuring, classifying, computing, describing, synthesizing, analyzing,
and interpreting systematically acquired quantitative data (Seema, 2012).
1a.Discussion of Descriptive Statistics
Definition
Descriptive statistics are procedures/ various techniques used to summarize, organize, and make
sense of a set of scores or observations. They help us get a picture of what is happening in our
data. Descriptive statistics are typically presented graphically, in pictorial form or numerically:
in tabular form (in tables), or as summary statistics (single values) - we simply state what the
data shows and tells us.
Usage/ Characteristics
Descriptive statistics summarize observations; it tries to capture a large set of observations and
gives us some idea about the data set. This is through distribution (to show the kinds of numbers
that we have) like frequency distribution table, measures of central tendency like mean, median,
mode together with measures of dispersion/variability like range, variance, standard deviation,
etc. Data can also be represented in the form of charts/graphs like histograms to better
understand what is happening in the experiment.
Descriptive statistic reports generally include summary statistics (averages: mean/standard
deviation), tables, graphics, and text to explain what the charts and tables are showing without
drawing conclusions about a population. That is the analysis is limited to your data and that you
are not extrapolating any conclusions about a full population. However, it remains important in
1|Page
the sense that large amounts of data is described in a way that is understandable, useful, and, if
need be, convincing.
Descriptive statistics are techniques that take raw scores and organize or summarize them in a
form that is more manageable. Often the scores are organized in a frequency distribution table or
a chart/graph so that it is possible to see the entire set of scores. A frequency distribution lists
each category of data and the number of occurrences for each category of data. Another common
technique is to summarize a set of scores by computing an average/mean. Note that even if the
data set has hundreds of scores, the average provides a single descriptive value for the entire set.
The mean and standard deviation are widely used descriptive statistic to capture averages.
Descriptive statistics deals with the presentation and collection of data; which is usually the first
part of a statistical analysis. It therefore demands on the part of the statistician awareness about
levels of measurement, designing experiments, choosing the right focus group and
avoiding biases that are so easy to creep into the experiment.
They are also diverse in application, making it widely applied by different areas of study that
may also require different kinds of analysis. For instance, a physicist studying turbulence in the
laboratory needs the average quantities that vary over small intervals of time. The nature of this
problem requires that physical quantities be averaged from a host of data collected through the
experiment. Measures of central tendency like the mean become very relevant.
Descriptive statistics implies a simple quantitative summary of a data set that has been collected.
It therefore helps us understand the experiment or data set in detail and tells us all about the
required details to enable us put the data in perspective. That is, it allows us to simply state what
the data shows and tells us; leaving out interpretation of the results and trends (which is the
domain of inferential statistics that is a separate branch altogether). For example, when
conducting an experiment to understand the effect of news stories on a person’s risk taking
2|Page
behavior, data collected may be represented and quantified in several ways. The description of
this behavior, its mean and the corresponding graphical representation of the data is descriptive
statistics while any conclusions from this data like: what one reads in the daily newspaper is
likely to influence his/her future risk taking behavior is inferential statistics.
Descriptive Statistics gives numerical and graphic procedures to summarize a collection of data
in a clear and understandable way. Essentially, it helps us to simplify large amounts of data in a
sensible way because each descriptive statistic reduces lots of data into a simpler summary. They
therefore provide a window through which we can begin to appreciate what is going on in our
data besides being the bedrock on which other statistical techniques are built.
Descriptive statistics can involve the examination of one variable on its own or the relationships
between two or more variables. A variable is a feature characteristic of any member of a
population differing in quality or quantity from one member to another or condition that changes
or has different values for different individuals. There are two main kinds of variables:
categorical measures and continuous measures or quantitative and qualitative variables (a
measurement is usually called a variable).
Methods of Presenting Descriptive Statistics
There are two basic methods: numerical and graphical.
Numerical approaches are more precise and objective. They are used to compute statistics such
as the mean and standard deviation which reflect measures of location and spread/variability
respectively. These statistics convey information about the values of the data and the spread
within the data set. The standard deviation measures the degree of consistency (or lack thereof)
and that a small standard deviation tells us that the data are consistent and the spread of the data
is small. Numerical summaries of data, when properly used, help us understand the overall
pattern of a data set without getting bogged down in the details.
3|Page
Graphical methods, though complex compared to numerical ones, present statistical data in
visual form which not only identifies patterns in the data but also the plots which may contain
detailed information about the distribution. They basically strive to give colourful, easy to read
and interpret pictorial representation of data which is more difficult to obtain from a table or a
complete listing of the data. Common examples include histograms, line diagrams/historigrams,
pie charts, simple bar charts, pictograph, trend line, and pictograms/ideograms. The kind of
graph that is the most appropriate for a situation depends on many factors, and creating a good
picture of a data set is as much an art as a science.
Limitations of Descriptive Statistics
i.
The descriptive statistics just describe data that has been given without telling the reader
anything more about the data. Inference is left to inferential statistics. For example, in a
study you may find that girls are more deviant at form 2 than form 4. You cannot
conclude that all form 2 girls are more deviant than form 4 ones. This would be going
beyond the information that you had.
ii.
The three characteristics of a variable: distribution, central tendency and dispersion may
have to be used/considered together for meaningful data analysis. The case is worse for
central tendency alone without a measure of dispersion- to interpret them may be very
hard.
iii.
The mean is the most preferred of the measures of central tendency while standard
deviation is the most preferred of the measure of dispersion. It means that not all
elements/descriptive statistic have equal weight in statistical application.
iv.
Both the mean and standard deviation are utilized under stated assumption that data has a
normal distribution.
4|Page
1b. Discussion of Inferential Statistics
Definition
Inferential statistics consist of techniques that allow us to study samples and then make
generalizations about the populations from which they were selected. It is the mathematics and
logic of how generalization from sample to population can be made. The fundamental question
is: can researchers infer the population’s characteristics from the sample’s characteristics?
Usage/ Characteristics
Inferential statistics, as the name suggests, involves drawing the right conclusions from the
statistical analysis that has been performed using descriptive statistics. It reinforces descriptive
statistics by taking it beyond description. This, it does through predictions of the future
and generalizations about a population by studying a smaller sample. This is very important
because studies and experiments need to state and conclude something about general populations
and not just about the sample that was studied.
The methods of inferential statistics center on the process of examining a sample of data about
some set of entities of interest (population) and, through use of the evidence available in the
sample, making an inference about some characteristic of the population.
Inferential statistics seek to make correct inferences and/or to avoid incorrect inferences besides
having a clear idea of just how likely it is that a particular inference is correct. This is achieved
foremost by formulating a hypothesis concerning the population characteristic and then applying
a statistical technique to the evidence in the sample in order to reach a decision either to accept
or reject the hypothesis.
Inferential statistics is a study to apply the conclusions that have been obtained from one
experimental study to more general populations. This means inferential statistics tries to answer
questions about populations and samples that have never been tested in the given experiment.
For example, from a given survey, one can apply the conclusions to a more general population
assuming the sample size is large enough and the sample represented the views of a general cross
section of the public.
With inferential statistics, you are trying to reach conclusions that extend beyond the immediate
data description alone. For instance, we use inferential statistics to try to infer from the sample
data what the population might think.
5|Page
The inferences are almost always an estimate with a confidence interval though in some cases
there is simply a rejection of hypothesis especially in the case where the experiment or study
is designed to refute some claim. Therefore one needs to take all precautions in order to arrive at
the right conclusions through inferential statistics.
Inferential statistics is used to make judgments of the probability that an observed difference
between groups is a dependable one or one that might have happened by chance in this study.
Thus, it is used to make inferences from study data to more general conditions.
A particular virtue of inferential statistics is that it calls attention to the fact that many
phenomena are by nature variable, and that observed differences may often be due to nothing
more than chance. It avails tools by which researchers decide whether an observed difference is
statistically speaking, significant or insignificant.
Source: Geraght (2014), pp13
6|Page
Methods of Presenting Inferential Statistics
When it comes to inferential statistics, there are generally two forms: estimation statistics and
hypothesis testing.
Estimation Statistics
Its branch of Inferential Statistics where sample statistics are used to estimate the values of a
population parameter, that is, estimating population values based on your sample data. For
example, given a true random sample of 350 students out of a target population of 1 million
students, we can say that 30% of students in our sample said that they abuse drugs. Can we
safely extrapolate that 30% of all students in Kenya also abuse drugs? Is that the true value of
students in Kenya? Well, we can’t say with 100% confidence, but–using inferential statistical
techniques such as the confidence interval, the researcher can provide a range of people that
abuse drugs with some level of confidence. The confidence interval or level is an important
feature of estimation statistics because from the data we will construct an interval of values so
that the process has a certain chance, say a 95% chance, of generating an interval that contains
the actual population average.
Hypothesis Testing
Hypothesis testing is simply another way of drawing conclusions about a population parameter.
It’s a statistical procedure in which a choice is made between a null hypothesis and an alternative
hypothesis based on information in a sample. The end result of a hypotheses testing procedure is
a choice of one of the following two possible conclusions:
1. Reject H0 (and therefore accept Ha), or
2. Fail to reject H0 (and therefore fail to accept Ha).
The null hypothesis, denoted H0, is the statement about the population parameter that is assumed
to be true unless there is convincing evidence to the contrary. The alternative hypothesis, denoted
Ha, is a statement about the population parameter that is contradictory to the null hypothesis, and
is accepted as true only if there is convincing evidence in favor of it.
With hypothesis testing, one uses a test such as T-Test, Chi-Square, or ANOVA to test whether a
hypothesis about the mean is true or not. Again, the point is that this is an inferential statistic
method to reach conclusions about a population, based on a sample set of data.
Most of the major inferential statistics come from a general family of statistical models known as
the General Linear Model. This includes the t-test, Analysis of Variance (ANOVA), Analysis of
Covariance (ANCOVA), regression analysis, and many of the multivariate methods like factor
analysis, multidimensional scaling, cluster analysis, discriminant function analysis, among
others.
7|Page
Limitations of Inferential Statistics
i.
ii.
iii.
Its valid inferences depends on whether the design of the experiment was right. Hence it
is not automatic that the researcher is able to draw conclusions relevant to his study using
inferential statistics. For example, several models are available in inferential statistics that
help in the process of analysis and if not chosen with care, an error in assuming one
model might give wrong conclusions about the experiment.
Though it is packaged as superior to descriptive, inferential statistics go hand in hand
with descriptive and one cannot exist without the other. Good scientific
methodology needs to be followed in both these steps of statistical analysis and both
these branches of statistics are equally important for a researcher.
It is vulnerable to data dredging whenever computers are being used due to their ability
hold loads of information and ease to be either intentionally or unintentionally to use the
wrong inferential methods. This may yield wrong or biased conclusions
Conclusion:
The purpose of descriptive statistical analysis is to describe the data that you have. Hence
descriptive statistics help you to explain to other people what is happening in your data. It’s
usually complemented with exploratory data analysis which ideally helps the researcher to
understand what is happening in his/her data to be able to package or present it to others/readers
in the best way possible, either numerically, graphically or both. To understand the simple
difference between descriptive and inferential statistics, all you need to remember is that
descriptive statistics summarize your current dataset and inferential statistics aim to draw
conclusions about an additional population outside of your dataset. Additionally, descriptive
statistics remains local to the sample, describing its central tendency and variability, while
inferential statistics focuses on making statements about the population.
8|Page
REFERENCES
DeCaro, S. A. (2003). A student’s guide to the conceptual side of inferential statistics. Retrieved
[august, 2014], from http://psychology.sdecnet.com/stathelp.htm
Geraght, M.A. (2014). Inferential Statistics and Hypothesis Testing, California, USA: De Anza
College Publication
Henk, E. (2013). Foundations of Descriptive and Inferential Statistics, Karlsruhe, German:
Karlshochschule International University
Howitt, D & Cramer, D. (2011). Introduction to Statistics in Psychology 5th Edition, England,
UK: Pearson Education Ltd.
Http://www.Saylor.org/books
Klotz, J.H. (2006). A Computational Approach to Statistics, Madison, Wisconsin, USA: Pearson
Education Ltd
Seema, J. (2012). Descriptive Statistics and Exploratory Data Analysis, New Delhi, India:
Agricultural Statistics Research Institute.
Wyllys, R. E. (1978). Teaching Descriptive and Inferential Statistics in Library Schools. Journal
of Education for Librarianship, Vol.19 (1), 3-20
9|Page