Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
DEM810: RESEARCH METHODS HANDOUT1 Statistics is a collection of mathematical techniques that help to summarize, analyze, present and interpret data—to make sense or meaning of our observations. It is also defined as a set of procedures for gathering, measuring, classifying, computing, describing, synthesizing, analyzing, and interpreting systematically acquired quantitative data (Seema, 2012). 1a.Discussion of Descriptive Statistics Definition Descriptive statistics are procedures/ various techniques used to summarize, organize, and make sense of a set of scores or observations. They help us get a picture of what is happening in our data. Descriptive statistics are typically presented graphically, in pictorial form or numerically: in tabular form (in tables), or as summary statistics (single values) - we simply state what the data shows and tells us. Usage/ Characteristics Descriptive statistics summarize observations; it tries to capture a large set of observations and gives us some idea about the data set. This is through distribution (to show the kinds of numbers that we have) like frequency distribution table, measures of central tendency like mean, median, mode together with measures of dispersion/variability like range, variance, standard deviation, etc. Data can also be represented in the form of charts/graphs like histograms to better understand what is happening in the experiment. Descriptive statistic reports generally include summary statistics (averages: mean/standard deviation), tables, graphics, and text to explain what the charts and tables are showing without drawing conclusions about a population. That is the analysis is limited to your data and that you are not extrapolating any conclusions about a full population. However, it remains important in 1|Page the sense that large amounts of data is described in a way that is understandable, useful, and, if need be, convincing. Descriptive statistics are techniques that take raw scores and organize or summarize them in a form that is more manageable. Often the scores are organized in a frequency distribution table or a chart/graph so that it is possible to see the entire set of scores. A frequency distribution lists each category of data and the number of occurrences for each category of data. Another common technique is to summarize a set of scores by computing an average/mean. Note that even if the data set has hundreds of scores, the average provides a single descriptive value for the entire set. The mean and standard deviation are widely used descriptive statistic to capture averages. Descriptive statistics deals with the presentation and collection of data; which is usually the first part of a statistical analysis. It therefore demands on the part of the statistician awareness about levels of measurement, designing experiments, choosing the right focus group and avoiding biases that are so easy to creep into the experiment. They are also diverse in application, making it widely applied by different areas of study that may also require different kinds of analysis. For instance, a physicist studying turbulence in the laboratory needs the average quantities that vary over small intervals of time. The nature of this problem requires that physical quantities be averaged from a host of data collected through the experiment. Measures of central tendency like the mean become very relevant. Descriptive statistics implies a simple quantitative summary of a data set that has been collected. It therefore helps us understand the experiment or data set in detail and tells us all about the required details to enable us put the data in perspective. That is, it allows us to simply state what the data shows and tells us; leaving out interpretation of the results and trends (which is the domain of inferential statistics that is a separate branch altogether). For example, when conducting an experiment to understand the effect of news stories on a person’s risk taking 2|Page behavior, data collected may be represented and quantified in several ways. The description of this behavior, its mean and the corresponding graphical representation of the data is descriptive statistics while any conclusions from this data like: what one reads in the daily newspaper is likely to influence his/her future risk taking behavior is inferential statistics. Descriptive Statistics gives numerical and graphic procedures to summarize a collection of data in a clear and understandable way. Essentially, it helps us to simplify large amounts of data in a sensible way because each descriptive statistic reduces lots of data into a simpler summary. They therefore provide a window through which we can begin to appreciate what is going on in our data besides being the bedrock on which other statistical techniques are built. Descriptive statistics can involve the examination of one variable on its own or the relationships between two or more variables. A variable is a feature characteristic of any member of a population differing in quality or quantity from one member to another or condition that changes or has different values for different individuals. There are two main kinds of variables: categorical measures and continuous measures or quantitative and qualitative variables (a measurement is usually called a variable). Methods of Presenting Descriptive Statistics There are two basic methods: numerical and graphical. Numerical approaches are more precise and objective. They are used to compute statistics such as the mean and standard deviation which reflect measures of location and spread/variability respectively. These statistics convey information about the values of the data and the spread within the data set. The standard deviation measures the degree of consistency (or lack thereof) and that a small standard deviation tells us that the data are consistent and the spread of the data is small. Numerical summaries of data, when properly used, help us understand the overall pattern of a data set without getting bogged down in the details. 3|Page Graphical methods, though complex compared to numerical ones, present statistical data in visual form which not only identifies patterns in the data but also the plots which may contain detailed information about the distribution. They basically strive to give colourful, easy to read and interpret pictorial representation of data which is more difficult to obtain from a table or a complete listing of the data. Common examples include histograms, line diagrams/historigrams, pie charts, simple bar charts, pictograph, trend line, and pictograms/ideograms. The kind of graph that is the most appropriate for a situation depends on many factors, and creating a good picture of a data set is as much an art as a science. Limitations of Descriptive Statistics i. The descriptive statistics just describe data that has been given without telling the reader anything more about the data. Inference is left to inferential statistics. For example, in a study you may find that girls are more deviant at form 2 than form 4. You cannot conclude that all form 2 girls are more deviant than form 4 ones. This would be going beyond the information that you had. ii. The three characteristics of a variable: distribution, central tendency and dispersion may have to be used/considered together for meaningful data analysis. The case is worse for central tendency alone without a measure of dispersion- to interpret them may be very hard. iii. The mean is the most preferred of the measures of central tendency while standard deviation is the most preferred of the measure of dispersion. It means that not all elements/descriptive statistic have equal weight in statistical application. iv. Both the mean and standard deviation are utilized under stated assumption that data has a normal distribution. 4|Page 1b. Discussion of Inferential Statistics Definition Inferential statistics consist of techniques that allow us to study samples and then make generalizations about the populations from which they were selected. It is the mathematics and logic of how generalization from sample to population can be made. The fundamental question is: can researchers infer the population’s characteristics from the sample’s characteristics? Usage/ Characteristics Inferential statistics, as the name suggests, involves drawing the right conclusions from the statistical analysis that has been performed using descriptive statistics. It reinforces descriptive statistics by taking it beyond description. This, it does through predictions of the future and generalizations about a population by studying a smaller sample. This is very important because studies and experiments need to state and conclude something about general populations and not just about the sample that was studied. The methods of inferential statistics center on the process of examining a sample of data about some set of entities of interest (population) and, through use of the evidence available in the sample, making an inference about some characteristic of the population. Inferential statistics seek to make correct inferences and/or to avoid incorrect inferences besides having a clear idea of just how likely it is that a particular inference is correct. This is achieved foremost by formulating a hypothesis concerning the population characteristic and then applying a statistical technique to the evidence in the sample in order to reach a decision either to accept or reject the hypothesis. Inferential statistics is a study to apply the conclusions that have been obtained from one experimental study to more general populations. This means inferential statistics tries to answer questions about populations and samples that have never been tested in the given experiment. For example, from a given survey, one can apply the conclusions to a more general population assuming the sample size is large enough and the sample represented the views of a general cross section of the public. With inferential statistics, you are trying to reach conclusions that extend beyond the immediate data description alone. For instance, we use inferential statistics to try to infer from the sample data what the population might think. 5|Page The inferences are almost always an estimate with a confidence interval though in some cases there is simply a rejection of hypothesis especially in the case where the experiment or study is designed to refute some claim. Therefore one needs to take all precautions in order to arrive at the right conclusions through inferential statistics. Inferential statistics is used to make judgments of the probability that an observed difference between groups is a dependable one or one that might have happened by chance in this study. Thus, it is used to make inferences from study data to more general conditions. A particular virtue of inferential statistics is that it calls attention to the fact that many phenomena are by nature variable, and that observed differences may often be due to nothing more than chance. It avails tools by which researchers decide whether an observed difference is statistically speaking, significant or insignificant. Source: Geraght (2014), pp13 6|Page Methods of Presenting Inferential Statistics When it comes to inferential statistics, there are generally two forms: estimation statistics and hypothesis testing. Estimation Statistics Its branch of Inferential Statistics where sample statistics are used to estimate the values of a population parameter, that is, estimating population values based on your sample data. For example, given a true random sample of 350 students out of a target population of 1 million students, we can say that 30% of students in our sample said that they abuse drugs. Can we safely extrapolate that 30% of all students in Kenya also abuse drugs? Is that the true value of students in Kenya? Well, we can’t say with 100% confidence, but–using inferential statistical techniques such as the confidence interval, the researcher can provide a range of people that abuse drugs with some level of confidence. The confidence interval or level is an important feature of estimation statistics because from the data we will construct an interval of values so that the process has a certain chance, say a 95% chance, of generating an interval that contains the actual population average. Hypothesis Testing Hypothesis testing is simply another way of drawing conclusions about a population parameter. It’s a statistical procedure in which a choice is made between a null hypothesis and an alternative hypothesis based on information in a sample. The end result of a hypotheses testing procedure is a choice of one of the following two possible conclusions: 1. Reject H0 (and therefore accept Ha), or 2. Fail to reject H0 (and therefore fail to accept Ha). The null hypothesis, denoted H0, is the statement about the population parameter that is assumed to be true unless there is convincing evidence to the contrary. The alternative hypothesis, denoted Ha, is a statement about the population parameter that is contradictory to the null hypothesis, and is accepted as true only if there is convincing evidence in favor of it. With hypothesis testing, one uses a test such as T-Test, Chi-Square, or ANOVA to test whether a hypothesis about the mean is true or not. Again, the point is that this is an inferential statistic method to reach conclusions about a population, based on a sample set of data. Most of the major inferential statistics come from a general family of statistical models known as the General Linear Model. This includes the t-test, Analysis of Variance (ANOVA), Analysis of Covariance (ANCOVA), regression analysis, and many of the multivariate methods like factor analysis, multidimensional scaling, cluster analysis, discriminant function analysis, among others. 7|Page Limitations of Inferential Statistics i. ii. iii. Its valid inferences depends on whether the design of the experiment was right. Hence it is not automatic that the researcher is able to draw conclusions relevant to his study using inferential statistics. For example, several models are available in inferential statistics that help in the process of analysis and if not chosen with care, an error in assuming one model might give wrong conclusions about the experiment. Though it is packaged as superior to descriptive, inferential statistics go hand in hand with descriptive and one cannot exist without the other. Good scientific methodology needs to be followed in both these steps of statistical analysis and both these branches of statistics are equally important for a researcher. It is vulnerable to data dredging whenever computers are being used due to their ability hold loads of information and ease to be either intentionally or unintentionally to use the wrong inferential methods. This may yield wrong or biased conclusions Conclusion: The purpose of descriptive statistical analysis is to describe the data that you have. Hence descriptive statistics help you to explain to other people what is happening in your data. It’s usually complemented with exploratory data analysis which ideally helps the researcher to understand what is happening in his/her data to be able to package or present it to others/readers in the best way possible, either numerically, graphically or both. To understand the simple difference between descriptive and inferential statistics, all you need to remember is that descriptive statistics summarize your current dataset and inferential statistics aim to draw conclusions about an additional population outside of your dataset. Additionally, descriptive statistics remains local to the sample, describing its central tendency and variability, while inferential statistics focuses on making statements about the population. 8|Page REFERENCES DeCaro, S. A. (2003). A student’s guide to the conceptual side of inferential statistics. Retrieved [august, 2014], from http://psychology.sdecnet.com/stathelp.htm Geraght, M.A. (2014). Inferential Statistics and Hypothesis Testing, California, USA: De Anza College Publication Henk, E. (2013). Foundations of Descriptive and Inferential Statistics, Karlsruhe, German: Karlshochschule International University Howitt, D & Cramer, D. (2011). Introduction to Statistics in Psychology 5th Edition, England, UK: Pearson Education Ltd. Http://www.Saylor.org/books Klotz, J.H. (2006). A Computational Approach to Statistics, Madison, Wisconsin, USA: Pearson Education Ltd Seema, J. (2012). Descriptive Statistics and Exploratory Data Analysis, New Delhi, India: Agricultural Statistics Research Institute. Wyllys, R. E. (1978). Teaching Descriptive and Inferential Statistics in Library Schools. Journal of Education for Librarianship, Vol.19 (1), 3-20 9|Page