* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Inferential Statistics + Estimation
Survey
Document related concepts
Transcript
Inferential Statistics Chapter 5 Keep in Mind! 1) Statistics are useful for figuring out random noise from real effects. 2) Numbers are not absolute, and they can be easily manipulated. 3) Always scrutinize data closely, and draw your own conclusions. 4) 85% of all statistics are made up on the spot: the rest are all wrong The End of Descriptive Statistics • Descriptive statistics’ work stops when the data are : – Displayed in an organized manner. – Reduced to compact indices (e.g. mean and SD). – Described in terms of distributional shape. – Evaluated with regard to reliability and validity. – Examined to reveal the strength and direction of relationship (in the case of bivariate concern). Inferential Statistics • Drawing conclusions with regard to the subjects from the data. • Making statement(s) from the data in the sample about the population. • Inf. Stats. deals with the issue of generalization of the study results. Sample vs. Population • A population comes of a group larger than the sample. • A sample is extracted form the population. • Measurements are carried over the sample to make an ‘educated guess’ about the population. Why Sampling? • Cost in terms of money and time. • Sometimes, the population of interest may extend into the future. (abstract population to fit the sample), the idea of ‘not pulled from’. • Otherwise, the population is called tangible in nature. – The researcher begins with the population and ends with the sample. – There is a sampling frame in this case. Statistic and Parameter • Before making the educated guess or inference, researchers should deal with: – What the population is, – how to extract the sample, – what characteristics to measure in the sample, • After deciding on the first and second points • Reliability and validity are involved – and, what the study statistical focus will be. • In what way should the data be summarized? Statistic and Parameter, cont’d • Statistic is the numerical value that is computed from the sample. • Parameter is the value that is in the population. – Can never be computed, why? • See symbols in Chapter 5, Page 104. Types of Samples • The choice of sample will influence: – The accuracy of the inferential guess – And/Or the definition of the population toward which the guess is directed • There several types of samples. Probability Sample • Conditions: – The number of the population has to be known a priori and small. – The probability that every ‘object/person’ in the population can be drawn in the sample should be known. • The resulting sample is probability sample. • There are several types of probability sample. Probability Sample, cont’d • Simple Random Sample – Population names in a hat, then draw the sample blindly (randomly). – The population has to be small to be conceivable. – Equal opportunity of being in the sample to each member of the population. – Can result in unrepresentative samples. Probability Sample, cont’d • Stratified Random Sample – Divide population based on variables (males and females) – Draw the sample from the segments of population. – More representative than previous type. • Oversampling: – If difficulties are anticipated in getting objects in certain strata. – Need to make equal size strata from different size segments of population. Probability Sample, cont’d • Systematics Sample – Go through ordered list of possible sample members (population) – Select one from ever certain range (one from between X and Y) randomly. Probability Sample, cont’d • Cluster Sample – Divide the population into clusters such as schools or house holds. – Select the sample items randomly from the clusters. Nonprobability Samples • If you do not have a finite population, • and no probability of being selected (for each member/item) is known a priori, • The sample is called nonprobability sample. • Extreme care about generalization is needed. • Hard to determine the population to which the findings should apply. Nonprobability Sample, cont’d • Purposive Sample – Select a large group initially – Do screening to satisfy some requirements needed to answer the research question. – The final set of items selected make up the purposive sample. • To whom/which you should generalize? – The portion that satisfied the screening criteria – An abstract population made up of objects similar to those in the sample Nonprobability Sample, cont’d • Convenience Sample – Select from whoever/whichever is available. – The population in this case is abstract. – Please provide detailed description of the sample in this case. Nonprobability Sample, cont’d • Quota Sample – Predetermined sample slots – Continue to find objects to satisfy the slots – Similar to stratified probability sample, but • The population SPS is known a priori. – In QS, what goes in the sample is know. Nonprobability Sample, cont’d • Snowball Sample – Like a two-stage convenience or purposive sample. – Conveniently find a part of the sample – That part of the sample (i.e. people you recruit) should bring others to participate. Sampling Problems • Especially with nonprobability samples • The inference becomes hard and murky if the data are collected from less than 100% of the sample. • Three main problems: – Low Response Rates – Refusal to Participate – Attrition Low Response Rate • Happens. • Do not try to justify it by saying it is normal. • Tell the reader of your article how hard you tried to get high response rates (several methods) Refusal to Participate • Attrition • Participants dropping out of a study. • Check for attrition bias. Warnings • Mismatch may happen between the source of data and the destination of the inferential claims. • Some concerns about the book script: Warnings, cont’d • The size of the sample does not matter, it is the quality that does. • The term ‘random’ must be treated very carefully. – Choices in the sample must satisfy the concept of ‘randomness’. – You should clearly describe the process of selecting the sample. – Usually, computer-generated random numbers are used. • Statistical inferences are worthless in the empirical evidence if the sample/population is not described in detail. – See example on page 22 in the text book. Chapter 6 Estimation Estimation • Two approaches researchers use to make educated guesses from sample statistics. – Estimation • Interval • Point – Hypothesis testing Interval Estimation • Three main concepts – Sampling Error – Standard Error – Confidence Interval Sampling Error • Is the magnitude of the difference between statistic and parameter. • Coin Flipping Exercise. • SE does not imply that sample was extracted improperly. • It does not imply that the summarization was improper. • It goes to the natural behavior of the sample. – The biased coin example. Sampling Distribution and Standard Error • Sampling distribution – The distribution of sample statistics – Aka, the distribution of the means of multiple samples from the population. – Flipping the coin 20 times and repeating the process several times (several samples result) • Standard Error – How variable the sample statistic is when multiple samples of the same size are drawn form the same population. – It is equal to the SD of the sampling distribution. – How much dispersion exists. – Indicates the extent to which the statistic fluctuates from sample to sample around the value of the parameter. Standard Error, cont’d • Usually one sample is used. • How to get around the standard error of multiple samples (sampling distribution)? – Estimate the standard error. – The standard error of the mean is: 𝜎𝑥 = 𝜎𝑥 𝑁 Confidence Interval • More advantageous than SE when trying to understand the connection between statistic and parameter. • Can be built around any statistic. • Can be represented in three ways: – In text. – In a table. – In a picture (figure). Confidence Interval, cont’d • How to construct a confidence interval? – Can be around the mean, the median, the correlation coefficient, and so on. – This YouTube video shows how to construct a CI around the mean. • http://www.youtube.com/watch?v=siqx4PbqJ6s • See this as well:http://office.microsoft.com/en-us/excelhelp/confidence-HP005209021.aspx – Excel has the Function CONFIDENCE which takes the confidence level, the SD, and the size of the sample. Confidence Interval, cont’d • How to interpret CI? – The chance that the parameter lies within the limits of the CI. It depends on the confidence level (usually 95% is reported). Confidence Interval vs. Estimated Standard Error • Should both be supplied to the reader of research. • CI is usually supplied with level of confidence. • SE is usually not. It is usually assumed to be 68%. Point Estimation • Interval estimation is more logical to indicate an educational guess than point estimation. • In point estimation, the computed statistic is assumed to be the same as the parameter. • Reliability and Validity coefficient can support the point estimation. Warnings • The second of two numbers separated by (±) can mean: – Standard deviation – OR, estimated standard error – OR, half the distance to the end point of CI. • Always use ‘estimated’ before you state the standard error. • CI means that the sample statistic can fall at any point within CI. • For estimation to be proper, the sample has to be randomly selected from the population OR the population has to match closely the nature of the sample.