Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture 1 Stat Applications, Types of Data And Statistical Inference Statistics and Data • What is Statistics – Intuitively, Stats are numbers or summaries of numbers. • Such as average, sum, maximum/ minimum number. – In a broader sense, Statistics is the art and science of collecting, analyzing, presenting and interpreting data. • Statisticians collect, analyze data and then draw conclusion about the truth behind the data. • Data are the facts and figures we collected, analyzed, and summarized for presentation and interpretation. • All the data collected in a particular study are called Data Set. 2 Basic Concepts • Data: facts and figures collected, analyzed and summarized for presentation and interpretation. • Data Set: all data collected in a particular study • Elements: individual entities of a data set • Variables: a characteristic of interest for the elements • Observations: The set of measurements collected for a particular element Types of Data • There are different classifications. • Each classification serves some specific purpose. • Correspondingly, each type of data requires/demands some specific techniques of analysis. Types of Data • Qualitative: labels or names used to identify an attribute of each element • • Nominal: order does NOT matter (gender, race, marital status) Ordinal: order DOES matter (Level of satisfaction, class [fresh, soph, jr, sr]) • Quantitative: require numeric values that indicate how much or how many • • • Interval: ratios of quantities cannot be compared (temp (C/F), IQ score) Ratio: ratios of quantities have meaning (height, weight, age) Difference between interval and ratio, whether ZERO matters. Types of Data (cont.) • Cross-Sectional: data collected at the same or approximately the same point in time • Time Series Data: data collected over several time periods. Sources of Data • Existing Sources: data that are there already • • Surveys: teaching evaluation • Experiments: • *key thing in an experiment rather than observational study is that you manipulate and control what the groups, such as assigning different treatments (drugs) to each one* • Observational Studies: • *key thing in an observational study rather than an experiment is that you are simply observing what happens and are not giving a specific treatment to anything* How to get data? • Sampling • For survey, experiment and observational studies. Types of Sampling • Simple Random Sampling (Finite Population): A sample of size n from a finite population of size N is selected such that each possible sample of size n has the same probability of being selected. • Simple Random Sampling (Infinite Population): a sample selected such that each element selected comes from the population and each element is selected independently. • Sampling With Replacement: Elements are put back in the population after being selected for the sample allowing for a chance of being selected more than once for a single sample • Sampling Without Replacement: Elements are not replaced after being selected and are therefore only chosen once to be in a sample. Types of Sampling (cont.) • Convenience Sample: participants in a sample are not selected at random, but instead those more convenient of attaining are chosen, such as polling people as they come into the grocery store on a Saturday from 1-2pm rather than taking all the people that came on Saturday and choosing the same size sample where all people are equally likely to be chosen. • Stratified Random Sample: The population is divided into different strata (or groups) and people are randomly chosen from each strata. An example would be to divide Purdue undergraduates into Freshman, Sophomore, Junior and Senior, and selecting from each group. Types of Sampling (cont. II) • Cluster Sample: The population is divided into different groups and groups are randomly selected to be included in the sample and each element in the group is a part of the sample. • Kth element Random Sample: number the units in the population from 1 to N decide on the n (sample size) that you want or need and set k = N/n = the interval size. Randomly select an integer between 1 to k then take every kth unit. Statistical Inference • Population: the set of all elements of interest in a particular study • Sample: a subset of the population • Sample Survey: the process of conducting a survey to collect data for a sample • Census: the process of conducting a survey to collect data for the entire population What is Statistical Inference ? Using data from a sample to estimate the characteristic of a population Why sample rather than population? • Hard to sample EVERYONE • Too expensive to sample everyone • Too much time/effort to sample everyone Bias • How a question is asked can have an effect on how a respondent responses. • Bias is bad. • What are the Sources of Bias ?