Download Data - stat225

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Lecture 1
Stat Applications, Types of Data
And Statistical Inference
Statistics and Data
• What is Statistics
– Intuitively, Stats are numbers or summaries of numbers.
• Such as average, sum, maximum/ minimum number.
– In a broader sense, Statistics is the art and science of
collecting, analyzing, presenting and interpreting data.
• Statisticians collect, analyze data and then draw conclusion
about the truth behind the data.
• Data are the facts and figures we collected, analyzed, and
summarized for presentation and interpretation.
• All the data collected in a particular study are called Data Set.
2
Basic Concepts
• Data: facts and figures collected, analyzed and
summarized for presentation and interpretation.
• Data Set: all data collected in a particular study
• Elements: individual entities of a data set
• Variables: a characteristic of interest for the
elements
• Observations: The set of measurements
collected for a particular element
Types of Data
• There are different classifications.
• Each classification serves some
specific purpose.
• Correspondingly, each type of data
requires/demands some specific
techniques of analysis.
Types of Data
• Qualitative: labels or names used to identify an
attribute of each element
•
•
Nominal: order does NOT matter (gender, race, marital status)
Ordinal: order DOES matter (Level of satisfaction, class [fresh, soph,
jr, sr])
• Quantitative: require numeric values that indicate
how much or how many
•
•
•
Interval: ratios of quantities cannot be compared (temp (C/F), IQ
score)
Ratio: ratios of quantities have meaning (height, weight, age)
Difference between interval and ratio, whether ZERO
matters.
Types of Data (cont.)
• Cross-Sectional: data collected at
the same or approximately the same
point in time
• Time Series Data: data collected
over several time periods.
Sources of Data
• Existing Sources: data that are there already
•
• Surveys: teaching evaluation
• Experiments:
•
*key thing in an experiment rather than observational study is that
you manipulate and control what the groups, such as assigning different
treatments (drugs) to each one*
• Observational Studies:
•
*key thing in an observational study rather than an
experiment is that you are simply observing what happens and are not
giving a specific treatment to anything*
How to get data?
• Sampling
• For survey, experiment and
observational studies.
Types of Sampling
• Simple Random Sampling (Finite Population): A sample of
size n from a finite population of size N is selected such that
each possible sample of size n has the same probability of
being selected.
• Simple Random Sampling (Infinite Population): a sample
selected such that each element selected comes from the
population and each element is selected independently.
• Sampling With Replacement: Elements are put back in the
population after being selected for the sample allowing for a
chance of being selected more than once for a single sample
• Sampling Without Replacement: Elements are not replaced
after being selected and are therefore only chosen once to be in
a sample.
Types of Sampling (cont.)
• Convenience Sample: participants in a sample are not
selected at random, but instead those more convenient of
attaining are chosen, such as polling people as they come
into the grocery store on a Saturday from 1-2pm rather
than taking all the people that came on Saturday and
choosing the same size sample where all people are equally
likely to be chosen.
• Stratified Random Sample: The population is divided into
different strata (or groups) and people are randomly chosen
from each strata. An example would be to divide Purdue
undergraduates into Freshman, Sophomore, Junior and
Senior, and selecting from each group.
Types of Sampling (cont. II)
• Cluster Sample: The population is divided into different
groups and groups are randomly selected to be included in
the sample and each element in the group is a part of the
sample.
• Kth element Random Sample: number the units in the
population from 1 to N decide on the n (sample size) that
you want or need and set k = N/n = the interval size.
Randomly select an integer between 1 to k then take every
kth unit.
Statistical Inference
• Population: the set of all elements of interest in
a particular study
• Sample: a subset of the population
• Sample Survey: the process of conducting a
survey to collect data for a sample
• Census: the process of conducting a survey to
collect data for the entire population
What is Statistical Inference ?
Using data from a sample to estimate
the characteristic of a population
Why sample rather than population?
• Hard to sample EVERYONE
• Too expensive to sample everyone
• Too much time/effort to sample
everyone
Bias
• How a question is asked can have an
effect on how a respondent
responses.
• Bias is bad.
• What are the Sources of Bias ?