Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Datasets and Variables We want to answer questions We want to use data for this purpose Observations of characteristics of cases Case: person, city, organization, etc. Characteristic or Variable: age, size, sector of economy, etc. Dataset: data arranged in case by variable format Datasets Cases Variables Variables Measures or observations of a case’s traits characteristics qualities attributes amounts quantities etc. Errors in variables Missing values Measurement errors mistakes in Reporting Remembering Recording Lies, etc. Quality of answer ~ quality of data Types of variables Categorical: nominal (name) Ordinal: (name and order) Measurement: interval and ratio Interval (name, order, and unit of measure) Ratio (name, order, unit, and true zero point) Summarizing Data Frequency Distributions For measurement variables For categorical variables Frequency Distribution For Categorical variables (table) Variable Value Frequency Proportion (Percent) Dem 17 .425 42.5% Rep 7 .175 17.5 Ind 16 .400 40.0 n=40 1.000 100.0% f f/n f/n * 100 Frequency Distribution For Measurement variables (table) Variable Value Frequency Proportion (Percent) 0 19 .44 44 1 10 .23 23 … 7 2 .05 5 n=43 1.00 100 Frequency Distribution For Categorical variable (bar chart) 20 15 10 Pol. Pref. 5 0 Dem Rep Ind Frequency Distribution For Measurement variable (histogram) 20 15 10 Quiz Score 5 0 0 1 2 3 4 5 6 7 Frequency Distribution Frequency, f - count the number of cases that have the same value of a variable Total cases, n - count all the cases Proportion, p = f/n Percentage, % = 100 * p Dataset Individual cases e.g. stats.dta dataset: characteristics of individuals: age, msat, gender Aggregate data (groups of individual cases) e.g. college1.dta dataset: characteristics of individuals: age, msat, gender averaged for groupings of students by college Populations and Samples Population: all the relevant cases. The entire set. Sample: some portion of the population haphazard (e.g., whomever you meet) systematic (e.g., every 10th person) representative (all possibilities included) random (every case in the population has a fixed probability of being included in the sample, often equal probability) Populations Best information Expensive or impossible to observe Samples Easier to get Less expensive Less accurate - but, can be very accurate depending upon type of sample Hints for good grade Do all assigned exercises and turn them in on time Do all other exercises for yourself to be sure you understand Read and study the text, before and after lectures -- several times Statistics is a language -- learn (memorize) the vocabulary and concepts