Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Stat303 - EXAM 1 MATERIAL Exam Date: Feb. 4, 5 or 6 Week 1: Data summary (single variables) IPS - Ch 1 I) Variables - Characteristics of a thing/person A) Variable Types 1) Numeric a) Discrete b) Continuous 2) Categorical II) Distribution of a variable - which values occurred and how often each value occured III) Numeric Variables A) Describing distributions 1) Measures of location a) Mean, x - also a measure of center ~ b) Median, x (or Q2) - also a measure of center c) Q1 d) Q3 e) Minimum f) Maximum 2) Measures of spread a) Range b) Interquartile Range (IQR) = Q3 - Q1 c) Standard Deviation, s 3) Tables/Graphs a) Frequency Tables frequency relative frequency b) Histograms Symmetric Skewed right Skewed left Outliers Unimodal Bimodal c) Stem and leaf plot d) Box plots-use 5 number summary=min, Q1, ~ x , Q3, max Symmetric Skewed right Skewed left Outliers e) Normal Quantile Plot B) Properties of distributions 1) Skewed data a) Skewed right - x > ~ x ~ b) Skewed left - x < x c) Use Median andIQR to measure center and spread 2) (roughly) Symmetric Data a) Mean=median b) Use mean and SD to measure center and spread c) Empirical Rule : If distribution approximately normal ~68% within 1 SD of mean ~95% within 2 SD of mean ~99.7% within 3 SD of mean IV) Categorical Variables A) Describing distributions 1) Tables/Graphs a) Frequency Tables b) Bar chart c) Pie chart 2) Measures of location a) Proportions b) Mode Week 2: Data summary (relationships between 2 variables) IPS - Ch 2 I) If one variable is thought to affect the other A) Explanatory variable B) Response variable II) Categorical vs. Numeric A) Categorical is the explanatory variable 1) Compare numeric variable across the levels of the categorical variable B) Graphs 1) Side by side box plot a) All levels of explanatory on X axis b) Box plot of each group in graph area C) Measures 2) Summary statistics by group a) Mean b) Standard deviation III) Categorical vs. Categorical A) Either variable on may be explanatory (depends on situation) 1) Compare percentages in each level of one categorical variable across the levels of the other categorical variable B) Table 1) Two Way table a) All levels of one variable on one axis b) All levels of the other variable on the other axis 2) What to look for a) Associations 3) Properties of associations a) Does not imply causation IV) Numeric vs. Numeric A) Either variable on may be explanatory (depends on situation) 1) Compare values of one numberic variable over possible values of the other numeric variable B) Graphs 1) Scatter plot a) Explanatory on X axis b) Response on Y axis C) Measures 1) Slope, b1 a) Positive relationship b) Negative relationship 2) Intercept, b0 3) Correlation, r - Measures the strength of a linear relationship a) How to calculate b) Properties Doesn't matter which is x and which is y Between -1 and 1 (Positive vs. negative) Close to 0 vs. far from 0 c) Sensitive to outliers, influential points d) Does not imply causation 4) Difference between slope and correlation a) Strong linear relationship vs. rise/run b) Effect of scale changes Week 3: Collecting Data (sampling, experimental design) IPS - Ch 3 I) Basic idea of inferencial statistics A) Ask question about a population 1) Questions that do not compare - define one population 2) Questions that compare - define more than one population B) Collect a sample 1) Questions that do not compare - collect one sample 2) Questions that compare - collect more than one sample C) Summarize data in sample D) Analyze- Make inference about population using sample summaries (later) II) Questions that do not compare populations A) Concerns 1) Samples should be good representations of populations a) unbiased b) members should be independent of one another 2) Samples should be large enough B) (Some) good sampling techniques 1) Simple Random Sample (SRS) 2) Stratified Random Sample 3) Cluster Sample C) (Some) ways that bias can enter study 1) Voluntary response sampling 2) Convenience sampling 3) Dishonesty of the respondent IV) Questions that compare populations A) Vocabulary 1) Explanatory variable 2) Response variable B) Concerns 1) Samples should be good representations of their populations 2) Samples should be alike as possible except explanatory variable a) Confounding variables 3) Questions should not overgeneralize a) Interacting variables 4) Types of conclustion that can be drawn depend on sampling technique C) Good sampling techniques 1) Randomized experiments - create members of populations by subjecting units to different treatments a) How to run a good randomized experiment Use good sampling technique from III, A Randomly assign members of sample to treatment groups b) Types of conclusions that can be drawn If done carefully, can infer causation c) Why we can't always run randomized experiments sometimes dificult sometimes unethical sometimes impossible 2) Observational Studies - select samples from pre-existing populations a) How to run a good observational study Try to account for any potential confounding variables b) Types of conclusions that can be drawn Can never say explanatory caused response Can only describe populations as they exist YOU SHOULD KNOW FOR EXAMS/QUIZZES: Graphs/Tables 1. Frequency Tables - Read what percent in a given category, know when it's appropriate 2. Stem and Leaf Plots - How to read one (detect skewness, modality, outliers, find median), know when it's appropriate 3. Histograms - How to read one (detect skewness, modality, outliers, find median), know when it's appropriate 4. Box plots - How to read one (detect skewness, outliers, find min, Q1, med, Q3, and max), know when it's appropriate 5. Normal Quantile plots - Detect if distribution is approximately normal 6. Bar graphs - Read what percent in a given category, know when it's appropriate 7. Pie graphs - Read what percent in a given category, know when it's appropriate 8. Scatter Plots - How to read one (detect direction, strength), know when it's appropriate 9. Two-way tables - Read what percent in a given category or combination of categories, know when it's appropriate How to Calculate (by computer or by hand): 1. Measures of center (Mean, Median) 2. Other measures of location (5 number summary, quartiles, min, max, percentages) 3. Measures of spread (Range, IQR, Standard deviation) 4. Percent of values between given limits, above one limit, below one limit for approximately normal shaped distribution Facts: 1. The advantages/disatvantages of Histograms vs. Stem and Leaf plots vs. Box plots 2. When mean is less than median, when mean is greater than median, when mean = median 3. When mean and standard deviation are appropriate measures of center and spread 4. When median and IQR are appropriate measures of center and spread 5. Characteristics of the standard deviation 6. Empirical Rule for approximately normal distributions 7. What correlation measures and what it doesn't imply 8. Properties of correlation 9. How to interpret correlation 10. The difference between correlation and slope 11. How adding an outlier will affect the correlation and slope 12. Sampling concerns when answering questions that do not compare 13. Good sampling techniques for questions that do not compare 14. How we take an SRS 15. Concerns when comparing populations 16. How we deal with confounding in studies that compare populations 17. Conclusions can we draw if we do careful experiments 18. Conclusions can we draw if we do careful observational studies How to Identify: 1. Population(s) in a study or study question 2. Sample(s) in a study 3. Explanatory and response variables in a study or study question 4. Response variable type (continuous, discrete, catagorical) 5. Potential sources of bias in a study 6. Sampling method used in a study 7. Whether an experiment or observational study was used to answer a question 8. Confounding variables in a study 9. Interacting variables in a study The Definition of: Variable Categorical variable Quantatative variable Distribution Histogram Stem and Leaf plot Box Plot Normal Quantile plot Bar chart Pie chart Frequency Relative Frequency Symmetric Skewed right Skewed left Unimodal Bimodal Outlier Mean Median Mode First Quartile, Q1 Second Quartile, Q3 Five number summary Standard Deviation Range Inter Quartile Range, IQR Empirical rule Scatter Plot Slope Intercept Linear Negative/positive realtionship Correlation Side by side box plot Two Way table Association Population Sample Bias SRS Cluster sample Stratified random sample Convenience sample Voluntary response sample Explanatory variable Response variable Confounding variable Interacting variable Randomized Experiment Observarional Study