Download Exam 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Mediation (statistics) wikipedia , lookup

Gibbs sampling wikipedia , lookup

Categorical variable wikipedia , lookup

Transcript
Stat303 - EXAM 1 MATERIAL
Exam Date: Feb. 4, 5 or 6
Week 1: Data summary (single variables)
IPS - Ch 1
I) Variables - Characteristics of a thing/person
A) Variable Types
1) Numeric
a) Discrete
b) Continuous
2) Categorical
II) Distribution of a variable - which values occurred and how often each value occured
III) Numeric Variables
A) Describing distributions
1) Measures of location
a) Mean, x - also a measure of center
~
b) Median, x (or Q2) - also a measure of center
c) Q1
d) Q3
e) Minimum
f) Maximum
2) Measures of spread
a) Range
b) Interquartile Range (IQR) = Q3 - Q1
c) Standard Deviation, s
3) Tables/Graphs
a) Frequency Tables
frequency
relative frequency
b) Histograms
Symmetric
Skewed right
Skewed left
Outliers
Unimodal
Bimodal
c) Stem and leaf plot
d) Box plots-use 5 number summary=min, Q1, ~
x , Q3, max
Symmetric
Skewed right
Skewed left
Outliers
e) Normal Quantile Plot
B) Properties of distributions
1) Skewed data
a) Skewed right - x > ~
x
~
b) Skewed left - x < x
c) Use Median andIQR to measure center and spread
2) (roughly) Symmetric Data
a) Mean=median
b) Use mean and SD to measure center and spread
c) Empirical Rule : If distribution approximately normal
~68% within 1 SD of mean
~95% within 2 SD of mean
~99.7% within 3 SD of mean
IV) Categorical Variables
A) Describing distributions
1) Tables/Graphs
a) Frequency Tables
b) Bar chart
c) Pie chart
2) Measures of location
a) Proportions
b) Mode
Week 2: Data summary (relationships between 2 variables)
IPS - Ch 2
I) If one variable is thought to affect the other
A) Explanatory variable
B) Response variable
II) Categorical vs. Numeric
A) Categorical is the explanatory variable
1) Compare numeric variable across the levels of the categorical
variable
B) Graphs
1) Side by side box plot
a) All levels of explanatory on X axis
b) Box plot of each group in graph area
C) Measures
2) Summary statistics by group
a) Mean
b) Standard deviation
III) Categorical vs. Categorical
A) Either variable on may be explanatory (depends on situation)
1) Compare percentages in each level of one categorical variable
across the levels of the other categorical variable
B) Table
1) Two Way table
a) All levels of one variable on one axis
b) All levels of the other variable on the other axis
2) What to look for
a) Associations
3) Properties of associations
a) Does not imply causation
IV) Numeric vs. Numeric
A) Either variable on may be explanatory (depends on situation)
1) Compare values of one numberic variable over possible values of
the other numeric variable
B) Graphs
1) Scatter plot
a) Explanatory on X axis
b) Response on Y axis
C) Measures
1) Slope, b1
a) Positive relationship
b) Negative relationship
2) Intercept, b0
3) Correlation, r - Measures the strength of a linear relationship
a) How to calculate
b) Properties
Doesn't matter which is x and which is y
Between -1 and 1 (Positive vs. negative)
Close to 0 vs. far from 0
c) Sensitive to outliers, influential points
d) Does not imply causation
4) Difference between slope and correlation
a) Strong linear relationship vs. rise/run
b) Effect of scale changes
Week 3: Collecting Data (sampling, experimental design)
IPS - Ch 3
I) Basic idea of inferencial statistics
A) Ask question about a population
1) Questions that do not compare - define one population
2) Questions that compare - define more than one population
B) Collect a sample
1) Questions that do not compare - collect one sample
2) Questions that compare - collect more than one sample
C) Summarize data in sample
D) Analyze- Make inference about population using sample summaries (later)
II) Questions that do not compare populations
A) Concerns
1) Samples should be good representations of populations
a) unbiased
b) members should be independent of one another
2) Samples should be large enough
B) (Some) good sampling techniques
1) Simple Random Sample (SRS)
2) Stratified Random Sample
3) Cluster Sample
C) (Some) ways that bias can enter study
1) Voluntary response sampling
2) Convenience sampling
3) Dishonesty of the respondent
IV) Questions that compare populations
A) Vocabulary
1) Explanatory variable
2) Response variable
B) Concerns
1) Samples should be good representations of their populations
2) Samples should be alike as possible except explanatory variable
a) Confounding variables
3) Questions should not overgeneralize
a) Interacting variables
4) Types of conclustion that can be drawn depend on sampling
technique
C) Good sampling techniques
1) Randomized experiments - create members of populations by
subjecting units to different treatments
a) How to run a good randomized experiment
Use good sampling technique from III, A
Randomly assign members of sample to treatment groups
b) Types of conclusions that can be drawn
If done carefully, can infer causation
c) Why we can't always run randomized experiments
sometimes dificult
sometimes unethical
sometimes impossible
2) Observational Studies - select samples from pre-existing populations
a) How to run a good observational study
Try to account for any potential confounding variables
b) Types of conclusions that can be drawn
Can never say explanatory caused response
Can only describe populations as they exist
YOU SHOULD KNOW FOR EXAMS/QUIZZES:
Graphs/Tables
1. Frequency Tables - Read what percent in a given category, know when it's appropriate
2. Stem and Leaf Plots - How to read one (detect skewness, modality, outliers, find median),
know when it's appropriate
3. Histograms - How to read one (detect skewness, modality, outliers, find median), know
when it's appropriate
4. Box plots - How to read one (detect skewness, outliers, find min, Q1, med, Q3, and max),
know when it's appropriate
5. Normal Quantile plots - Detect if distribution is approximately normal
6. Bar graphs - Read what percent in a given category, know when it's appropriate
7. Pie graphs - Read what percent in a given category, know when it's appropriate
8. Scatter Plots - How to read one (detect direction, strength), know when it's appropriate
9. Two-way tables - Read what percent in a given category or combination of categories,
know when it's appropriate
How to Calculate (by computer or by hand):
1. Measures of center (Mean, Median)
2. Other measures of location (5 number summary, quartiles, min, max, percentages)
3. Measures of spread (Range, IQR, Standard deviation)
4. Percent of values between given limits, above one limit, below one limit for
approximately normal shaped distribution
Facts:
1. The advantages/disatvantages of Histograms vs. Stem and Leaf plots vs. Box plots
2. When mean is less than median, when mean is greater than median, when mean =
median
3. When mean and standard deviation are appropriate measures of center and spread
4. When median and IQR are appropriate measures of center and spread
5. Characteristics of the standard deviation
6. Empirical Rule for approximately normal distributions
7. What correlation measures and what it doesn't imply
8. Properties of correlation
9. How to interpret correlation
10. The difference between correlation and slope
11. How adding an outlier will affect the correlation and slope
12. Sampling concerns when answering questions that do not compare
13. Good sampling techniques for questions that do not compare
14. How we take an SRS
15. Concerns when comparing populations
16. How we deal with confounding in studies that compare populations
17. Conclusions can we draw if we do careful experiments
18. Conclusions can we draw if we do careful observational studies
How to Identify:
1. Population(s) in a study or study question
2. Sample(s) in a study
3. Explanatory and response variables in a study or study question
4. Response variable type (continuous, discrete, catagorical)
5. Potential sources of bias in a study
6. Sampling method used in a study
7. Whether an experiment or observational study was used to answer a question
8. Confounding variables in a study
9. Interacting variables in a study
The Definition of:
Variable
Categorical variable
Quantatative variable
Distribution
Histogram
Stem and Leaf plot
Box Plot
Normal Quantile plot
Bar chart
Pie chart
Frequency
Relative Frequency
Symmetric
Skewed right
Skewed left
Unimodal
Bimodal
Outlier
Mean
Median
Mode
First Quartile, Q1
Second Quartile, Q3
Five number summary
Standard Deviation
Range
Inter Quartile Range, IQR
Empirical rule
Scatter Plot
Slope
Intercept
Linear
Negative/positive realtionship
Correlation
Side by side box plot
Two Way table
Association
Population
Sample
Bias
SRS
Cluster sample
Stratified random sample
Convenience sample
Voluntary response sample
Explanatory variable
Response variable
Confounding variable
Interacting variable
Randomized Experiment
Observarional Study