• Study Resource
• Explore

Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia, lookup

Bootstrapping (statistics) wikipedia, lookup

Time series wikipedia, lookup

Misuse of statistics wikipedia, lookup

Categorical variable wikipedia, lookup

Gibbs sampling wikipedia, lookup

Transcript
```AP Stats
Review day 1
April 2, 2013
Basics
• Two Parts (90 Minutes each part)
– 40 Multiple Choice
• Content Questions (10-15)
• Calculation Questions(25-30)
– 6 Free Response
• Part A 5 Questions (12 minutes) 75% of Part II Score
• Part B 1 Questions (25 minutes) 25 % of Part II Score
Constructing and Interpreting
Graphical Displays of Univariate Data
• Two Types of Data
– Categorical (qualitative) example: colors, gender
– Quantitative (numerical) example: age, scores
• Types of Graphs (bar, pie, dotplot, histogram,
stem and leaf)
• Describing or Comparing Distributions:
– Shape: overall pattern of the data; unimodal,
bimodal, uniform, symmetrical, skewness
– Center: Point that divides the data roughly in half
– Outliers: 1.5 x IQR rule
Review Questions P. 59 (4 minutes)
• 1. Which of the following are true statements? E
• 2. Which of the following is inappropriate for
displaying quantitative data?
C
• 3. A graphical display of data that shows the
cumulative counts across each of the possible
data values or ranges of data values is a
C
• 4. The height of Mrs. Clark’s tomato plant is
what type of data?
B
Summarizing Distributions of
Univariate Data
• Compared by measures of center: Mean &
Median
– Mean is pulled toward skewness
– Median = Mean when symmetric
• Measures of Spread: range, IQR, variance and
standard deviation
• Measures of position: quartiles and
percentiles (z score)
• Boxplot: Min, Q1, Med, Q3, Max
Review Questions P. 74 & 75
• Multiple Choice Questions
– 1. The mean assessed value of homes in Southern
County is \$158,000 with a standard deviation of
\$32,000. If the county supervisors decided to
increase everyone’s assessment by \$5,000, the
new mean and standard deviation would be
C
– 2. What was the average
B
C
– 3. A distribution is skewed right if
– 4. When a constant is added to every data value
• Free Response Question
D
Comparing Distributions
• P. 84 & 85 Review questions
– 1. Which of the following is true?
– 2. Consider the following back-to- back
– 3. Which of thf following could not be used to
compare these data graphically?
– 4. Which of the following statements cannot be
justified?
• Free Response Question: The following data
represents the hours of continuous use for two
brands of batteries.
1. E
2. A
3. C
4. A
Free Response
• From my boxplot comparisons, Brand A has a
symmetric distribution, larger mean (67.2) and
median (67), and range was 10. Brand B is
skewed right, smaller mean (66.1) and median
(65), and range was 6.
Exploring Bivariate Data
•
•
•
•
•
Scatterplots
Linear correlation coefficient (r)
Least Squares Regression Line
Residual Plot
Transforming Data to achieve linearity
Review Questions P. 101
• 1. A scatterplot is obtained by
C
• 2. A perfect positive correlation means
A
• 3. In a linear regression model, the slope
D
• 4. In the regression equation y = 12 + 6x, 12
• Free Response Question P. 102
D
Exploring Categorical Data
• 1. Categorical data is exhibited using
frequency tables and bar charts
• 2. Joint frequencies occur where a row
category meets a column category; if the
frequencies appear in the margins of the
table, they are called marginal frequencies.
• 3. Conditional relative frequencies are
expressed in percentages
P. 111 Review Questions
• 1. Relative frequency
• 2. Frequency of a category
• 3. What is the joint relative conditional
frequency for male Republicans given if the
marginal row totals are fixed?
• 4. Which of the following is a joint freqency?
1. B
2. A
3. B
4. A
Methods of Data Collection
• For all types of studies, the primary goal of the
investigator is to eliminate bias. Bias is any
systematic tendency to favor certain outcomes
at the expense of others.
• Experiment vs Observational Study
• Observational study
– Census (population/parameter)
– Sample survey (sample/statistic)
Experiments
• Experiments
– Deliberately impose a treatment on a set of
individuals. (subjects if people)
– To determine whether the the treatment imposed
causes the effects measured in the subjects.
– These measured effects are the response variables
in an experiment.
– Explanatory Variables are often referred to as
factors, each of which can take multiple values
called levels.
Example
• Suppose you are investigating the effects of
various amounts of sunlight and moisture on
the growth of tulip plants. The plants would
be your experimental units, and you could
measure the height of the tulips as your
response variable. You could subject samples
to direct sunlight, partial shade, and full
shade. You could also water them either
everyday or every other day. What are the
factors and levels? How many treatments?
Two factors: Sunlight and moisture Three levels for sunlight, Two levels for
moisture. 6 treatments
Review Questions P. 119
• 1. What kind of study is this?
• 2. Which of the following statements about
observational studies is true?
• 3. Which describes the factors, levels, and/or
treatments in this experiment?
• If time permits look at the free response
question on page 120
1. D
2. E
3. A
Sources of Bias
• Sample selection
– Undercoverage
– Nonresponse
– Convenience sampling
– Voluntary response bias
• Incorrect Measurement
– Poorly worded questions
– Response bias
– Deliberately lie
Sampling
•
•
•
•
•
Simple Random Sample (SRS)
Stratified Random Sampling
Systematic Sampling
Cluster Sampling
Multistage sampling refers to any random
sampling scheme with at least two steps that
incorporates elements of stratified sampling,
cluster, or SRS
Review Questions P. 129
• 1. What type of sampling?
• 2. Which of the following is not a property of
a large table of random digits?
• 3. Most significant sources of bias
1. A
2. B
3. D
Planning and Conducting Experiments
• Control: Compare multiple treatments
• Randomization: Randomly assigned to
treatment groups.
• Replication: Multiple subjects/Repeat to
validate results
Poorly designed experiments
• Confounding variables
• Lurking variables
• Placebo effect
Experimental Design
•
•
•
•
Completely Randomized
Randomized Block
Matched Pairs
Blinding/Double Blind
Review Questions P. 139
• 1. Which is the most serious problem with
this design?
• 2. Which of the following is the least
important way in which the designer of an
experiment can guard against confounding?
• 3. Which of the following is the most
appropriate design for this experiment?
1. A
2. A
3. E
Probability
• Events are independent if the outcome of one event
does not influence the outcome of any other event
• Events are mutually exclusive if they cannot occur
together.
• P(A or B) = P(A) + P(B) – P(A and B)
• If A and B are independent: P(A&B)= P(A)P(B)
• P(B|A) = P(A and B)/P(A)
• The most common way to check for independence is
simply to check that P(B) = P(B|A)
Review Questions
•
•
•
•
•
1. What is the probability?
2. What is the probability?
3. The two events are
4. What is the probability?
5. What is the probability that exactly one is
defective?
1. C
2. B
3.D
4. A
5. D
Random Variables
•
•
•
•
•
𝜇𝑎+𝑏𝑥 = 𝑎 + 𝑏𝑥
𝜎 2 𝑎+𝑏𝑥 =𝑏 2 𝜎 2 𝑥
𝜇𝑋+𝑌 = 𝜇𝑋 + 𝜇𝑌
𝜎 2𝑋+𝑌 = 𝜎 2𝑋 + 𝜎 2 𝑌
𝜎 2𝑋−𝑌 = 𝜎 2𝑋 + 𝜎 2 𝑌
Binomial Distribution
• Each observation is either a success or failure
• The number of observations is the fixed
number n
• The n observations are all independent
• The probability of success, p, is the same for
each observation.
• Mean: np
• Standard deviation 𝑛𝑝(1 − 𝑝)
Geometric Distribution
• Each observation is either a success or failure
• The variable of interest is the number of trials
required to obtain the first success
• The n observations are all independent
• The probability of success, p, is the same for
each observation
• Mean: 1/p
• Standard Deviation:
1−𝑝
𝑝
Review Questions P. 182
The Normal Distribution
```
Related documents