Download AP Stats "Things to Remember" Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Inductive probability wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

Taylor's law wikipedia , lookup

Law of large numbers wikipedia , lookup

Student's t-test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
AP Stats – Things to Remember
1 a. Mean and standard deviation are good for symmetric distributions.
b. 5 number summaries are good for skewed distributions.
2. Potential outliers:
1.5(Q3
– Q1)  add to Q3
and subtract from Q1
IQR
3. Graphs:
a. Cumulative Frequency Graph or “ogive”:
b. Histogram: Charts the number of times A, B, C,… occurred
c. Stem and Leaf Plot:
Data: 3, 3, 9, 11,15,18,20,27,29,31,31,35,38,38,46,47,49
d. Scatter plot: measures x and y coordinates
e. Frequency Distribution
0
1
2
3
14
16
42
20
Bottom numbers explain the number of times the top numbers occur.
f. Segmented Bar Graph
all categories will add up to 100% in each column
4. R vs. R2
R= Correlation coefficient; closer to 1 or -1 means strong.
* Closer to 0 means little correlation
(Describes strength of data’s correlation)
R2 = Percent of variation.
*The percent of y values that are explained by x.
5. Resistant: median, mode, IQR, Q1, Q3.
Non-resistant: x , , r, linear regression, min/max, r2
6. Testing Normalacy: Use normal probability plot (6th graph under stat plot)
*The closer the data is to a straight line the more normal the data.

7. Linear Transformation:
Mean, median, quartiles x*=a + b x
IQR and   x*=b
8. N (,) uses z 
x 


n
9. 68 - 95 - 99.7= empirical rule

10. Least Squares Regression Line
ŷ = a + bx where
S 
b  r y 
Sx 
a = plug b, x, and y in and solve

11. Residual= observed-predicted= y- ŷ
12. Low bias- data is close to the middle
Low variability- Data is close together
13. Parameter describes a population
Statistic describes specific data
14. Disjoint events= No outcomes in common
i.e.- Heads on a coin and a 3 on a dice (You can’t get both by only doing one action)
Independent- if knowing one event occurs does not change the probability of another
event.
*Can’t draw Venn diagram for this principle
Mutually Exclusive- can’t both happen at same time.
Ex. one roll of dice, getting a 6 and a 3
15. Logs
-Exponential- increases by a fixed % of previous
-Not easy to compare curves
-x, y are the originals
-Take log y and plot against x
-If exponential it will form a straight line with high r and r2 value
-Take residuals
y=abx
log y=log(ab x ) => Rule 1= log a+ log bx Rule 2= loga+(logb)x
16. Logarithm Rules
1. log(AB)= log A+ log B
2. log(A/B)=log A-log B
3. log xP =p logx
17. Means and variances or random variables
- a+bx = a + bx
- 2 a+bx= b2 2 =  a+bx= b
-r2 provides info regarding the percent of variation in the dependant variable that
the regression line accounts for
18. If x and y are 2 random variables:
x+y=x+y
2 x+y= 2 x+ 2 y
2 x+y= 2 x+ 2 y
18.5 Sampling;
SRS
Systematic
Multistage
Stratified- Divide into groups of similar individuals
Census
Block-Break into groups that are expected to affect response before you even start
Matched Pairs
19. Bias
a. Voluntary Bias- A general appeal is made for response to one or more
questions
b. Convenience sampling- Members of the population are chosen based on the
convenience of including them
c. Non-response- Occurs when a selected individual either can’t be contacted or
refuses to cooperate
d. Response bias- Respondents may lie, especially if asked about illegal/antisocial
behavior
e. Wording questions- Wording questions in a biased way to lead the person to a
given answer.
f. Under coverage- Under representing some group (group left out)
20. Designing experiments:
Randomize- Use SRS to randomly pick subjects
Replicate- Repeat experiment
Control- Use placebo/control group
SRS men TEST
Random <
> Compare and replicate
SRS men Placebo
21. Simulating Experiments:
Assign all a number, use table B/random number generator
EX: 50 people, need 5 for experiment, assign #01-50
Table B(09,21,33,41,43)
22. Probability
P(AUB)= P(A) + P(B)
P(AB)= P(A)  P(B)
P(AC )= 1 - P(A)
Conditional (The venn diagram will work well here)
P(B/A)=P(A+B)/P(A)
P(A or B)= P(A)+P(B)-P(A+B)
P(A+B)=P(A)  P(B/A)
n
23. Expected mean=
x p 
i i
i0
 P=probability
x=Outcome
n= total #
 x    p
2
n
Sx=
i 0
i
i
24. Binomial distribution
B(n,p) n=# p=Probability of success
“What’s the probability of getting 3 successes in 20 trails?”
n 
P(x=k) =   pk (1-p) n-k or binompdf
k 
Exactly- Binomcdf (n, p, k)
Rules

1. Success/Failure
2. All independent
3. Set #
4. Probability is the same
25. To convert binomial distribution to normal;
x=np
x= np(1 p)
x  x
Then use
and use table z
x

26. Sample x of a SRS, of size n, from a large population with mean  and standard
deviation  has distribution


x =
 x =
n

 (used if given a specific sample size)
 Geometric Distribution
27.
 times until our first success?”
“How many
“What is the probability we have five trials before the first success?”
1 p
=1/p
=
p(x=k) = p(1-p) x-1
p
geometricpdf(p,k)
Rules

1. Success/Failure
2. All independent
3. Probability is the same
28. Confidence interval
  
x  z  x =SRS mean, Z*= upper critical value
 n 
=Population standard deviation, n=population size


29. Sample size needed to get a specific margin of error
z* 2
n=   1 sample z interval
 m 
m= margin of error
OR

  
ME= z  and solve for n
 n 
CI decrease if:
1. Confidence level decreases
 2. Sample size increases
3. Population standard deviation decreases
30a. 1 sample Z Test
Ho: Null hypothesis
Ha: Alternative hypothesis
Ex.
=3
<3
2.7  3
z= -3.87 p= 0
.3 15
At  =.05, we accept the Ha that <3
x=2.7
=.3
n=15
z

b. If the test was two-sided (  3), multiply probability by 2
31. Type 1 error= reject Ho when Ho is true
Type 2 error= reject Ha when Ha is true
 that p value is significant
Power= probability
x 
Standard Error= Sx
n
Sx n
* Remember T test shows upper probability
 S 
b. confidence interval= x  t * x 
 n 


df=n-1
robust=if not affected by outliers
32a. 1 sample T statistic =
33. Matched Pairs T Test
 and after data.
-Evaluates before
-Compares just two treatments
-One subject
Before
After
Subject differences (Then use 1 sample T test)
2
3
1
4
9
5
5
20
15
34a. Two sample t statistic
t= ____( x 1- x 2)_____
S 2  S 2 
 1   2 
N1  N 2 
 
b. Confidence interval=

S 2  S 2 
( x 1- x 2)  t*  1   2 
N1  N 2 
Assumptions
  
1. Approximately normal
2.SRS
3. Independent
4. Don’t know 
35a. 1 sample proportion z test
Assumptions
np>10
n(1-p)>10
population>10n
pˆ =sample proportion
z= ___ pˆ -p__
p(1 p 

 } 
 n 



bottom part of equation equals 

( pˆ (1 pˆ )) 
b. Confidence interval= pˆ  z* 

 n

z * 2
z * 2
n=  
  p*(1-p*)
or
zm 
  m 
If p* is not given
p*=estimated population proportion

36a. 2 sample proportion z test
Assumptions 
n1p1>5
n1(1-p1)>5
n2p2>5
n2(1-p2)>5
z=
pˆ1- pˆ 2
____________________________
1 1 
pˆ (1 pˆ )  ^
n1 n 2 
 
x 
pˆ 1=  1 
n1 



x 
pˆ 2   2 
n 2 
x  x 2 
pˆ   1

n1  n 2 

b. Confidence interval

pˆ1 (1 pˆ1 )  pˆ 2 (1 pˆ 2 ) 

 

 n1
  n 2

|____Standard error______|
|______Margin of error______|
pˆ1  pˆ 2   z *

37. X2 Test- Used to compare 2 or more proportions.
I.
Test of independence-matrix is 2x2 or larger
Df=(r -1)(c -1)
Ho=No association between row and column
Ha=Association between row and column
Expected cell counts= Row total • Column total
n
2
obs  exp 
X2 = 
exp

*Can do test on calculator
Assumptions= Expected values > 5
II.
Goodness of Fit-Matrix (1 X (#))
Df = # of cells – 1
Ho = Good fit
Ha = Not a good fit
Expected value = given probability  total amount
EX.
1 2 3 4 5
12 18 13 11 20
6
16
Prob.=1/6
Total amount= 90
Expected value = 15
X =
2
obs  exp 
2
exp
*Can’t be done on a calculator

38. Linear regression T-test
y=bo+b1x
Test= Ho: B1 = 0  No Relationship
Ha: B1  0  There is a relationship
b1 0
b1
Sb1 
Sb1
t
Confidence interval= b1 t * Sb1 
Df = n-2
t



Computer printouts are popular *

Linear Regression T-test example
FUEL= 10.7 + 2.15RAILCARS
Predictor
Constant
Railcar
Coef
StDev
10.677 5.157
2.1495 .1396
S= 4.361 R-Sq= 96.7%
T
P
2.07 .072
15.40 .000
R-Sq(adj)= 96.3%
The linear regressin equation: this is usually printed at the top of the printout. Notice that
they substituted “y” and “x” for the response and independent variables in the problem, in
this case, “FUEL” and “RAILCARS.”
Constant: The y intercept of the regression line. If you are thinking of y= a+bx,
“Constant” is the added number. In this case, 10.7.
Railcar: The slope of the regression line. In y=a+bx, it is the b value. In this case 2.15.
Coef: Refers to the value in the regression equation. These are taken right from the
equation.
StDev: The standard deviation of the values in “Coef.” These are necessary if you need to
show all the work for a linear regression T test, but if you are just asked to make
conclusions then don’t worry about this column.
T: The value of the test statistic (T value) for a linear regression T test of the value in
“Coef” against the null hypothesis B0=0 or B1=0, depending on which row it is in.