* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download AP Stats "Things to Remember" Document
Psychometrics wikipedia , lookup
Inductive probability wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Foundations of statistics wikipedia , lookup
History of statistics wikipedia , lookup
Confidence interval wikipedia , lookup
Taylor's law wikipedia , lookup
Law of large numbers wikipedia , lookup
Student's t-test wikipedia , lookup
AP Stats – Things to Remember 1 a. Mean and standard deviation are good for symmetric distributions. b. 5 number summaries are good for skewed distributions. 2. Potential outliers: 1.5(Q3 – Q1) add to Q3 and subtract from Q1 IQR 3. Graphs: a. Cumulative Frequency Graph or “ogive”: b. Histogram: Charts the number of times A, B, C,… occurred c. Stem and Leaf Plot: Data: 3, 3, 9, 11,15,18,20,27,29,31,31,35,38,38,46,47,49 d. Scatter plot: measures x and y coordinates e. Frequency Distribution 0 1 2 3 14 16 42 20 Bottom numbers explain the number of times the top numbers occur. f. Segmented Bar Graph all categories will add up to 100% in each column 4. R vs. R2 R= Correlation coefficient; closer to 1 or -1 means strong. * Closer to 0 means little correlation (Describes strength of data’s correlation) R2 = Percent of variation. *The percent of y values that are explained by x. 5. Resistant: median, mode, IQR, Q1, Q3. Non-resistant: x , , r, linear regression, min/max, r2 6. Testing Normalacy: Use normal probability plot (6th graph under stat plot) *The closer the data is to a straight line the more normal the data. 7. Linear Transformation: Mean, median, quartiles x*=a + b x IQR and x*=b 8. N (,) uses z x n 9. 68 - 95 - 99.7= empirical rule 10. Least Squares Regression Line ŷ = a + bx where S b r y Sx a = plug b, x, and y in and solve 11. Residual= observed-predicted= y- ŷ 12. Low bias- data is close to the middle Low variability- Data is close together 13. Parameter describes a population Statistic describes specific data 14. Disjoint events= No outcomes in common i.e.- Heads on a coin and a 3 on a dice (You can’t get both by only doing one action) Independent- if knowing one event occurs does not change the probability of another event. *Can’t draw Venn diagram for this principle Mutually Exclusive- can’t both happen at same time. Ex. one roll of dice, getting a 6 and a 3 15. Logs -Exponential- increases by a fixed % of previous -Not easy to compare curves -x, y are the originals -Take log y and plot against x -If exponential it will form a straight line with high r and r2 value -Take residuals y=abx log y=log(ab x ) => Rule 1= log a+ log bx Rule 2= loga+(logb)x 16. Logarithm Rules 1. log(AB)= log A+ log B 2. log(A/B)=log A-log B 3. log xP =p logx 17. Means and variances or random variables - a+bx = a + bx - 2 a+bx= b2 2 = a+bx= b -r2 provides info regarding the percent of variation in the dependant variable that the regression line accounts for 18. If x and y are 2 random variables: x+y=x+y 2 x+y= 2 x+ 2 y 2 x+y= 2 x+ 2 y 18.5 Sampling; SRS Systematic Multistage Stratified- Divide into groups of similar individuals Census Block-Break into groups that are expected to affect response before you even start Matched Pairs 19. Bias a. Voluntary Bias- A general appeal is made for response to one or more questions b. Convenience sampling- Members of the population are chosen based on the convenience of including them c. Non-response- Occurs when a selected individual either can’t be contacted or refuses to cooperate d. Response bias- Respondents may lie, especially if asked about illegal/antisocial behavior e. Wording questions- Wording questions in a biased way to lead the person to a given answer. f. Under coverage- Under representing some group (group left out) 20. Designing experiments: Randomize- Use SRS to randomly pick subjects Replicate- Repeat experiment Control- Use placebo/control group SRS men TEST Random < > Compare and replicate SRS men Placebo 21. Simulating Experiments: Assign all a number, use table B/random number generator EX: 50 people, need 5 for experiment, assign #01-50 Table B(09,21,33,41,43) 22. Probability P(AUB)= P(A) + P(B) P(AB)= P(A) P(B) P(AC )= 1 - P(A) Conditional (The venn diagram will work well here) P(B/A)=P(A+B)/P(A) P(A or B)= P(A)+P(B)-P(A+B) P(A+B)=P(A) P(B/A) n 23. Expected mean= x p i i i0 P=probability x=Outcome n= total # x p 2 n Sx= i 0 i i 24. Binomial distribution B(n,p) n=# p=Probability of success “What’s the probability of getting 3 successes in 20 trails?” n P(x=k) = pk (1-p) n-k or binompdf k Exactly- Binomcdf (n, p, k) Rules 1. Success/Failure 2. All independent 3. Set # 4. Probability is the same 25. To convert binomial distribution to normal; x=np x= np(1 p) x x Then use and use table z x 26. Sample x of a SRS, of size n, from a large population with mean and standard deviation has distribution x = x = n (used if given a specific sample size) Geometric Distribution 27. times until our first success?” “How many “What is the probability we have five trials before the first success?” 1 p =1/p = p(x=k) = p(1-p) x-1 p geometricpdf(p,k) Rules 1. Success/Failure 2. All independent 3. Probability is the same 28. Confidence interval x z x =SRS mean, Z*= upper critical value n =Population standard deviation, n=population size 29. Sample size needed to get a specific margin of error z* 2 n= 1 sample z interval m m= margin of error OR ME= z and solve for n n CI decrease if: 1. Confidence level decreases 2. Sample size increases 3. Population standard deviation decreases 30a. 1 sample Z Test Ho: Null hypothesis Ha: Alternative hypothesis Ex. =3 <3 2.7 3 z= -3.87 p= 0 .3 15 At =.05, we accept the Ha that <3 x=2.7 =.3 n=15 z b. If the test was two-sided ( 3), multiply probability by 2 31. Type 1 error= reject Ho when Ho is true Type 2 error= reject Ha when Ha is true that p value is significant Power= probability x Standard Error= Sx n Sx n * Remember T test shows upper probability S b. confidence interval= x t * x n df=n-1 robust=if not affected by outliers 32a. 1 sample T statistic = 33. Matched Pairs T Test and after data. -Evaluates before -Compares just two treatments -One subject Before After Subject differences (Then use 1 sample T test) 2 3 1 4 9 5 5 20 15 34a. Two sample t statistic t= ____( x 1- x 2)_____ S 2 S 2 1 2 N1 N 2 b. Confidence interval= S 2 S 2 ( x 1- x 2) t* 1 2 N1 N 2 Assumptions 1. Approximately normal 2.SRS 3. Independent 4. Don’t know 35a. 1 sample proportion z test Assumptions np>10 n(1-p)>10 population>10n pˆ =sample proportion z= ___ pˆ -p__ p(1 p } n bottom part of equation equals ( pˆ (1 pˆ )) b. Confidence interval= pˆ z* n z * 2 z * 2 n= p*(1-p*) or zm m If p* is not given p*=estimated population proportion 36a. 2 sample proportion z test Assumptions n1p1>5 n1(1-p1)>5 n2p2>5 n2(1-p2)>5 z= pˆ1- pˆ 2 ____________________________ 1 1 pˆ (1 pˆ ) ^ n1 n 2 x pˆ 1= 1 n1 x pˆ 2 2 n 2 x x 2 pˆ 1 n1 n 2 b. Confidence interval pˆ1 (1 pˆ1 ) pˆ 2 (1 pˆ 2 ) n1 n 2 |____Standard error______| |______Margin of error______| pˆ1 pˆ 2 z * 37. X2 Test- Used to compare 2 or more proportions. I. Test of independence-matrix is 2x2 or larger Df=(r -1)(c -1) Ho=No association between row and column Ha=Association between row and column Expected cell counts= Row total • Column total n 2 obs exp X2 = exp *Can do test on calculator Assumptions= Expected values > 5 II. Goodness of Fit-Matrix (1 X (#)) Df = # of cells – 1 Ho = Good fit Ha = Not a good fit Expected value = given probability total amount EX. 1 2 3 4 5 12 18 13 11 20 6 16 Prob.=1/6 Total amount= 90 Expected value = 15 X = 2 obs exp 2 exp *Can’t be done on a calculator 38. Linear regression T-test y=bo+b1x Test= Ho: B1 = 0 No Relationship Ha: B1 0 There is a relationship b1 0 b1 Sb1 Sb1 t Confidence interval= b1 t * Sb1 Df = n-2 t Computer printouts are popular * Linear Regression T-test example FUEL= 10.7 + 2.15RAILCARS Predictor Constant Railcar Coef StDev 10.677 5.157 2.1495 .1396 S= 4.361 R-Sq= 96.7% T P 2.07 .072 15.40 .000 R-Sq(adj)= 96.3% The linear regressin equation: this is usually printed at the top of the printout. Notice that they substituted “y” and “x” for the response and independent variables in the problem, in this case, “FUEL” and “RAILCARS.” Constant: The y intercept of the regression line. If you are thinking of y= a+bx, “Constant” is the added number. In this case, 10.7. Railcar: The slope of the regression line. In y=a+bx, it is the b value. In this case 2.15. Coef: Refers to the value in the regression equation. These are taken right from the equation. StDev: The standard deviation of the values in “Coef.” These are necessary if you need to show all the work for a linear regression T test, but if you are just asked to make conclusions then don’t worry about this column. T: The value of the test statistic (T value) for a linear regression T test of the value in “Coef” against the null hypothesis B0=0 or B1=0, depending on which row it is in.