* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Math 10 - Elementary Statistics
Bootstrapping (statistics) wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
Psychometrics wikipedia , lookup
Taylor's law wikipedia , lookup
Misuse of statistics wikipedia , lookup
Analysis of variance wikipedia , lookup
Omnibus test wikipedia , lookup
2  The Chi-Square Distribution 1 The student will be able to  Perform a Goodness of Fit hypothesis test  Perform a Test of Independence hypothesis test 2  Chi-square is a distribution test statistics used to determine 3 things    Does our data fit a certain distribution? Goodness-of-fit Are two factors independent? Test of independence Does our variance change? Test of single variance 3  Notation   new random variable µ = df 2 = 2df  Facts    about Chi-square Nonsymmetrical and skewed right value is always > zero curve looks different for different degrees of freedom. As df gets larger curve approaches normal    2 ~  2 df df > 90 mean is located to the right of the peak 4  Hypothesis test steps are the same as always with the following changes     Test is always a right-tailed test Null and alternate hypothesis are in words rather than equations degrees of freedom = number of intervals - 1 test statistic defined as 2   n 2 (O  E ) E 5 A 6-sided die is rolled 120 times. The results are in the table below. Conduct a hypothesis test to determine if the die is fair. Face Value Frequency 1 15 2 29 3 16 4 15 5 30 6 15 6  Contradictory   Ho: observed data fits a Uniform distribution (die is fair) Ha: observed data does not fit a Uniform distribution (die is not fair)  Determine   hypotheses distribution Chi-square goodness-of-fit right-tailed test  Perform calculations to find pvalue   enter observed into L1 enter expected into L2 7  Perform  TI83     Access LIST, MATH, SUM enter sum((L1 - L2)2/L2) this is the test statistic  For our problem chi-square = 13.6 Access DISTR and chicdf   syntax is (test stat, 199, df) generate pvalue  For our problem pvalue = 0.0184  Make  calculations (cont.) decision since α > 0.0184, reject null  Concluding  statement There is sufficient evidence to conclude that the observed data does not fit a uniform distribution. (The die is not fair.) 8  Hypothesis testing steps the same with the following edit    Null and alternate in words have a contingency table expected values are calculated from the table   (row total)(column total) sample size Test statistic same 2 ( O  E ) 2   E n   df = (#columns - 1)(#row - 1) always right-tailed test 9  Conduct a hypothesis test to determine whether there is a relationship between an employees performance in a company’s training program and his/her ultimate success on the job. Use a level of significance of 1%.   Ho: Performance in training and success on job are independent Ha: Performance in training and success on job are not independent (or dependent). 10  Performance on job versus performance in training Performance in training Performance on Job Below Average Above TOTAL Average Average Poor 23 60 29 112 Average 28 79 60 167 Very Good 9 49 63 121 60 188 152 400 TOTAL 11  Determine   distribution right tailed chi-square  Perform calculations to find pvalue  Calculator will calculated expected values. We must enter contingency table as a Matrix (ack!)   Access MATRIX and edit Matrix A Access Chi-square test  Matrix A = observed  Matrix B calculator places expected here 12  Perform  pvalue = 0.0005  Make   calculations (cont.) decision.  = 0.01 > pvalue = 0.0005 reject null hypothesis  Concluding  statement. Performance in training and job success are dependent. 13 Linear Regression and Correlation Chapter Objectives 14 The student should be able to:  Discuss basic ideas of linear regression and correlation.  Create and interpret a line of best fit.  Calculate and interpret the correlation coefficient.  Find outliers. 15  Method for finding the “best fit” line through a scatterplot of paired data  independent variable (x) versus dependent variable (y)  Recall  from Algebra equation of line y = a + bx where a is the y-intercept  b is the slope of the line if b>0, slope upward to right if b<0, slope downward to right if b=0, line is horizontal     16  The   Draw what looks to you to be the best straight line fit Pick two points on the line and find the equation of the line  The   eye-ball method calculated method from calculus, we find the line that minimizes the distance each point is from the line that best fits the scatterplot letting the calculator do the work using LinRegTTest An example 17 Used to determine if the regression line is a “good fit” ρ is the population correlation coefficient  r is the sample correlation coefficient Formidable equation  see text  Calculator does the work     r positive - upward to right r negative - downward to right r zero - no correlation Graphs 18 Determining if there is a “good fit”  Gut method   if calculated r is close to 1 or -1, there’s a good fit Hypothesis test (LinRegTest)  Ho: ρ = 0    Ho means here IS NOT a significant linear relationship(correlation) between x and y in the population. Ha means here IS A significant linear relationship (correlation) between x and y in the population To reject Ho means that there is a linear relationship between x and y in the population.   Ha ρ ≠ 0 Does not mean that one CAUSES the other. Comparison to critical value  Use table end of chapter    Determine degrees of freedom df = n - 2 If r < negative critical value, then r is significant and we have a good fit If r > positive critical value, then r is significant and we have a good fit 19  If the line is determined to be a good fit, the equation can be used to predict y or x values from x or y values   Plug the numbers into the equation Equation is only valid for the paired data DOMAIN 20 Compare 1.9s to |y - yhat|for each (x, y) pair  if |y - yhat| > 1.9s, the point could be an outlier LinRegTest gives us s  y – yhat is put into the RESID list when the LinRegTest is done   To see the RESID list: go to STAT, Edit, move cursor to a blank list name and type RESID, the residuals will show up. 21 F Distribution and ANOVA 22 The student should be able to:  Interpret the F distribution as the number of groups and the sample size change.  Discuss two uses for the F distribution and ANOVA.  Conduct and interpret ANOVA 23  What is it good for?   Basic assumptions       Each population from which a sample is taken is assumed to be normal. Each sample is randomly selected and independent. The populations are assumed to have equal standard deviations (or variances). The factor is the categorical variable. The response is the numerical variable. The Hypotheses    Determines the existence of statistically significant differences among several group means. Ho: µ1=µ2=µ2=…=µk Ha: At least two of the group means are not equal Always a right-tailed test 24  Named after Sir Ronald Fisher  F statistic is a ratio (i.e. fraction)   two sets of degrees of freedom (numerator and denominator) F ~ Fdf(num),df(denom)  Two estimates of variance are made  Variation between samples    Estimate of σ2 that is the variance of the sample means Variation due to treatment (i.e. explained variation) Variation within samples   Estimate of σ2 that is the average of the sample variances Variations due to error (i.e. unexplained variation) 25 Curve is skewed right.  Different curve for each set of degrees of freedom.  As the dfs for numerator and denominator get larger, the curve approximates the normal distribution  F statistic is greater than or equal to zero  Other uses    Comparing two variances Two-Way Analysis of Variance 26  Formula  MSbetween – mean square explained by the different groups MSbetween F MS within MSbetween   SSbetween df between MSwithin – mean square that is due to chance MS within  SS within df within SSbetween – sum of squares that represents the variations among different samples  SSwithin – sum of squares that represents the variation within samples that is due to chance  27  Enter the table data by columns into L1, L2, L3….  Do ANOVA test – ANOVA(L1, L2,..)  What the calculator gives    F – the F statistics p – the pvalue Factor – the between stuff     df = # groups – 1 = k – 1 SSbetween MSbetween Error – the within stuff    df = total number of samples – # of groups = N – k SSwithin MSwithin 28 Four sororities took a random sample of sisters regarding their grade averages for the past term. The results are shown below: Sorority1 Sorority 2 Sorority 3 Sorority 4 2.17 2.63 2.63 3.79 1.85 1.77 3.78 3.45 2.83 3.25 4.00 3.08 1.69 1.86 2.55 2.26 3.33 2.21 2.45 3.18 Using a significance level of 1%, is there a difference in grade averages among the sororities? 29  What’s  Chapter 1, Chapter 2., Chapter 3, Chapter 4, Chapter 5, Chapter 6, Chapter 7, Chapter 8, Chapter 9, Chapter 10, Chapter 11, Chapter 12  42  multiple choice questions Do problems from each chapter  What  fair game to bring with you Scantron (#2052), pencil, eraser, calculator, 2 sheets of notes (8.5x11 inches, both sides) 30  Prepare for the Final exam  It has been a pleasure having you in class. Good luck and Godspeed with whatever path you take in life. 31
 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            