Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data assimilation wikipedia , lookup
Regression analysis wikipedia , lookup
Linear regression wikipedia , lookup
Bias of an estimator wikipedia , lookup
German tank problem wikipedia , lookup
Regression toward the mean wikipedia , lookup
Least squares wikipedia , lookup
Explain Standard Deviation Explain LSRL “b” Why use a control group? Explain LSRL “a” Explain a P-value Explain LSRL “SEb” Goal of Blocking Benefit of Blocking Explain LSRL “ y ” Explain LSRL “s” SRS For every one unit change in the x-axis variable (context) the y-axis variable (context) is estimated to increase/decrease by ____ units (context). Standard Deviation measures spread by giving the “typical” or “average” distance that the observations (context) are away from their (context) mean When the x-axis variable (context) is zero, the y-axis variable (context) is estimated to be put value here. A control group gives the researchers a comparison group to be used to evaluate the effectiveness of the treatment(s). (context) (gage the effect of the treatment compared to no treatment at all) SEb measures the standard deviation of the estimated slope for predicting the yaxis variable (context) from the x-axis variable (context). Assuming that the null is true (context) the p-value measures the chance of observing a statistic (or difference in statistics) (context) as large or larger than the one actually observed. y is the “estimated” or “predicted” yvalue (context) for a given x-value (context) A SRS is a sample taken in such a way that every set of n individuals has an equal chance to be the sample actually selected. The goal of blocking is to create groups of homogeneous experimental units. The benefit of blocking is the reduction of variation within the experimental units. (context) The value s = ___ is the standard deviation of the residuals. It measures a typical distance between the actual yvalues (context) and their predicted yvalues (context) Sampling Techniques 2 Random Variables (Formulas) Bias Is a Linear Model Appropriate? *Interpreting a Residual Plot* Central Limit Theorem The Meaning of 95% Confident Experimental Designs Interpreting r2 1 Random Variable (Formulas) Binomial Distribution (Conditions) Total Mean of 2 RV’s: T = x+ y Total Stdev of 2 Independent RV’s: T x2 Y 2 Total Stdev of 2 Dependent RV’s: Cannot be determined because it depends on how strongly they are correlated. 1. SRS– Number the entire population, draw numbers from a hat (every set of n individuals has equal chance of selection) 2. Stratified – Split the population into homogeneous groups, select a SRS from each group. 3. Voluntary Response – People choose themselves by responding to a general appeal. 4. Multistage – Select successively smaller groups within the population in stages, resulting in a sample consisting of clusters of individuals. 1. No Clear Pattern – Particularly check that there is not a curved pattern. 2. No increasing or decreasing spread – Bad for predicting the future (or past) 3. Are the residuals small? (Notice the units) 4. No clear outliers (large residuals) or influential observations (pulling the LSRL up or down) 5. Is “r” (or “r2”) close to 1 or -1? The closer the better! If so, the LSRL is a good model for the data! The systematic favoring of certain outcomes from flawed sample selection, poor question wording, undercoverage, nonresponse, etc. Bias deals with the center of a sample distribution being “off”! The method used to produce this interval will capture the true population mean/proportion in 95% of all possible samples of this same size from this same population. 1. If the population distribution is normal the sampling distribution will also be normal with the same mean as the population. Additionally, as n increases the sampling distribution’s standard deviation will decrease 2. If the population distribution is not normal the sampling distribution will become more and more normal as n increases. The sampling distribution will have the same mean as the population and as n increases the sampling distribution’s standard deviation will decrease. r2 = ____ means that ___% of the variation in y (context) is explained by the LSRL of y (context) on x (context). Or 2 r = ____ means that ___% of the variation in y (context) is explained by using the linear regression model with x (context) as the explanatory variable. 1. CRD (Completely Randomized Design) – All experimental units are allocated at random among all treatments 2. RBD (Randomized Block Design) – Experimental units are put into homogeneous blocks. The random assignment of the units to the treatments is carried out separately within each block. 3. Matched Pairs – A form of blocking in which each subject receives both treatments in a random order or the subjects are matched in pairs as closely as possible and one subject in each pair receives each treatment. Mean (Expected Value): 1. 2. 3. 4. Two Outcomes: Success & Failure Fixed Number of Trials (n) Fixed Probability of Success for Each Trial (p) Trials are Independent x xi pi (Multiply & add across the table) Standard Deviation: x ( xi ) pi Sum of: (Each x value – the mean)2(its probability) Binomial Distribution (Mean & Standard Deviation) Outlier Rule Binomial Distribution (Calculator Usage) What is an Outlier? Type I Error, Type II Error, & Power Interpret r Interpret a Z-score P(At Least 1) Two Events are Independent If… Linear Transformations Upper Bound = Q3 + 1.5(IQR) Lower Bound = Q1 – 1.5(IQR) IQR = Q3 – Q1 When given 1 variable data: An outlier is any value that falls more than 1.5IQR above Q3 or below Q1 Regression Outlier: Any data point that has a “large” residual x np Standard Deviation: x np(1 p) Mean: Exactly 5: At Most 5: Less Than 5: At Least 5: More Than 5: P(X = 5) = Binompdf(n, p, 5) P(x 5) = Binomcdf(n, p, 5) P(X < 5) = Binomcdf(n, p, 4) P(x 5) = 1 – Binomcdf(n, p, 4) P(X > 5) = 1 – Binomcdf(n, p, 5) Correlation measures the strength and direction of 1. Type I Error: H is innocent, but due to 0 the linear relationship between x and y. unfortunate sample selection that did not represent r is always between -1 and 1. the population well, we mistakenly reject H0. Close to zero = very weak, 2. Type II Error: H0 is guilty (should be rejected), but Close to 1 or -1 = stronger due to unfortunate sample selection (which did not Exactly 1 or -1 = Perfectly straight line represent the population well), we fail to reject H0. Positive r = + Correlation 3. Power: Probability of rejecting H0 when H0 should be rejected. (Rejecting Correctly) Negative r = - Correlation P(At least 1) = 1 – P(None) Ex. P(Get a statmaster on any one test) = 0.02 Tests are independent, and there are 14 chapter tests. P(Get at least 1 statmaster) = 1-P(None) = 1 – (0.98)14 = 0.246 Adding “a” to every member of a data set adds “a” to the measures of center, but does not change the measures of spread. Multiplying every member of a data set by “a” multiplies the measures of center by “a” and multiplies the measures of spread by |a|. z statistic mean stdev A z-score describes how many standard deviations a value or statistic (x, x , p ) falls away from the population mean. The further the z-score is away from zero the more “surprising” the value of the statistic is. P(A and B) = P(A) P(B) Or P(B) = P(B|A) Unbiased Estimator Why Large Samples Give More Trustworthy Results… (When collected appropriately) Does the Sample Represent the Population Well? Describe the Distribution OR Compare the Distributions Experiment or Observational Study? Does ___ CAUSE ___? How to Set Up a Simulation… What is a Residual? Extrapolation SOCS When collected appropriately, large samples The data is collected in such a way that there yield more precise/accurate results than small is no systematic tendency to over or samples because in a large sample the values underestimate the true value of the of the response tend to average out and population parameter. (The mean of the approach that of the true population sampling distribution equals the true value of parameter. the parameter being estimated) SOCS! Shape, Outliers, Center Spread Only discuss outliers if there are obviously outliers present. You will get full credit for SCS! If it says “Compare” YOU MUST USE comparison words like “Is greater than” or “Is less than” for Center & Spread Association is NOT Causation! An observed association, no matter how strong, is not evidence of causation. Only a well designed, controlled experiment can lead to conclusions of cause and effect. Residual = y yˆ A residual measures the difference between the actual (observed) y-value in a scatterplot and the y-value that is predicted by the LSRL for any given value of x. In the Calculator: L3 = L2 – Y1(L1) Shape – Skewed Left (Mean < Median) Skewed Right (Mean > Median) Fairly Symmetric (Mean ≈ Median) Outliers – Only discuss them if they are obvious Center – Mean or Median (whichever is easier) Spread – Range, IQR, or Standard Deviation ( whichever is easier) Yes, if: They have a large, random sample taken from the same population we hope to draw conclusions about. A study is an experiment ONLY if they IMPOSE a treatment upon the experimental units. In an observational study we make no attempt to influence the results. 1. Assign digits to represent the outcomes/responses 2. Scheme: Will you use Table B, RandInt? How many numbers will you read at a time? Skip Numbers? Skip Repeats? 3. Recording the Data: What are you counting? When do you stop? 4. Repeat Many Trials 5. Report the Results of the Simulation Using a LSRL to predict outside the domain of the explanatory variable. (Can lead to ridiculous conclusions if the current linear trend does not continue) Carrying out a Two-Sided Test from a CI Matched Pairs t-test Phrasing Hints, H0 and Ha, Conclusion Two Sample t-test Phrasing Hints, H0 and Ha, Conclusion Complimentary Events Binomial Distribution Conditions Key Phrase: MEAN DIFFERENCE Ho: μDiff = 0 Ha: μDiff < 0, > 0, ≠0 μ = The mean difference in __ for all __. We do/do not have enough evidence at the 0.05 level to conclude that the mean difference in __ for all __ is ___. We do/do not have enough evident to reject H0: μ = ? in favor of Ha: μ≠ ?at the α = 0.05 level (1 – Confidence Level) because ? falls inside/outside the 95% CI (or whatever Confidence Level was used) 2 Disjoint Events whose union is the sample space. Key Phrase: DIFFERENCE IN THE MEANS A Ac Ex: Boy/Girl, Rain/Not Rain, Draw at least one heart / Draw NO hearts Ho: μ1 = μ2 OR μ1 - μ2 = 0 Ha: μ1 < μ2, > μ2, ≠ μ2 μ = The difference in the mean __ for all __. We do/do not have enough evidence at the 0.05 level to conclude that the difference in the mean __ for all __ is ___. 1. Two Outcomes: Success / Failure 2. Fixed # of trials/observations (n) 3. Probability of Success is the same for all trials/observations (p) 4. The “n” trials/observations are independent.