Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Transcript

Stats I Review Sheet Design of Experiments (DOE) 1. Population (parameters: , , ρ) - all of a group (also a census) 2. Sample (statistic: x-bar, s, p-hat) – a subset of a group 3. Sampling Techniques 1. Simple random sample – SRS (a subjects equal likely to be chosen) 2. Stratified sample (similar to blocking – SRS within subgroups) 3. Cluster sample (SRS to groups – census within subgroup) 4. Systematic (every x person – from an algorithm) 5. Convenience sample (BIAS – call ins, voluntary response) 4. Sampling Errors (introducing bias) 1. Interviewer error (intimidation, misrepresentation), Data entry errors (typos), Question wording (inflammatory wording), Non-response (small percentage returning survey), Under-representation (incomplete frame) 5. Observational Study -- data is collected, but no controls by researcher 6. Experiments – treatments are varied to isolate effects on variable of interest 1. Completely randomized design 2. Matched-pairs design (before and after; same test subject with both) 3. Randomized block design (Blocking removes variability) 1. Blocking must be homogenous within the group 2. Examples: Male v Female, Age groups, by dog breed 4. Response variable, experimental factors (treatments) 5. Only way to establish causation 6. Confounding and Extraneous (or lurking) variable 7. Three main areas of experimental design (goals to minimize variability) 1. Randomization (random assignment: balancing out unknown variables) 2. Replication (within experiment: the number receiving each treatment – makes distinguishing effects of the treatment easy to see) 3. Control (Appling treatments to subjects; often uses control group – no treatment) 8. Blindness in an experiment 1. Single – the person receiving the treatment does not know which treatment 2. Double – neither the person receiving or the person giving the treatment knows which treatment is used 3. Placebo Effect (patient favorable response to any treatment) 9. Simulation 1. Describe it in such a simple way that a middle school student can repeat the simulation 1. Scheme: how to use the random numbers (remember 00) 2. Stopping Rule: how long to we do the simulation; what stops us 3. Duplicate Numbers: how do we handle duplicate numbers 2. Run simulation and make sure the middle schooler can follow how you used the random number listing Stats I Review Sheet Data Analysis 1. Quantitative – numerical data where addition and subtraction has meaning 2. Qualitative – categorical data or numerical data where add/sub has no meaning 3. Discrete – variable that takes on integer or finite values 4. Continuous – variable that can take on any value within a specified range 5. Data graphs – Distributions (CUSS or SOCS) 1. Shape (uniform, bell, skewed – right or left) 2. Center (mean, median, mode) 3. Spread (range, variance, standard deviation, IQR) 4. Outliers or Unusual characteristics (bi modal) 6. Graphs 1. Histograms (bars connect) 2. Bar graph (gaps between bars) 3. Dot plots (can see Distribution shape) 4. Stem and Leaf plot (can see Distribution Shape) 5. Relative frequency (percentage; can be cumulative as well) 6. Cumulative plots (always increases!) 7. Measures of central tendency (center; mean not resistant to outliers) 1. Mean – uniform or bell shaped data (sample: x-bar population, μ) 2. Median – skewed data 3. Mode – categorical data 8. Measures of Dispersion (first three are not resistant to outliers) 1. Range 2. Variance (sample, s population, σ) 3. Standard Deviation (square root of variance) 4. IQR 9. Boxplots (can also see distribution shape from) 1. IQR (Interquartile range) = Q3 – Q1 2. Upper Fence = Q3 + 1.5 IQR 3. Lower Fence = Q1 – 1.5 IQR 4. Outliers beyond upper or lower fence 10. Z-score = (x – μ) / σ (number of standard deviations away from the mean) Normal Curve 1. Symmetric mound shape distribution – written as: Height ~ N(,) 2. Mean = Median = Mode 3. Standard Normal (Z in table A) has mean of 0 and standard deviation of 1 4. Empirical Rule (68-95-99.7) – data within standard deviations from mean 5. Can estimate standard deviation from 99.7 part (6 standard deviations) 6. Normality plot – can assume normal if it looks linear; skewed otherwise Stats I Review Sheet Probability 1. Probability = outcome of interest / total possible outcomes 0 ≤ P(E) ≤ 1 ∑Pi = 1 P(E) = 0 impossible P(E) = 1 certainty 1 – P(E) = P(EC) 2. Unusual event P(E) < 0.05 or less than 5% 3. Mutually exclusive – disjoint P(E or F) = P(E) + P(F) 4. General Addition Rule: P(E or F) = P(E) + P(F) – P(E and F) 5. Independent Events – first event doesn’t affect the second event 6. Multiplication Rule for Independent Events P(E and F) = P(E) ∙ P(F) 7. At- Least probabilities: P(at least) = 1 – P(at least complement) 8. Conditional Probability Rule and using tables to determine 9. General Multiplication Rule 10. Mean of Discrete RV = ∑E∙P(E) Variance of Discrete RV = ∑E²∙P(E) – μ² 11. Discrete RV Distributions 1. Uniform (like one dice) 2. Binomial (x successes in n trials) 1. Performed fixed number of trials 2. Each trial independent 3. Two mutually exclusive outcomes (success or failure) 4. P(success) is same for each trial • Mean: np Std Dev: np(1-p) 3. Geometric (n trials before first success) • Mean: 1/p Std Dev: (1-p)/p² 4. Hypergeometric (P(success changes each trial) 5. Negative Binomial (n trials before r-th success) 6. Poisson (time or distance) 12. Continuous RV Probability = area under the curve 0 ≤ P(E) ≤ 1 ∑area under curve = 1 P (specific value) = 0 1 – P(E) = P(EC) 13. Continuous RV Density Functions 1. Uniform (arrival times) 2. Normal probability density function (PDF) 1. P(X = specific value) = 0 (hence don’t use PDF on calculator) 2. Inflection points at μ +/- σ 3. Student t-distribution (similar to normal, but more area in tails) 4. Chi-square distribution (positive only skewed right) (not on exam) 14. Probability of an Event is area under curve 15. Linear combinations of Random variables: Y = aX + b or Z = X Y 1. Addition shifts mean, but does not change variance (or Stnd Dev) 2. Multiplication affects mean and variance 1. E(Y) = aE(X) var (Y) = a²V(X) (Y)=a(X) 3. Adding or subtracting two random variables – always add variances! Stats I Review Sheet Sampling Distributions 1. Sampling distribution of the means 1. Variability is decreased by sample size: standard error of the mean σ x-bar = σ/n 2. Mean = x-bar and σ x-bar = σ/n 3. Central Limit Theorem: regardless of shape of population, sampling distribution of x-bar becomes apx normal as sample size, n, increases 1. Rule of thumb; n ≥ 30 2. Sampling distribution of population proportion, p-hat 1. Requirements 1. SRS 2. 0.1N n (to keep from being Hypergeometric) 3. np 10 and n(1-p) 10 (to use normal to estimate binomial) 2. Mean = p and σp=(p(1-p)/n) Calculator Reminders 1. Enter data in L1 and Enter probabilities in L2 2. Use 1-VarStats to get summary statistics (x-bar, sx (standard deviation), Q1, median (Q2), Q3, range) 3. Use 1-VarStats L1, L2 to get E(X) and (X) (remember V(X) = ²(X)) 4. STATPLOT Reminders: 1. 3rd graph type is a histogram 2. 4th graph type is a box-plot with outliers 3. 6th graph type is a normality plot 4. Use ZOOM-9 5. 2nd VARS – Distributions to get to Normal, invNorm, Binomial and Geometric 6. Use Normalcdf to calculate normal probabilities (area under normal curve) 1. Use –E99 for negative infinity 2. Use E99 for positive infinity 3. Standard Error (/n) can be calculated by normalcdf if plugged in 7. invNorm gives the Z or X value associated with a percentile 8. Remember binomial and geometric are discrete (watch < > and ≤ ≥) 1. P(X < 14) is same as P(X ≤ 13) 2. Complement of P(X ≥ 9) is 1 – P(X ≤ 8) 9. Use Binomialpdf for X = # 10. Use Binomialcdf for X ≤ # (Use complement rule for X > #) 11. Syntax (from Catalog Help – make sure its turned on!) Normalcdf(LB, UB, , ) invNorm(area, , ) Binomialpdf(n, p, x) Binomial cdf(n, p, x) [P(X ≤ x)] Geometcdf(p, x)