Download Sample (statistic: x-bar, s, p-hat)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Stats I Review Sheet
Design of Experiments (DOE)
1. Population (parameters: , , ρ) - all of a group (also a census)
2. Sample (statistic: x-bar, s, p-hat) – a subset of a group
3. Sampling Techniques
1. Simple random sample – SRS (a subjects equal likely to be chosen)
2. Stratified sample (similar to blocking – SRS within subgroups)
3. Cluster sample (SRS to groups – census within subgroup)
4. Systematic (every x person – from an algorithm)
5. Convenience sample (BIAS – call ins, voluntary response)
4. Sampling Errors (introducing bias)
1. Interviewer error (intimidation, misrepresentation), Data entry errors (typos),
Question wording (inflammatory wording), Non-response (small percentage
returning survey), Under-representation (incomplete frame)
5. Observational Study -- data is collected, but no controls by researcher
6. Experiments – treatments are varied to isolate effects on variable of interest
1. Completely randomized design
2. Matched-pairs design (before and after; same test subject with both)
3. Randomized block design (Blocking removes variability)
1. Blocking must be homogenous within the group
2. Examples: Male v Female, Age groups, by dog breed
4. Response variable, experimental factors (treatments)
5. Only way to establish causation
6. Confounding and Extraneous (or lurking) variable
7. Three main areas of experimental design (goals to minimize variability)
1. Randomization (random assignment: balancing out unknown variables)
2. Replication (within experiment: the number receiving each treatment – makes
distinguishing effects of the treatment easy to see)
3. Control (Appling treatments to subjects; often uses control group – no treatment)
8. Blindness in an experiment
1. Single – the person receiving the treatment does not know which treatment
2. Double – neither the person receiving or the person giving the treatment knows
which treatment is used
3. Placebo Effect (patient favorable response to any treatment)
9. Simulation
1. Describe it in such a simple way that a middle school student can repeat the
simulation
1. Scheme: how to use the random numbers (remember 00)
2. Stopping Rule: how long to we do the simulation; what stops us
3. Duplicate Numbers: how do we handle duplicate numbers
2. Run simulation and make sure the middle schooler can follow how you used the
random number listing
Stats I Review Sheet
Data Analysis
1. Quantitative – numerical data where addition and subtraction has meaning
2. Qualitative – categorical data or numerical data where add/sub has no meaning
3. Discrete – variable that takes on integer or finite values
4. Continuous – variable that can take on any value within a specified range
5. Data graphs – Distributions (CUSS or SOCS)
1. Shape (uniform, bell, skewed – right or left)
2. Center (mean, median, mode)
3. Spread (range, variance, standard deviation, IQR)
4. Outliers or Unusual characteristics (bi modal)
6. Graphs
1. Histograms (bars connect)
2. Bar graph (gaps between bars)
3. Dot plots (can see Distribution shape)
4. Stem and Leaf plot (can see Distribution Shape)
5. Relative frequency (percentage; can be cumulative as well)
6. Cumulative plots (always increases!)
7. Measures of central tendency (center; mean not resistant to outliers)
1. Mean – uniform or bell shaped data (sample: x-bar population, μ)
2. Median – skewed data
3. Mode – categorical data
8. Measures of Dispersion (first three are not resistant to outliers)
1. Range
2. Variance (sample, s
population, σ)
3. Standard Deviation (square root of variance)
4. IQR
9. Boxplots (can also see distribution shape from)
1. IQR (Interquartile range) = Q3 – Q1
2. Upper Fence = Q3 + 1.5 IQR
3. Lower Fence = Q1 – 1.5 IQR
4. Outliers beyond upper or lower fence
10. Z-score = (x – μ) / σ (number of standard deviations away from the mean)
Normal Curve
1. Symmetric mound shape distribution – written as: Height ~ N(,)
2. Mean = Median = Mode
3. Standard Normal (Z in table A) has mean of 0 and standard deviation of 1
4. Empirical Rule (68-95-99.7) – data within  standard deviations from mean
5. Can estimate standard deviation from 99.7 part (6 standard deviations)
6. Normality plot – can assume normal if it looks linear; skewed otherwise
Stats I Review Sheet
Probability
1.
Probability = outcome of interest / total possible outcomes 0 ≤ P(E) ≤ 1
∑Pi = 1
P(E) = 0 impossible P(E) = 1 certainty 1 – P(E) = P(EC)
2.
Unusual event P(E) < 0.05 or less than 5%
3.
Mutually exclusive – disjoint P(E or F) = P(E) + P(F)
4.
General Addition Rule: P(E or F) = P(E) + P(F) – P(E and F)
5.
Independent Events – first event doesn’t affect the second event
6.
Multiplication Rule for Independent Events P(E and F) = P(E) ∙ P(F)
7.
At- Least probabilities: P(at least) = 1 – P(at least complement)
8.
Conditional Probability Rule and using tables to determine
9.
General Multiplication Rule
10. Mean of Discrete RV = ∑E∙P(E) Variance of Discrete RV = ∑E²∙P(E) – μ²
11. Discrete RV Distributions
1.
Uniform (like one dice)
2.
Binomial (x successes in n trials)
1.
Performed fixed number of trials
2.
Each trial independent
3.
Two mutually exclusive outcomes (success or failure)
4.
P(success) is same for each trial
•
Mean: np
Std Dev: np(1-p)
3.
Geometric (n trials before first success)
•
Mean: 1/p
Std Dev: (1-p)/p²
4.
Hypergeometric (P(success changes each trial)
5.
Negative Binomial (n trials before r-th success)
6.
Poisson (time or distance)
12. Continuous RV Probability = area under the curve
0 ≤ P(E) ≤ 1
∑area under curve = 1
P (specific value) = 0
1 – P(E) = P(EC)
13. Continuous RV Density Functions
1.
Uniform (arrival times)
2.
Normal probability density function (PDF)
1.
P(X = specific value) = 0 (hence don’t use PDF on calculator)
2.
Inflection points at μ +/- σ
3.
Student t-distribution (similar to normal, but more area in tails)
4.
Chi-square distribution (positive only skewed right) (not on exam)
14. Probability of an Event is area under curve
15. Linear combinations of Random variables: Y = aX + b or Z = X  Y
1.
Addition shifts mean, but does not change variance (or Stnd Dev)
2.
Multiplication affects mean and variance
1.
E(Y) = aE(X)
var (Y) = a²V(X)
(Y)=a(X)
3.
Adding or subtracting two random variables – always add variances!
Stats I Review Sheet
Sampling Distributions
1.
Sampling distribution of the means
1.
Variability is decreased by sample size:
standard error of the mean σ x-bar = σ/n
2.
Mean = x-bar and σ x-bar = σ/n
3.
Central Limit Theorem: regardless of shape of population, sampling
distribution of x-bar becomes apx normal as sample size, n, increases
1.
Rule of thumb; n ≥ 30
2.
Sampling distribution of population proportion, p-hat
1.
Requirements
1.
SRS
2.
0.1N  n (to keep from being Hypergeometric)
3.
np  10 and n(1-p)  10 (to use normal to estimate binomial)
2.
Mean = p and σp=(p(1-p)/n)
Calculator Reminders
1.
Enter data in L1 and Enter probabilities in L2
2.
Use 1-VarStats to get summary statistics (x-bar, sx (standard deviation), Q1,
median (Q2), Q3, range)
3.
Use 1-VarStats L1, L2 to get E(X) and (X) (remember V(X) = ²(X))
4.
STATPLOT Reminders:
1.
3rd graph type is a histogram
2.
4th graph type is a box-plot with outliers
3.
6th graph type is a normality plot
4.
Use ZOOM-9
5.
2nd VARS – Distributions to get to Normal, invNorm, Binomial and Geometric
6.
Use Normalcdf to calculate normal probabilities (area under normal curve)
1.
Use –E99 for negative infinity
2.
Use E99 for positive infinity
3.
Standard Error (/n) can be calculated by normalcdf if plugged in
7.
invNorm gives the Z or X value associated with a percentile
8.
Remember binomial and geometric are discrete (watch < > and ≤ ≥)
1.
P(X < 14) is same as P(X ≤ 13)
2.
Complement of P(X ≥ 9) is 1 – P(X ≤ 8)
9.
Use Binomialpdf for X = #
10. Use Binomialcdf for X ≤ #
(Use complement rule for X > #)
11. Syntax (from Catalog Help – make sure its turned on!)
Normalcdf(LB, UB, , )
invNorm(area, , )
Binomialpdf(n, p, x)
Binomial cdf(n, p, x) [P(X ≤ x)]
Geometcdf(p, x)