Download Theory - Courses

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Inductive probability wikipedia , lookup

Foundations of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Law of large numbers wikipedia , lookup

Gibbs sampling wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
AMS7: MIDTERM
REVIEW
Chapters 1-6
Tuesday May 5, 2015
Introduction
• Important Definitions:
- Data
- Statistic
- A Population
- A census
- A sample
Types of Data
• Parameter (Describing a characteristic of the Population)
• Statistic (Describing a characteristic of the Sample)
-QUALITATIVE DATA (Categorical or Attribute Data)
-QUANTITATIVE DATA:
- Discrete
- Continuous
Levels of Measurement:
- Nominal
- Ordinal
- Interval
- Ratio
Design of Experiments
• An observational study (don’t attempt to modify the
subjects)
• An experiment (treatment group vs. control group)
Types of Observational Studies:
• Cross-sectional
• Retrospective (or case-control)
• Prospective (or longitudinal or cohort)
Problems
• Confounding (confusion of variables effects)
How to solve this problem?:
• Blinding (placebo effect, single-blind, double-blind)
• Blocking
• Randomization:
• Completely randomized design
• Randomized block design
Sampling strategies
• Random sample
• Simple random sample (assumed throughout the book)
• Systematic sampling
• Convenience sampling
• Stratified sampling
• Cluster sampling
• Sampling Error :difference between sampling result and
the true population result
• Non-sampling error: Sample data incorrectly collected
Important characteristics of data
• Center
• Variation
• Distribution
• Outliers
• Time
Frequency distribution
• Counts of data values individually or by groups of
intervals
• Other forms:
• Relative frequency distribution (divide each class frequency by the
total of all frequencies)
• Cumulative frequency distribution (cumulative totals)
• Histogram: Graphical representation of the frequency
distribution
Other graphs
• Relative frequency histogram
• Frequency polygon
• Dotplots
• Steam-Leaf plots
• Pareto chart
• Pie Charts
• Scatter Diagrams
• Time series graphs
Examples: Histogram and Scatter plot
Measures of center
• Sample Mean: ̅ =
∑
• Median: Middle value
• Mode: Most frequent value
• Bimodal
• Multimodal
• No mode
• Midrange=
Skewed distributions
Measures of Variation
• Range= (Maximum value-Minimum value)
• Sample standard deviation: Variation from the mean
=
∑( − ̅ )
−1
• Population standard deviation:
=
∑( − )
• Sample variance:
∑
(
−
̅
)
=
−1
• Population Variance: =
∑()మ
Measures of Variation (Cont.)
• Sample Coefficient of Variation: = . 100%
̅
• Population coefficient of variation: =
. 100%
Range Rule of Thumb
≈
4
• Minimum usual value: (mean)-2 x (standard deviation)
• Maximum usual value: (mean)+2 x (standard deviation)
Rules of data with Bell-Shaped distribution
• About 68% of all values fall within 1 standard deviation of
the mean
• About 95% of all values fall within 2 standard deviations of
the mean
• About 99.7% of all values fall within 3 standard deviations
of the mean
Z Scores
• Sample
• Population
− ̅
=
−
=
Ordinary values: -2≤ z score≤2
Unusual values: z score < -2 or z score> 2
Quartiles and Percentiles
• Quartiles: Separate a data set into four parts
• Q1 (First): Separates bottom 25% of the sorted values from the top
75%
• Q2 (Second): Same as the median
• Q3 (Third): Separates bottom 75% of the sorted values from the top
25%
• Percentiles: Separate the data into 100 parts (P1, P2, …,
P99)
Percentile value of x=
. 100
• Intercuartile range= Q3-Q1
Boxplots
Probability
• Definitions:
• An event
• A simple event
• The Sample Space
• Notation
• P: Probability
• A,B and C: specific events
• P(A): Probability of event A occurring
Definitions of Probability
• Frequency approximation: P(A)=
• Classical Approach: P(A)=
=
• Subjective Probability
• LAW OF LARGE NUMBERS: A procedure is repeated
many times. Relative frequency probability tends to the
actual probability
Properties of probability
• Probability of an impossible event is 0
• Probability of an event that is certain is 1
• For any event A, 0≤P(A)≤1
• P(Complement of event A)=P(̅) = 1 − ()
• Addition Rule:
P(A or B)=P(in a single trial, event A occurs or event B
occurs or they both occur)= P(A)+P(B)-P(A and B)
Or
P(A∪B)= P(A)+P(B)-P(A∩B)
Events A and B are disjoint if P(A∩B)=0
Multiplication Rule
• P(A and B)=P(event A occurs in the first trial and event B
occurs in a second trial)
• = . • Independent events: P(B|A)=P(B)
• If A and B are independent: = . ()
• Conditional probability:
=
()
()
Bayes Theorem
• =
.(|)
. [ ̅ . ̅ ]
Probability distributions
• Definitions:
• Random Variable (x): Numerical value given to an outcome of a
procedure. Example: Number Mountain lions seen at UCSC
campus last year
• Probability distribution (P(x)): Gives the probability to each value of
the random variable.
• Types of random variables:
• Discrete
• Continuous
Requirements of a Probability distribution
• ∑ = 1(Discrete case)
• 0≤P(x)≤1
• Expected value of a discrete random variable
= [. ]
Discrete Distributions:
• Binomial
• Poisson
Binomial distribution
• Requirements:
• Fixed number of trials
• Trials are independent
• Each trial can be a success or a failure
• Probabilities remain constant
• Random variable: x=number of successes among n trials
• =
!
. . !!
(You can also use the Binomial
Table)
• n= number of trials
• p=probability of success in one trial
• q=probability of failure in one trial (q=1-p)
Mean ,Variance and Standard deviation of the
Binomial distribution
• Mean: = • Variance: = • Standard deviation:
• Maximum usual value: + 2
• Minimum usual value: − 2
Poisson distribution
• Requirements:
• Random variable x is the number of occurrences of an event over
some interval
• The occurrences must be random
• The occurrences must be independent
. =
!
The Poisson distribution only depends on (the mean of
the process)
Mean, Variance and Standard deviation of the
Poisson distribution
• Mean: • Variance: • Standard deviation: =
• Maximum usual value: + 2
• Minimum usual value: − 2
Continuous distributions
• Uniform distribution
• Normal distribution
• Density curve: Graph of a continuous distribution
• Properties:
• Area below the curve is equal to 1
• All points in the curve are greater or equal than zero
Uniform and Normal distributions
Sampling distributions
• Variation of the value of a statistics from sample to
sample: Sampling variability
• Sampling distribution of the sample mean
• Sampling distribution of the sample proportion
CENTRAL LIMIT THEOREM:
• The random variable x has a distribution (normal or not)
with mean and standard deviation • The distribution of the sample means will approach to a
normal distribution as the sample size increases.
Mean and standard deviation of the
sample mean
• Mean: ̅ = • Standard deviation: ̅ =
Normal approximation to the Binomial
• If np≥5 and nq≥5 a Binomial random variable x can be
approximated with a Normal distribution with mean and
standard deviation:
• Mean: = • Use Continuity Correction
• Standard deviation:
Confidence Interval for the Population
Proportion (p)
• p=population proportion
• ̂ =
=
sample proportion of successes
•
= 1- ̂ = sample proportion of failures
Procedure to build a CI of confidence level
(1-ߙ)⨯100% for p
1) Check the normal approximation to the Binomial
distribution (np≥5 and nq≥5 )
2) Get the critical value /
3) Evaluate the margin of error: = / .
4) Confidence Interval:
• ̂ − < < ̂ + • ̂ ± • (̂ − , ̂ + )
5) Interpret results
⁄
Sample size for estimating proportion p
• ̂ is given:
[
/ ] ̂ =
• ̂ is not given: (̂ is assumed = 0.5)
[
/ ] 0.25
=
Finding (point estimate) and E from the
Confidence Interval
Point estimate:
• ̂ =
!
!! ("
!
!!)
Margin of Error:
• E=
!
!! ("
!
!!)
Confidence Interval for the Population
Mean (ߪ known)
• Check Requirements:
• Sample is a simple random sample
• Population standard deviation is known
• Population is normally distributed or n>30
• Procedure
1) Check normality requirements
2) Get the critical value /
Evaluate the margin of error: = / . 4) Confidence Interval:
3)
• ̅ − < < ̅ + • ̅ ± • (̅ − ,̅ + )
5) Interpret results
Sample size for estimating Mean
=
# . Values of , ഀ⁄మ and E are given.
Confidence Interval for the Population
Mean (ߪ unknown)
• In this case we use the Student t distribution with n-1 degrees of
freedom to find the critical value
• Check Requirements:
• Sample is a simple random sample
• Population standard deviation is estimated by s (sample standard dev.)
• Population is normally distributed or n>30
• Procedure
1) Check normality requirements
2) Get the critical value / with n-1 degrees of freedom
3)
4)
Evaluate the margin of error: = / . Confidence Interval:
• ̅ − < < ̅ + • ̅ ± • (̅ − ,̅ + )
5) Interpret results
Finding point estimate and E from Confidence
Interval
Point estimate of ߤ:
!"!#!$ + ("% !"!#!$)
̅ =
2
Margin of Error
• E=
!
!! ("
!
!!)