Download REVIEW: Midterm Exam

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Inductive probability wikipedia , lookup

Foundations of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Law of large numbers wikipedia , lookup

Gibbs sampling wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
REVIEW:
Midterm Exam
Spring 2012
Introduction
•
-
Important Definitions:
Data
Statistics
A Population
A census
A sample
Types of Data
• Parameter (Describing a characteristic of the Population)
• Statistic (Describing a characteristic of the Sample)
-QUALITATIVE DATA (Categorical or Attribute Data)
-QUANTITATIVE DATA:
- Discrete
- Continuous
Levels of Measurement:
-
Nominal
Ordinal
Interval
Ratio
Design of Experiments
• An observational study (don’t attempt to modify the subjects)
• An experiment (treatment group vs. control group)
Types of Observational Studies:
• Cross-sectional
• Retrospective (or case-control)
• Prospective (or longitudinal or cohort)
Problems
• Confounding (confusion of variables effects)
How to solve this problem?:
• Blinding (placebo effect, single-blind, double-blind)
• Blocking
• Randomization:
• Completely randomized design
• Randomized block design
Sampling strategies
•
•
•
•
•
•
Random sample
Simple random sample (assumed throughout the book)
Systematic sampling
Convenience sampling
Stratified sampling
Cluster sampling
• Sampling Error :difference between sampling result and the
true population result
• Nonsampling error: Sample data incorrectly collected
Important characteristics of
data
•
•
•
•
•
Center
Variation
Distribution
Outliers
Time
Frequency distribution
• Counts of data values individually or by groups of intervals
• Other forms:
• Relative frequency distribution (divide each class frequency by
the total of all frequencies)
• Cumulative frequency distribution (cumulative totals)
• Histogram: Graphical representation of the frequency
distribution
Other graphs
•
•
•
•
•
•
•
•
Relative frequency histogram
Frequency polygon
Dotplots
Steam-Leaf plots
Pareto chart
Pie Charts
Scatter Diagrams
Time series graphs
Examples: Histogram and
Scatter plot
Measures of center
• Sample Mean: ̅ =
∑
• Median: Middle value
• Mode: Most frequent value
• Bimodal
• Multimodal
• No mode
• Midrange=
Skewed distributions
Measures of Variation
• Range= (Maximum value-Minimum value)
• Sample standard deviation: Variation from the mean
=
∑( − ̅ )
−1
• Population standard deviation:
=
• Sample variance:
∑( − )
∑
(
−
̅
)
=
−1
• Population Variance: =
∑()మ
Measures of Variation (Cont.)
• Sample Coefficient of Variation: = . 100%
̅
• Population coefficient of variation: =
. 100%
Range Rule of Thumb
≈
4
• Minimum usual value: (mean)-2 x (standard deviation)
• Maximum usual value: (mean)+2 x (standard deviation)
Rule of data with Bell-Shaped
distribution
• About 68% of all values fall within 1 standard deviation of the
mean
• About 95% of all values fall within 2 standard deviations of the
mean
• About 99.7% of all values fall within 3 standard deviations of
the mean
Z Scores
• Sample
• Population
− ̅
=
−
=
Ordinary values: -2≤ z score≤2
Unusual value: z score < -2 or z score> 2
Quartiles and Percentiles
• Quartiles: Separate a data set into four parts
• Q1 (First): Separates bottom 25% of the sorted values from the
top 75%
• Q2 (Second): Same as the median
• Q3 (Third): Separates bottom 75% of the sorted values from the
top 25%
• Percentiles: Separate the data into 100 parts (P1, P2, …, P99)
Percentile value of x=
. 100
• Intercuartile range= Q3-Q1
Boxplots
Probability
• Definitions:
• An event
• A simple event
• The Sample Space
• Notation
• P: Probability
• A,B and C: specific events
• P(A): Probability of event A occurring
Definitions of Probability
• Frequency approximation: P(A)=
=
• Classical Approach: P(A)=
• Subjective Probability
• LAW OF LARGE NUMBERS: A procedure is repeated many
times. Relative frequency probability tends to the actual
probability
Properties of probability
•
•
•
•
Probability of an impossible event is 0
Probability of an event that is certain is 1
For any event A, 0≤P(A)≤1
P(Complement of event A)=P(̅) = 1 − ()
• Addition Rule:
P(A or B)=P(in a single trial, event A occurs or event B occurs or
they both occur)= P(A)+P(B)-P(A and B)
Or
P(A∪B)= P(A)+P(B)-P(A∩B)
Events A and B are disjoint if P(A∩B)=0
Multiplication Rule
• P(A and B)=P(event A occurs in the first trial and event B
occurs in a second trial)
• = . • Independent events: P(B|A)=P(B)
• If A and B are independent: = . ()
• Conditional probability:
=
(
)
()
Bayes Theorem
• =
.(|)
. [ ̅ . ̅ ]
Probability distributions
• Definitions:
• Random Variable (x): Numerical value given to an outcome of a
procedure. Example: Number Mountain lions seen at UCSC
campus last year
• Probability distribution (P(x)): Gives the probability to each value
of the random variable.
• Types of random variables:
• Discrete
• Continuous
Requirements of a Probability
distribution
• ∑ = 1(Discrete case)
• 0≤P(x)≤1
• Expected value of a discrete random variable
= [. ]
Discrete Distributions:
• Binomial
• Poisson
Binomial distribution
• Requirements:
•
•
•
•
Fixed number of trials
Trials are independent
Each trial can be a success or a failure
Probabilities remain constant
• Random variable: x=number of successes among n trials
• =
. . (You can also use the Binomial
!!
Table)
• n= number of trials
• p=probability of success in one trial
• q=probability of failure in one trial (q=1-p)
!
Mean ,Variance and Standard deviation
of the Binomial distribution
• Mean: = • Variance: = • Standard deviation: • Maximum usual value: + 2
• Minimum usual value: − 2
Poisson distribution
• Requirements:
• Random variable x is the number of occurrences of an event over
some interval
• The occurrences must be random
• The occurrences must be independent
. =
!
The Poisson distribution only depends on (the mean of the
process)
Mean, Variance and Standard deviation
of the Poisson distribution
• Mean: • Variance: • Standard deviation: • Maximum usual value: + 2
• Minimum usual value: − 2
Continuous distributions
• Uniform distribution
• Normal distribution
• Density curve: Graph of a continuous distribution
• Properties:
• Area below the curve is equal to 1
• All points in the curve are greater or equal than zero
Uniform and Normal
distributions
Sampling distributions
• Variation of the value of a statistics from sample to sample:
Sampling variability
• Sampling distribution of the sample mean
• Sampling distribution of the sample proportion
CENTRAL LIMIT THEOREM:
• The random variable x has a distribution (normal or not) with
mean and standard deviation • The distribution of the sample means will approach to a
normal distribution as the sample size increases.
Mean and standard deviation
of the sample mean
• Mean: ̅ = • Standard deviation: ̅ =
Normal approximation to the
Binomial
• If np≥5 and nq≥5 a Binomial random variable x can be
approximated with a Normal distribution with mean and
standard deviation:
• Mean: = • Standard deviation: Confidence Interval for the Population
Proportion (p)
• p=population proportion
• ̂ = = sample proportion of successes
• = 1- ̂ = sample proportion of failures
Procedure to build a CI of
confidence level (11) Check the normal approximation to the Binomial
distribution (np≥5 and nq≥5 )
2) Get the critical value /
3) Evaluate the margin of error: = / .
4) Confidence Interval:
• ̂ − < < ̂ + • ̂ ± • (̂ − , ̂ + )
5) Interpret results
!⁄
Sample size for estimating
proportion p
• ̂ is given:
[/ ] ̂ =
• ̂ is not given: (̂ is assumed = 0.5)
[/ ] 0.25
=
Finding (point estimate) and E from
the Confidence Interval
Point estimate:
• ̂ =
"
#
"## ($"
#
"##)
Margin of Error:
• E=
"
#
"## ($"
#
"##)
Confidence Interval for the
Population Mean (
• Check Requirements:
• Sample is a simple random sample
• Population standard deviation is known
• Population is normally distributed or n>30
• Procedure
1) Check normality requirements
2) Get the critical value /
3) Evaluate the margin of error: = / . 4) Confidence Interval:
• ̅ − < < ̅ + • ̅ ± • (̅ − ,̅ + )
5) Interpret results
Sample size for estimating
Mean
=
% . Values of , ഀ⁄మ and E are given.
Confidence Interval for the
Population Mean (
• In this case we use the Student t distribution with n-1 degrees of
freedom
• Check Requirements:
• Sample is a simple random sample
• Population standard deviation is estimated by s (sample standard dev.)
• Population is normally distributed or n>30
• Procedure
1) Check normality requirements
2) Get the critical value ఈ/ଶ with n-1 degrees of freedom
3) Evaluate the margin of error: = ఈ/ଶ . ௦ ௡
4) Confidence Interval:
• ̅ − < < ̅ + • ̅ ± • (̅ − ,̅ + )
5) Interpret results
Finding point estimate and E from
Confidence Interval
Point estimate of ߤ:
!"#"$"% + (# & !"#"$"%)
̅ =
2
Margin of Error
• E=
"
#
"## ($"
#
"##)