Download REVIEW: Midterm Exam

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Inductive probability wikipedia, lookup

Foundations of statistics wikipedia, lookup

Confidence interval wikipedia, lookup

Bootstrapping (statistics) wikipedia, lookup

History of statistics wikipedia, lookup

Taylor's law wikipedia, lookup

Law of large numbers wikipedia, lookup

Gibbs sampling wikipedia, lookup

Student's t-test wikipedia, lookup

Transcript
REVIEW:
Midterm Exam
Spring 2012
Introduction
•
-
Important Definitions:
Data
Statistics
A Population
A census
A sample
Types of Data
• Parameter (Describing a characteristic of the Population)
• Statistic (Describing a characteristic of the Sample)
-QUALITATIVE DATA (Categorical or Attribute Data)
-QUANTITATIVE DATA:
- Discrete
- Continuous
Levels of Measurement:
-
Nominal
Ordinal
Interval
Ratio
Design of Experiments
• An observational study (don’t attempt to modify the subjects)
• An experiment (treatment group vs. control group)
Types of Observational Studies:
• Cross-sectional
• Retrospective (or case-control)
• Prospective (or longitudinal or cohort)
Problems
• Confounding (confusion of variables effects)
How to solve this problem?:
• Blinding (placebo effect, single-blind, double-blind)
• Blocking
• Randomization:
• Completely randomized design
• Randomized block design
Sampling strategies
•
•
•
•
•
•
Random sample
Simple random sample (assumed throughout the book)
Systematic sampling
Convenience sampling
Stratified sampling
Cluster sampling
• Sampling Error :difference between sampling result and the
true population result
• Nonsampling error: Sample data incorrectly collected
Important characteristics of
data
•
•
•
•
•
Center
Variation
Distribution
Outliers
Time
Frequency distribution
• Counts of data values individually or by groups of intervals
• Other forms:
• Relative frequency distribution (divide each class frequency by
the total of all frequencies)
• Cumulative frequency distribution (cumulative totals)
• Histogram: Graphical representation of the frequency
distribution
Other graphs
•
•
•
•
•
•
•
•
Relative frequency histogram
Frequency polygon
Dotplots
Steam-Leaf plots
Pareto chart
Pie Charts
Scatter Diagrams
Time series graphs
Examples: Histogram and
Scatter plot
Measures of center
• Sample Mean: ̅ =
∑
• Median: Middle value
• Mode: Most frequent value
• Bimodal
• Multimodal
• No mode
• Midrange=
Skewed distributions
Measures of Variation
• Range= (Maximum value-Minimum value)
• Sample standard deviation: Variation from the mean
=
∑( − ̅ )
−1
• Population standard deviation:
=
• Sample variance:
∑( − )
∑
(
−
̅
)
=
−1
• Population Variance: =
∑()మ
Measures of Variation (Cont.)
• Sample Coefficient of Variation: = . 100%
̅
• Population coefficient of variation: =
. 100%
Range Rule of Thumb
≈
4
• Minimum usual value: (mean)-2 x (standard deviation)
• Maximum usual value: (mean)+2 x (standard deviation)
Rule of data with Bell-Shaped
distribution
• About 68% of all values fall within 1 standard deviation of the
mean
• About 95% of all values fall within 2 standard deviations of the
mean
• About 99.7% of all values fall within 3 standard deviations of
the mean
Z Scores
• Sample
• Population
− ̅
=
−
=
Ordinary values: -2≤ z score≤2
Unusual value: z score < -2 or z score> 2
Quartiles and Percentiles
• Quartiles: Separate a data set into four parts
• Q1 (First): Separates bottom 25% of the sorted values from the
top 75%
• Q2 (Second): Same as the median
• Q3 (Third): Separates bottom 75% of the sorted values from the
top 25%
• Percentiles: Separate the data into 100 parts (P1, P2, …, P99)
Percentile value of x=
. 100
• Intercuartile range= Q3-Q1
Boxplots
Probability
• Definitions:
• An event
• A simple event
• The Sample Space
• Notation
• P: Probability
• A,B and C: specific events
• P(A): Probability of event A occurring
Definitions of Probability
• Frequency approximation: P(A)=
=
• Classical Approach: P(A)=
• Subjective Probability
• LAW OF LARGE NUMBERS: A procedure is repeated many
times. Relative frequency probability tends to the actual
probability
Properties of probability
•
•
•
•
Probability of an impossible event is 0
Probability of an event that is certain is 1
For any event A, 0≤P(A)≤1
P(Complement of event A)=P(̅) = 1 − ()
• Addition Rule:
P(A or B)=P(in a single trial, event A occurs or event B occurs or
they both occur)= P(A)+P(B)-P(A and B)
Or
P(A∪B)= P(A)+P(B)-P(A∩B)
Events A and B are disjoint if P(A∩B)=0
Multiplication Rule
• P(A and B)=P(event A occurs in the first trial and event B
occurs in a second trial)
• = . • Independent events: P(B|A)=P(B)
• If A and B are independent: = . ()
• Conditional probability:
=
(
)
()
Bayes Theorem
• =
.(|)
. [ ̅ . ̅ ]
Probability distributions
• Definitions:
• Random Variable (x): Numerical value given to an outcome of a
procedure. Example: Number Mountain lions seen at UCSC
campus last year
• Probability distribution (P(x)): Gives the probability to each value
of the random variable.
• Types of random variables:
• Discrete
• Continuous
Requirements of a Probability
distribution
• ∑ = 1(Discrete case)
• 0≤P(x)≤1
• Expected value of a discrete random variable
= [. ]
Discrete Distributions:
• Binomial
• Poisson
Binomial distribution
• Requirements:
•
•
•
•
Fixed number of trials
Trials are independent
Each trial can be a success or a failure
Probabilities remain constant
• Random variable: x=number of successes among n trials
• =
. . (You can also use the Binomial
!!
Table)
• n= number of trials
• p=probability of success in one trial
• q=probability of failure in one trial (q=1-p)
!
Mean ,Variance and Standard deviation
of the Binomial distribution
• Mean: = • Variance: = • Standard deviation: • Maximum usual value: + 2
• Minimum usual value: − 2
Poisson distribution
• Requirements:
• Random variable x is the number of occurrences of an event over
some interval
• The occurrences must be random
• The occurrences must be independent
. =
!
The Poisson distribution only depends on (the mean of the
process)
Mean, Variance and Standard deviation
of the Poisson distribution
• Mean: • Variance: • Standard deviation: • Maximum usual value: + 2
• Minimum usual value: − 2
Continuous distributions
• Uniform distribution
• Normal distribution
• Density curve: Graph of a continuous distribution
• Properties:
• Area below the curve is equal to 1
• All points in the curve are greater or equal than zero
Uniform and Normal
distributions
Sampling distributions
• Variation of the value of a statistics from sample to sample:
Sampling variability
• Sampling distribution of the sample mean
• Sampling distribution of the sample proportion
CENTRAL LIMIT THEOREM:
• The random variable x has a distribution (normal or not) with
mean and standard deviation • The distribution of the sample means will approach to a
normal distribution as the sample size increases.
Mean and standard deviation
of the sample mean
• Mean: ̅ = • Standard deviation: ̅ =
Normal approximation to the
Binomial
• If np≥5 and nq≥5 a Binomial random variable x can be
approximated with a Normal distribution with mean and
standard deviation:
• Mean: = • Standard deviation: Confidence Interval for the Population
Proportion (p)
• p=population proportion
• ̂ = = sample proportion of successes
• = 1- ̂ = sample proportion of failures
Procedure to build a CI of
confidence level (11) Check the normal approximation to the Binomial
distribution (np≥5 and nq≥5 )
2) Get the critical value /
3) Evaluate the margin of error: = / .
4) Confidence Interval:
• ̂ − < < ̂ + • ̂ ± • (̂ − , ̂ + )
5) Interpret results
!⁄
Sample size for estimating
proportion p
• ̂ is given:
[/ ] ̂ =
• ̂ is not given: (̂ is assumed = 0.5)
[/ ] 0.25
=
Finding (point estimate) and E from
the Confidence Interval
Point estimate:
• ̂ =
"
#
"## ($"
#
"##)
Margin of Error:
• E=
"
#
"## ($"
#
"##)
Confidence Interval for the
Population Mean (
• Check Requirements:
• Sample is a simple random sample
• Population standard deviation is known
• Population is normally distributed or n>30
• Procedure
1) Check normality requirements
2) Get the critical value /
3) Evaluate the margin of error: = / . 4) Confidence Interval:
• ̅ − < < ̅ + • ̅ ± • (̅ − ,̅ + )
5) Interpret results
Sample size for estimating
Mean
=
% . Values of , ഀ⁄మ and E are given.
Confidence Interval for the
Population Mean (
• In this case we use the Student t distribution with n-1 degrees of
freedom
• Check Requirements:
• Sample is a simple random sample
• Population standard deviation is estimated by s (sample standard dev.)
• Population is normally distributed or n>30
• Procedure
1) Check normality requirements
2) Get the critical value ఈ/ଶ with n-1 degrees of freedom
3) Evaluate the margin of error: = ఈ/ଶ . ௦ ௡
4) Confidence Interval:
• ̅ − < < ̅ + • ̅ ± • (̅ − ,̅ + )
5) Interpret results
Finding point estimate and E from
Confidence Interval
Point estimate of ߤ:
!"#"$"% + (# & !"#"$"%)
̅ =
2
Margin of Error
• E=
"
#
"## ($"
#
"##)