Download Part I Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Inductive probability wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

History of statistics wikipedia , lookup

Gibbs sampling wikipedia , lookup

Central limit theorem wikipedia , lookup

Law of large numbers wikipedia , lookup

Transcript
Part I Data
1.1 Examining distributions.
※ Categorical and Quantitative variables.
※ Displaying distribution with graphs.
Categorical variables:1.Bar graphs and Pie charts、2. Pareto charts.
Quantitative variables:1.Histographs、2.Stemplot.
※ Symmetric and skewed distributions.
Symmetric
Skewed to the right
Skewed to the left
1.2 Describing distributions with numbers.
※ The Mean x .
n
xi
x1 x 2  x n 
i
1
If the n observations are x1 , x 2 ,, x n , their mean is x 

n
n
※ The Median M.
The median M is the midpoint of a distribution, the number such that half the observations
are smaller and the other half are larger. To find the median of a distribution:
1. Arrange all observations in order of size, from smallest to largest.
2. If the number of observations n is odd, M X n1 
 
2 
X n X n
3. If the number of observations n is even, M 

 1 
2 

2 
2
※ The standard deviation S
x x 
n
The variance of n observations x1 , x 2 ,  , x n , is S 2  i 1
2
i
n 1
x x 
n
The standard deviation S is the square root of the variance S2: S 
i
1
2
i
n 1
1.3 The Normal distribution.
※ The 68%, 95%, 99.7% rule.
In the Normal distribution with mean  and standard deviation :
1. 68% of the observations fall within  of the mean .
2. 95% of the observations fall within 2  of the mean .
3. 99.7% of the observations fall within 3  of the mean .
※ The standard Normal distribution is the Normal distribution N(0,1) .
X~N( , 2), Z=
X 
~ N (0,1)

※ Normal quantile plot or Normal probability plots
1.1 Arrange the observed data values from smallest to largest.
1.2 Record what percentile of the data each value occupies.
2.1 Find the dame percentiles for the Normal distribution using Table A or statistical software.
2.2 Percentiles of the standard Normal distribution are often called Normal scores.
3.1 Plot each data point x against the corresponding Normal score z.
※ If the data distribution is close the any Normal distribution, the plotted points will lie close
to some straight line.
※ Correlation r
The correlation measures the direction and strength of the linear relationship between two
quantitative variables.
Correlation is usually written as r.
Suppose that we have data on variables x and y for n individuals. The values for the first
individual are x1 and y1, the values for the second individual are x2 and y2, and so on. The
means and standard deviations of the two variables are x and Sx for x-values, and y
and Sy for the y-values. The correlation r between x and y is:
y i y 
x x 
1 n 


i

r


S 
n 1 i 1 
S
x
y




※ r measures only straight –line relationships.
※ r>0 for a positive association.
r<0 for a negative association.
※ Correlation always satisfies 1 r 1 and indicates the strength of a relationship by
how close it is to 1 or 1. Perfect correlation, r 1 , occurs only when the points on a
scatter plot lie exactly on a straight line.
※ Correlation igores the distribution between explanatory and response variables. The
values of r is not affected by changes in the unit of measurement of either variable.
※ A binomial experiment possesses the following properties:
1. The experiment consists of a fixed number, n, of identical trials.
2. Each trial results in one of two outcomes. We will call one outcome success, S, and the
other failure, F.
3. The probability of success on a single trial is equal to some value p and remains the
same from trial to trail. The probability of a failure is equal to q (1 p ) .
4. The trials are independent.
5. The random variable of interest is Y, the number of successes observed during the n trials.
※ A random variable X is said to have a binomial distribution based on n trials with success
probability p if and only if:
p ( x) C xn p x (1 p ) n x , x 0,1,2,  , n , and 0 p 1 , X ~ bin(n, p )
Ex. Suppose that a lot of 5000 electrical fuses contains 5% defectives. If a sample of five
fuses is tested, find the probability of observing at least one defective. X ~ bin(5000,0.05)
P(at least one defective )=P 
X 11 P
X 0
= 1 C 05 p 0 (1 p ) 5 1 (0.95) 5 =0.226
※ Let X ~ bin(n, p ) . Then E ( X ) np, 2 Var ( X ) npq .
※ A random variable Y is said to have a geometric probability distribution if and only if:
p( y ) q y1 p, y 1,2,3,  , n , 0 p 1 .
If Y is a random variable with a geometric distribution, E (Y ) 1 / p,
1 p
2 Var (Y )  2
p
Ex. If the probability of engine malfunction during any 1-hour period is p =0.02 and Y
denote the number of 1-hour intervals until the first malfunction, find the mean and standard
deviation of Y.
Y ~ Geo( p 0.02) , E (Y ) 1 / p 1 / 0.02 50 , Var (Y ) q / p 2 0.98 / 0.0004 2450 ,
 Var (Y )  2450 49.497
※ A random variable Y is said to have a negative binomial probability distribution if and
only if p( y ) C ry11 p r q y r , y r , r 1, r 2,  , 0 p 1 .
If Y ~ Nbin( p, r ), then E (Y ) r / p, 2 Var (Y ) r (1 p) / p 2
※ A random variable Y is said to have a Poisson probability distribution if and only if:
y
P( y )  e y , y 0,1,2,  , >0
y!
Ex. Suppose that Y ~ bin(20,0.1), find the exact value of P (Y 3) . Using the table of
binomial probabilities. To approximate this probability, using a corresponding probability
given by the Poisson distribution.
P (Y 3) 0.867 , W ~ Poisson(np 20 0.1 2) , P (W 3) 0.857
Thus you can see that the Poisson approximation is quite good, yielding a value that differs
from the exact value by only 0.01
※ If X is a random variable possessing a Poisson distribution with parameter , then
E (Y )  and 2 Var (Y ) 
※ Chebysheff’
s Theorem
Let X be a random variable with mean  and finite variance 2 . Then, for any constant
1
1
k >0, P X  k
1  2 or P X  k
 2
k
k