* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Part I Data
Survey
Document related concepts
Transcript
Part I Data 1.1 Examining distributions. ※ Categorical and Quantitative variables. ※ Displaying distribution with graphs. Categorical variables:1.Bar graphs and Pie charts、2. Pareto charts. Quantitative variables:1.Histographs、2.Stemplot. ※ Symmetric and skewed distributions. Symmetric Skewed to the right Skewed to the left 1.2 Describing distributions with numbers. ※ The Mean x . n xi x1 x 2 x n i 1 If the n observations are x1 , x 2 ,, x n , their mean is x n n ※ The Median M. The median M is the midpoint of a distribution, the number such that half the observations are smaller and the other half are larger. To find the median of a distribution: 1. Arrange all observations in order of size, from smallest to largest. 2. If the number of observations n is odd, M X n1 2 X n X n 3. If the number of observations n is even, M 1 2 2 2 ※ The standard deviation S x x n The variance of n observations x1 , x 2 , , x n , is S 2 i 1 2 i n 1 x x n The standard deviation S is the square root of the variance S2: S i 1 2 i n 1 1.3 The Normal distribution. ※ The 68%, 95%, 99.7% rule. In the Normal distribution with mean and standard deviation : 1. 68% of the observations fall within of the mean . 2. 95% of the observations fall within 2 of the mean . 3. 99.7% of the observations fall within 3 of the mean . ※ The standard Normal distribution is the Normal distribution N(0,1) . X~N( , 2), Z= X ~ N (0,1) ※ Normal quantile plot or Normal probability plots 1.1 Arrange the observed data values from smallest to largest. 1.2 Record what percentile of the data each value occupies. 2.1 Find the dame percentiles for the Normal distribution using Table A or statistical software. 2.2 Percentiles of the standard Normal distribution are often called Normal scores. 3.1 Plot each data point x against the corresponding Normal score z. ※ If the data distribution is close the any Normal distribution, the plotted points will lie close to some straight line. ※ Correlation r The correlation measures the direction and strength of the linear relationship between two quantitative variables. Correlation is usually written as r. Suppose that we have data on variables x and y for n individuals. The values for the first individual are x1 and y1, the values for the second individual are x2 and y2, and so on. The means and standard deviations of the two variables are x and Sx for x-values, and y and Sy for the y-values. The correlation r between x and y is: y i y x x 1 n i r S n 1 i 1 S x y ※ r measures only straight –line relationships. ※ r>0 for a positive association. r<0 for a negative association. ※ Correlation always satisfies 1 r 1 and indicates the strength of a relationship by how close it is to 1 or 1. Perfect correlation, r 1 , occurs only when the points on a scatter plot lie exactly on a straight line. ※ Correlation igores the distribution between explanatory and response variables. The values of r is not affected by changes in the unit of measurement of either variable. ※ A binomial experiment possesses the following properties: 1. The experiment consists of a fixed number, n, of identical trials. 2. Each trial results in one of two outcomes. We will call one outcome success, S, and the other failure, F. 3. The probability of success on a single trial is equal to some value p and remains the same from trial to trail. The probability of a failure is equal to q (1 p ) . 4. The trials are independent. 5. The random variable of interest is Y, the number of successes observed during the n trials. ※ A random variable X is said to have a binomial distribution based on n trials with success probability p if and only if: p ( x) C xn p x (1 p ) n x , x 0,1,2, , n , and 0 p 1 , X ~ bin(n, p ) Ex. Suppose that a lot of 5000 electrical fuses contains 5% defectives. If a sample of five fuses is tested, find the probability of observing at least one defective. X ~ bin(5000,0.05) P(at least one defective )=P X 11 P X 0 = 1 C 05 p 0 (1 p ) 5 1 (0.95) 5 =0.226 ※ Let X ~ bin(n, p ) . Then E ( X ) np, 2 Var ( X ) npq . ※ A random variable Y is said to have a geometric probability distribution if and only if: p( y ) q y1 p, y 1,2,3, , n , 0 p 1 . If Y is a random variable with a geometric distribution, E (Y ) 1 / p, 1 p 2 Var (Y ) 2 p Ex. If the probability of engine malfunction during any 1-hour period is p =0.02 and Y denote the number of 1-hour intervals until the first malfunction, find the mean and standard deviation of Y. Y ~ Geo( p 0.02) , E (Y ) 1 / p 1 / 0.02 50 , Var (Y ) q / p 2 0.98 / 0.0004 2450 , Var (Y ) 2450 49.497 ※ A random variable Y is said to have a negative binomial probability distribution if and only if p( y ) C ry11 p r q y r , y r , r 1, r 2, , 0 p 1 . If Y ~ Nbin( p, r ), then E (Y ) r / p, 2 Var (Y ) r (1 p) / p 2 ※ A random variable Y is said to have a Poisson probability distribution if and only if: y P( y ) e y , y 0,1,2, , >0 y! Ex. Suppose that Y ~ bin(20,0.1), find the exact value of P (Y 3) . Using the table of binomial probabilities. To approximate this probability, using a corresponding probability given by the Poisson distribution. P (Y 3) 0.867 , W ~ Poisson(np 20 0.1 2) , P (W 3) 0.857 Thus you can see that the Poisson approximation is quite good, yielding a value that differs from the exact value by only 0.01 ※ If X is a random variable possessing a Poisson distribution with parameter , then E (Y ) and 2 Var (Y ) ※ Chebysheff’ s Theorem Let X be a random variable with mean and finite variance 2 . Then, for any constant 1 1 k >0, P X k 1 2 or P X k 2 k k