Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Location: Chemistry 001 Time: Friday, Nov 20, 7pm to 9pm o Make Copies of: o Areas under the Normal Curve o Appendix B.1, page 784 o (Student’s t distribution) o Appendix B.2, page 785 o Binomial Probability Distribution o Appendix B.9, pages 794,798 n x p( x) P( X x) p (1 p) n x x Constructing a Frequency Distribution Decide on the number of classes to group data into. 2k greater than n, where k=number of classes, & n=number of observations. Ex.: if we have 80 observations, we should use 7 classes (26=64<80; 27=128>80) Rule: Determine the class interval or width. It should be the same for all classes. Rule H L i K Constructing a Frequency Distribution Count the number of items in each class. The number of items in each class is called the class frequency. Selling Prices (in 000) 15 up to 18 18 up to 21 21 up to 24 24 up to 27 27 up to 30 30 up to 33 33 up to 36 Total Frequency 8 23 17 18 8 4 2 80 Graphic Presentation Histogram Frequency Polygon Cumulative Frequency Polygon Histogram 23 Number of Vehicles 25 20 17 18 15 10 8 8 4 5 2 0 1/1/00 Selling Price Frequency Polygon Frequencies A Frequency Polygon consists of line segments connecting the points formed by the class midpoint and the class frequency 40 35 30 25 20 15 10 5 0 13.5 16.5 19.5 22.5 25.5 28.5 31.5 34.5 37.5 Selling Price Cumulative Frequency Distribution A Cumulative Frequency Distribution is used to determine how many or what proportion of the data values are below or above a certain value. Selling Prices (inFrequency 000) Cmltv Frqcy 15 up to 18 8 8 18 up to 21 23 31 8+23 21 up to 24 17 48 8+23+17 24 up to 27 18 66 27 up to 30 8 74 30 up to 33 4 78 33 up to 36 2 80 Total 80 Cumulative Frequency Polygon Number of vehicles sold 80 70 60 50 40 30 20 10 0 100% 75% 50% 25% 0 15 18 21 24 27 Selling Price ($000) 30 33 36 Chapter 3 Measures of Location o Arithmetic mean X N o Weighted mean ( w1 X 1 w2 X 2 ... wn X n ) Xw ( w1 w2 ...wn ) o Median o Mode o If Median<Mean then +vly skewed. o If Median>Mean then –vly skewed Measures of Dispersion Dispersion refers to the spread or variability in the data. o Range o Mean deviation |X X | MD o Variance Population Var 2 ( X ) N o Standard deviation 2 , n Sample Var S2 2 ( X X ) n 1 Mean of Grouped Data (from a frequency distribution) f X M S f (M X ) n f is the frequency in each class M is the midpoint of each class n is the total number of frequencies n 1 2 Chapter 4 Percentiles Location of a percentile: P Lp ( n 1) 100 n = number of observations P = desired percentile Percentiles Location of a percentile: P Lp ( n 1) 100 n = number of observations P = desired percentile o Q1=L25, Q2=Median=L50, Q3=L75. o 70th Decile = L70. Example: 43, 61, 75, 91, 101, 104 L25=(6+1)25/100=1.75 (1.75th position) or .75 of distance between 1st and 2nd observation= 43 + (61-43)(0.75)=56.5. Box Plot L=13, H=30, Q1=15 , Q2=18 , Q3=22. Outlier if value > Q3 + 1.5(Q3-Q1) , or if value <Q1 – 1.5(Q3-Q1), where Q3-Q1 is the inter-quartile range. Q1 Median Q3 L 12 H 14 16 18 20 22 24 26 28 30 32 Ex: Using the twelve stock prices, we find the mean to be 84.42, standard deviation, 7.18, median, 84.5. Coefficient of variation s CV (100%) = 8.5% X Coefficient of skewness 3 (X Median) sk s = -.035 Chapter 5 Classical Probability Based on the assumption that the outcomes of an experiment are equally likely. Probability of an event= Number of favorable outcomes / Total number of possible outcomes. We roll a die. What is the probability of the event “an even # appears face up”? Example: Possible outcomes are:1,2,3,4,5,6.(6) Favorable outcomes are:2,4,6.(3) Probability of an even number=3/6 =.5 Empirical Probability Based on relative frequency. Probability of an event= Number of times event occurred in the past / Total number of observations Example: What is the probability of a future space shuttle mission being successful, given that 2 out of the last 113 missions ended with a disaster? Probability of a successful mission= Number of successful flights / Total number of flights. P(A)= 111 / 113= .98 Conditional Probability A conditional probability is the probability of a particular event occurring, given that another event has occurred. The probability of the event A given that the event B has occurred is written P(A|B). Two events A and B are independent if the occurrence of one has no effect on the probability of the occurrence of the other P(A|B)=P(A) or P(B|A)=P(B). Rules for Computing Probabilities Addition Rule Rules of Addition Special Rule of Addition - If two events A and B are mutually exclusive, the probability of one or the other event’s occurring equals the sum of their probabilities. P(A or B) = P(A) + P(B) The General Rule of Addition - If A and B are two events that are not mutually exclusive, then P(A or B) is given by the following formula: P(A or B) = P(A) + P(B) - P(A and B) A B A and B Joint Probability of A and B Contingency Table: Example: The Dean of the School of Business at Owens University collected the following information about undergraduate students in her college Major Accounting Male 170 Female 110 Total 280 Finance 120 100 220 Marketing 160 70 230 Management 150 120 270 Total 600 400 1000 Chapter 6 Constructing a PDF and a CDF for a Discreet Random Variable Example: Toss a coin three times and let X be the number of heads. What is the PDF and CDF of X? Outcome Prob. X HHH 1/8 3 HHT 1/8 2 HTH 1/8 2 HTT 1/8 1 THH 1/8 2 THT 1/8 1 TTH 1/8 1 TTT 1/8 0 x P(X = x) F(x)=P(X ≤ x) 0 1/8 1/8 1 3/8 =1/8+3/8=1/2 2 3/8 =1/2+3/8=7/8 3 1/8 1 Expected Value (Mean) Mathematically: The expected value (or mean) of a RV X is µ = E(X) = xp(x) all x Sometimes Additivity: write µX E(X + Y) = E(X) + E(Y) Variance and Standard Deviation A measure of the variability of a RV is its Variance To compute the variance of a discrete RV X Compute µ For each possible x, compute (x – µ)2 p(x) Add up these values It helps to construct a table In a formula: σ 2 Var(X) (x μ)2 p(x) all x OR σ 2 Var(X) x 2 p(x) 2 all x Standard Deviation (SD): σ Var(X) Variance and Standard Deviation Consider µ the pdf of the random variable: x 0 1 2 3 p(x) 1/8 3/8 3/8 1/8 = 3/2 Var(X) = (0 – 3/2)2(1/8) + (1 – 3/2)2 (3/8) + (2 – 3/2)2 (3/8) + (3 – 3/2)2 (1/8) = 3/4 What is the CDF at 2 = F(2)=P(X=2)+P(X=1)+P(X=0)=7/8 The Binomial Distribution Let X be the number of “successes” in n independent “trials,” each with success probability p, Such an X is a Binomial R.V. with parameters n and p n x p( x) P( X x) p (1 p) n x x where n n! x x!(n x)! n is the number of trials x is the number of observed successes, x=0…n p is the probability of success on each trial What is the mean and variance of a Binomial Random Variable? In the book: the probability p is denoted by π, p( x) n Cx (1 ) x n x where n! n Cx x!(n x)! The Binomial Distribution An important part of understanding probability/statistics is recognizing a “binomial situation” Binomial example Number n = number of items, p = probability of a product being defective Number of students in this class who are in senior year n = number of students in this class, p = probability of a student being a senior. Number of defective products in a sample of items. of no-shows for a flight n = number of passengers, p = probability of a no show flight Number of times next week I’ll get stuck in traffic on my way to school n = number of work days per week, p = probability I get stuck in traffic Chapter 7 Continuous Probability Distributions For Discrete RV X, the pdf is given by p(x)=P(X=x) for all possible values of x. For a Continuous RV X, P(X=x)=0 for all values of x. Example: If X is the amount of time you wait in line at Starbucks then P(X=30.567… seconds)=0. The pdf of a continuous RV is represented by a function p(x) for all values of x where the area under p(x) is 1. The Uniform and Normal Distributions are commonly used Continuous Distributions. Uniform Distribution The simplest distribution for a continuous random variable. Rectangular in shape, constant (uniform) height Defined by minimum and maximum values a and b. Areas within the distribution represent probabilities Example: Time to fly on MEA from Beirut to Paris ranges from 4 hrs to 5hrs. Random variable is flight time; it is continuous. P(x) A continuous Uniform Distribution 1/(b-a) a b x Uniform Distribution Mean: ab 2 SD: (b a ) 2 12 Height: 1 if a≤ x ≤b, P( x ) ba 0 elsewhere. The Standard Normal Distribution The standard normal distribution is a normal distribution with a mean of 0 and a standard deviation of 1. It is also called the z distribution. A z-value is the signed distance between a selected value, designated X, and the mean µ, divided by the standard deviation, σ. The formula is: z X The Normal Distribution We want to know the area under the curve between the mean, 283, and 285.4 grams. Or P(283< X <285.4) We convert the x values into z values z value for 283: z= (x-μ)/σ= (283-283)/1.6 = 0 z value for 285.4: z= (285.4-283)/1.6 = 1.5 P(283< weight <285.4) = P(0<z<1.5) = The area under the curve, between 0.00 and 1.5 = 0.4332 The Normal Distribution What is the value of X for which 5% will be larger than X. What is the value of X for which 95% will fall below. We obtain z from Appendix B.1, z=1.65 We convert to the x value, x= σ z+ μ. Chapters 8 and 9 Sampling Error Samples are used to estimate population characteristics. Unlikely sample mean (standard dev.) equals to population mean (standard dev.) Error made in estimating the population mean based on the sample? Definition: Difference between a sample statistic and its corresponding population parameter Example: output of each employee: 97,103,96,99,105 units. Select samples of two and find their mean Sample1: {97,105} with mean = 101 Sampling Error = X 101 100 1 Sample2: {103,96} with mean = 100 Sampling Error = X 99.5 100 0.5 Sampling errors are random and occur by chance. To make accurate predictions based on sample results, we need to first develop sampling distributions of the sample means. Sampling: Distribution of the Sample Mean (Sigma Known) o If a population follows the normal distribution, the sampling distribution of the sample mean will also follow the normal distribution. o To determine the probability a sample mean falls within a particular region, use: X z n Note that: n is called the Standard Error of the Mean. Sampling: Distribution of the Sample Mean (Sigma Unknown) o If the population does not follow the normal distribution, but the sample is of at least 30 observations, the sample means will follow the normal distribution. o To determine the probability a sample mean falls within a particular region, use: X z s n Point Estimate Definition: The statistic computed from sample information and used to estimate the population parameter. Examples: Sample mean, X is a point estimate for the population mean, µ Sample standard error s is a point estimate of population standard deviation σ Sample proportion p is a point estimate of population proportion π Confidence Interval Confidence interval equations, CI Eq1: X z CI Eq2: s X z n CI Eq3: s X t n n When to Use the t Distribution Is the population normal? No Assume Normal and go through the flow chart again Yes Is the population SD known? Is n 30 or more? Eq3 No Use a nonparametric test Yes Use the z distribution Eq2 or Eq1 No Use t if n less than or equal to 30, Use z if n is more than 30 Eq2 Yes Eq1 Use the z distribution Sample Size for Estimating Population Mean zs n E 2 n is the sample size; z is the standard normal value corresponding to the desired level of confidence; s is an estimate of the population SD; E is the maximum allowable error (1/2 length of the CI). If the result is not a whole number, round up. Standard Error of the Sample Proportion p p(1 p) n Confidence Interval for a Population Proportion p(1 p) pz n Sample Size for the Population Proportion Three items need to be specified: 1. The desired level of confidence. 2. The margin of error in the population proportion. 3. An estimate of the population proportion. z n p (1 p ) E 2 If an estimate of π is not available, use p=0.5 to approximately estimate the sample size. Finite-Population Correction Factor If the population size N is not very large, then we use a population correction factor when computing the CI. If (n/N > 0.05) then use : s X t n N n N 1 N n N 1 X z n , OR s X z n N n N 1 Finite-Population Correction Factor If the population size N is not very large, then we use a population correction factor when computing the CI. If (n/N > 0.05) then use : p(1 p) N n pz n N 1 Material NOT Included in the Midterm o o o Chapter 2: o Chebyzhev’s Theorem o Geometric Mean Chapter 4: o Software Coefficient of Skewness o Stem-and-Leaf Displays Chapter 5: o Permutation Equation Material NOT Included in the Midterm o o Chapter 6: o Hypergeometric Probability Distribution o Poisson Probability Distribution o Covariance Chapter 7: o The Normal Approximation to the Binomial