Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Survey

Document related concepts

Transcript

3/30/2009 Random variable The outcome of each procedure is determined by chance. Probability Distributions Normal Probability Distribution N Chapter 6 Discrete Random variables takes on a countable number of values (i.e. there are gaps between values). SPECIAL Discrete Random variables •Binomial distribution (Sections 5.3, 5.4) •Geometric distribution •Hypergeometric distribution •Poisson distribution (Section 5.5) Continuous Random variables there are an infinite number of values the random variable can take, and they are densely packed together (i.e. there are no gaps between values) SPECIAL Continuous Random variables •Normal distribution •Exponential distribution •Uniform distribution 2 Binomial distribution TI-83 Binomial Probability Fixed number of trials Press 2nd VARS. There are only two possible outcomes: success or failure Select the option 0:binompdf(. The trials are independent Complete the entry to obtain binompdf(n, p, x), with the The probabilities of success and failure are the remain the same appropriate values substituted in. Example: recording the genders of children in 250 families. Example: What is the probability of getting exactly 2 heads when 4 tosses are made? The mean is Solution: Using the TI-83 with binompdf(4, 0.5, 2), it µ = np The standard deviation is follows that the probability for getting 2 heads on 4 throws is 0.375. σ = np(1 − p) = npq 3 4 Continuous Random Variables Poisson distribution Continuous sample spaces contain an infinite number of events. They typically are intervals of possible, continuously-distributed outcomes. The random variable is the number of occurrences of some events over an interval. Used for describing the behavior of rare events Ex.: Select ANY number between 0 and 1. Number of industrial accidents per month in a manufacturing plant. Number of people arriving at a checkout in a day What is the sample space? S = { all numbers between 0 and 1} Ex.: Drink ANY volume of water from a 32-ounce bottle. What is the sample space? S = { 0 – 32 ounce} Number of eagles nesting in a region Number of patients arriving at an emergency room The occurrences must be random and independent of each other, and uniformly distributed over the interval. µ , and the standard deviation is σ = The mean is 5 n 6 1 3/30/2009 Special Continuous Probability Distributions Continuous Random Variables A continuous probability distribution function for a random variable X is a continuous function with the property that the area below the graph of the function between any two points a and b equals the probability that a ≤ X ≤ b. Uniform distribution Remember, AREA = PROPORTION = PROBABILITY 7 Normal distribution Exponential distribution 8 Uniform Distribution 1. Equally Likely Outcomes 1 b−a a +b 2 σ= Describes Time or Distance Between Events 2. b a 3. Normal Distribution λ = 0.5 X Parameters Examples of normal random variables A and B have the same center, but different standard deviations (shape). A and C have the same standard deviations (shape), but different means (shifted). f ( x) = λ = 2.0 µ = λ1 , σ = λ1 b−a 12 f(X) Density Function f(X) f ( x ) = λ e − λx x Mean Median 3. Mean & Standard Deviation µ= 1. f( x) 1 b−a 2. Probability Density f ( x) = Exponential Distribution testosterone level of male students head circumference of adult females length of middle finger of Math 225 ( x − µ) 2 1 exp − 2σ 2 σ 2π B students A C test scores in Math 225 height of all kindergarten kids at a school X 11 12 2 3/30/2009 Characteristics of normal distribution Bell-shaped curve Symmetric, bell-shaped curve. Shape of curve depends on population mean µ 0.08 Mean = 70 SD = 5 0.07 and standard deviation σ. Density 0.06 Center of distribution is µ. Spread is determined by σ. Most values fall around the mean, but some 0.05 0.04 Mean = 70 SD = 10 0.03 values are smaller and some are larger. STANDARD NORMAL DISTRIBUTION: Mean: µ = 0 Standard deviation: σ =1 0.02 0.01 0.00 40 50 60 70 80 90 100 Grades 13 14 Infinite Number of Tables Probabilities for Normal Distributions Probability is area under curve! curve! Normal distributions differ by mean & standard deviation. ? d f(x) P(c ≤ x ≤ d) = ∫ f (x) dx c c d f(X) Each distribution would require its own table. x X 15 To find probability follow these steps: Standardize the Normal Distribution X −µ Z= σ Normal Distribution Draw the normal distribution and shade the area of interest Find the standardized score (z-score) for the x−µ given x. Standardized Normal Distribution z= σ=1 σ σ Find the probability using the z-table or µ X µ= 0 One table! calculator Z 18 3 3/30/2009 To find x from given area follow these steps TI-83, 84: DISTR 2:normalcdf( upper-tail: normalcdf(z,9999) lower-tail: normalcdf(-9999,z) Between part: normalcdf(z1,z2) Draw and shade Find the LOWER tail probability INSIDE the table, and read off the corresponding zscore. OR: use DISTR3:invNorm( To find x use the formula: Probability student scores higher than 75? 0.07 0.05 Density Density 0.06 P(X > 75) 0.04 0.03 0.02 0.08 0.08 0.07 0.07 0.06 0.06 0.05 Density 0.08 0.04 0.03 P(X < 65) 0.02 0.01 0.00 65 70 75 80 x = z ⋅σ + µ 0.00 85 Grades 0.03 0.01 0.00 60 P(65 < X < 70) 0.04 0.02 0.01 55 0.05 55 65 75 85 Grades 55 60 65 70 75 80 85 Grades 19 20 Parameter versus statistic Example Sample: the part of Population: the entire group of individuals in which we are interested but can’t usually assess directly. the population we actually examine and for which we do have data. The Environmental Protection Agency took soil samples at A statistic is a number describing a characteristic of a sample. We often use a statistic to estimate an unknown population parameter. A parameter is a number describing a characteristic of the population. Parameters are usually unknown. 21 20 locations near a former industrial waste dump and checked each for evidence of toxic chemicals. They found no elevated levels of any harmful substances. Population: ALL the soil near the waste dump Sample: the 20 soil samples Parameter: mean level of toxic chemicals in the ground around the waste dump Statistic: the mean level of toxic chemicals in the 20 soil samples 22 Notation Variable of interest: Quantitative Then we are interested in Then we are interested in PROPORTION Notation: Population parameter: p Sample statistic : p$ 23 Sampling Variability Variable of interest: Categorical When we take many samples, the statistics from the samples are usually different from the population figures, and also different from what we got in the first sample. This very intuitive idea, that sample results change from sample to sample, is called sampling variability. MEAN Notation: Population parameter: Sample statistic: x µ 24 4 3/30/2009 Comments Sampling Distributions 1. Parameters are usually unknown, because it is impractical or impossible to know exactly what values a variable takes for every member of the population. 2. Statistics are computed from the sample, and vary from sample to sample due to sampling variability. 25 The sampling distribution is a distribution of a sample statistic in infinite number of samples. 26 OK, we have the sampling distribution of the sample means. Then what? Sampling distribution of the sample mean, x Sampling distribution of x Histogram of some sample averages 27 Sampling distributions, like data distributions, are best described by shape, center, and spread. 28 Mean and standard error of the sampling distribution of the sample means Shape, Center, and Spread Shape: Many, but not all, sampling Suppose that x is the mean of an SRS of size n drawn from a large population with mean μ and standard deviation σ. Then the sampling distribution of x has distributions are approximately normal. Center: The mean will be denoted by µ with a subscript to indicate which sampling distribution is being discussed. For example, the mean of the sampling distribution of the mean is represented by the symbol µ X . (The mean of the sample means.) Spread: the standard deviation of the sampling distribution of the sample means and is σ X 29 mean µx = µ and standard deviation σx = σ n 30 5 3/30/2009 For any population with mean µ and standard deviation σ: Mean of a sampling distribution of The mean, or center of the sampling distribution of x , is equal to the population mean µ. There is no tendency for a sample mean to fall The standard deviation of the sampling distribution is σ/√n, where n is the sample size. Sampling distribution of x systematically above or below µ, even if the distribution of the raw data is skewed. Thus, the x mean of the sampling distribution of x is an unbiased estimator of the population mean μ —it σ/√n 31 will be “correct on average” in many samples. 32 µ Standard error of a sampling distribution of Generating Sampling Distributions x The standard deviation of the sampling 1. distribution measures how much the sample 2. statistic x varies from sample to sample. It is 3. smaller than the standard deviation of the 4. Take a random sample of a fixed size n from a population. Compute the summary statistics (mean, proportion). Repeat steps 1 and 2 many times. Display the distribution of the summary statistics. population by a factor of √n. Averages are less variable than individual observations. 33 34 Example The results from the 1000 samples Extensive studies have found that the DMS odor 1st SRS of size 10: x = 36, s = 3.2 2nd SRS of size 10: x = 22.8, s = 2.7 x = 30.4, s = 4.1 threshold of adults follows a roughly normal distribution with mean µ =25 micrograms per liter and standard deviation σ =7 micrograms per liter. With this information, we can simulate many runs of our study with different subjects drawn at random from the population. We take 1000 samples of size 10, find the 1000 sample mean thresholds x , and make a histogram of these 1000 values. 35 3rd SRS of size 10: M 1000th SRS of size 10: x = 28.9, s = 2.1 36 6 3/30/2009 The sampling distribution of the statistic For normally distributed populations x. 100 When a variable in a population is normally distributed, then the sampling distribution of x for all possible samples of size n is also normally distributed. 90 Shape: looks normal. 80 Frequency 70 60 Center: the mean of the 1000 x ‘s is 25.073. 50 40 µ x = 25.073 30 20 10 0 20 25 30 35 C1 The distribution is centered very close to the population mean µ = 25 Spread: the standard error of the 1000 smaller than the standard deviation σ = x 7 ‘s is 2.191, notably of the population. 37 If the population is N(µ,σ), then the sample means distribution is N(µ,σ/√n ). Population 38 IQ scores: population vs. sample μ Application σ In a large population of adults, the mean IQ is 112 with standard deviation 16. Suppose 100 adults are randomly selected for a market research campaign. Hypokalemia is diagnosed when blood potassium levels are low, below 3.5mEq/dl. Let’s assume that we know a patient whose measured potassium levels vary daily according to a normal distribution N(µ = 3.8, σ = 0.2). n If only one measurement is made, what's the probability that this patient will be misdiagnosed hypokalemic? The distribution of the sample mean IQ is A) exactly normal, mean 112, standard deviation 16. B) approximately normal, mean 112, standard deviation 16. z= C) approximately normal, mean 112 , standard deviation 1.6. D) approximately normal, mean 112, standard deviation 4 . z= Population distribution: N (µ = 112; σ = 16) Sampling distribution for n = 200 is N (µ = 112; σ /√n = 1.6) 39 40 σ = 3.5 − 3.8 0.2 z = −1.5, P(z < −1.5) = 0.0668 ≈ 7% ( x − µ ) 3.5 − 3.8 = σ n 0.2 4 z = −3, P(z < −1.5) = 0.0013 ≈ 0.1% Note: Make sure to standardize (z) using the standard deviation for the sampling distribution. The Central Limit Theorem VERY IMPORTANT!!! But… Not all variables are normally distributed. When randomly sampling from any Income is typically strongly skewed for example. Is (x − µ) If instead measurements are taken on four separate days, what is the probability of such a misdiagnosis? C) approximately normal, mean 112, standard deviation 1.6. population with mean µ and standard x still a good estimator of µ then? deviation σ, when n is large enough, the sampling distribution of The Central Limit Theorem will rescue x is approximately normal: N(µ, σ/√n). us! 41 Sample means 42 7 3/30/2009 Central Limit Theorem The Central Limit Theorem guarantees that a distribution of sample mean to be approximately normal as long as the sample size is large enough. We will depend on the Central Limit Theorem again and again in order to take advantage of normal probability calculations when we use sample mean to draw conclusions about population mean, even if the population distribution is not normal. 43 44 Comments The central limit theorem There is no requirement on the shape of the population distribution. This is where the strength of the Central Limit Theorem lies. It tells us that regardless of the shape of the population distribution, averages that are based on a large enough sample will have a normal distribution. Population with strongly skewed distribution Sampling distribution of x for n = 2 observations Sampling distribution of x for n = 10 observations Sampling distribution of for n = 25 observations x http://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/index.html 46 45 Data from a Normal distribution Assessing Normality Data from a right-skewed distribution Data from a left-skewed distribution v A normal probability plot is a graph with the original set of data on the x-axis, and the corresponding z scores for each data value on the yaxis. If the points appear to lie reasonably close to a straight line and there does not appear to be a systematic pattern that is not a straight line, we can conclude that the data came from a normally distributed population. 47 Data from a Short-tailed distribution Data from a Long-tailed distribution 48 8