Survey

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Survey

Document related concepts

no text concepts found

Transcript

Populations, Samples and Variable Visual Displays for Univariate Data Describing distributions The Normal Distribution Other Continuous Distributions Several Useful discrete distributions Chapter 1: Populations, Samples and Processes Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 1 Populations, Samples and Variable Visual Displays for Univariate Data Describing distributions The Normal Distribution Other Continuous Distributions Several Useful discrete distributions Outline of Chapter 1 1.1 1.2 1.3 1.4 1.5 1.6 Populations, Samples and Variable Visual Displays for Univariate Data Describing Distributions The Normal Distribution Other Continuous Distributions Several Useful Discrete Distributions Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 2 Populations, Samples and Variable Visual Displays for Univariate Data Describing distributions The Normal Distribution Other Continuous Distributions Several Useful discrete distributions Introduction Statistics theory and techniques are powerful and indispensable means in understanding the world around us. The means can help one to make intelligent judgments and decisions in the presence of uncertainty and variation. Without uncertainty or variation, there would be little need for statistical techniques and statisticians. Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 3 Populations, Samples and Variable Visual Displays for Univariate Data Describing distributions The Normal Distribution Other Continuous Distributions Several Useful discrete distributions Populations Sample Branches of statistics Populations Engineers and scientists are constantly exposed to collections of facts/data in their work. Population is a well-defined collection of objects. Examples: Students in Class ECE08 People in Vietnam ... Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 4 Populations, Samples and Variable Visual Displays for Univariate Data Describing distributions The Normal Distribution Other Continuous Distributions Several Useful discrete distributions Populations Sample Branches of statistics Sample When desired information is available for all objects in the population, we have what is called a census. Practical constraints (e.g., money, time and other limited resources) usually make a census impractical or infeasible. Sample: a (random) subset of the population. For instance, we might select a sample of last year’s engineering graduates to obtain feedback about the quality of the engineering curricula. Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 5 Populations, Samples and Variable Visual Displays for Univariate Data Describing distributions The Normal Distribution Other Continuous Distributions Several Useful discrete distributions Populations Sample Branches of statistics Sample: variable Variable: is any characteristic whose value may change from one object to another in the population. Examples: X = gender of a graduating engineer, Y = age of a graduating engineer, Z = temperature of a certain time instance in a day. Univariate data set: consists of observations on a single variable. Bivariate data: observations are made on each of two variables. Multivariate data: observations are made on more than two variables. Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 6 Populations, Samples and Variable Visual Displays for Univariate Data Describing distributions The Normal Distribution Other Continuous Distributions Several Useful discrete distributions Populations Sample Branches of statistics Branches of statistics Descriptive Statistics: methods to summarize and describe important features of the data. Examples: Graphical: the construction of histogram, stem-and-leaf display, dot plot Calculation: numerical measures of means, variances, correlation,... Inferential Statistics: techniques for generalizing from a sample to a population Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 7 Populations, Samples and Variable Visual Displays for Univariate Data Describing distributions The Normal Distribution Other Continuous Distributions Several Useful discrete distributions Stem-and-leaf displays Dotplots Histograms Stem-and-leaf displays Stem-and-leaf display: an effective way to organize numerical data into two parts: Stem: one or more leading digits Leaf: the remaining digits The display can provide the following information: Identification of a typical or representative value Extent of spread about the typical value Presence of any gaps in the data Extent of symmetry in the distribution of values Number and location of peaks Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 8 Populations, Samples and Variable Visual Displays for Univariate Data Describing distributions The Normal Distribution Other Continuous Distributions Several Useful discrete distributions Stem-and-leaf displays Dotplots Histograms Stem-and-leaf displays: an example In a given experiment, the values of the considered variable are: 41,43,49,52,57,...112,114,123 The related stem-and-leaf can be presented as follows: Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 9 Populations, Samples and Variable Visual Displays for Univariate Data Describing distributions The Normal Distribution Other Continuous Distributions Several Useful discrete distributions Stem-and-leaf displays Dotplots Histograms Dotplots Dotplot: a summary of data when the data set is reasonably small or there are relatively few distinct data values. Each observation is represented by a dot above the corresponding location on a a horizontal measurement scale. When a value occurs more than once, there is a dot for each occurrence, and these dots are stacked vertically. a dotplot provides information about location, spread, extremes, and gaps. Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 10 Populations, Samples and Variable Visual Displays for Univariate Data Describing distributions The Normal Distribution Other Continuous Distributions Several Useful discrete distributions Stem-and-leaf displays Dotplots Histograms Dotplot: an example Here is an example to show what a dotplot looks like and how to interpret it. Suppose 30 first graders are asked to pick their favorite color. Their choices can be summarized in a dotplot, as shown below. * * * * * * * * * Red * * * * * * * * * * * * * * * * * Orange Yellow Green Blue * * * * Indigo Violet Each dot represents one student, and the number of dots in a column represents the number of first graders who selected the color associated with that column. For example, Red was the most popular color (selected by 9 students), followed by Blue (selected by 7 students). Selected by only 1 student, Indigo was the least popular color. Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 11 Populations, Samples and Variable Visual Displays for Univariate Data Describing distributions The Normal Distribution Other Continuous Distributions Several Useful discrete distributions Stem-and-leaf displays Dotplots Histograms Histograms Construct a histogram for: discrete data: Determine the (relative) frequency of each x value in a sample set Mark possible x values on a horizontal scale Above each value, draw a rectangle whose height is the relative frequency of that value. continuous data: Determine the (relative) frequency of each class Mark the class boundaries on a horizontal measurement axis Above each class interval, draw a rectangle whose height is the corresponding frequency. Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 12 Populations, Samples and Variable Visual Displays for Univariate Data Describing distributions The Normal Distribution Other Continuous Distributions Several Useful discrete distributions Stem-and-leaf displays Dotplots Histograms Histogram: an example 1500 Number of values in each interval Gaussian Histogram 1000 500 0 −4 −3 −2 −1 0 1 2 3 4 Variable value Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 13 Populations, Samples and Variable Visual Displays for Univariate Data Describing distributions The Normal Distribution Other Continuous Distributions Several Useful discrete distributions Continuous distributions Discrete distributions Density function A density function f (x) is used to describe (approximately) the population distribution of a continuous variable x. The graph of f (x) is called the density curve. The following properties of f (x) must be satisfied: fR (x) ≥ 0 −∞ −∞ f (x)dx = 1 (i.e., the total area under the density curve is 1) For any two numbers a and b with R b a < b, the proportion of x values between a and b = a f (x)dx. Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 14 Populations, Samples and Variable Visual Displays for Univariate Data Describing distributions The Normal Distribution Other Continuous Distributions Several Useful discrete distributions Continuous distributions Discrete distributions Density function Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 15 Populations, Samples and Variable Visual Displays for Univariate Data Describing distributions The Normal Distribution Other Continuous Distributions Several Useful discrete distributions Continuous distributions Discrete distributions Mass function A mass function p(x) is used to describe (approximately) the population distribution of a discrete variable x. The following properties of p(x) must be satisfied: p(x) P ≥0 p(x) = 1 Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 16 Populations, Samples and Variable Visual Displays for Univariate Data Describing distributions The Normal Distribution Other Continuous Distributions Several Useful discrete distributions Definition The standard normal distribution Definition A continuous variable x is said to have a normal distribution with parameters µ and σ, where −∞ < µ < ∞ and σ > 0, if the density function of x is f (x) = √ 1 2 2 e−(x−µ) /(2σ ) with − ∞ < x < ∞ 2πσ (1) The normal distribution is the most important distribution in statistics. Many population and process variables have distributions that can be very closely fit by an appropriate normal curve. Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 17 Populations, Samples and Variable Visual Displays for Univariate Data Describing distributions The Normal Distribution Other Continuous Distributions Several Useful discrete distributions Definition The standard normal distribution The standard normal distribution The normal distribution with parameters µ = 0 and σ = 1 is 1 called the standard normal distribution f (x) = √2πσ 0.4 0.35 0.3 f(x) 0.25 0.2 0.15 0.1 0.05 0 −6 −4 −2 0 2 4 6 x Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 18 Populations, Samples and Variable Visual Displays for Univariate Data Describing distributions The Normal Distribution Other Continuous Distributions Several Useful discrete distributions The lognormal distribution The Weibull distribution Selecting an appropriate distribution The lognormal distribution The nonnegative variable x is said to be have a lognormal distribution if ln(x) has a normal distribution with parameters µ and σ. The density function of the lognormal distribution is ( 2 2 √ 1 e−(ln(x)−µ) /(2σ ) x > 0 2πσx f (x) = . (2) 0 for x ≤ 0. Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 19 Populations, Samples and Variable Visual Displays for Univariate Data Describing distributions The Normal Distribution Other Continuous Distributions Several Useful discrete distributions The lognormal distribution The Weibull distribution Selecting an appropriate distribution The lognormal distribution: an example 0.014 σ=1 µ =4 lognormal distribution 0.012 0.01 0.008 0.006 0.004 0.002 0 0 50 100 150 200 250 300 350 400 450 500 x Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 20 Populations, Samples and Variable Visual Displays for Univariate Data Describing distributions The Normal Distribution Other Continuous Distributions Several Useful discrete distributions The lognormal distribution The Weibull distribution Selecting an appropriate distribution The Weibull distribution The distribution was introduced in 1939 by a Swedish physicist. A variable x has a Weibull distribution with parameters α and β if the density function of x is ( α α−1 −(x/β)α x e x>0 βα (3) f (x) = 0 x≤0 In recent years, the Weibull distribution has been used to model engine emission of various pollutants. Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 21 Populations, Samples and Variable Visual Displays for Univariate Data Describing distributions The Normal Distribution Other Continuous Distributions Several Useful discrete distributions The lognormal distribution The Weibull distribution Selecting an appropriate distribution The Weibull distribution: an example 2 β=1, α=1 β=1, α=1.5 β=1, α=5 1.8 1.6 Density function 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 0.5 1 Applied Probability and Statistics for Engineering and Science 1.5 x 2 2.5 3 Chapter 1: Populations, Samples and Processes 22 Populations, Samples and Variable Visual Displays for Univariate Data Describing distributions The Normal Distribution Other Continuous Distributions Several Useful discrete distributions The lognormal distribution The Weibull distribution Selecting an appropriate distribution Selecting an appropriate distribution The choice of an appropriate distribution for a continuous variable x is usually based on sample data. An investigator must first decide whether a particular family, such as the Weibull or the normal one, is reasonable. Then, any parameters of the chosen family must be estimated to find a particular member of the family that best fits the data. Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 23 Populations, Samples and Variable Visual Displays for Univariate Data Describing distributions The Normal Distribution Other Continuous Distributions Several Useful discrete distributions The Binomial distribution The Poisson distribution The Binomial distribution Suppose that items or entities of some sort come in batches or groups of size n. Let denote ρ the proportion of all items in the population or process that are satisfactory (S), so the proportion of all items that are unsatisfactory (F) is 1 − ρ Assume the condition of any particular item (S or F) is independent of that of any other item. Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 24 Populations, Samples and Variable Visual Displays for Univariate Data Describing distributions The Normal Distribution Other Continuous Distributions Several Useful discrete distributions The Binomial distribution The Poisson distribution The Binomial distribution (cont.) The binomial variable x is the number of S’s in a batch or group. The mass function of x is given by the formula n! ρx (1−ρ)n− x!(n − x)! (4) The binomial distribution is used extensively in genetic applications. The use of binomial distribution can be tedious when n is large. p(x) = proportion of batches with x S’s = Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 25 Populations, Samples and Variable Visual Displays for Univariate Data Describing distributions The Normal Distribution Other Continuous Distributions Several Useful discrete distributions The Binomial distribution The Poisson distribution The Binomial distribution: a histogram 0.35 Binomial histogram 0.3 Proportion 0.25 0.2 0.15 0.1 0.05 0 0 1 2 3 Applied Probability and Statistics for Engineering and Science 4 x 5 6 7 8 Chapter 1: Populations, Samples and Processes 26 Populations, Samples and Variable Visual Displays for Univariate Data Describing distributions The Normal Distribution Other Continuous Distributions Several Useful discrete distributions The Binomial distribution The Poisson distribution The Poisson distribution The Poisson distribution is usually used as a model for the number of times an ”event” of some sort occurs during a specific time period or in a particular region of space. The Poisson mass function is p(x) = e−λ λx x = 0, 1, 2, 3... x! (5) The Poisson distribution is used telephone engineering. Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 27 Populations, Samples and Variable Visual Displays for Univariate Data Describing distributions The Normal Distribution Other Continuous Distributions Several Useful discrete distributions The Binomial distribution The Poisson distribution The Poisson distribution: a histogram 0.35 λ=2 0.3 Poisson histogram 0.25 0.2 0.15 0.1 0.05 0 0 1 2 3 Applied Probability and Statistics for Engineering and Science 4 x 5 6 7 8 Chapter 1: Populations, Samples and Processes 28