Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture 2 (Section 1.3 of the textbook) Normal distribution Density curve Density curves Normal distributions The 68-95-99.7 rule The standard Normal distribution Normal distribution calculations Normal quantile plots Proportion of students with vocabulary score ≤ 6: Density curve (graph of y=f(x)) Normal density curve is always on or above the horizontal axis, has area exactly 1 underneath it. Mathematically: Median – the equal-areas point. Mean – the balance point of the density curve. A right skewed density curve 1 Mean is the balance point of the density curve. Notation: µ. Parameters of the distribution vs statistics of the data µ – mean of the idealized distribution (i.e. of the density curve) σ – standard deviation of the idealized distribution x - mean of the actual observations (sample mean) s – standard deviation of the actual observations (sample standard deviation) We observe/calculate: We are interested in: Normal distributions with different σ and µ. Formula for Normal density: 1 f ( x) = e σ 2π 1 x−µ − 2 σ 2 The 68-95-99.7 rule for Normal data Approximately 68% of the observations fall within σ from µ. Approximately 95% of the observations fall within 2σ from µ. Approximately 99.7% of the observations fall within 3σ from µ. 2 Example Heights of young women aged 18 to 24. Normal with µ = 64.5, σ=2.5. Find ranges for the middle 99%, 95%, 68% of the population: Finding probabilities for Normal data Standardizing z-score – standardized value of x (z=how many standard deviations from the mean) z= x−µ σ Example: Compute the standardized scores for young women 70 inches tall; 60 inches tall. The standardized values for any distribution always have mean 0 and standard deviation 1. If the original distribution (X) is Normal, then the standardized values have Normal distribution (Z) with mean 0 and standard deviation 1: (X) N(µ, σ) --standardization--> N(0,1) (Z) Standardization: Z=(X-µ)/σ i.e. X=µ+σZ. Probabilities for N(0,1) are given in statistical tables. 3 Examples: What is the proportion of these young women who are less than 70 inches tall ? less than 2.2? greater than -2.05? In Y2K the scores of students taking SATs were approximately Normal with mean 1019 and standard deviation 209. What percent of all students had the SAT scores of : at least 820? (=limit for Division I athletes to compete in their first college year) between 720 and 820? (=partial qualifiers) Procedure for percentiles of N(µ,σ): Inverse reading in Normal table What is z such that P(Z<z)=0.95? Example: What proportion of observations of a standard Normal variable Z take values: xp=µ+σ *zp Calculate 95th, and 99th percentile of the distribution of SAT. What is z such that P(Z>z)=0.01? The first z is the 95-th percentile, z95, of N(0,1). The second z is the 99th percentile, z99. 4 Example (continued) How high must a student score in order to be in the top 20 % of all students taking SAT? Normal quantile plots If the plotted points lie close to a straight line then the distribution of data is close to Normal. Construction: use software or Summary: in Normal calculations use z=(x-µ)/σ or x=µ+σz. Newcomb’s data Supermarket spending data: skewness (heavy right tail). arrange the data from smallest to largest and record corresponding percentiles (w/r to the data), find z-scores for these percentiles (for example zscore for 5-th percentile is z=-1.645), plot each data point against the corresponding z. Interpretation: tails of the distribution etc... Newcomb’s data without outliers. IQ scores of seventh-grade students (shorter both tails). 5