Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Measures of Variation Range = Maximum β Minimum = Length of an interval containing all of the data Example: August 2005 tropical storms: 40, 50, 65, 105, 175 Range= 175-40=135 How far from the mean is a typical data point? For population, the standard deviation is defined as βπ (π₯π β π)2 π=1 π=β π and the variance is π 2 Variance is mathematically convenient, std. dev. is more interpretable. Std. Dev. is a measure of the distance from the mean to a typical data point, and is the most common measure of spread. Ex: If we are interested in the wind speed of Aug 2005 storms, what is our population? (interested in 2005 storms?) For population, π π₯Μ = = π₯Μ . π 2= π= In most cases, we only have a sample, not a population. Then we use π₯Μ to estimate π, and π to estimate π. The formula for sample standard deviation is βππ=1(π₯π β π₯Μ )2 π =β πβ1 Empirical rule for data with a Bell-shaped distribution: About 68% of the data will be within 1 s.d. About 95% within 2 s.d. Is the following be considered as a bell shaped? Is it symmetric? Skewed? Textbook, Page 48, Figure 2-11 gives a good illustration about Mean, Median, Mode and Skewness. Anything we can say for non-bell shaped distribution/ all distributions? Chebyshevβs Theorem The proportion of any set of data lying within K s.d. of the mean is always at least 1-1/(K*K), where K>1. Standardized Scores (z Scores) How extreme is an observation? How do we compare observations from different datasets? We standardize the data by subtracting its mean and dividing by its s.d. (p. 69) Sample vs. Population Ex: Hurricane Katrina (175-87)/49.25=1.79 Aug 2004βs biggest storm was Karl at 145. The mean & s.d. for Aug 04 were 90.6 and 38.28. So Karlβs z Score is Conclusion: Even after adjusting for the increased variability in 2005, Katrina stands out more extreme. SAT scores (rescaled to 200-800) adjusts for varying difficulty of the exams. From Empirical rule, |z|>2, means the observation is unusual. Quartiles and Percentiles The median separates the data into two equally sized groups β half of the observations are above the median, half are below. Equivalently, 50% are below, so median is the 50th percentile. The 99 percentiles divide the data into 100 groups. 1% of the observations are less than the 1st percentile, 2% less than the 2nd percentile, and so on. To find the kth percentile: (p 73) 1. 2. 3. a. b. Storm examples: Aug04 40,45,65,70,105,129,135,145 Aug05 40,50,65,105,175 The 25th percentile is called 1st quartile (Q1), the 75th percentile is the 3rd quartile (Q3). The Median = Q2 =50th percentile. Boxplot Min, Q1, median, Q3, Max Probability Event: any collection of results or outcomes of a procedure. Examples: procedure: flipping a fair coin, rolling 2 dice Outcomes: Head/Tail, ?~12 Simple Event: cannot be broken down further Examples: H/T Sample Space: collection of all possible simple events Procedure Flip a coin once (some possible) Event Sample Space Head (simple event) Flip a coin 3 times 2 heads 1 tail {H, T} {HHH,HHT,HTH,HTT, (HHT, HTH,THH are all simple events resulting in 2Head 1Tail event) Notations P, denotes a probability A,B or C denote specific events P(A) denotes the probability of event A Occurring THH,THT,TTH,TTT } Rule 1: Relative frequency approximation of probability Conduct (or observe) a procedure, and count the number of times that event A actually occurs. Based on these actual results, P(A) is estimated as P(A)=#of times A occurred/# of times trial was repeated Rule 2: Classical Assume that a given procedure has n different simple events and that each of those simple events has an equal chance of occurring. If event A can occur in s of n ways, then P(A)=s/n Example, P(2Hs1T)=3/8 Example: rolling 1 die, 1 12 face die and 2 dice Rule 3: Subjective Probability: based on knowledgeβ¦ Law of Large Number As a procedure is repeated again and again, the relative frequency probability of an event tends to approach the actual probability. Complement of event A denotes by π΄Μ , consists of all outcomes in which event A does not occur. P(π΄Μ )=1- P(A) Review There are 9 people in a room. I wanted to know what the proportion of them that smokes is. I randomly picked 3 of them to ask. Population: Sample: Parameter: Statistics: Data type: Experimental/observational: (Simple) Random sampling: To identify people, assign them numbers 1~9, or equivalently, 1 (1,1),2 (1,2),3 (1,3), 4(2,1),5 (2,2),6 (2,3), 7(3,1),8 (3,2),8 (3,3). Example or sampling methods: Rolling a 3 number die then chose by row, by column, or by color. Why Random? Each individual member has the same chance of being selected. Not simple random? Every size n(3) sample has the same chance to being chosen. What are the all possible size 3 samples? (Sample space) {(1,2,3),(1,2,4),(1,2,5),(1,2,6),(1,2,7),β¦(2,3,4),β¦(7,8,9)} (totally 84 possible samples) Among them, for example (1,2,4) cannot be chosen by row, by column or by color. So the sampling methods βRolling a 3 number die then chose by row, by column, or by colorβ are not a simple random sampling methods. And the samples selected in such ways are not simple random samples. How to conduct a simple random sample for this example? 3 Random numbers from1~9 Rolling twice to pick (a,b). (This is equivalent to the above line, but to prove is out of the scope.) Another example: Product line: every 100th from some starting point Fixed starting number Radom starting Simple random? Using common sense. Key concepts: Stem and leaf Histogram Frequency distribution Mean, Median, Mode Symmetric/Skew Range s.d. / variance Quartile, percentile Outlier Probability/event/simple event/sample space