Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Introduction Randomness Probability Probability rules Probability distributions Discrete STAT 151 Introduction to Statistical Theory August 17, 2016 STAT 151 Class 1 Slide 1 Continuous Introduction Randomness Probability Probability rules Probability distributions Discrete Instructor and TA’s (weekly consultation location and hours, by appointments only) Denis HY LEUNG SOE 5047 email:[email protected] phone: 68280396 Tues 4-7pm TA’s SOE/SOSS GSR 4-9 Mon 9am-12pm AW Jia Yu [email protected] Mon 3pm-6pm Marilyn LIM Xue Yan [email protected] Wed 330-630pm Jefferson LOW Hao Hwee [email protected] Thu 330-630pm YANG Zhi Yao [email protected] STAT 151 Class 1 Slide 2 Continuous Introduction Randomness Probability Probability rules Probability distributions Discrete Continuous Essentials Name card (first few weeks) Course webpage: http://economics.smu.edu.sg/faculty/ profile/9699/Denis%20LEUNG (NOT eLearn!) Understanding of basic Calculus and Algebra – Appendix in course notes Readings before each class Projects vs Homework Do not disturb others in class If you missed a class, it is YOUR responsibility to find out what you have missed from your classmates or course webpage STAT 151 Class 1 Slide 3 Introduction Randomness Probability Probability rules Probability distributions Discrete Assessments Class Participation (10%) Projects (40%) – 1 project with presentation 20% – 1 project without presentation 20% – Each project’s grade includes 8% individual assessment (quizzes) Exam (50%) – Closed book but one 2-sided A-4 “cheat sheet” is allowed STAT 151 Class 1 Slide 4 Continuous Introduction Randomness Probability Probability rules Structure of each class New materials (< 2.5 hrs) Break (15 mins) Any other business (15 mins) STAT 151 Class 1 Slide 5 Probability distributions Discrete Continuous Introduction Randomness Probability Probability rules Probability distributions Discrete Continuous Data Survival time (in years) in 70 cancer patients 0.67 0.01 0.48 1.06 0.37 2.17 0.15 1.19 0.12 0.50 0.15 0.81 0.46 0.56 0.82 0.07 0.45 0.41 0.23 0.16 0.66 0.89 0.99 0.98 0.08 1.39 1.31 0.50 Orange - Subtype I 0.85 0.58 0.64 2.28 0.39 0.95 0.74 0.45 0.19 0.78 1.23 1.22 0.72 0.67 1.42 0.22 0.05 0.55 0.46 0.34 0.64 0.09 0.77 0.29 0.62 1.09 0.14 0.14 0.03 0.01 2.62 1.19 0.15 0.14 1.18 Black - Subtype II 0.18 0.02 0.83 0.26 0.49 0.99 1.53 Outcome (H vs. T) in tossing a coin 10 times HHTHTTHTTT Occurrence of financial crises 82 84 Mexican S&L 87 91 97 98 00 07 12 Dotcom Subprime Euro ? STAT 151 Class 1 Slide 6 Black Mon. Comm. RE AsianLTCM Introduction Randomness Probability Probability rules Probability distributions Discrete Continuous Sample vs. Population (a) Data are a sample from a population that we want to study e.g. Survival time of 70 patients (sample) out of all cancer patients (population) (b) We are interested in some characteristics of the population e.g. Average survival time of patients or percentage of patients who live beyond 2 years (c) Due to randomness, we cannot make definitive statements about a particular unit in the population e.g. We cannot tell how long the next patient would live (d) We use the sample of data to help us answer the questions in (b) STAT 151 Class 1 Slide 7 Introduction Randomness Probability Probability rules Probability distributions Discrete Continuous Randomness Observation from survival time data - Why some live longer than others? Subtype I - 1.06 2.17 1.22 1.42 2.28 2.62 1.39 1.18 1.53 average ≈ 1.65 Subtype II - 0.67 0.01 0.48 0.85 0.45 0.19 0.78 1.23 0.18 0.37 0.15 1.19 0.58 0.72 0.67 0.02 0.12 0.50 0.15 0.81 0.64 0.22 0.05 0.55 0.46 0.83 0.46 0.56 0.82 0.07 0.34 0.64 0.09 0.77 0.26 0.45 0.41 0.23 0.16 0.39 0.29 0.62 1.09 0.14 0.49 0.66 0.89 0.99 0.98 0.95 0.14 0.03 0.01 0.99 0.08 1.31 0.50 0.74 1.19 0.15 0.14 average ≈ 0.50 I has a better chance of survival because it has a higher average than II There are still variations within each subtype - inevitable in the real world We attribute (accommodate) these (unexplained) variations as random Chance and randomness can be described using probability theory STAT 151 Class 1 Slide 8 Introduction Randomness Probability Probability rules Probability distributions Discrete Continuous Definition of probability - Coin tossing data 1 H 2 T 3 T 4 H 5 H 6 T 7 H 8 T 9 H ··· ··· 0.6 0.4 probability = 0.5 0.0 0.2 Proportion of heads 0.8 1.0 Toss Outcome 1 2 3 4 9 50 100 500 1000 Number of tosses The long run proportion (frequency) of heads is the probability of heads. STAT 151 Class 1 Slide 9 Introduction Randomness Probability Probability rules Probability distributions Discrete Continuous Insight from coin tossing experiment Probability is the long run frequency of an outcome Probability cannot predict individual outcomes However, it can be used to predict long run trends Probability always lies between 0 and 1, with a value closer to 1 meaning a higher frequency of occurrence Probability is numeric in value so we can use it to: - compare the relative chance between different outcomes (events) - carry out calculations STAT 151 Class 1 Slide 10 Introduction Randomness Probability Probability rules Probability distributions Discrete Continuous Probability Axioms - Urn model (1)- drawing marbles from an urn (with replacement) Draws 1 1 4 3 2 3 4 5 2 5 Probability 3 5 STAT 151 Class 1 6 Slide 11 2 5 7 8 9 ... ... Introduction Randomness Probability Probability rules Probability distributions Discrete Urn model (2) Five possible Outcomes: 1 2 3 4 5 Interested in Event A: A={ 1 outcomes 4 5 }; hence an event is a collection of P(A) = 35 = 0.6(60%) = Number of marbles in A Total number of marbles STAT 151 Class 1 Slide 12 Continuous Introduction Randomness Probability Probability rules Probability distributions Discrete Complementary events Marbles in urn: 1 Interested in Ā: Ā= 2 3 2 3 4 5 (Not A) Ā, sometimes written as AC , is called the complementary event of A Chance of = 1 − chance of ⇒ P(Ā) = 1 − P(A) = 1 − STAT 151 Class 1 Slide 13 3 2 = 5 5 Continuous Introduction Randomness Probability Probability rules Probability distributions Discrete Continuous Joint probability - independent events - Drawing a blue in draw 1 and a green in draw 2 (with replacement) A and B are independent means the occurrence of one event does not change the chance of the other Draws 1 1 4 3 STAT 151 Class 1 2 5 Slide 14 A={ B ={ in 1st draw} in 2nd draw} 2 3 2 P(A) = ; P(B) = 5 5 3 2 P(A and B) = × 5 5 6 = = P(A)P(B) 25 Introduction Randomness Probability Probability rules Probability distributions Discrete Joint probability and disjoint events Two events A and B are disjoint or sometimes called mutually exclusive if they cannot occur simultaneously: P(A and B)=0 Example A = { in 1st draw} B={ in 1st draw} P(A and B) = 0 STAT 151 Class 1 Slide 15 Continuous Introduction Randomness Probability Probability rules Probability distributions Discrete Continuous Conditional probability Conditional probability is a useful quantification of how the assessment of chance changed due to new information: “If A happened, what is the chance of B?” The conditional probability of “B given A” is written as P(B|A) Example Drawing marbles WITHOUT replacement A = { in 1st draw} B = { in 2nd draw} 2 P(B) = 5 2 P(B|A) = 4 2 2 ⇒ because has been drawn 5 4 2 3 6 P(A and B) = × = P(B|A)P(A) = 6= P(B)P(A) 4 5 20 ? ; STAT 151 Class 1 Slide 16 Introduction Randomness Probability Probability rules Probability distributions Discrete Conditional probability and independence The following relationships hold if A and B are independent P(A|B) = P(A) ⇔ P(B|A) = P(B) ⇔ P(A and B) = P(A|B) | {z } P(B) = P(A)P(B) =P(A) if A and B indep Independence is NOT the same as mutually exclusive (disjoint), which is P(A and B) = 0. STAT 151 Class 1 Slide 17 Continuous Introduction Randomness Probability Probability rules Probability distributions Discrete The multiplication rule P(AB) = P(A|B)P(B) = P(B|A)P(A) “Multiplication Rule” Example Marbles in urn: 1 P( 2 3 4 1 3 1 and Odd) = P( |Odd)P(Odd) = = 3 5 5 1 2 1 = P(Odd| )P( ) = = 2 5 5 Rearranging the multiplication rule: P(A|B) = STAT 151 Class 1 5 Slide 18 P(AB) P(B) Continuous Introduction Randomness Probability Probability rules Probability distributions Partition rule P( ) = P( = P({ 1 2 1 = + 5 5 3 = 5 STAT 151 Class 1 Slide 19 and odd) + P( and even) 5 }) + P({ 4 }) Discrete Continuous Introduction Randomness Probability Probability rules Probability distributions Discrete Continuous Union of events (1) Union of events can sometimes be best visualized using a Venn diagram (John Venn, 1834-1923) Example What is the probability of drawing a or an odd number ? Urn Odd Green 1 4 3 2 5 1 2 3 5 4 ; STAT 151 Class 1 Slide 20 Introduction Randomness Probability Probability rules Probability distributions Discrete Union of events (2) P( or odd) = P( ) + P(Odd) − P( 2 3 1 = + − 5 5 5 4 = 5 and odd) In general, if A and B are: disjoint, then P(A or B) = P(A) + P(B) not disjoint, then P(A or B) = P(A) + P(B) − P(AB) STAT 151 Class 1 Slide 21 Continuous Introduction Randomness Probability Probability rules Probability distributions Discrete Continuous Probability tree Probability tree is useful for studying combination of events. Branches of a tree are conditional probabilities. Example Drawing two marbles from urn without replacement: ∩ )= 2 5 · 1 4 P( ∩ )= 2 5 · 3 4 P( ∩ )= 3 5 · 2 4 ∩ )= 3 5 · 2 4 P( Draw 2 1 4 = P( | ) 3 4 2 5 = P( | ) Draw 1 3 5 Draw 2 2 4 = P( | ) 2 4 = P( | ) STAT 151 Class 1 Slide 22 P( Introduction Randomness Probability Probability rules Probability distributions Discrete Bayes Theorem (Thomas Bayes, 1701-1761) Trees are useful for visualizing P(B|A) when B follows from A in a natural (time) order. Many problems require P(A|B), Bayes Theorem provides an answer. Example Testing for an infectious disease. Test 9 10 1 100 · 9 10 P(D ∩ T̄ ) = 1 100 · 1 10 P(D̄ ∩ T ) = 99 100 · 1 10 99 100 · 9 10 P(D ∩ T ) = T = P(T |D) T̄ D 1 10 1 100 = P(T̄ |D) Disease D̄ T 99 100 Test 1 10 = P(T |D̄) T̄ 9 10 = P(T̄ |D̄) What is P(D|T ) or P(D̄|T̄ )? STAT 151 Class 1 Slide 23 P(D̄ ∩ T̄ ) = Continuous Introduction Randomness Probability Probability rules Probability distributions Discrete Bayes Theorem (2) P(D|T ) = P(D ∩ T ) P(T ) = = P(T |D)P(D) P(T ) P(T |D)P(D) P(T ∩ D) + P(T ∩ D̄) | {z } Partition rule = P(T |D)P(D) P(T |D)P(D) + P(T |D̄)P(D̄) | {z } Multiplication rule = 9 10 · 9 10 1 100 1 · 100 1 + 10 · In general, P(A|B) = STAT 151 Class 1 Slide 24 P(B|A)P(A) P(B) 99 100 = 1 12 Continuous Introduction Randomness Probability Probability rules Probability distributions Discrete Continuous Data revisited- Discrete vs. Continuous Data Discrete - countable number of possible values Examples (1) Coin Tossing: H H T H T T H T T T Two possible values: H or T (2) Financial crisis: 1982, 1984, 1987,... (No. of crises in a decade) Many possible values: 0, 1, 2, 3, 4,... Continuous - values fall in an interval (a, b), a could be −∞ and b could be ∞ Example Survival time in cancer patients: 0.67 0.01 0.48 1.06 0.85 0.45 ... 0 < survival time < a positive number STAT 151 Class 1 Slide 25 Introduction Randomness Probability Probability rules Probability distributions Discrete Continuous First look - summary statistics Naı̈ve methods Numerical summary: Min, max, mean, variance, etc. Graphical summary: Histogram, pie chart, bar graph, etc. Tabular summary: Tables of frequencies Attributes Quick and dirty Identify potential problems, errors Flaws Difficult to generalize especially when the dataset is complicated Not easily portable – cannot describe a chart or graph unless you can see it! STAT 151 Class 1 Slide 26 Introduction Randomness Probability Probability rules Probability distributions Discrete Continuous What do we do about data? Financial crises data 82 84 Mexican S&L 87 91 97 98 00 07 12 Dotcom Subprime Euro ? Black Mon. Comm. RE Asian LTCM We may be interested in X , the number of crises in the next decade. Possible questions: What is the probability of X =0, or 1, or 2, or 3 or...? Probability distribution - tells us the long run frequencies (chances) of the possible outcomes On average, how many crises in a decade? Expectation (Expected value) - tells us the weighted average of the possible outcomes, where the weights are the probabilities How likely will X differ from the expected? Standard deviation (Variance) - tells us the weighted average of deviations of the possible outcomes from the expected value STAT 151 Class 1 Slide 27 Introduction Randomness Probability Probability rules Probability distributions Discrete Continuous Discrete distributions - Tossing a coin (H) or (T) ? Long run frequencies H T 1 2 1 2 The long run frequencies are 1/2 each for H and T – there is equal chance for H or T STAT 151 Class 1 Slide 28 ; Introduction Randomness Probability Probability rules Probability distributions Discrete Continuous Discrete distributions - Drawing a marble 1 1 4 3 2 5 2 3 4 1 2 3 4 5 1 5 1 5 1 5 1 5 1 5 5 ? 3 5 2 5 ; The long run frequencies tell us there is equal chance for 1, 2, 3, 4 and 5 but there is a higher chance for blue than green STAT 151 Class 1 Slide 29 Introduction Randomness Probability Probability rules Probability distributions Discrete Continuous Discrete probability distributions X H T P(X ) 1 2 1 2 Probability distribution function X X 3 5 P(X ) 2 5 a1 a2 a3 P(X ) P(a1) P(a2) P(a3) X 1 2 3 4 5 P(X ) 1 5 1 5 1 5 1 5 1 5 ... ... ; A probability distribution summarizes the long run frequencies (chances) of the possible outcomes of X STAT 151 Class 1 Slide 30 Introduction Randomness Probability Probability rules Probability distributions Discrete Continuous Probability distribution for a discrete random variable X is the unknown outcome. X is called a discrete random variable if its value can only come from a countable number of possible values: a1 , a2 , ..., ak . Examples Coin toss: X = H or T (2 possible values) Marbles: X = 1, 2, 3, 4, or 5 (5 possible values) Financial crisis: X = 0, 1, 2, 3,... (Infinite but countable number of possible values) P(X = ai ) gives the probability X = ai and is called a probability distribution function. A valid probability distribution function must satisfy the following rules: P(X = ai ) must be between 0 and 1 We are certain that one of the values will appear, therefore: P(X = a1 or X = a2 or ...X = ak ) = P(X = a1 ) + P(X = a2 ) + ... + P(X = ak ) = 1 | {z } X =a1 ,X =a2 ,...are disjoint events STAT 151 Class 1 Slide 31 Introduction Randomness Probability Probability rules Probability distributions Discrete Continuous Example: Probability distribution of the number from drawing a marble X P(X ) 1 0.2 2 0.2 3 0.2 4 0.2 5 0.2 P(X = 1) = 0.2 P(X ≥ 3) = P(X = 3) + P(X = 4) + P(X = 5) = 0.6 P(X = 1.5) = 0 P(X > 5) = 0 1 = P(X = 1)+P(X = 2)+P(X = 3)+P(X = 4)+P(X = 5) STAT 151 Class 1 Slide 32 Introduction Randomness Probability Probability rules Probability distributions Discrete Continuous Continuous random variable The survival time (X ) of a cancer patient following diagnosis may be any value in a range (e.g. 0 < X < some positive number). X is called a continuous random variable if its value falls in a range (a, b) ∈ (−∞, ∞). For a continuous random variable X , P(X = x) for any value x is 0, we can only talk about the probability of X falling in a range Examples P(X > 2 years) = Probability of surviving beyond 2 years P(0 < X ≤ 1 year) = Probability of dying within 1 year P(X ≤ 2 years|X > 1 year) = Probability of dying within 2 years given surviving for 1 year? STAT 151 Class 1 Slide 33 Introduction Randomness Probability Probability rules Probability distributions Discrete Continuous Probability distribution for a continuous random variable A continuous random variable, X , is a random variable with outcome that falls within a range or interval, (a, b) Examples X = survival time of a patient: (a, b) = [0, 150] years X = return from an investment of 10000; (a, b) = [−10000, ∞) dollars X = per capita income; (a, b) = [0, 1000000000000] dollars The probability distribution of X is defined by a (probability) density function (PDF), f (x). There are subtle differences between PDF and probability: f (x) ≥ 0 for any value of x in (a, b) f (x) 6= P(X = x) = 0 for any value of x P(r ≤ X ≤ s) = P(r < X < s) probability of X between r and s | {z } since P(X =r )=P(X =s)=0 P(a ≤ X ≤ b) = 1 since X must fall within (a, b) The cumulative distribution function (CDF), written as F (x), is defined as P(X ≤ x) STAT 151 Class 1 Slide 34 Introduction Randomness Probability Probability rules Probability distributions Discrete Continuous Example - survival data f (x) = λe −λx , λ > 0 is called an exponential density Density of survival time f(x) = 1.5e−1.5x Different values of λ can be used to model different populations Density All exponential densities have the same shape but different gradients so it is easy to describe to others Allows us to make probability statements about survival times we have and have not observed 0 1 2 3 4 X (Time in years) STAT 151 Class 1 Slide 35 5 The “area” under f (x) is 1 which means the probability of dying between time 0 and ∞ is 1 Introduction Randomness Probability Probability rules Probability distributions Discrete Some calculations P(X > 2) Density f(x) = 1.5e−1.5x 0 1 2 3 4 5 X (Time in years) P(X > 2) = 0.049 is the red area under the curve and can be obtained by analytical or numerical (computer) analysis STAT 151 Class 1 Slide 36 Continuous Introduction Randomness Probability Probability rules Probability distributions Discrete Continuous Some calculations P(X < 1) Density f(x) = 1.5e−1.5x 0 1 2 3 4 5 X (Time in years) P(X ≤ 1) = P(0 < X ≤ 1) = 0.776 is the red area under the curve and can be obtained by analytical or numerical analysis. STAT 151 Class 1 Slide 37 Introduction Randomness Probability Probability rules Probability distributions Discrete Some calculations P(X ≤ 2|X > 1) = P(1 < X and X ≤ 2) P(X > 1) | {z } conditional probability P(1 < X ≤ 2) P(X > 1) 1 − P(X ≤ 1) − P(X > 2) = 1 − P(X ≤ 1) 1 − 0.776 − 0.049 = 1 − 0.776 = 0.781 = STAT 151 Class 1 Slide 38 Continuous