Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Quantitative Methods Topic 5 Probability Distributions 1 Outline Probability Distributions For categorical variables For continuous variables Concept of making inference 2 Reading Chapters 4, 5 and Chapter 6 (particularly Chapter 6) Fundamentals of Statistical Reasoning in Education, Colardarci et al. 3 Tossing a coin 10 times - 1 If the coin is not biased, we would expect “heads” to turn up 50% of the time. However, in 10 tosses, we will not get exactly 5 “heads”. Sometimes, it could be 4 heads out of 10 tosses. Sometimes it could be 3 heads, etc. 4 Tossing a coin 10 times - 2 What is the probability of getting No ‘heads’ in 10 tosses 1 ‘head’ in 10 tosses 2 ‘heads’ in 10 tosses 3 ‘heads’ in 10 tosses …… 5 Do an experiment in EXCEL See animated demo CoinToss1_demo.swf 6 Frequencies of 50 sets of coin tosses 7 Histogram of 50 sets of coin tosses 8 Some terminology Random variable A variable the values of which are determined by chance. Examples of random variables Number of heads in 10 tosses of a coin Test score of students Height Income 9 Probability distribution (function) Shows the frequency (or chance) or occurrence of each value of the random variable. 10 Probability Distribution of Coin Number of Toss - 1 heads in 10 Slide 8 shows the empirical probability distribution. Theoretical one can be computed See animated demo Binomial Probability_demo.swf tosses Probability 0 0.001 1 0.010 2 0.044 3 0.117 4 0.205 5 0.246 6 0.205 7 0.117 8 0.044 9 0.010 10 0.001 11 Probability Distribution of Coin Toss - 2 Theoretical probabilities 0.300 0.250 0.200 0.150 0.100 0.050 0.000 0 1 2 3 4 5 6 7 8 9 10 12 How can we use the probability distribution - 1? Provide information about “central tendency” (where the middle is, typically captured by Mean or Median), and variation (typically captured by standard deviation). 13 How can we use the probability distribution - 2? Use the distribution as a point of reference Example: If we find that, 20% of the time, we obtain only 1 head in 10 coin tosses, when the theoretical probability is about 1%, we may conclude that the coin is biased (not 50-50 chance of tossing a head) Theoretical distribution will be better than empirical distribution, because of fluctuation in the collection of data. 14 Random variables that are continuous Collect a sample of height measurement of people. Form an empirical probability distribution Typically, the probability distribution will be a bell-shaped curve. Compute mean and standard devation Empirical distribution is obtained Can we obtain theoretical distribution? 15 Normal distribution - 1 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 -4 -3 -2 -1 0 1 2 3 4 16 Normal distribution - 2 A random variable, X, that has a normal distribution with mean and standard deviation can be transformed to a variable, Z, that has standard normal distribution where the mean is 0 and the standard deviation is 1. z-score Need only discuss properties of the standard z normal distribution x 17 Standard normal distribution - 1 0.45 0.4 0.35 0.3 5% in this region 0.25 0.2 2.5% in this region 0.15 0.1 0.05 0 -4 -3 -2 -1 -1.64 0 1 2 1.96 3 4 18 Standard normal distribution - 2 2.5% outside 1.96 So around 5% less than -1.96, or greater than 1.96. So the general statement that Around 95% of the observations are within -2 and 2. More generally, around 95% of the observations are within -2 and 2 (± 2 standard deviations). 19 Standard normal distribution - 3 Around 95% of the observations lie within ± two standard deviations (strictly, ±1.96) 0.45 0.4 95% in this region 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 -4 -3 -2 -1 0 1 2 3 4 20 Standard normal distribution - 3 Around 68% of the observations lie within ± one standard deviation 68% in this region 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 -4 -3 -2 -1 0 1 2 3 4 21 Computing normal probabilities in EXCEL See animated demo NormalProbability_demo.swf 22 Exercise - 1 For the data set distributed in Week 2, TIMSS2003AUS,sav, for the variable bsmmat01 (second last variable, maths estimated ability), compute the score range where the middle 95% of the scores lie: Use the observed scores and compute the percentiles from the observations Assume the population is normally distributed 23 Exercise - 2 Dave scored 538. What percentage of students obtained scores higher than Dave? Use the observed scores and compute the percentiles from the observations Assume the population is normally distributed 24