Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
26134 Business Statistics Autumn 2017 Tutorial 5: Continuous Probability Distributions [email protected] B MathFin (Hons) M Stat (UNSW) PhD (UTS) mahritaharahap.wordpress.com/ teaching-areas UTS CRICOS PROVIDER CODE: 00099F business.uts.edu.au A random variable, usually written X, is a variable whose possible values are numerical outcomes of a random phenomenon. There are two types of random variables, discrete and continuous. Highlights: Reminder of Key Steps from Lecture 4 (Discrete Probability) Step One: Determine the Distribution Look at the type of variable the question is talking about Think about the properties of the distribution in the question and match it to one you know about Is it discrete or continuous If it is discrete, does it match to properties of a binomial, hypergeometric, Poisson? Step Two: What is the question asking for? Look at what the question is asking you to do. Is it “find the probability…”, or something else. Write down what exactly it is you are finding (e.g., P(x>2)) Can you rewrite this probability (e.g., 1 – P(X<=2)) Step Three: Get the necessary formula Based on the information from steps 1 and 2 you should be able to pick the needed formula from your formula sheet. Step Four: Apply the formula to the question Match up all the terms in the formula to numbers in the question and find the answer needed. 𝑓 𝑥 =𝑃 𝑋=𝑥 n =C where: f(x) = probability of x successes in n independent trials p = probability of success q = 1-p = probability of failure 𝑥 = number of successful trials n-x = number of failed trials x x (n-x) pq 𝑥 −𝜆 𝜆 𝑒 𝑓 𝑥 =𝑃 𝑋=𝑥 = 𝑥! where: f(x) = probability of x occurrences in a specified interval 𝜆 = mean number of events in a specified interval 𝑓 𝑥 =𝑃 𝑋=𝑥 = rC N−rC x n−x NC n where: f(x) = probability of x successes in n dependent trials n = number of trials N = number of elements in the population r = number of elements in the population labelled success x= number of elements in the sample labelled success • Discrete Distributions • P(X<=x)≠P(X<x) • Continuous Distributions • P(X<=x)=P(X<x) • P(X=x) ≠ 0 • i.e. P(X<=2) ≠ P(X<2) • P(X=x) ≈ 0 • i.e. P(X<=2)=P(X<2) X=rate of return on a proposed investment 𝑋~𝑁(μ=0.30 , σ = 0.10) 𝑍~𝑁(μ = 0 , σ = 1) 𝑥−μ 𝑃 𝑋 ≤ $0.23 = 𝑃(𝑍 ≤ ) σ 0.23−0.30 = 𝑃(𝑍 ≤ 0.10 ) = 𝑃(𝑍 ≤ −0.70) = We need to do this to standardise the distribution so we can find the probabilities using the tables. Why do we standardise the normal distribution to find probabilities? When we are given a problem about a random variable X~N(μ,σ) that follows a normal distribution, to find cumulative probabilities P[X<x] we would have to calculate a complex definite integral. To do this computation easier, we have standard practices in statsistics where we can convert the normal random variable X~N(μ,σ) into a standard normal random variable Z~N(μ=0,σ=1) (by computing the associated z-score, then we can just look up the tables to find certain probabilities): @ [email protected] Calculating Probabilities using normal distribution applying the complement rule and/or symmetry rule and/or interval rule • Complement Rule P(Z>z)=1-P(Z<z) • Symmetry Rule P(Z<-z)=P(Z>z) • Interval Rule P(-z<Z<z)=P(Z<z)-P(Z<-z) @ [email protected] X = time of visits 𝜆 =1/ μ = mean number of events per unit of time = two clients to talk to on average per 30mins = one client to talk to on average per 15mins =1/15 clients on average per minute = 0.0666667 clients per minute X = time of visits 𝜆 =1/ μ = mean number of events per unit of time = two clients to talk to on average per 30mins = one client to talk to on average per 15mins =1/15 clients on average per minute = 0.0666667 clients per minute P(X<x) = 1 - e-λx P(X<10)=1-e-0.06667*10= We would use this for Poisson: 𝜆 = 2 clients to talk to on average per 30mins = 2/3 client to talk to on average per 10mins In statistics we usually want to analyse a population parameter but collecting data for the whole population is usually impractical, expensive and unavailable. That is why we collect samples from the population (sampling) and make conclusions about the population parameters using the statistics of the sample (inference) with some level of confidence (level of significance). Week 1 25 Types of data – Graphical displays Nominal Categorical (no order) e.g. Nationality, Gender, Month (divides the cases into groups/categories) Ordinal (order) Bar Chart Pie Chart e.g. Satisfaction Level or level of education Discrete Quantitative (measures a numerical quantity for each case) takes whole number values e.g. number of birds in a tree, shoe size Continuous Boxplot Histogram e.g. height, temp Week 1 26 Presenting Data Graphically Univariate Categorical Data Quantitative Data Bar Charts Pie Charts Histogram Boxplot to depict frequencies to depict proportions to look at the distribution graphical summary statistics 27 Displaying categorical data Bar Graph Good for depicting frequencies. Week 1 Pie Graph Good for depicting proportions 28 Displaying quantitative data Histogram Good for getting an overall ‘picture’ of the data Week 1 Boxplot Good for finding unusual observations and looking at the symmetry of the data 29 Presenting Data Graphically Bivariate Categorical x Categorical to compare frequencies Crosstabs to depict frequencies, row and column proportions Quantitative x Quantitative Multiple Comparison Boxplots 40 163 178 160 75 162 179 145 30 70 body mass index Multiple Bar Charts Categorical x Quantitative 20 10 N= SEX 102 100 male female Scatterplots relationship between two numerical variables Types of data – Measures of central tendency Nominal (no order) e.g. Nationality, Gender Mode Categorical Ordinal (order) e.g. S,M,L or level of education Discrete takes whole number values e.g. number of birds in a tree Quantitative Continuous e.g. height Week 1 Median or Mode Mean, Median or Mode Mean or Median or 31 Mode Types of data – Measures of spread Nominal (no order) e.g. Nationality, Gender None Categorical Ordinal (order) IQR e.g. S,M,L or level of education Discrete takes whole number values e.g. number of birds in a tree Quantitative Continuous e.g. height Week 1 Range, IQR SD CV Range, IQR SD 32 CV Measures of dispersion The coefficient of variance is used when you are comparing the variability between groups with different means. 𝑠 𝐶𝑉 = ∗ 100 𝑥 It is defined as the ratio of the standard deviation to the mean, expressed as a percentage. Dividing the standard deviation by the mean standardises the measure of variability so it is suitable for comparison. Week 1 33 Summary: Probability Marginal Probability Union Probability Joint Probability Conditional Probability Independent Probability 34 @ Dr. Sonika Singh, BSTATS, UTS U 𝑃(𝐴⋂ 𝐵) 𝑃 𝐴𝐵 = 𝑃(𝐵) 𝑃 𝐴⋂𝐵 = 𝑃 𝐴 ∗ 𝑃(𝐵) Critical Value=CHIINV(0.05,df) Conclusion: If χ2>critical value we reject Ho. We conclude that at the 5% level of significance we have enough evidence to show that the two variables are not independent. If χ2<critical value we do not reject Ho. We conclude that at the 5% level of significance we do not have enough evidence to show that the two variables are not independent. Week 1 38 • Uniform X~Uniform(a,b) • Normal X~Normal(μ,σ) • Exponential X~Exp(λ) Week 1 39 SEE YOU ALL NEXT WEEK! UTS CRICOS PROVIDER CODE: 00099F