Download Tutorial 1: Introduction to Business Statistics (BStats)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
26134 Business Statistics
Autumn 2017
Tutorial 5: Continuous
Probability Distributions
[email protected]
B MathFin (Hons)
M Stat (UNSW)
PhD (UTS)
mahritaharahap.wordpress.com/
teaching-areas
UTS CRICOS PROVIDER CODE: 00099F
business.uts.edu.au
A random variable, usually written X, is
a variable whose possible values are numerical
outcomes of a random phenomenon. There are
two types of random variables, discrete and
continuous.
Highlights:
Reminder of Key Steps from Lecture 4 (Discrete Probability)
Step One: Determine the Distribution




Look at the type of variable the question is talking about
Think about the properties of the distribution in the question and match it to one you know about
Is it discrete or continuous
If it is discrete, does it match to properties of a binomial, hypergeometric, Poisson?
Step Two: What is the question asking for?
 Look at what the question is asking you to do. Is it “find the probability…”, or something else.
 Write down what exactly it is you are finding (e.g., P(x>2))
 Can you rewrite this probability (e.g., 1 – P(X<=2))
Step Three: Get the necessary formula
 Based on the information from steps 1 and 2 you should be able to pick the needed formula from
your formula sheet.
Step Four: Apply the formula to the question
 Match up all the terms in the formula to numbers in the question and find the answer needed.
𝑓 𝑥 =𝑃 𝑋=𝑥
n
=C
where:
f(x) = probability of x successes in n independent trials
p = probability of success
q = 1-p = probability of failure
𝑥 = number of successful trials
n-x = number of failed trials
x
x
(n-x)
pq
𝑥 −𝜆
𝜆 𝑒
𝑓 𝑥 =𝑃 𝑋=𝑥 =
𝑥!
where:
f(x) = probability of x occurrences in a specified interval
𝜆 = mean number of events in a specified interval
𝑓 𝑥 =𝑃 𝑋=𝑥 =
rC N−rC
x
n−x
NC
n
where:
f(x) = probability of x successes in n dependent trials
n = number of trials
N = number of elements in the population
r = number of elements in the population labelled success
x= number of elements in the sample labelled success
• Discrete Distributions
• P(X<=x)≠P(X<x)
• Continuous Distributions
• P(X<=x)=P(X<x)
• P(X=x) ≠ 0
• i.e. P(X<=2) ≠ P(X<2)
• P(X=x) ≈ 0
• i.e. P(X<=2)=P(X<2)
X=rate of return on a proposed investment
𝑋~𝑁(μ=0.30 , σ = 0.10)
𝑍~𝑁(μ = 0 , σ = 1)
𝑥−μ
𝑃 𝑋 ≤ $0.23 = 𝑃(𝑍 ≤
)
σ
0.23−0.30
= 𝑃(𝑍 ≤ 0.10 )
= 𝑃(𝑍 ≤ −0.70)
=
We need to do this to standardise the distribution so we can find the probabilities using the tables.
Why do we standardise the normal
distribution to find probabilities?
When we are given a problem about a random variable X~N(μ,σ) that follows a
normal distribution, to find cumulative probabilities P[X<x] we would have to
calculate a complex definite integral.
To do this computation easier, we have standard practices in statsistics where we can
convert the normal random variable X~N(μ,σ) into a standard normal random
variable Z~N(μ=0,σ=1) (by computing the associated z-score, then we can just look
up the tables to find certain probabilities):
@ [email protected]
Calculating Probabilities using normal distribution applying the
complement rule and/or symmetry rule and/or interval rule
• Complement Rule P(Z>z)=1-P(Z<z)
• Symmetry Rule P(Z<-z)=P(Z>z)
• Interval Rule P(-z<Z<z)=P(Z<z)-P(Z<-z)
@ [email protected]
X = time of visits
𝜆 =1/ μ = mean number of events per unit of time
= two clients to talk to on average per 30mins
= one client to talk to on average per 15mins
=1/15 clients on average per minute
= 0.0666667 clients per minute
X = time of visits
𝜆 =1/ μ = mean number of events per unit of time
= two clients to talk to on average per 30mins
= one client to talk to on average per 15mins
=1/15 clients on average per minute
= 0.0666667 clients per minute
P(X<x) = 1 - e-λx
P(X<10)=1-e-0.06667*10=
We would use this for Poisson:
𝜆 = 2 clients to talk to on average per 30mins
= 2/3 client to talk to on average per 10mins
In statistics we usually want to analyse a population parameter but collecting data
for the whole population is usually impractical, expensive and unavailable. That is
why we collect samples from the population (sampling) and make conclusions
about the population parameters using the statistics of the sample (inference) with
some level of confidence (level of significance).
Week 1
25
Types of data – Graphical displays
Nominal
Categorical
(no order)
e.g. Nationality, Gender, Month
(divides the cases into
groups/categories)
Ordinal (order)
Bar Chart
Pie Chart
e.g. Satisfaction Level or level of
education
Discrete
Quantitative
(measures a
numerical quantity for
each case)
takes whole number values
e.g. number of birds in a tree,
shoe size
Continuous
Boxplot
Histogram
e.g. height, temp
Week 1
26
Presenting Data Graphically
Univariate
Categorical Data
Quantitative Data
Bar Charts
Pie Charts
Histogram
Boxplot
to depict frequencies
to depict proportions
to look at the
distribution
graphical summary
statistics
27
Displaying categorical data
Bar Graph
Good for depicting frequencies.
Week 1
Pie Graph
Good for depicting proportions
28
Displaying quantitative data
Histogram
Good for getting an overall ‘picture’
of the data
Week 1
Boxplot
Good for finding unusual observations and
looking at the symmetry of the data
29
Presenting Data Graphically
Bivariate
Categorical x
Categorical
to compare frequencies
Crosstabs
to depict frequencies, row
and column proportions
Quantitative x
Quantitative
Multiple
Comparison
Boxplots
40
163
178
160
75
162
179
145
30
70
body mass index
Multiple Bar
Charts
Categorical x
Quantitative
20
10
N=
SEX
102
100
male
female
Scatterplots
relationship between two
numerical variables
Types of data – Measures of central tendency
Nominal
(no order)
e.g. Nationality, Gender
Mode
Categorical
Ordinal (order)
e.g. S,M,L or level of education
Discrete
takes whole number values
e.g. number of birds in a tree
Quantitative
Continuous
e.g. height
Week 1
Median
or Mode
Mean,
Median or
Mode
Mean or
Median or
31
Mode
Types of data – Measures of spread
Nominal
(no order)
e.g. Nationality, Gender
None
Categorical
Ordinal (order)
IQR
e.g. S,M,L or level of education
Discrete
takes whole number values
e.g. number of birds in a tree
Quantitative
Continuous
e.g. height
Week 1
Range, IQR
SD
CV
Range, IQR
SD
32
CV
Measures of dispersion
The coefficient of variance is used when you are
comparing the variability between groups with different
means.
𝑠
𝐶𝑉 = ∗ 100
𝑥
It is defined as the ratio of the standard deviation to the
mean, expressed as a percentage.
Dividing the standard deviation by the mean standardises
the measure of variability so it is suitable for comparison.
Week 1
33
Summary: Probability





Marginal Probability
Union Probability
Joint Probability
Conditional Probability
Independent Probability
34
@ Dr. Sonika Singh, BSTATS,
UTS
U
𝑃(𝐴⋂ 𝐵)
𝑃 𝐴𝐵 =
𝑃(𝐵)
𝑃 𝐴⋂𝐵 = 𝑃 𝐴 ∗ 𝑃(𝐵)
Critical Value=CHIINV(0.05,df)
Conclusion:
If χ2>critical value we reject Ho. We conclude that at the 5% level of
significance we have enough evidence to show that the two variables are
not independent.
If χ2<critical value we do not reject Ho. We conclude that at the 5% level of
significance we do not have enough evidence to show that the two variables
are not independent.
Week 1
38
• Uniform X~Uniform(a,b)
• Normal X~Normal(μ,σ)
• Exponential X~Exp(λ)
Week 1
39
SEE YOU ALL NEXT WEEK!
UTS CRICOS PROVIDER CODE: 00099F