Download Question 3 Continuous Probability Distribution Continuous

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Question 1. (Topics 1-3)
A population consists of all the members of a group about which you want
to draw a conclusion (Greek letters (μ, σ, Ν) are used)
A sample is the portion of the population selected for analysis (Roman letter
(x, s, n) are used for sample data)
A parameter is a numerical measure that describes a characteristic of a
population
A statistic is a numerical measure that describes a characteristic of a sample
Class intervals: Width of interval ≅
range
no.of desired class groupings
Numerical data is measured on a natural numerical scale (age)
Continuous – Data that can take on any real number (time/length)
Discrete - Countable number of responses (cannot have 0.5)
Categorical data can only be named or categorised
Nominal – no order, no response is considered better (gender)
Ordinal – There is an order (very good, good, average)
Descriptive Statistics - Collect, Present, Characterise data
Arithmetic Mean: 𝑋̅ =
Measures of Dispersion: Variance, Standard Deviation, Coefficient of Variation
Reordered data: 3, 4, 7, 9
Variance:
where SXfirstly
& SYfind
= S.Dev
𝑥 = formula
5.75
𝑛 (𝑥
− 𝑥 )2
𝑖=1
2
𝑠SXSy
=
= Sample Variance
𝑛−1
[(3 − 7)2 + (4 − 7)2 + (7 − 7)2 + (9 − 7)2 ]
5.75 − 1
[(−4)2 + (−3)2 + (0)2 + (2)2]
=
4.75
16 + 9 + 0 + 4
29
=
=
= 6.10
4.75
4.75
Standard deviation: 𝑠 = 𝑠 2 = 6.1 = 2.46
Coefficient of variation:
𝑠
2.46
𝐶𝑉 = × 100% =
× 100% = 61.7%
𝑥
4
𝑛
Median (Position):
Find the following probabilities
1. 𝑃(𝑍 < −1.67) = 0.0475
Read straight from the table.
Note: P(Z<1.846) we can only look up z values to two decimal
places so round 1.846 up to 1.85
2. 𝑃(𝑍 > −2.78) = ?
1 − 𝑃(𝑍 < −2.78) = ?
1 − 0.27 = 0.9973
3.𝑃(0.15 < 𝑍 < 1.99) = ?
𝑃(𝑍 < 1.99) = 0.9767
𝑃(𝑍 < 0.15) = 0.5596
0.9767 − 0.5596 = 0.4171
Solve the following inverse problems for the standard normal
distribution
𝑃(𝑍 > ____ ) = 0.01
Look up the Inverse Normal Table
𝑃(𝑍 > 2.3263) = 0.01
2
Range 𝑿𝒎𝒂𝒙 − 𝑿𝒎𝒊𝒏
Z Score: 𝒁 =
̅
𝑿−𝑿
𝑺
Z Outliers = > 3.0 or <-3.0
Inter-Quartile Range: 𝐼𝑄𝑅 = 𝑄3 − 𝑄1
Covariance tells us only the direction of association Sample coefficient of correlation r: r =
Sample of n = 4:
(2, 3), (7, 9), (4, 5), (4, 6)
2 + 7 + 4 + 4 17
𝑥=
=
= 4.25
4
4
3 + 9 + 5 + 6 23
𝑦̅ =
=
= 5.75
4
4
(𝑥 − 𝑥 )(𝑦 − 𝑦̅)
𝑥
𝑦
(𝑥 − 𝑥 )
(𝑦 − 𝑦̅)
2
3
-2.25
-2.75
6.19
7
9
2.75
3.25
8.94
4
5
-0.25
-0.75
0.19
4
6
-0.25
0.25
-0.06
(𝑥 − 𝑥 )(𝑦 − 𝑦̅) = 15.26
(𝑥 − 𝑥 )(𝑦 − 𝑦̅) 15.26
𝑐𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 =
=
= 5.09
𝑛−1
4−1
(Direction)
𝑐𝑜𝑣𝑎𝑟
5.09
𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 = 𝑟 =
=
= 0.99
𝑠𝑥 × 𝑠𝑦 2.06 × 2.5
(Strength)
Question 3 Continuous Probability Distribution
𝑛+1
Quartile (Position): 𝑄1 = 0.25(𝑛 + 1), 𝑄2 = 0.50(𝑛 + 1), 𝑄3 = 0.75(𝑛 + 1)
Measures of Central Tendency: Arithmetic Mean, Median, Mode
Numerical
Descriptive Measures
cov(X,Y)
𝑋1 +𝑋2 +⋯𝑋𝑛
Inferential Statistics - Drawing conclusions about a population
based on sample data
Frequency Distributions - summary table in which data are
arranged into numerically ordered classes or intervals
Ordered array: sequence of data in rank order
Time Series – Data collect through time (Months sales for May)
Cross Sectional – Collected for a point in time (My height today)
𝑐𝑜𝑣𝑎𝑟
𝑠𝑥 ×𝑠𝑦
where 𝑠𝑥 & 𝑠𝑦 = S.Dev formula
Interpreting Correlation Coefficient
r
Interpretation
r = -1
PERFECT negative linear
-1 < r ≤ -0.7
STRONG negative linear
-0.7 < r ≤ -0.3
MODERATE negative linear
-0.3 < r < 0
WEAK negative linear
r=0
No relationship
0 < r < 0.3
WEAK positive linear
0.3 ≤ r < 0.7
MODERATE positive linear
0.7 ≤ r < 1
STRONG positive linear
1
PERFECT positive linear
Population mean – μ
Sample mean - 𝑋̅
Population variance - 2 Sample Proportion – p
Standard Deviation – S Variance – 𝑆 2
Continuous Probability Distribution cont.
Question 4. Sampling Distribution
Between what two values of Z (symmetrically distributed around
the mean) will 68.26% of all possible Z values be contained?
Each tail has an area, α = 0.1587 (i.e. (1 - 0.6826)/2, so if we use
the Cumulative Normal Distribution table and look for the area of
0.1587, we find that P(Z < -1) = 0.1587. Therefore the right tail
where Z = +1 has the same area.
So the two values of Z that we are looking for are -1 and +1.
i.e. P( -1 < Z < 1) = 0.6826 as in the diagram.
Using Inverse Normal table, only look
up an area to two decimal places: 0.16
(i.e. 0.1587 rounded to two decimal
places) and we would conclude that the
two values of Z were Z = 0.9945 and
Z = -0.9945 i.e. P( -0.9945 < Z < 0.9945) = 0.68
Sampling Distribution cont.
I
The Inverse table only gives the Z values for upper-tail areas,
but because the normal distribution is symmetric about zero, we
find the upper-tail Z value, and the lower-tail Z value that we
need is the same value but negative.
Find the two values of Z (symmetrically distributed around the
mean) such that the following statements are true:
𝑃(____ < 𝑍 < ____ ) = 0.80
Each tail will have an area of 0.10, so looking up the Inverse
table to get the two Z values:
𝑍𝐿𝑂𝑊𝐸𝑅 = −1.2816
𝑍𝑈𝑃𝑃𝐸𝑅 = −1.2816
P(−1.2816 < 𝑍 < 1.2816) = 0.80
Estimation cont. / Confidence Intervals.
Sampling Distribution cont.
Estimation
Is it for μ? No 𝑋2 =
(𝑛−1)2
̅ −𝜇
𝑋
Yes Is  known? No 𝑡 = 𝑆
2
⁄
𝑋̅−𝜇
Student Name:
Student No:
Yes Quantitative – 𝑍 = 
⁄
𝑛
Qualitative 𝑍 =
𝑛
𝑝−𝜋
√𝜋(1−𝜋)
𝑛
Question 2 Simple Linear Regression & Probability
Probability & Discrete Probability Distributions
Probability & Discrete Probability Distributions
Binomial Distribution (Question will provide n, x and % (portion)
L
Hypothesis Testing cont.
Question 5 Hypothesis testing
Two population Proportion Example – Two Sample
(Rejection region use inverse normal table)
Pooled-Variance t Test Example – Two Sample
(Sigma Unknown, Variance Equal, Assume n =30min (Central Limit T)
F Test Example – Two Sample (F table for reject regions)
1.6449
(t0.05, 1998)
𝑑𝑓 = 𝑛1 + 𝑛2 − 2
= 1000 + 1000 − 2
= 1998
FL =
1
𝐹𝑢∗
1
= 1.67
= 0.599
Fu = F 0.025 , 99 , 71
= F 0.025 , 60 , 60
= 1.67
Fu* = F 0.025 , 71, 90
= F 0.025 , 60 , 60
= 1.67
Analysis of Variance (ANOVA)
BSB123 Data Analysis
Semester 2 2015
Workshop 8 (Week 10) – Estimation
Question 1
The quality control manager at a light bulb factory needs to estimate that mean life of a large
shipment of light bulbs. The standard deviation is 100 hours. A random sample of 64 light bulbs
indicates a sample mean life of 350 hours.
(a)
Construct a 95% confidence interval estimate of the population mean life of light bulbs in this
shipment.
(b)
Do you think that the manufacturer has the right to state that the light bulbs last an average
of 400 hours? Explain.
The first approach is purely to say it’s outside the confidence interval. The second approach is to take
that value of 400 convert it to a Z value, so you can determine the probability that the statement is
correct.
(c)
Must you assume that the population of light bulb life is normally distributed? Explain.
No because my sample size is >30. Therefore according to the CLT (central limit theorem) at the very
least I will end up with approximate normal distribution
In other words if we have 30 observations or more, under the CLT we have a ≈ Normal
Question 2
If X̅ = 75, S = 24, n = 36, and assuming that the population is normally distributed, construct a 95%
confidence interval estimate of the population mean μ.
Question 3
A study conducted by the Australian Stock Exchange found that 46% of 2,405 Australian adults
surveyed in 2006 held shares, either directly or indirectly through managed funds or self-managed
superannuation funds (2006 Australian Share Ownership Study, ASX).
(a)
Construct a 95% confidence interval for the proportion of Australian adults who held shares
in 2006.
When dealing with populations proportions we always use a Z.
(b)
Interpret the interval constructed in (a).
As above. I am 95% confident that the true proportion of Australian adults who held shares in 2006 is
between 44 and 48%
(c)
To construct a follow-up study to estimate the population proportion of adults who currently
hold shares to within 0.01 with 95% confidence, how many adults would you interview?
Related documents