Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Question 1. (Topics 1-3) A population consists of all the members of a group about which you want to draw a conclusion (Greek letters (μ, σ, Ν) are used) A sample is the portion of the population selected for analysis (Roman letter (x, s, n) are used for sample data) A parameter is a numerical measure that describes a characteristic of a population A statistic is a numerical measure that describes a characteristic of a sample Class intervals: Width of interval ≅ range no.of desired class groupings Numerical data is measured on a natural numerical scale (age) Continuous – Data that can take on any real number (time/length) Discrete - Countable number of responses (cannot have 0.5) Categorical data can only be named or categorised Nominal – no order, no response is considered better (gender) Ordinal – There is an order (very good, good, average) Descriptive Statistics - Collect, Present, Characterise data Arithmetic Mean: 𝑋̅ = Measures of Dispersion: Variance, Standard Deviation, Coefficient of Variation Reordered data: 3, 4, 7, 9 Variance: where SXfirstly & SYfind = S.Dev 𝑥 = formula 5.75 𝑛 (𝑥 − 𝑥 )2 𝑖=1 2 𝑠SXSy = = Sample Variance 𝑛−1 [(3 − 7)2 + (4 − 7)2 + (7 − 7)2 + (9 − 7)2 ] 5.75 − 1 [(−4)2 + (−3)2 + (0)2 + (2)2] = 4.75 16 + 9 + 0 + 4 29 = = = 6.10 4.75 4.75 Standard deviation: 𝑠 = 𝑠 2 = 6.1 = 2.46 Coefficient of variation: 𝑠 2.46 𝐶𝑉 = × 100% = × 100% = 61.7% 𝑥 4 𝑛 Median (Position): Find the following probabilities 1. 𝑃(𝑍 < −1.67) = 0.0475 Read straight from the table. Note: P(Z<1.846) we can only look up z values to two decimal places so round 1.846 up to 1.85 2. 𝑃(𝑍 > −2.78) = ? 1 − 𝑃(𝑍 < −2.78) = ? 1 − 0.27 = 0.9973 3.𝑃(0.15 < 𝑍 < 1.99) = ? 𝑃(𝑍 < 1.99) = 0.9767 𝑃(𝑍 < 0.15) = 0.5596 0.9767 − 0.5596 = 0.4171 Solve the following inverse problems for the standard normal distribution 𝑃(𝑍 > ____ ) = 0.01 Look up the Inverse Normal Table 𝑃(𝑍 > 2.3263) = 0.01 2 Range 𝑿𝒎𝒂𝒙 − 𝑿𝒎𝒊𝒏 Z Score: 𝒁 = ̅ 𝑿−𝑿 𝑺 Z Outliers = > 3.0 or <-3.0 Inter-Quartile Range: 𝐼𝑄𝑅 = 𝑄3 − 𝑄1 Covariance tells us only the direction of association Sample coefficient of correlation r: r = Sample of n = 4: (2, 3), (7, 9), (4, 5), (4, 6) 2 + 7 + 4 + 4 17 𝑥= = = 4.25 4 4 3 + 9 + 5 + 6 23 𝑦̅ = = = 5.75 4 4 (𝑥 − 𝑥 )(𝑦 − 𝑦̅) 𝑥 𝑦 (𝑥 − 𝑥 ) (𝑦 − 𝑦̅) 2 3 -2.25 -2.75 6.19 7 9 2.75 3.25 8.94 4 5 -0.25 -0.75 0.19 4 6 -0.25 0.25 -0.06 (𝑥 − 𝑥 )(𝑦 − 𝑦̅) = 15.26 (𝑥 − 𝑥 )(𝑦 − 𝑦̅) 15.26 𝑐𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = = = 5.09 𝑛−1 4−1 (Direction) 𝑐𝑜𝑣𝑎𝑟 5.09 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 = 𝑟 = = = 0.99 𝑠𝑥 × 𝑠𝑦 2.06 × 2.5 (Strength) Question 3 Continuous Probability Distribution 𝑛+1 Quartile (Position): 𝑄1 = 0.25(𝑛 + 1), 𝑄2 = 0.50(𝑛 + 1), 𝑄3 = 0.75(𝑛 + 1) Measures of Central Tendency: Arithmetic Mean, Median, Mode Numerical Descriptive Measures cov(X,Y) 𝑋1 +𝑋2 +⋯𝑋𝑛 Inferential Statistics - Drawing conclusions about a population based on sample data Frequency Distributions - summary table in which data are arranged into numerically ordered classes or intervals Ordered array: sequence of data in rank order Time Series – Data collect through time (Months sales for May) Cross Sectional – Collected for a point in time (My height today) 𝑐𝑜𝑣𝑎𝑟 𝑠𝑥 ×𝑠𝑦 where 𝑠𝑥 & 𝑠𝑦 = S.Dev formula Interpreting Correlation Coefficient r Interpretation r = -1 PERFECT negative linear -1 < r ≤ -0.7 STRONG negative linear -0.7 < r ≤ -0.3 MODERATE negative linear -0.3 < r < 0 WEAK negative linear r=0 No relationship 0 < r < 0.3 WEAK positive linear 0.3 ≤ r < 0.7 MODERATE positive linear 0.7 ≤ r < 1 STRONG positive linear 1 PERFECT positive linear Population mean – μ Sample mean - 𝑋̅ Population variance - 2 Sample Proportion – p Standard Deviation – S Variance – 𝑆 2 Continuous Probability Distribution cont. Question 4. Sampling Distribution Between what two values of Z (symmetrically distributed around the mean) will 68.26% of all possible Z values be contained? Each tail has an area, α = 0.1587 (i.e. (1 - 0.6826)/2, so if we use the Cumulative Normal Distribution table and look for the area of 0.1587, we find that P(Z < -1) = 0.1587. Therefore the right tail where Z = +1 has the same area. So the two values of Z that we are looking for are -1 and +1. i.e. P( -1 < Z < 1) = 0.6826 as in the diagram. Using Inverse Normal table, only look up an area to two decimal places: 0.16 (i.e. 0.1587 rounded to two decimal places) and we would conclude that the two values of Z were Z = 0.9945 and Z = -0.9945 i.e. P( -0.9945 < Z < 0.9945) = 0.68 Sampling Distribution cont. I The Inverse table only gives the Z values for upper-tail areas, but because the normal distribution is symmetric about zero, we find the upper-tail Z value, and the lower-tail Z value that we need is the same value but negative. Find the two values of Z (symmetrically distributed around the mean) such that the following statements are true: 𝑃(____ < 𝑍 < ____ ) = 0.80 Each tail will have an area of 0.10, so looking up the Inverse table to get the two Z values: 𝑍𝐿𝑂𝑊𝐸𝑅 = −1.2816 𝑍𝑈𝑃𝑃𝐸𝑅 = −1.2816 P(−1.2816 < 𝑍 < 1.2816) = 0.80 Estimation cont. / Confidence Intervals. Sampling Distribution cont. Estimation Is it for μ? No 𝑋2 = (𝑛−1)2 ̅ −𝜇 𝑋 Yes Is known? No 𝑡 = 𝑆 2 ⁄ 𝑋̅−𝜇 Student Name: Student No: Yes Quantitative – 𝑍 = ⁄ 𝑛 Qualitative 𝑍 = 𝑛 𝑝−𝜋 √𝜋(1−𝜋) 𝑛 Question 2 Simple Linear Regression & Probability Probability & Discrete Probability Distributions Probability & Discrete Probability Distributions Binomial Distribution (Question will provide n, x and % (portion) L Hypothesis Testing cont. Question 5 Hypothesis testing Two population Proportion Example – Two Sample (Rejection region use inverse normal table) Pooled-Variance t Test Example – Two Sample (Sigma Unknown, Variance Equal, Assume n =30min (Central Limit T) F Test Example – Two Sample (F table for reject regions) 1.6449 (t0.05, 1998) 𝑑𝑓 = 𝑛1 + 𝑛2 − 2 = 1000 + 1000 − 2 = 1998 FL = 1 𝐹𝑢∗ 1 = 1.67 = 0.599 Fu = F 0.025 , 99 , 71 = F 0.025 , 60 , 60 = 1.67 Fu* = F 0.025 , 71, 90 = F 0.025 , 60 , 60 = 1.67 Analysis of Variance (ANOVA) BSB123 Data Analysis Semester 2 2015 Workshop 8 (Week 10) – Estimation Question 1 The quality control manager at a light bulb factory needs to estimate that mean life of a large shipment of light bulbs. The standard deviation is 100 hours. A random sample of 64 light bulbs indicates a sample mean life of 350 hours. (a) Construct a 95% confidence interval estimate of the population mean life of light bulbs in this shipment. (b) Do you think that the manufacturer has the right to state that the light bulbs last an average of 400 hours? Explain. The first approach is purely to say it’s outside the confidence interval. The second approach is to take that value of 400 convert it to a Z value, so you can determine the probability that the statement is correct. (c) Must you assume that the population of light bulb life is normally distributed? Explain. No because my sample size is >30. Therefore according to the CLT (central limit theorem) at the very least I will end up with approximate normal distribution In other words if we have 30 observations or more, under the CLT we have a ≈ Normal Question 2 If X̅ = 75, S = 24, n = 36, and assuming that the population is normally distributed, construct a 95% confidence interval estimate of the population mean μ. Question 3 A study conducted by the Australian Stock Exchange found that 46% of 2,405 Australian adults surveyed in 2006 held shares, either directly or indirectly through managed funds or self-managed superannuation funds (2006 Australian Share Ownership Study, ASX). (a) Construct a 95% confidence interval for the proportion of Australian adults who held shares in 2006. When dealing with populations proportions we always use a Z. (b) Interpret the interval constructed in (a). As above. I am 95% confident that the true proportion of Australian adults who held shares in 2006 is between 44 and 48% (c) To construct a follow-up study to estimate the population proportion of adults who currently hold shares to within 0.01 with 95% confidence, how many adults would you interview?