Download Random Processes and Time Series Analysis Tutorial 1: Probability

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Law of large numbers wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
 Random Processes and Time Series Analysis Tutorial 1: Probability and Statistics. Normal Distribution Learning outcomes. By the end of Tutorial 1 you should • know how to plot a probability density function (PDF), or a distribution, of a discrete random variable • know the properties of a probability density function (PDF), or a distribution, of a continuous random variable; know how to sketch the PDF and interpret it • be able to standardize a normal random variable • be able to find probability of a normal random variable to fall inside the given interval For Questions 1-­‐4 you are invited to use the “Useful properties of probability density functions” of discrete and continuous random variables in lecture notes. 1. Consider tossing a fair coin. There are 2 possible outcomes of this experiment, Heads and Tails, and 2 possible values of the discrete random variable X describing them: 0 for Heads and 1 for Tails. a. Plot the probability density distribution for X. [Hint: a fair coin means that the probabilities of both outcomes are the same. What are they?] b. Find the mean value and the variance of X. Is the mean value a possible value of X? 2. Consider tossing an unfair coin, in which the probability to get Heads is 0.7. a. What is the probability to get Tails? b. Plot the probability density distribution for the discrete random variable X. c. Find the mean value and the variance of X. Is the mean value a possible value of X? 3. Consider a continuous random variable Y and a function f (y) = cy 2 , 0 ≤ y ≤ 2 and f (y) = 0 elsewhere. a. Find the value of c for which f (y) is a valid distribution density. b. Find the probability that Y takes a value between 1 and 2. Illustrate your solution with a sketch. c. Find the mean value and the variance of Y. 4. Consider a continuous random variable X which takes values from inside the interval [0,4] with probability density distribution f (x) = Ce− x a. Find the value of the constant C. €
b. Find the mean value of X up to 2 decimal points. c. What is the probability that X takes a value between 0 and 1? Illustrate your solution with a sketch. (continues on next page) In Questions 5-­‐8, use the Table for Standard Normal Probability Integral in Appendix A of lecture notes. You might find it helpful to consult 5 cases in lecture notes, pages 8-­‐9. 5. For a standard normal random variable Z find the following: a. P(0 ≤ Z ≤ 1.2) b. P(−0.9 ≤ Z ≤ 0) c. P(0.3 ≤ Z ≤ 1.56) d. P(−1.56 ≤ Z ≤ −0.2) €
e. P(Z > 0) €
6.€ For a standard normal random variable Z find the value of z0 such that €
a. P(Z > z0 ) = 0.5 €
b. P(Z < z0 ) = 0.8643 c. P(−z0 < Z < z0 ) = 0.90 €
d.
P(−z
<
Z
<
z
)
=
0.99
0
0
€
€
7. The marks in a certain module are normally distributed with mean 75 and standard €
deviation 10. What is the fraction of marks that lie between 80 and 90? €
8. The weekly amount of money spent on maintenance and repairs by a company is approximately normally distributed with mean £400 and standard deviation £20. What is the probability that the actual amount spent in the given week will exceed £450? How much should be budgeted for weekly repairs and maintenance to provide that the probability that the budgeted amount will be exceeded is only 0.1? Random Processes and Time Series Analysis. Tutorial 2: Probability and Statistics. Central Limit Theorem. Sampling distribution. Learning outcomes. By the end of Tutorial 2 you should • be able to distinguish between random and non-­‐random parameters in the problem of analysing a random sample • understand how variance of the distribution of sample means depends on the sample size •
be able to estimate the population mean and the standard error of the means from parameters of a sample • be able to estimate probabilities from distribution of sample means To solve Questions 3 and 4 you will need a Table for Normal distribution in Appendix A of lecture notes. 1. Which of the following is a random variable? a) population mean, µ b) population size, N c) sample size, n d) sample mean, X e) sample variance, s 2 f) largest value in the sample g) population variance, σ 2 2. Explain what is wrong in each of the following statements. a) If the variance of the population σ 2 = 10 , then variance of the means of a sample of n=30 observations from this population will be 10 30 . b) When taking random samples from a population, larger sample sizes will result in larger standard deviations of the sample mean. c) The mean of a sampling distribution of X changes when the sample size changes. 3. Consider lengths of calls to a service centre. Their standard deviation is σ = 184.81 sec . A sample of 80 calls have mean length of X = 196.6 sec . Estimate the true average call length. Find the standard error of the mean call times. Sketch the distribution of sample means with reasonable accuracy. If you wish to reduce this error by a factor of 4, how can you change the sample size? 4. Suppose the time X between text messages arriving at your mobile phone has mean µ = 25 min and standard deviation σ = 25 min and is distributed exponentially, i.e. f (x) = Ce− λ x , x ∈ [0,∞) , where C and λ are some positive constants. You record times between your next 50 messages. What is the probability that their average exceeds 21 minutes? Hint: Use Central Limit Theorem and give a sketch of the PDD of sample means. Random Processes and Time Series Analysis. Tutorial 3: Sampling distribution and confidence interval Learning outcomes. By the end of Tutorial 3 you should: • be be able to distinguish between “large” and “small” samples • be able to obtain an unbiased estimate of a population mean • be able to obtain biased and unbiased estimates of population variance and standard deviation • be able to obtain a confidence interval for the estimate of the population mean You will need a Table for Normal distribution in Appendix A of lecture notes. 1. The shopping times of n=64 randomly selected customers (“large sample”) at a local supermarket were recorded. The mean and variance of the 64 shopping times were y = 33 minutes and s 2 = 256 minutes squared, respectively. Assume that the sample mean has a 
normal distribution and the biased estimate σ 2 = s 2 is a good approximation of the population variance σ 2 . a) What is the probability that the sample mean is greater than 35 minutes? b) Estimate µ , the true average shopping time per customer, and give a 90% confidence interval for µ . 2. A random sample of 100 incoming freshmen (“large” sample) was taken to look at their college anxiety level. The mean score of the sample was 87.3 out of 100, and the sample variance was 9. Give the 95% confidence interval for the average anxiety level µ among all freshmen. 3. a) In a certain county, a random sample of 50 eleven-­‐year-­‐old children (“large” sample) is found to have the average weight of x = 36 kilograms (kg), and a sample variance s 2 = 81 kg 2 . i.
What is the probability that the average weight of a sampled child will be less than 34 kg? Illustrate your calculations with a suitable graph. ii.
Give a 95% confidence interval for the average child weight µ . b) For a random sample of 15 eleven-­‐year-­‐old children ("small” sample), the sample mean was x = 37 kg and sample variance s 2 = 83 kg 2 . Find an unbiased estimate of the population variance. 4. (This is additional problem for students who would appreciate some extra challenge) Prove that for a population of size N, the population variance is 1
σ 2 = ( x12 + x22 +…+ x N2 ) − µ 2 , N
where µ is the population mean. Hint: The definition of the population variance is 1
2
2
2
σ2 =
( x1 − µ ) + ( x2 − µ ) +…+ ( x N − µ ) . N
(
)
Random Processes and Time Series Analysis. Tutorial 4: Hypothesis testing with normal distribution: 2-­‐tailed tests Learning outcomes. By the end of Tutorial 4, assuming that population means follow normal distirbution, you should be able to • perform a two-­‐tailed hypothesis test regarding the estimated value of the population mean • be able to find a confidence interval for the mean estimate The problems given in this tutorial sheet is the absolute minimal set of problems all students should solve before the exams. You will need a Table for Normal distribution in Appendix A of lecture notes. 1. In a certain county, a random sample of 15 eleven-­‐year-­‐old children (“small” sample) is found to have the average weight of x = 37 kilograms (kg), and a sample variance s 2 = 83 kg 2 . a) Find an unbiased estimate of the population variance. b) Assuming that the sample mean follows normal distribution, test the hypothesis that the average child weight is 36.5 kg against the alternative hypothesis that it is not equal to 36.5 kg. Use a test with significance level 10%. Illustrate your calculations with suitable graphs. 2. A random sample of n=71 females was found to have the mean cholesterol level of 173 mg/dl with a sample standard deviation 27 mg/dl. Test the null hypothesis that the mean cholesterol level µ in the female population is 168 mg/dl against the alternative hypothesis that it is not equal to 168 mg/dl. Take significance level of 1%. Hint: Since n>30, we can assume that the sample mean follows normal distribution, and the sample standard deviation is a good estimate of the population standard deviation σ . 3. The lead concentration in a water specimen is analysed, and the outcomes of each measurement are described by a random variable. The mean of 3 measurements (the sample mean) is found to be 6.7 units, and it is known that the standard deviation of multiple repeated measurements (“of the underlying population”) is σ = 0.25 units. Assuming that the sample means follow normal distribution, do the following: a) Find the 99% confidence interval for the mean lead concentration µ . b) Test the hypothesis that the mean lead concentration is equal to 7.1 units against the alternative hypothesis that it is not equal to 7.1 units. Take significance level of 2%. 4. The standard deviation of the scores in a math exam was 2.4 and a sample of 36 students scored an average of 5.6. With this data, can the hypothesis be confirmed that the average test score was 6 with a significance level of 5%? Random Processes and Time Series Analysis. Tutorial 5: 1-­‐tailed test with normal distribution. Confidence intervals and hypothesis testing with t-­‐distribution Learning outcomes. By the end of Tutorial 5 you should: * be able to perform a one-­‐tailed hypothesis testing * be able to assess for what sample sizes and at what conditions you should assume t-­‐distribution for sample means * be able to test a hypothesis regarding the estimated value of the population mean, and to find a confidence interval for the mean estimate, assuming that the sample means follow t-­‐distribution The problems given in this tutorial sheet is the absolute minimal set of problems all students should solve before the exams. You will need a Table for t-­‐distribution in Appendix B of lecture notes. 1. A random sample of n=71 females was found to have the mean cholesterol level of 173 mg/dl with a sample standard deviation 27 mg/dl. a) Test the null-­‐hypothesis that the mean cholesterol level µ in the female population is 168 mg/dl against the alternative hypothesis that µ > 168 mg/dl. Take significance level of 1%. b) Test the null-­‐hypothesis in a) against the alternative hypothesis that µ < 168 mg/dl. Take significance level of 5%. 2. A random sample of n=15 females was found to have the mean cholesterol level of 167 mg/dl with the sample standard deviation 30 mg/dl. a) Test the null hypothesis that the mean cholesterol level µ in the female population is 168 mg/dl against the alternative hypothesis that it is not equal to 168 mg/dl. Take significance level of 1%. b) Test the null-­‐hypothesis in a) against the alternative hypothesis that µ > 168 mg/dl. Take significance level of 1%. c) Test the null-­‐hypothesis in a) against the alternative hypothesis that µ < 168 mg/dl. Take significance level of 5%. d) Find the 95% confidence interval for the mean cholesterol level. Hint: Here n<30, so we can assume that a suitable test statistic follows t-­‐distribution, and n
we should use the unbiased estimate σˆ = s
for the population standard deviation σ . n −1 3. The lead concentration in a water specimen is analysed, and the outcomes of each measurement are described by a random variable. The mean of 3 measurements (the sample mean) is found to be 6.7 units, and the standard deviation of this sample is 0.28 units. Assuming that the suitable statistic follows t-­‐distribution, do the following: a) Find the 99% confidence interval for the mean lead concentration µ . b) Test the hypothesis that the mean lead concentration is equal to 6.2 units against the alternative hypothesis that it is not equal to 6.2 units. Take significance level of 5%. c) Test the hypothesis that the mean lead concentration is equal to 7 units against the alternative hypothesis that it is less than 7 units. Take significance level of 12.5%.