Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Ministry of Education and Science of the Russian Federation Federal State Budgetary Educational Institution of Higher Professional Education "National Research Tomsk Polytechnic University" APPROVED Vice-Rector, Director of IC ___________ Zamyatin A.V. «___»________________ 2012 SYLLABUS OF THE COURSE Theory Probability and Mathematical Statistics BEP DEGREE PROGRAMME: 230100 “Computer science and technology” DEGREE: Bachelor CORE CURRICULUM FOR ADMISSION 2011 YEAR 2; TERM 4; NUMBER OF CREDITS: 4 PREREQUISITES: Calculus, Linear algebra EDUCATIONAL ACTIVITIES AND TIME RESOURCES: LECTURES 36 hours (classroom) PRACTICE 36 hours (classroom) TOTAL CLASSROOM ACTIVITIES 72 hours SELF-STUDY 72 hours TOTAL 144 hours MODE OF STUDY: Full Time INTERIM ASSESSMENT: Test DEPARTMENT: (CSO) HEAD OF DEPARTMENT: HEAD OF BEP: TUTOR: Control System Optimization O.B. Fofanov V.I. Reizlin A.V. Kitaeva 2012 1. Objectives The key aim has been to develop the ability to construct probabilistic models and to use common statistical methods in a manner that combines intuitive understanding and mathematical precision. Objective Objective Statement identification code C1 Preparation of the graduates for interdisciplinary scientific researches and innovations aimed at meeting professional challenges in the sphere of computer science and technology. C2 Preparation of the graduates for design work aimed at accomplishing professional projects in the sphere of computer science and technology that are competitive on the world market. C3 Preparation of the graduates for design and technological activities in professional sphere of computer science and technology C4 Preparation of the graduates for organizational and management activities while accomplishing interdisciplinary projects in professional sphere, including work in international teams of transnational companies. C5 Preparation of the graduates for science and education activities, development of their abilities for self-study and professional selfimprovement. 2. Course within the structure of the Basic Educational Program (BEP) “Theory Probability and Mathematical Statistics” course refers to “Mathematical, natural and scientific cycle” of the degree program “Computer science and technology” (the B2 block). Prerequisites: Calculus, Linear algebra. 3. Outcomes of mastering course As a result a student should: Know: the basic concepts and models of modern Probability and Statistics: probability space, random event, random variable, distribution, independence, sample space; the basic methods of probability calculation and statistical inference: combinatorics, conditioning, sum and product rules, point and interval estimation, hypotheses testing; basic discreet and continuous distributions, their properties and the field of application. Be able to: apply probabilistic and statistic methods to solve various theoretical and practical tasks; calculate the probabilities of compound events; find numerical characteristics of random variables, sample characteristics and describe their properties; formulate and test basic statistical hypotheses; construct basic point and interval estimators of distribution’s parameters and investigate their properties. Have skills of: elementary probability calculation; testing classic statistical hypotheses; parameter‘s estimating by the ML and MM methods. While mastering the discipline the following competences (according to BEP) are developed in students: 1.General ОК-1 To analyze and generalize the information ОК-2 To express the thoughts clearly ОК-10 To use probabilistic and statistic methods in their professional activity 2. Professional PK-5 To choose methods for solving management and design tasks in the sphere of computer science and technology PK-6 To justify the decisions, to prove their correctness 4. Course structure and content 4.1 Structure of the course by modules, forms of training and academic progress monitoring Table1 Modules Classroom activity (hrs) Lectures Practice Lab SelfStudy (hrs) 1. Basic probability 3 2 2 concepts 2. Combinatorics 2 3 6 3. Properties of proba- 2 2 6 bility 4. Conditional probability. Independence 2 3 6 Forms of current monitoring and assessment Seminar Total (hrs) 7 Seminar Task1 Seminar Task2 11 Seminar Task2 Case study2 11 10 Modules Classroom activity (hrs) Lectures Practice Lab SelfStudy (hrs) Forms of current monitoring and assessment Seminar Case study1 Total (hrs) 2 3 6 5 4 6 Seminar Task3 15 5 4 6 15 8. Limit results for sequences of random variables 9. Point estimation 4 4 6 Seminar Task3 Calculation task Seminar Test 4 4 6 14 10. Interval estimation 3 3 6 11. Hypotheses testing 4 4 6 12. Histogram, sample distribution function and box-and-whisker plot Total 0 0 10 Seminar Test Calculation task Seminar Test Seminar Test Calculation task Test Calculation task 36 36 5. The Monty Hall problem and other puzzles 6. Discrete random variables 7. Continuous random variables (RVs with densities) 0 72 0 11 14 12 14 10 144 4.2. Content of the course modules: Topic#1. Basic probability concepts Random experiments. Events. Probability. Historical remarks Topic#2. Combinatorics Topic#3. Properties of probability Topic#4. Conditional probability. Independence Topic#5. The Monty Hall problem and other puzzles Topic#6. Discrete random variables Distributions. Moments, mean, variance. Multivariate discrete distribution. Independence. Covariance and correlation Topic#7. Continuous random variables (RVs with densities) Density and distribution functions. Density and distribution functions. Bertrand’s paradox. Exponential distribution. Multivariate continuous distribution. Independence. Transformation of random variables. Sums, products, and quotients of ran- dom variables. Moment generating function. Distributions concerning normal. Conditional density and expectation. The median, quartiles, percentiles. Skewness and kurtosis. Simulation of random variables. Topic#8. Limit results for sequences of random variables Convergence of random variables. Inequalities. The law of large numbers. Sampling from a distribution. Central limit theorem. Topic#9. Point estimation Sample mean and sample variance. Properties of the estimators. Sufficient statistics. Method of moments. Method of maximum likelihood. Topic#10. Interval estimation Confidence intervals for mean. Confidence interval for difference of two means. Confidence interval for variance. Confidence interval for a proportion. Topic#11. Hypotheses testing Formulating the hypotheses ( H 0 and H 1 ). Test criterion and rejection region. Two types of errors. Performing a test. 5. Educational technologies Table 2. Methods and forms of training Types of learning activities Lectures Methods Case study IT methods Teamwork Learning based on experience Leap ahead self-study Projecting Searching and investigating Practice Selfstudy 6. Students' self-study Self-study is the most productive form of learning and cognitive activities of a student during the course. To fulfill creative abilities and deeper course mastering the following types of self-study are stipulated: 1) current and 2) creative problem - oriented. 6.1 Current self-study work with the course book, search and review of literature and other electronic sources on a given problem individually, homework, home tests, leap ahead self-study, self-study of a particular subject, preparation for the test. 6.2 Creative problem-oriented self-study research, analysis, structuring and presentation of information, review of publications according to pre-determined subject. 6.3 Self-study topics histogram sample distribution function box-and-whisker plot. 6.4 Self-study check the individual calculation task’s performance, the home tasks’ discussion, the pre-determined topics’ presentation. 6.5 References for self-study (Internet resources) http://www.random.org/ (offers true random numbers to anyone on the Internet.) http://lib.mexmat.ru/catalogue.php (books (for reading only)). http://www.dartmouth.edu/~chance/index.html (The goal of Chance is to make students more informed, critical readers of current news stories that use probability and statistics.) http://www.math.uah.edu/stat/ (The Virtual Laboratories in Probability and Statistics, a set of web-based resources for students and teachers of probability and statistics, where you can run simulations.) www-history.mcs.st-andrews.ac.uk/history/index.html (historical information about math and mathematicians.) http://oli.web.cmu.edu/openlearning/forstudents/freecourses/statistics (The Open and Free Full Courses. The Probability and Statistics course is comparable to the full semester course on Statistics taught at Carnegie Mellon University.) http://www.icoachmath.com (…designed specifically to help students strengthen their math skills through selfpaced, guided, and online interactive practice.) http://plus.maths.org/content/teacher-package-statistics-and-probabilitytheory http://home.ubalt.edu/ntsbarsh/ See also the last page Additional Resources 1-9 7. PRE-COURSE, INTERMEDIATE AND FINAL TESTS 7.1 Final test questions 1. In the context of a university admission test discuss the trade-off between Type I and Type II errors. 2. A cereal package should weigh W0. If it weighs less, the consumers will be unhappy and the company may face legal charges. If it weighs more, company’s profits will fall. Devise the hypothesis testing procedure. Explain the statistical logic behind the decision rule (the choice of hypotheses and the use of the confidence interval). 3. Give examples of the null and alternative hypotheses which lead to (i) a one-tail test and (ii) a two-tail test. How are the critical values defined and what are the decision rules in these cases? 4. Normal population versus Bernoulli. (a) Select two pairs of hypotheses such that in one case you would apply a two-tail test and in the other – a one-tail test. Assuming that the population is normal and the variance is unknown, illustrate graphically the procedure of testing the mean. (b) What changes would you make if the population were Bernoulli? 5. Using the procedure of testing the sample mean (a one-tail test, σ is known) describe in full the procedure of assessing the power of a test. Answer the following questions: (a) What happens to the power when the real mean moves away from the µ0? (b) What happens to 1 – β when α decreases? (c) How does the sample size affect the power? (d) How does 1 – β behave if σ increases? 6. Fill out the next form: Comparison of population and sample formulas Population formula Sample formula Mean Discrete and continuous Variance Standard deviation Correlation 7. Suppose we sample from a Bernoulli population. How do we figure out the approximate value of p ? How can we be sure that the approximation is good? 8. Define unbiasedness and prove that the sample variance is an unbiased estimator of the population variance. 9. Find the mean and variance of the chi-square distribution. 10. Define the uniformly distributed random variable. Find its mean, variance and distribution function. 11. In one block give the properties of the standard normal distribution, with proofs, where possible, and derive from them the properties of normal variables. 12. What is the relationship between distribution functions Fz and FX if X = σz + μ? How do you interpret Fz(1) – Fz(–1)? 13. How is the central limit theorem applied to the binomial variable? 14 List and prove all properties of the mean. 15. List and prove all properties of the covariance. 16. List and prove all properties of the variance. 17. List and prove all properties of the standard deviation. 18. List and prove all properties of the correlation coefficient. 19. How and why do you standardize a variable? 20. Define and derive the properties of the Bernoulli variable. 21. Explain the term “independent identically distributed”. 22. Define a binomial distribution and derive its mean and variance with explanations. 23. Define a distribution function, describe its geometric behavior and prove the interval formula. 24. Define the Poisson distribution and describe (without proof) how it is applied to the binomial distribution. 25. What do we mean by set operations? 26. Prove de Morgan laws. 28. Prove P(AUB) = P(A) + P(B) − P(A∩B) . 29. Prove the formulas for combinations 30. Proof Bayes theorem. 7.2 Intermediate tasks (examples) 1. Suppose that we want to generate a random variable X that is equally likely to be either 0 or 1, and that all we have at our disposal is a biased coin that, when flipped, lands on heads with some (unknown) probability p. Consider the following procedure: i. Flip the coin, and let 01, either heads or tails, be the result. ii. Flip the coin again, and let 02 be the result. iii. If 01 and 02 are the same, return to step 1. iv. If 02 is heads, set X = 0, otherwise set X = 1. (a) Show that the random variable X generated by this procedure is equally likely to be either 0 or 1. (b) Could we use a simpler procedure that continues to flip the coin until the last two flips are different, and then sets X = 0 if the final flip is a head, and sets X = 1 if it is a tail? 2. A fair coin is independently flipped n times, k times by A and n − k times by B. Show that the probability that A and B flip the same number of heads is equal to the probability that there are a total of k heads. 3. Consider n independent flips of a coin having probability p of landing heads. Say a changeover occurs whenever an outcome differs from the one preced- ing it. For instance, if the results of the flips are H H T H T H H T, then there are a total of five changeovers. If p = 1/2, what is the probability there are k changeovers? 4. An individual claims to have extrasensory perception (ESP). As a test, a fair coin is flipped ten times, and he is asked to predict in advance the outcome. Our individual gets seven out of ten correct. What is the probability he would have done at least this well if he had no ESP? (Explain why the relevant probability is P{X 7} and not P{X = 7}.) 5. Let X be binomially distributed with parameters n and p. Show that as k goes from 0 to n, P(X = k) increases monotonically, then decreases monotonically reaching its largest value (a) in the case that (n + 1)p is an integer, when k equals either (n + 1)p – 1 or (n + 1)p, (b) in the case that (n + 1)p is not an integer, when k satisfies (n + 1)p −1 < k <(n + 1)p. Hint: Consider P{X = k}/P{X = k − 1} and see for what values of k it is greater or less than 1. 6. An airline knows that 5 percent of the people making reservations on a certain flight will not show up. Consequently, their policy is to sell 52 tickets for a flight that can hold only 50 passengers. What is the probability that there will be a seat available for every passenger who shows up? 7. Suppose that two teams are playing a series of games, each of which is independently won by team A with probability p and by team B with probability 1 − p. The winner of the series is the first team to win four games. Find the expected number of games that are played, and evaluate this quantity when p = 1/2. 8. Suppose that each coupon obtained is, independent of what has been previously obtained, equally likely to be any of m different types. Find the expected number of coupons one needs to obtain in order to have at least one of each type. m Hint: Let X be the number needed. It is useful to represent X by X = X i i 1 where each Xi is a geometric random variable. 9. A coin, having probability p of landing heads, is flipped until head appears for the rth time. Let N denote the number of flips required. Calculate E(N). Hint: There is an easy way of doing this. It involves writing N as the sum of r geometric random variables. 10. An urn contains 2n balls, of which r are red. The balls are randomly removed in n successive pairs. Let X denote the number of pairs in which both balls are red. (a) Find E(X). (b) Find var(X). 11. Suppose that X takes on each of the values 1, 2, 3 with probability 1/3. What is the moment generating function? Derive E(X), E(X2), and E(X3) by differentiating the moment generating function and then compare the obtained result with a direct derivation of these moments. 12. Let X and Y be independent normal random variables, each having parameters μ and σ2. Show that X + Y is independent of X − Y. Hint: Find their joint moment generating function. 13. Case study Drug testing procedure Your company is going to conduct a review of all staff members for the use of drugs. To estimate the cost (required for testing equipment and possible psychological problems) and the expected gains (increase productivity), it is need to analyze the situation. Testing procedure is not ideal. The laboratory staff informed you that if a person uses drugs, the test is "positive" with a probability of 90 %. However, if a person does not use drugs, the test shows "negative" (i.e. “not positive”) result in 95 % of cases. Based on an informal survey of some workers you can expect about 8 % of all personnel using drugs. Analysis of the probability is in this case is important because it allows you to transform existing information in a much more useful for decision-making probability. Does it make sense to conduct such a test? Give a reasoned response based on a suitable probability analysis. 7.3. Pre-course survey (examples) 1. A small object was weighed on the same scale separately by nine students in a science class. The weights (in grams) recorded by each student are shown below. 6.2 6.0 6.0 15.3 6.1 6.3 6.2 6.15 6.2 The students want to determine as accurately as they can the actual weight of this object. Of the following methods, which would you recommend they use? _____a. Use the most common number, which is 6.2. _____b. Use the 6.15 since it is the most accurate weighing. _____c. Add up the 9 numbers and divide by 9. _____d. Throw out the 15.3, add up the other 8 numbers and divide by 8. 3. Which of the following sequences is most likely to result from flipping a fair coin 5 times? _____a. H H H T T _____b. T H H T H _____c. T H T T T _____d. H T H T H _____e. All four sequences are equally likely. 4. Select the alternative below that is the best explanation for the answer you gave for the item above. _____a. Since the coin is fair, you ought to get roughly equal numbers of heads and tails. _____b. Since coin flipping is random, the coin ought to alternate frequently between landing heads and tails. _____c. Any of the sequences could occur. _____d. If you repeatedly flipped a coin five times, each of these sequences would occur about as often as any other sequence. _____e. If you get a couple of heads in a row, the probability of a tails on the next flip increases. _____f. Every sequence of five flips has exactly the same probability of occurring. 7. Half of all newborns are girls and half are boys. Hospital A records an average of 50 births a day. Hospital B records an average of 10 births a day. On a particular day, which hospital is more likely to record 80 % or more female births? _____a. Hospital A (with 50 births a day). _____b. Hospital B (with 10 births a day). _____c. The two hospitals are equally likely to record such an event. 8. STUDY SCHEDULE Course Probability Theory and Mathematical Statistics Term Year 4 2 Lecturer A.V. Kitaeva, professor Weeks Credits Lectures, hrs Practice, hrs Labs, hrs. Class work in total, hrs Self-study, hrs 18 4 36 36 0 72 72 TOTAL, hrs. 144 Term Schedule Theoretical material Topic Subject 1-2 Basic probability concepts Random experiments. Events. Probability. Historical remarks 2-3 Combinatorics 3-4 Properties of probability 4-5 Conditional probability. Independence Practice Testing Test 1 Points 8 Lab Points Subject Points Boundary check Points Problemoriented tasks Points Total Weeks Schedule Classical probability. Geometric probability 2 2 Arrangement. Permutation. Combination 3 3 Sum and product rules 3 3 4 12 Total probability and Bayes formulas Check point # 1 in total 20 Theoretical material Topic Subject Practice Testing Points Lab Points Subject 5-6 The Monty Hall problem and other puzzles Conditioning. Probability Puzzles 6-9 Discrete random variables Distribution. Mean. Variance. Basic discreet distributions 9-12 Continuous random variables Test 2 Density and distribution functions. Density and distribution functions. Bertrand’s paradox. Exponential distribution. Multivariate continuous distribution. Independence. Transformation of random variables. Sums, products, and quotients of random variables. Moment generating function. Distributions concerning normal. Conditional density and expectation. The median, quartiles, percentiles. Skewness and kurtosis. Simulation of random variables. 8 Density. Basic continuous distributions. MGF. Moments. Conditional distribution. Simulation. Points Boundary check Points 4 4 6 Problemoriented tasks Points Total Weeks Schedule 4 12 6 Theoretical material Topic 1214 Subject Limit results for sequences of random variables 1415 Point estimation 1516 Interval mation esti- 1718 Hypotheses testing Testing Convergence of random variables. Inequalities. The law of large numbers. Sampling from a distribution. Central limit theorem. Sample mean and sample variance. Properties of the estimators. Sufficient statistics. Method of moments. Method of maximum likelihood. Confidence intervals for mean. Confidence interval for difference of two means. Confidence interval for variance. Confidence interval for a proportion. Formulating the hypotheses and Practice Points Lab Points Subject Points Boundary check Points Problemoriented tasks Points Total Weeks Schedule Types of RV’s convergence. Chebuchov’s inequality. De Mouvre’s approximation 6 6 Method of moments and method of maximum likelihood for classic distribution parameters’ estimation 6 6 Construction of confidence intervals 6 6 Hypotheses testing Pearson’s lemma 6 6 ( H0 H 1 ). Test cri- terion and rejection region. Two types of errors. Performing a test. Theoretical material Topic 1718 Descriptive Statistics Practice Subject Testing Points Histogram, sample distribution function and box-andwhisker plot Test 3 10 Lab Points Subject Points Boundary check Points Problemoriented tasks Points Total Weeks Schedule 10 Check point # 2 in total 56 Check points # 1, 2 in total 76 Test 24 Points in total (all course) 100 9. Resources Textbook: We will use a (free) text that is available online in pdf form. Additional Resources 1. Chung K. L. and AitSahlia F. Elementary Probability Theory. With Stochastic Processes and an Introduction to Mathematical Finance. – 2003. – 402 p. 2. Elias J. Syllabus //www.montana.edu 3. Johnson R.A. and Bhattacharyya G.K. Statistics: Principles and Methods. – 2011. – 686 p. 4. Grinstead C.M., Snell J.L. Introduction to Probability. – 1997. – 510 p. (available free http://www.freebookcentre.net/Mathematics/Probability-TheoryBooks.html) 5. Soong T.T. Fundamentals of Probability and Statistics for Engineers. – 2004. – 408 p. 6. Stirzaker D. Elementary Probability. – 2003. – 520 p. 7. Suhov Y. and Kelbert M. Probability and Statistics by Example. – 2005. – 360 p. 8. Vrbik J. Lecture Notes. Probability //www.brocku.ca/ 9. Vrbik J. Lecture Notes. Mathematical Statistics //www.brocku.ca/ This program is made in accordance with TPU Standards and Federal State Educational Standards (FSES) requirements in the study major of 230100 “Computer science and technology” This program was approved during Control System Optimization (CSO) department meeting (protocol № ____ from «___» _______ 2012 г.). Author ____________________Kitaeva A.V. Reviewer _________________________