Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
DIPLOMA IN INFORMATION TECHNOLOGY MODULE LEARNING GUIDE STATISTICAL ANALYSIS Version 2: MAY 2007 Diploma in Information Technology 1.0 INTRODUCTION The use of Statistics in decision making has become widespread in virtually every occupational field. In support of good decision-making, the purpose of this module is to provide students’ knowledge and skills in statistical principles. Fundamental principles discussed include data collection and sampling, measures of central tendency and location, correlation, regression, probability distribution, hypothesis testing, time series and index numbers. After learning this module, students should be able to develop the skill of data collection, data presentation, data analysis, and interpretation. 2.0 AIMS The broad aim of this programme is to students should be able to develop the skill of data collection, data presentation, data analysis and interpretation. 3.0 Programme Learning Outcomes Learners who successfully complete this module will: Examine changes in data from applied science or business, overtime, paying attention to trends, cyclical variations, and irregular variations. Explore, describe and regression techniques to data from business or applied science to identify relationships, gauge the strength of relationships, make predictions and detect study effectiveness. Identify problems associated with producing a particular set of data to measure opinion or identify a trend Apply probability laws or concepts to real life situations Apply binomial and normal probability distributions to real life situations Apply Chi-square distribution to real life situations 4.0 AUDIENCE AND PRE-REQUISITES This module requires students to have former understanding of the elementary elements of Business Mathematics and Economics. 5.0 OVERVIEW OF THE MODULE Students will be exposed to the various forms of Statistical Analysis such as: Describe about the data collection methods Tabulation of data, data presentation and data analyses like averages and dispersion Correlation and regression analysis Probability and probability distributions Analysis of hypothesis by using the test criterions like Z, t and Chi-square tests Examine the application of time series and index numbers. 6.0 INSTRUCTIONAL PLAN AND RESOURCES 6.1 The use of an electronic calculator must be used for the teaching of this module. The base source of material to be used in the design of the teaching-learning schedule will be the print based module material provided to both staff and students. Other resources will be included to supplement and fill in gaps especially latest development or recent changes which are obviously not found in the module material. Statistical Analysis 2 Diploma in Information Technology Class Teaching Schedule Week 1 Lecture Statistical Analysis (An Introduction) Tutorial/Activities 2 Survey & Sampling 3 The Contribution of Graphical Data to Research 4 Measures of Location and Dispersion 5 Regression Analysis 6 Correlation Analysis 7 Probability and Discrete Probability Distribution 8 Normal Distribution 9 Sampling Distribution & Estimation Discuss a basic understanding of the types of statistics. Conduct activity about the types of data and data collection methods Divide students into group to do a short discussion on collecting data with various approaches. Review on the relationship between censuses and samples. Divide students into group to do a short discussion on survey. Conduct activity on the application of sampling methods. Summarize and present the data that has been collected. Divide students into group to do presentation of data. Conduct exercises on the Classification of Data Learn how arithmetic mean, median and mode formulation from grouped and ungrouped data. Learn how range, quartile deviation, mean deviation and standard deviation formulation Compute co-efficient of variation and Skewness Use graph paper to draw scatter diagrams to visualize the relationship between two variables. Use regression analysis to estimate equation to predict future values of dependent variables. Learn how correlation analysis describes the degree to which two variables are linearly related to each other. Use coefficient of determination as a measure of the strength of the relationship between two variables. Give exercises on using various probability concepts to make decisions. More advanced application questions to be provided to students to test on understanding of Poisson distribution and Binomial Distribution. Give application exercises on using normal distribution formulation. Discuss the characteristics of normal distribution. Apply t-Distribution for small samples Discuss exercises on using the t-Distribution and ZDistribution method. Review various Confidence Intervals methods for means, proportion and means based on small samples. Determining the sample size n 7.0 ASSESSMENT STRATEGY 7.1 AIM The aim of the assessment strategy is to identify formal practices and procedures for assessing and appraising the performance of participants in order those judgments and decisions can be reached concerning: Statistical Analysis 3 Diploma in Information Technology The progression of participants through the programme. How well participants have met the programme learning outcomes through the combination of the individual module learning outcomes. The provision of feedback information to participants concerning their performance and how they adhered to the generic assessment criteria and the module-specific assessment criteria. The underpinning principles which drive the assessment strategies adopted for this programme are the profile of the target participants and the programme itself (its philosophy and associated learning outcomes). 7.2 ASSESSMENT INSTRUMENTS: Reference is to be made to the appendix on assessment instruments. Statistical Analysis 4 Diploma in Information Technology LEARNING SUGGESSTIONS AND GUIDELINES Week 1: Statistical Analysis (An Introduction) Over the week of lecture and tutorial, the focus will be to undertake the following: Define Statistics Explain how Statistics be of value in Business Explain the basic concepts of Statistics To provide a brief history on statistics. To provide a basic understanding of the types of statistics. To present a review of the types of data and data collection methods. Learning outcomes to be attained: Understand the needs of Statistics in business application Learn about the different types of Statistics Identify the types of data Provide an insight of various data collections methods Readings and preparation to be undertaken: a) From the Learning material: Section 1a & Section 1c b) Additional reading Lower Hutt(0221, Statistical Analysis, The Open Polytechnic of New Zealand Reading 1 c) 1. 2. 3. 4. 5. Recommended text David S. Moore and George P. McCabe, Introduction to the Practice of Statistics(IPS), 3 rd, W.H. Freeman and Company, New York Donald Waters, 1994, Quantitative Methods for Business, Addison-Wesley, USA John A. Ingram & Joseph G. Monks(1992), Statistics for Business and Economics, 2nd, The Dryden Press, Harcourt Brace Jovanovich College Sheldon P. Gordon & Florence S. Gordon, Contemporary Statistics (1994), McGraw-Hill, USA d) Discuss the following questions and explain their rationale Design questionnaire for any of the following: AIDS awareness TV habits Study habits Video games Family values Postal questionnaire and interview are two methods of collecting data. List the advantages of each method. One of the main defects of survey by interviews is the problem of interview bias. What is interview bias and how can the problem be minimized. Explain the different between primary and secondary data, giving suitable examples and state the advantages and disadvantages of each source. Explain whether each of the following items describes a statistics or a parameter. Your grade-point average The percentage of persons who respond to an n opinion survey by agreeing that our president is doing a good job. The average age of all U.S. citizens as reported by the 1990 census. Statistical Analysis 5 Diploma in Information Technology 6. 7. The average income for all persons who are of the American Medical Association Sales made by General Foods Corp. in July as an indicator of annual sales. Description and inference are the two broad subareas of statistics. Identify which subareas classificationdescription or inference. Describe each of the following items better: Computing a baseball player’s batting average Using the first-quarter summary of production to project second-quarter production A bar chart that displays the corporate revenues for the current year, the preceding year, two years ago, and so on. A survey estimates of corporate year-end assets at somewhere between $1.2 million and $1.4 million. Estimating an assembly worker’s weekly output as 1200 units plus or minus 100 units from eight hours of work performance. Comment on each of the following as a potential sample survey question. Is the question clear? Is it slanted toward a desired respond? Does our family use food stamps? Which of the following best represents your opinion on gun control? The government should confiscate our guns We have the right to keep and bear arms. A national system of health insurance should be favored because it would provide health insurance for everyone and would reduce administrative costs. In view of escalating environmental degradation and incipient resource depletion, would you favor economic incentives for recycling of resource-intensive consumer goods? Week 2: Survey & Sampling Over the week of lecture and tutorial, the focus will be to undertake the following: Discuss the guideline to conduct survey activity Define Population, Sample and Sampling frame Define Sampling and state the merits and demerits of sampling Describe the major forms of spatial sampling for selecting samples from phenomena that vary across the landscape. Explain the various types of sampling methods like Probability Sampling method and Non-Probability Sampling method. Learning outcomes to be attained: Learn about survey activity through group work Understand the difference between sample and population Understand the merit points and demerit points of sampling Understand the various sampling method to be used. Readings and preparation to be undertaken: a) From the Learning Material: Section 1b b) Additional reading Lower Hutt(0221, Statistical Analysis, The Open Polytechnic of New Reading 1 Zealand c) Recommended text David S. Moore and George P. McCabe, Introduction to the Practice of Statistics(IPS), 3 rd, W.H. Freeman and Company, New York Donald Waters, 1994, Quantitative Methods for Business, Addison-Wesley, USA John A. Ingram & Joseph G. Monks(1992), Statistics for Business and Economics, 2 nd, The Dryden Press, Harcourt Brace Jovanovich College Sheldon P. Gordon & Florence S. Gordon, Contemporary Statistics (1994), McGraw-Hill, USA d) Discuss the following questions and explain their rationale 1. Indicate whether a sample or a population is described in each of the following items. - An opinion survey - A biannual agriculture survey - A quarterly stockholders of assets and liabilities Statistical Analysis 6 Diploma in Information Technology 2. 3. 4. 5. 6. 7. 8. The collection of all grades assigned for this class. Define the target population for the following studies: - prior programming experience among first-year tertiary computing students - infertility among the wild echidna (native anteater) population - variations in the durability of jogging shoes - variation in heavy-metal content of a river downstream from a lead smelter A non-profit organization is conducting a door-to-door opinion poll on municipal day-care centres. The organization has devised a scheme for random sampling of houses, and plans to conduct the poll on weekdays from noon to 5p.m. Will this scheme produce a random sample Bob Peterson, public relations manager for Piedmont Power and Light, has implemented an intuitional advertising campaign to promote energy consciousness among its customers. Peterson, anxious to know if the campaign has been effective, plans to conduct a telephone survey of area residents. He plans to look in the telephone book and select random numbers with addresses that correspond to the company’s service area. Will Peterson’s sample be a random one? At the U.S Mint in Philadelphia, 10 machines stamp out pennies in lots of 50. These lots are arranged sequentially on a single conveyor belt, which passes an inspection station. An inspector decides to use systematic sampling in inspecting the pennies and is trying to decide whether to inspect every fifth or every seventh lot of pennies. Which is better? Why? The state occupational board has decided to do a study of work-related accidents within the state, to examine some of the variables involved in the accidents, for example, the type of job, the cause of the accident, the extent of the injury, the time of day, and whether the employer was negligent. It has been decided that 250 of the 2500 workrelated accidents reported last year in the state will be sampled. The accident reports are filed by date in a filing cabinet. Marsha Gulley, a department employee, has proposed that the study use a systematic sampling technique and select every tenth report in the file for the sample. Would her plan of systematic sampling e appropriate here? Explain. Bob Bennett, product manager for Clipper Mowers Company, is interested in looking at the kinds of lawn mowers used throughout the country. Assistant product manager Mary Wilson has recommended a stratified randomsampling process in which the cities and communities studied are separated into substrata, depending on the size and nature of the community. Mary Wilson proposes the following classification: Category Type of Community Urban Inner city(population 100,000) Suburban Outlaying areas of cities or smaller communities(pop. 20000 to 100000) Rural Small communities(fewer than 20000 residents) Is stratified random sampling appropriate here? A Senate study on the issue of self-rule for the District of Columbia involved surveying 2000 people from the population of the city regarding their opinions on a number of issues related to self-rule. Washington,D.C., is a city in which many neighborhoods are poor and many neighborhoods are rich, with very few neighborhoods falling between the extremes. The researches who were administering the survey had reasons to believe that the opinions expressed on the various questions would be highly dependent upon income. Which method was more appropriate, stratified sampling or cluster sampling? Explain briefly Week 3: The Contribution of Graphical Data to Research Over the week of lecture and tutorial, the focus will be to undertake the following: Define the contribution to the presentation process that graphical displays of data can make. To provide students with a comprehensive understanding of data collection methods. Distinguish between Diagram and Graph Describe the types of diagrams likes Bar diagram and Pie Diagram Describe the types of Graphs like Line Graph, Histogram, Frequency Polygon and Ogive(Cumulative Frequency Curve) Describe Stem-and-Leaf Learning outcomes to be attained: Learn about data collection activity through group work Understand about the sources and approaches to use when collecting data Understand the basic definitions and concept of data collection and presentation Statistical Analysis 7 Diploma in Information Technology Ability to present data as chart, graphs and illustrates diagrams Readings and preparation to be undertaken: a) 1. 2. From the Learning Material: Section 2 b) Additional reading Lower Hutt(0221, Statistical Analysis, The Open Polytechnic of New Reading 2 Zealand c) Recommended text David S. Moore and George P. McCabe, Introduction to the Practice of Statistics(IPS), 3rd, W.H. Freeman and Company, New York Donald Waters, 1994, Quantitative Methods for Business, Addison-Wesley, USA Billson, J. (2002). The Power of Focus Groups for Social and Policy Research. Skywood Press. Wadsworth, Y. (1997). Do it yourself social research (2nd ed.). St. Leonards, NSW, Australia: Allen and Unwin. Berry, M. J. A. and Linoff, G. (1997). Data Mining Techniques for Marketing, Sales, and Customer Support. New York: Wiley. Keppel, G. (1991). Design and Analysis : A Researcher's Handbook. Englewood Cliffs, NJ: Prentice-Hall. Tukey, J. W. (1977). Exploratory Data Analysis. Reading, MA: Addison-Wesley. Cox, B.G., et al (eds.) (1995), Business Survey Methods, Wiley. Groves, R.M. (1988), Telephone Survey Methodology, Wiley. Groves, R.M. (1989), Survey Errors and Survey Costs, Wiley. d) Discuss the following questions and explain their rationale Construct the stem-and-leaf plot for the following data: 53, 47, 59, 66, 36, 69, 84, 77, 42, 57, 51, 60, 78, 63, 46, 63, 42, 55, 63, 48, 75, 60, 8, 80, 44, 59, 60, 75, 49, 63. The data in the table below describe the weekly take – home pay for 20 semi- skilled Laborers. Using the classes $300.00 - $ 309.99, $310.00 - $ 319.99, ….., $ 340.00 - $ 349.99, set up a frequency distribution table and plot the appropriate histogram and frequency polygon. Weekly Take- Home Pay ($) 319.12 331.50 320.76 325.42 333.98 3. 326.81 348.39 321.67 315.38 340.89 324.79 337.24 331.47 304.12 327.02 313.48 326.67 326.67 321.19 327.11 There are five hospitals in a Health District, and they classify the number of beds in each hospital as follows: Hospital Maternity Surgical Medical Psychiatric 4. 5. Foothills General Southern Healthview St John 24 86 82 25 38 85 55 22 6 45 30 30 0 30 30 65 0 24 35 76 Sales in four regions are given in the following table. Draw a pie chart to represent these: Region Sales North South East West 25 10 45 25 Total 100 Construct a frequency distribution from the following set of data showing the number of minutes 100 customers occupy their seats in a college cafeteria. 29 67 34 39 23 66 24 37 45 58 Statistical Analysis 8 Diploma in Information Technology 51 73 31 15 51 47 35 46 72 6. 37 48 58 31 31 41 45 40 35 45 63 35 34 56 34 26 41 62 22 37 82 56 43 47 35 56 28 41 19 28 45 39 30 67 37 38 55 31 35 27 35 54 73 51 61 27 38 44 54 23 49 30 33 33 96 68 40 46 28 34 16 92 49 22 22 41 62 45 53 52 70 59 43 35 34 29 48 61 35 63 36 The Degree of Reading Power (DRP) test is often used to measure the reading ability of children. Here are the DRP scores of 44 third – grade students, measured during research on ways to improve reading performance: 40 26 39 14 42 18 25 43 46 27 19 47 19 26 35 34 15 44 40 38 31 46 52 25 35 35 33 29 34 41 49 28 52 47 35 48 22 33 41 51 27 14 54 45 Make a stem plot of these data. Then, make a histogram 7. 8. 9. In 1994, there were 12, 263, 000 undergraduate students in U.S. colleges. According To the U.S. Department of Education, there were 117, 000 American Indian students, 674,000 Asian, 1, 317, 000 non – Hispanic black, 968, 000 Hispanic, and 8, 916, 000 non- Hispanic white students. In addition, 269, 000 foreign undergraduates were enrolled in U.S. colleges. Present these data in a graph. What is meant by ogive? Draw two ogive curves from the following data: Class interval Frequency 0 to less than 5 7 5 to less to less than 10 10 10 to less to less than 15 16 15 to less to less than 20 23 20 to less to less than 25 25 25 to less to less than 30 13 30 to less to less than 35 17 35 to less to less than 40 10 40 to less to less than 45 14 45 to less to less than 50 10 50 to less to less than 55 5 Hourly wages rates(RM) for 25 workers are as follows: 4.11 4.25 4.90 5.30 5.20 5.05 6.15 5.80 4.65 5.60 4.43 5.25 4.54 4.50 4.25 4.14 4.85 4.29 4.80 4.40 5.50 4.40 6.05 5.15 4.90 Construct a frequency distribution for the above data using equal data class intervals and with first class defined as RM4.00 and under RM 4.50. Prepare a histogram too. Statistical Analysis 9 Diploma in Information Technology Week 4: Measures of Location and Dispersion Over the week of lecture and tutorial, the focus will be to undertake the following: Define Discrete and Continuous Variables Develop Discrete and Continuous Frequency Distribution Find a measure of central location and dispersion such as average, median, mode, standard deviation and coefficient of variation. Learning outcomes to be attained: Ability to calculate the mean, median and mode from both ungrouped and grouped data. Ability to find variance and standard deviation both from ungrouped and grouped data. Readings and preparation to be undertaken: a) From the Learning Material: Section 2 b) Additional reading Lower Hutt(0221, Statistical Analysis, The Open Polytechnic of New Reading 3 Zealand c) Recommended text David S. Moore and George P. McCabe, Introduction to the Practice of Statistics(IPS), 3 rd, W.H. Freeman and Company, New York Donald Waters, 1994, Quantitative Methods for Business, Addison-Wesley, USA Billson, J. (2002). The Power of Focus Groups for Social and Policy Research. Skywood Press. John A. Ingram & Joseph G. Monks(1992), Statistics for Business and Economics, 2 nd, The Dryden Press, Harcourt Brace Jovanovich College Sheldon P. Gordon & Florence S. Gordon, Contemporary Statistics (1994), McGraw-Hill, USA d) Discuss the following questions and explain their rationale 1. The numbers of resignation received by a certain form per month during 1988 were: 8, 3, 5, 3, 4, 3, 1, 0, 3, 4, 0, 7 Calculate the arithmetic mean, mode and median. 2. Calculate the mean, median and mode for the number of types purchased annually by each individual from the following data: 3. No of types purchased No of people 1 2 3 4 5 6 7 8 9 2 4 8 3 3 2 2 4 6 Define the following: mean median mode standard deviation coefficient of variation Statistical Analysis 10 Diploma in Information Technology 4. 5. The table below shows the marks obtained by 40 students in class test. Find standard deviation and variance: Marks 30 40 50 60 70 80 90 No of Students 4 6 12 10 5 2 1 The following data shows wages of a group of employee: Wages group(hourly rate in cents) 6. 50 and under 60 60 and under 70 70 and under 80 80 and under 90 90 and under 100 100 and under 110 110 and under 120 Calculate the mode, median, mean and standard deviation Find coefficient of variation quartile deviation No. of employees 5 25 134 85 9 43 34 From the 140 children whose urinary concentration of lead were investigated 40 were chosen who were aged at least 1 year but less than 5 years. The following concentrations of copper (in ) were found. 0.70, 0.45, 0.72, 0.30, 1.16, 0.69, 0.83, 0.74, 1.24, 0.77, 0.65, 0.76, 0.42, 0.94, 0.36, 0.98, 0.64, 0.90, 0.63, 0.55, 0.78, 0.10, 0.52, 0.42, 0.58, 0.62, 1.12, 0.86, 0.74, 1.04, 0.65, 0.66, 0.81, 0.48, 0.85, 0.75, 0.73, 0.50, 0.34, 0.88 Find the median, range and quartiles. Week 5: Regression Analysis Over the week of lecture and tutorial, the focus will be to undertake the following: To use scatter diagrams to visualize the relationship between two variables. To use regression analysis to estimate equation to predict future values of dependent variables. Learning outcomes to be attained: Identify situations where regression analysis is appropriate Ability to draw scatter diagram Know when to apply regression analysis Readings and preparation to be undertaken: a) From the Learning Material: Section 3a b) Additional reading Lower Hutt(0221, Statistical Analysis, The Open Polytechnic of New Reading 5 Zealand c) Recommended text David S. Moore and George P. McCabe, Introduction to the Practice of Statistics(IPS), 3 rd, W.H. Freeman and Company, New York Donald Waters, 1994, Quantitative Methods for Business, Addison-Wesley, USA John A. Ingram & Joseph G. Monks(1992), Statistics for Business and Economics, 2 nd, The Dryden Press, Harcourt Brace Jovanovich College Sheldon P. Gordon & Florence S. Gordon, Contemporary Statistics (1994), McGraw-Hill, USA d) Discuss the following questions and explain their rationale Statistical Analysis 11 Diploma in Information Technology 1. Following are the advertising expenses and sales value over 5 months for a company Jan Feb Mar Apr May Advertising expenses($’000) 15 15 11 11 19 Sales Value($’000) 120 160 140 100 180 - Construct a linear regression equation show the sales value dependent on advertising expenses Estimate the sales value if the advertising expenses amounted to $13000 Draw a scatter diagram to show the best fit line for the above linear regression equation based on the above figures 2. A machine runs at a different speed, the higher the speed is the sooner the part has to be replaced. Trial observation produced the following data: - 3. Speed(revolution per minute) Life of drill head(hours) 18 162 20 154 20 171 21 165 23 128 26 138 26 129 31 125 32 106 32 97 40 95 41 103 42 109 43 69 Plot the figures on a scatter diagram Determine the equation of the regression line. Plot the line on the scatter diagram and estimate the life of the drill if the machine operates at the 30 revolutions per minute. The following data has been collected over eight periods: - Period Unit of output Total cost($) 1 10,000 32,000 2 20,000 39,000 3 40,000 58,000 4 25,000 44,000 5 30,000 52,000 6 40,000 61,000 7 50,000 70,000 8 45,000 64,000 Draw a scatter diagram Draw a straight line that best fits the data. Give the equation of the line and estimate the cost likely to be incurred at the output levels of 26,000 units and 48,750 units. Statistical Analysis 12 Diploma in Information Technology 4. The following table shows the increase in average earning of male employee in the United Kingdom between 1975 and 1981. Year Average earnings 1975 59 1976 70 1977 77 1978 87 1979 99 1980 122 1981 132 - Find the regression for least squares regression line that would enable you to forecast earning in future years. Plot the data and draw your regression line on the same graph Make a forecast of the earning for 1982 and comment Week 6: Correlation Analysis Over the week of lecture and tutorial, the focus will be to undertake the following: Identify situations where correlation analysis is appropriate between two variables. Calculate the correlation coefficient, perform statistical test on the coefficient and interpret the correlation coefficient. Differentiate between Product Moment Correlation Coefficient and Coefficient of Determination Determine Rank Correlation by using Spearman’s Rank Learning outcomes to be attained: Identify situations where correlation analysis is appropriate Ability to understand the difference between Product Moment Correlation Co-efficient and Coefficient of Determination Ability to perform statistical tests on the coefficient of correlation and interpret it Ability to understand Spearman’s Rank approach Readings and preparation to be undertaken: a) From the Learning Material: Section 3a b) Additional reading Lower Hutt(0221, Statistical Analysis, The Open Polytechnic of New Reading 4 Zealand c) Recommended text David S. Moore and George P. McCabe, Introduction to the Practice of Statistics(IPS), 3rd, W.H. Freeman and Company, New York Donald Waters, 1994, Quantitative Methods for Business, Addison-Wesley, USA John A. Ingram & Joseph G. Monks(1992), Statistics for Business and Economics, 2 nd, The Dryden Press, Harcourt Brace Jovanovich College Sheldon P. Gordon & Florence S. Gordon, Contemporary Statistics (1994), McGraw-Hill, USA e) Discuss the following questions and explain their rationale Statistical Analysis 13 Diploma in Information Technology 1. Following are the quantity units of product and its total production cost. Product quantity(‘000 units) 10 20 40 25 30 40 50 45 Production cost ($‘000) 32 39 58 44 52 61 70 64 - 2. Calculate the coefficient of correlation to explain the extent of correlation between product quantity and production cost. Calculate the coefficient of determination to explain the extent of variation in production cost caused by the variation in product quantity. Following are the marks scored by a group of 10 students in two different progress tests Student title A B C D E F G H I J Accounting Marks 50 70 60 30 40 90 80 75 65 55 Costing Marks 40 60 50 20 55 65 70 90 80 75 Calculate the coefficient of rank and comment the ranking of marks for 10 students between accounting test and costing test. 3. Take the data given below and construct a scatter diagram. Find the correlation coefficient and coefficient of determination for this data. X Y 4. 5. 10 12 14 16 18 20 22 24 26 28 25 24 22 20 19 17 13 12 11 10 A farmer has recorded the number of fertilizer applications to each of the fields in one section of the farm and, at harvest time, records the weight of crop per acre. The results are given in the accompanying table: X 1 2 4 5 6 8 10 Y 2 3 4 7 12 10 7 Use the following data to calculate the coefficient of correlation and the coefficient of determination. Can you draw conclusions from your results? X 4 2 6 7 8 5 2 4 Y 10 5 15 16 19 14 8 11 Week 7: Probability and Discrete Probability Distribution Over the week of lecture and tutorial, the focus will be to undertake the following: Explain how probability can help understand the consequences of the dependencies Discuss and implement the rules of probability under statistically dependency and statistically independence. Identify the range of business situations which requires different probability distributions Identify when and how to apply Binomial Distributions and Poisson Distributions Learning outcomes to be attained: Understand the rules of probability Solve probability questions Understand conditional probability Able to understand the concept of Binomial Distributions and Poisson Distributions Readings and preparation to be undertaken: a) From the Learning Material: Section 6a b) Additional reading Lower Hutt(0221, Statistical Analysis, The Open Polytechnic of New Reading 6 & Zealand Reading 7 c) Recommended text David S. Moore and George P. McCabe, Introduction to the Practice of Statistics(IPS), 3 rd, W.H. Freeman and Company, New York Donald Waters, 1994, Quantitative Methods for Business, Addison-Wesley, USA Statistical Analysis 14 Diploma in Information Technology d) 1. John A. Ingram & Joseph G. Monks(1992), Statistics for Business and Economics, 2nd, The Dryden Press, Harcourt Brace Jovanovich College Sheldon P. Gordon & Florence S. Gordon, Contemporary Statistics (1994), McGraw-Hill, USA Discuss the following questions and explain their rationale The following table is based on observing a random sample of n=200 individuals who entered a gift shop at an airline terminal: Sex Purchase No Purchase Male 40 40 Female 40 80 2. 3. 4. 5. 6. 7. 8. 9. - What is the probability that a randomly selected individual is female? What is the probability that a randomly selected individual is female and made a purchase/ What is the probability that a randomly selected individual is female or made a purchase? What is the probability that a randomly selected individual is female given that a purchase is made? What is the probability that a purchase was made given that a randomly selected individual is female? What is the probability that the sum of the faces in two rolls of a die is: Less than 6 Equal to 6 Greater than 6 A survey of 2000 customers was conducted to determine their purchasing behaviour regarding two products. It was found that during the past summer 500 had purchased brand A. 350 purchased brand B and 125 had purchased both brands. If a person is selected at random from this group, what is the probability that the person: Would have purchased brand A Would have purchased brand A, but not brand B Would have purchased brand A, brand B or both Would not have purchased either brand A group of people consists of 30 men of whom 10 disagree with the proposal and 70 women of who 40 disagree with the proposal. What is the probability for a person selected from the group is a man or disagree with the proposal? Probability for machine break down is 0.1 and probability of material supply is 0.3. What is the probability for machine break down and stoppage of material supply? A group of ten people consists of 5 men and 5 women. What is the probability for the second person selected from the group being man, if the first person selected was a man? What is the probability for the second person selected from the group is woman, if the first person selected from the group was man? A company minibus has 10 passenger seats. In a routine run, it is estimated that the probability of any passengers seat being filled in 0.42. What is the mean and variance of the binomial distribution of the number of passengers on a routine run? Calculate the probability that on a routine run : There will be no passengers There will just be one passenger There will be exactly two passengers There will be at 3 passengers A firm which produces half-inch diameters rubber hose estimates that on average there are 0.4 flaws per 10 meter length. Assuming that flaws occur randomly, what is the probability that: There are no flaws in a 20 meter length There is more than 1 flaw in a 10 meter length There are more than 2 flaws in a 20 meter length Metal components are subjected to rigorous breaking tests. The probability that a component will break during such test is 0.4. If such components were tested on one particular occasion, what is the probability that: 3 of them will break 2,3 or 4 will break 0 or 1 will break Statistical Analysis 15 Diploma in Information Technology 10. The number of road accidents at a certain traffic roundabout has been found with a mean of 0.8 accidents per week. Calculate the probabilities that: There will be at least 2 accidents in a particular week There will be exactly 3 accidents in a particular three-week period Week 8: Normal Distribution Over the week of lecture and tutorial, the focus will be to undertake the following: Identify the Normal Distributions as the most important probability distributions in statistics Discuss the characteristics of Normal Distributions Apply the Standard Normal Deviations (Z) scores Learning outcomes to be attained: Apply the Normal Distributions Understand the different structure of Normal Distribution Calculate the Standard Normal Distributions (Z) scores Use table of Standard Normal probabilities Readings and preparation to be undertaken: a) From the Learning Material: Section 6a b) Additional reading Lower Hutt(0221, Statistical Analysis, The Open Polytechnic of New Reading 8 Zealand c) Recommended text David S. Moore and George P. McCabe, Introduction to the Practice of Statistics(IPS), 3 rd, W.H. Freeman and Company, New York Donald Waters, 1994, Quantitative Methods for Business, Addison-Wesley, USA John A. Ingram & Joseph G. Monks(1992), Statistics for Business and Economics, 2 nd, The Dryden Press, Harcourt Brace Jovanovich College Sheldon P. Gordon & Florence S. Gordon, Contemporary Statistics (1994), McGraw-Hill, USA d) 1. 2. 3. 4. Discuss the following questions and explain their rationale A recent study conducted by the Public Health Service found that males who smoke average 34 cigarettes per day. The number of cigarettes smoked per day is normally distributed with a standard deviation of 8. If a male smoker is selected at random, what is the probability that he smokes More than 2 packs(40 cigarettes) per day Less than one pack per day Less than 1½ pack per day Bolts are produced with a mean diameter of 0.8cm and any bolt whose diameter is outside the range of 0.78cm to 0.852cm is considered substandard. Assuming that the diameter normally distributed Find the standard deviation if 2.2% of the bolts is substandard An alteration of the production method changes the standard to 0.005cm and does not alter the mean. Find the percentage of bolts which are now substandard. It is known from the past experience that the life of a machine component is approximately normally distributed with mean equal to 200 hours and a standard deviation of 4 hours. Calculate the probability that a randomly selected component has a life of At least 206 hours Less than 198 hours Between 204 to 208 hours An aptitude test, which is marked out of 100, is widely used in colleges. From past experience it is known that the distribution marks is normally distributed and the average mark is 56 with a standard deviation of six. If, in a particular college, 250 students take the test, calculate Statistical Analysis 16 Diploma in Information Technology 5. The percentage of students expected to obtain more than 47 marks The expected number of students obtaining more than 70 marks The mark below which 33% of the students are expected to score The amount dispensed by a drink vending machine has a normal distribution with a mean of 205ml and standard deviation of 5ml. State the proportion of the drinks containing: Less than 212ml Less than 200ml Between 197.5 to 210ml If the standard deviation remains 5 ml to what value must the mean be changed if approximately 75% of the drinks are to contain more than 200ml Week 9: Sampling Distribution & Estimation Over the week of lecture and tutorial, the focus will be to undertake the following: Calculate Confident Interval for population parameters small samples Determine t-distributions and Z-distributions Estimate the characteristics of population by observing the characteristics of a sample Use a two-tail confidence interval Find the confidence interval using survey data Learning outcomes to be attained: How to apply the t-distribution and Z-distribution to the estimation of mean and proportion for single or multiple samples Understand the sampling distribution of the mean and proportion Estimate point and confidence intervals for the mean and the proportion of single and multiple samples Readings and preparation to be undertaken: a) From the Learning Material: Section 6a b) Recommended text David S. Moore and George P. McCabe, Introduction to the Practice of Statistics(IPS), 3 rd, W.H. Freeman and Company, New York Donald Waters, 1994, Quantitative Methods for Business, Addison-Wesley, USA John A. Ingram & Joseph G. Monks(1992), Statistics for Business and Economics, 2 nd, The Dryden Press, Harcourt Brace Jovanovich College Sheldon P. Gordon & Florence S. Gordon, Contemporary Statistics (1994), McGraw-Hill, USA c) 1. 2. 3. Discuss the following questions and explain their rationale In a random sample of 200 garages it was found that 79% sold car batteries at below list price recommended by the manufacturer. Estimate the proportion of all garages selling below the list price. Calculate 99% confidence interval for this estimation. In 2000 a simple random sample of 100 sales invoices was taken from a very large population of sales invoices. The average value of sales found to be $18.50 with a standard deviation of $6.00 Obtain the 95% confidence interval for the true average for sales. How large a simple random sample would have been required so as to be 95% confidence that the sample did not differ from the mean by more than ±0.05. Using the following data, obtain the 99% confidence limit for the population mean number of children: No. of Children No. of families 0 1 2 Statistical Analysis 30 40 45 17 Diploma in Information Technology 4. 5. 6. 7. 8. 3 30 4 20 5 15 6 12 7 8 A random sample of 1000 manufactured items is inspected and 250 are found to contain defects. What is likely range of the proportion defective in the proportion of items? (Use 98% confidence limit) A random sample of 100 student examination scripts has randomly been selected. If 12 were found to obtain marks below 40, calculate a 95% confidence interval for the true percentage of scripts that obtain marks below 40. What sample size would be necessary if we wish to produce an estimate of the percentage of scripts that obtain marks below 40 in the whole school to within 2% with a 95% confidence limit? A sample of 40 sardine cans was randomly selected and 8 were found to be defective. The production manager feels that the current percentage of defectives is too high. Construct appropriate 99% confidence intervals. A sample of 80 bottles of brandy was randomly selected and found to have a mean of 650 liters and standard deviation of 30 liters. A week later a second sample of 60 bottles brandy was selected and found to have a mean of 655 liters with a standard deviation of 25 liters. Construct 92% confidence interval for the change in the average weight of brandy The following table shows the shoe size and respective number of shoes in a shoe market Shoe size Frequency 1 10 2 28 3 42 4 50 5 20 Find the population mean of shoe size with 95% confidence level. Statistical Analysis 18