Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Power and Limitations of Statistics in IS Research Goal is to ask more questions about IS statistics rather than to blindly accept them…. These Overheads were prepared and made available by Dr. Mary Lacity. 1 The Power and Limitations of Statistics in IS Research •On average, a company’s annual IT operating budget represents 5% of annual revenues. •80% of IS projects are delivered late and over budget or fail to deliver requirements. •The global IT outsourcing market is $120 billion annually. •There is no discernible relationship between IT investment and productivity. •6% of US and UK respondents outsource more than 80% of IT budget to third party suppliers. 2 Statistical Concepts Population Parameters and how they are estimated: Census Sample Random Sample Non-random Sample Statistical calculations: Mean (average) Mode Median Standard Deviation Statistical tests: Statistical significance Type I error: alpha value Type II error: beta value correlation t-test 3 Population Census of IS Professionals M M M M M F F F F F M M M F F F CENSUS results: Number of Males: Females: M M M M F F F M M M F F F M M M F F F PARAMETER of Interest: Sex: % of females M M F F F 20 Percentage of Males 50% 20 Females 50% 4 Sample of IS Professionals M M M M M F F F F F M M M F F F SAMPLE results: Number of Males: Females: M M M M F F F Sample of 5 People M M M F F F M M M F F F M M F F F MMM FF 3 Percentage of Males 60% 2 Females 40% 5 When Sample statistics adequately approximate population parameters: Population Mean Population Variance Population Median Sample mean Sample variance Sample median A sample statistic (such as mean) will be close to a population parameter if: ** Sample size is large enough ** Measuring instrument is good 6 ** Sample is random IS Professor Salaries: Is the measuring instrument adequate; is the sample random? PARAMETER of Interest: Average IS salary $$$$$$ ? Sample On average, IS professors make $68,702 7 IS Professor Salaries: Is the measuring instrument adequate; is the sample random? How confident are you in this number? $$$$$$ ? Http://www.pitt.edu/ galletta/1998sals.html $68,702 8 IS Professor Salaries: Is the measuring instrument adequate; is the sample random? How confident are you in this number? $$$$$$ ? Http://www.pitt.edu/ galletta/1999sals.html Average: $76,369 Look at the 1999 survey so far…what can we learn from actually looking at the data!!!!! 9 1999 IS Professor Salary Mean 40,000 50,000-55,000 55,001-60,000 60,001-65,000 65,001-70,000 70,001-75,000 75,001-80,000 80,001-85,000 85,001-90,000 90,001-95,000 95,001-100,000 150,000 1 4 2 4 11 17 13 8 7 4 2 1 74 = $76,369 Median = $75,000 (half salaries above this number, half below this number.) Mode: = $75,000 (most frequent salary cited) 10 1999 IS Professor Salary Frequency 18 16 14 12 10 8 6 4 2 0 40000 50,00055,000 55,00160,000 60,00165,000 65,00170,000 70,00175,000 75,00180,000 80,00185,000 85,00190,000 90,00195,000 95,001100,000 150000 Mean, Mode, and Median are nearly the same because the distribution approximates the normal distribution. 11 When are mean, median, and mode different? 14 12 12 Population is not normal 10 Number of Employees Mean: $5,700 Median: $3,000 Mode: $2,000 Salaries by Huff, p. 33 8 6 4 4 2 3 2 1 1 $45,000 $15,000 1 1 0 $10,000 $5,700 $5,000 $3,700 $3,000 $2,000 12 Standard Deviation 1 standard deviation includes 68% of data mean 13 Standard Deviation 2 standard deviations includes 95% of data mean 14 Standard Deviation: Does it get bigger or smaller as sample size increases? mean 15 Standard Deviation: Does it get bigger or smaller as sample size increases? n is large n is medium n is small mean As sample size n increases, the sampling distribution of sample mean gets closer to population mean. Also, the sampling distribution gets closer and closer to the normal curve as n increases. What is this 16 called? Central Limit Theorem Population Distribution Sample distribution if n is large 17 Type I and Type II Errors Assume this is the real population mean and standard deviation. When we take a sample, we get a sample mean and a sample deviation (or sample error). 18 Type I and Type II Errors Actual Population (which we usually don’t know) Sample 1 Sample 2 Sample 2 19 Type I and Type II Errors Our null hypothesis is: There is no difference between the population mean and sample mean In reality, population mean In reality, population does equal sample mean doesn’t = sample mean Sample selected indicates sample mean is different than population mean Sample selected indicates sample mean is same as population mean Type I error No Error No error Type II Error 20 Type I and Type II Errors Type I error: Probability of rejecting null hypothesis when indeed null was true Type II error: Probability of accepting null hypothesis when indeed null was false 21 Type I and Type II Errors Type I error: Probability of rejecting null hypothesis when indeed null was true In this picture, the sample mean is very close to the population mean, so we would get a t-test that is large and indicates: don’t reject the null hypothesis. 22 Type I and Type II Errors Critical value Type I error: Probability of rejecting null hypothesis when indeed null was true In this picture, the sample mean is far away from the population mean If we select a Type I error of .05, then we would reject the null hypothesis if sample mean was greater than critical mean identified 23 by the Type I error selected. Type I and Type II Errors Critical value Type I error: Probability of rejecting null hypothesis when indeed null was true Thus, we have about a 5% change of drawling a sample which indicates reject when we should have accepted the null hypothesis. 24 Type I and Type II Errors Type II probability Critical value Type II error: Probability of accepting null hypothesis when indeed null was false In this picture, assume we really sampled the wrong population. By chance, we might have a sample that tells us we did have correct sample when indeed we did not. . 25 When Sample statistics adequately approximate population parameters: Sample size How are we supposed to know this???? Desired sample size n = (confidence level selected * population from standard normal table)2 variance 26 acceptable error2 When Sample statistics adequately approximate population parameters: Sample size: An example Assume we want to take a sample of IS professor salaries and assume we know the standard deviation is $12,000. If we will accept a plus or minus $3,000 error, how large should the sample be? Desired = (confidence level selected * population sample size n from standard normal table)2 variance acceptable error value2 n = (1.96)2 * (12,000)2 $3,0002 n = ???? 27 28 Source: Gartner Group DataQuest as reported in World Almanac World-wide subscriptions to Cellular Phones in Millions 80 69.8 70 50 40 39 30 20.5 20 14 US Austria 2.3 South Korea Italy Sweden 0 3.1 Japan 1 Australia 1.9 Portugal 2.1 Singapore 5.9 2.9 Denmark 4.1 Israel 2.1 Hong Kong 2.9 Normay 10 Finland Number of subscribers 60 28 The semi-attached figure: Which country has highest cell phone adoption rate? Source: Gartner Group DataQuest as reported in World Almanac World-wide subscriptions to Cellular Phones 60 57 50 48 46 Percentage of Population 43 40 37 36 35 32 31 31 31 30 29 26 20 20 10 0 Finland Sweden Israel Denmark Portugal Japan Austria Normay Hong Kong Italy Singapore Australia South Korea US 29 The semi-attached figure: Which Internet Stock should I invest in? Most visited websites August 1999 Matrix Media as reported in World Almanac 35 33 29 28 25 14 12 12 amazon 14 14 angelfire 15 15 passport 18 18 hotmail 21 20 20 10 5 excite lycos microsoft go netscape geocities msn aol 0 yahoo Unique visits in millions 30 X-Axis 30 The One Dimensional Picture Excite Msn Msn.com had twice as many visitors as Excite.com 31 So where did this statistic come from??? On average, a company’s annual IT budget represents 5% of annual revenues It was a generally quoted statistic I heard over and over again. One example includes: Minoli, Analyzing Outsourcing, Re-engineering Information And Communication Systems, McGraw Hill, 1994. Data collected by author, but not much detail is given. My confidence comes from the fact that his results are similar to many other results from studies I’ve seen. 32 So where did this statistic come from??? 80% of IS projects are delivered late and over budget or fail to deliver requirements. It was a generally quoted statistic I heard over and over again.Some more formal studies found: AUTHOR # of Projects Lehman 1979 57 Gladden 1982 ??? Johnson 1995 365 Phan (1995) 143 FINDINGS 46% overdue; 59% over budget 75% systems not used or not completed 31% projects cancelled; 53% cost over-run; 12% delivered on time to budget 25% do not meet requirements 33 So where did this statistic come from??? The global IT outsourcing market is $120 billion annually This statistic was reported by International Data Corporation on http://www.outsourcing.com last year. However, sit no longer exists. I found the following quote on: http://www.infoserver.com/ .. [5].src = "images/news_faq_up.gif"; } // --> Company: PR Newswire Date of Post: 08-Aug-99 Type of Article: Market Trends Article Title: IDC Reports Worldwide Outsourcing Spending Approached $100 Billion in 1998 and Will Surge to Over $151 Billion by 2003 Summary: Worldwide outsourcing services ... 34 So where did this statistic come from??? There is no discernible relation between IT investment and productivity. Attempts to correlate investments in information technology to productivity have found no correlation or a negative correlation: A study of 60 manufacturing firms during the period of 1974-1984 failed to show a significant positive relationship between IT expense and productivity. A study of 58 mutual savings banks found no relationship between organizational performance and IT expense. An evaluation by the US Department of Commerce for the years 1950-1986 show a negative correlation between information technology and productivity. 35 So where did this statistic come from??? There is no discernible relation between IT investment and productivity. A research report by the Gartner Group revealed that firms that invested in office automation systems had exactly the same level of productivity in 1987 as they did in 1967. Japan and Europe have much higher office and service sector productivity than the US even though they have not computerized nearly as quickly as the US Peter Drucker observed that the number of office workers and clerical staff grow in proportion to investments in information technology. 36 So where did this statistic come from??? There is no discernible relation between IT investment and productivity. How can the paradox be correct? The paradox runs counter to intuition. We see the effects on productivity everyday--automated tellers, laser checkouts, fax machines, word processors, travel reservation systems. 1. Macroeconomic studies have no internal validity because the information technology/productivity paradox merely captures a correlation, not a causal relationship. Perhaps productivity would have suffered a major decline without investments in IT. 37 So where did this statistic come from??? There is no discernible relation between IT investment and productivity. 2. Macroeconomics considers worker productivity, not net benefits to society. For example, automated tellers may not correlate with higher banking productivity, but society as a whole benefits from convenient, 24-hour banking. 3. IT is like R&D, many projects will fail, but you only need a few to gain a big payoff. 38 So where did this statistic come from??? There is no discernible relation between IT investment and productivity. 4. Quinn & Baily outline flaws with macroeconomic numbers: Industry productivity only captures 42% of service sector employment 30% of the productivity figures equate output and input --which will be constant! Example: Input is budget, Output assumes an equivalent $ value for input. For example, if the police department’s budget is $5 million, it assumes they produced $5 million worth of law enforcement. 39 So where did this statistic come from??? •6% of US and UK respondents outsource more than 80% of IT budget to third party suppliers. This statistic came from a survey that Leslie Willcocks and I administered to the following sample: For US survey, 500 names of CIOs were obtained from a list maintained by Dun & Bradstreet Information Services. Only 38 people returned the survey. For UK survey, a list of 100 CIOs were compiled from various sources including Financial Times top 100 list, and members of the Oxford Institute of Information Management. 63 surveys 40 were returned from UK. So where did this statistic come from??? How confident are we in this 6% number? Other surveys (which will have their own biases and limitations, found a similarly low number of total outsourcing; most companies pursue selective sourcing: In a survey of 300 IT managers in the US, on average less than 10% of the IT budget was outsourced (Caldwell, 1996a) A survey of 110 Fortune 500 companies found that 76% spent less than 20% of the IT budget on outsourcing, and 96% spent less than 40% (Collins and Millen, 1995) A survey of 365 US companies found that 65% outsourced one or more 41 IT activities, but only 12 outsourced IT completely (Dekleva, 1994) Statistical Significance: a few surprises Using the same dataset, US and UK respondents to outsourcing surveys, let’s look at the avg company size: Average Annual Revenues converted to $US n = 113 respondents 12000 10995 10000 8000 $US millions However, there is no statistical difference at p=025 between US and UK revenues! How can this be, given US revenues are nearly 10 times larger! 6000 4000 2000 1311 261 US: $10,995,000,000 UK: $ 1,311,000,000 0 Scandinavia United States United Kingdom 42 Look at the standard deviation! Minimum Maximum Average Standard Deviation $US Revenues UK revenues in $US $30 million $1 million $168,800 million $12,000 million $10,995 million $1,311 million $29,158 million $2,728 million “Despite differences in means, a one-tailed t-test assuming heteroscedasticity at p=.025 level indicates that US and UK revenues are not statistically different. This finding is explained by the large standard deviation. 43 $0.00 $0.01 $0.02 $0.03 $0.04 $0.05 $0.06 $0.07 $0.08 $0.09 $0.10 $0.20 $0.30 $0.40 $0.50 $0.60 $0.70 $0.80 $0.90 $1.00 $1.10 $1.20 $1.30 $1.40 $1.50 $1.60 $1.70 $1.80 $1.90 $2.00 $2.10 $2.20 $2.30 $2.40 $3.50 $6.00 $7.00 $10.00 $10.40 $14.00 $15.00 $16.00 $32.00 $169.00 Frequency 8 7 6 5 4 US Frequency UK Frequency 3 2 1 0 Revenues in $US 44 Gotta!!!! The key is the level of significance for the probability of a type I error. Type I error = probability that we reject the null hypothesis when indeed the null is true. With a t-test, we are testing the null hypothesis that the US and UK revenues not different. At a selected p=.025, we are saying that we want the probability of rejecting the null hypothesis if indeed the null is true to be .025. 45 Gotta!!!! In reality, the calculated p value was .03 Thus, if our selected p value is .025, we only reject the null hypothesis if the calculated p value was less than .025. Thus I can conclude that US and UK revenues are different at .025 level. What do we conclude if selected probability of type I error is .05, the more usual probability selected? 46 Conclusions “How to talk back to a statistic”, Huff, 1982, pp. 122-142 Who says so? How does he know? Did Somebody Change the subject? Does It Make Sense? 47