Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Artificial Intelligence Opportunities and an End-To-End Data-Driven Solution for Predicting Hardware Failures by Mario Orozco Gabriel B.S. Chemical Engineering, Instituto Tecnologico y de Estudios Superiores de Monterrey, 2011 Submitted to the Department of Mechanical Engineering and MIT Sloan School of Management in partial fulfillment of the requirements for the degrees of Master of Science in Mechanical Engineering and Master of Business Administration at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2016 c 2016 Mario Orozco Gabriel. All rights reserved. The author herby grants MIT permission to reproduce and to distribute publicly copies of this thesis document in whole or in part in any medium now know or hereafter created. Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Department of Mechanical Engineering and MIT Sloan School of Management May 6, 2016 Certified by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kalyan Veeramachaneni Principal Research Scientist in Laboratory for Information and Decision Systems Thesis Supervisor Certified by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tauhid Zaman KDD Career Development Professor in Communications and Technology Thesis Supervisor Certified by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . John J. Leonard Samuel C. Collins Professor of Mechanical and Ocean Engineering Thesis Supervisor Accepted by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maura Herson Director of MBA Program MIT Sloan School of Management 2 Artificial Intelligence Opportunities and an End-To-End Data-Driven Solution for Predicting Hardware Failures by Mario Orozco Gabriel Submitted to the Department of Mechanical Engineering and MIT Sloan School of Management on May 6, 2016, in partial fulfillment of the requirements for the degrees of Master of Science in Mechanical Engineering and Master of Business Administration Abstract Dell’s target to provide quality products based on reliability, security, and manageability, has driven Dell Inc. to become one of the largest PC suppliers. The recent developments in Artificial Intelligence (AI) combined with a competitive market situation have encouraged Dell to research new opportunities. AI research and breakthroughs have risen in the last years, bringing along revolutionary technologies and companies that are disrupting all businesses. Over 30 potential concepts for AI integration at Dell Inc. were identified and evaluated to select the ones with the highest potential. The top-most concept consisted of preventing in real time the failure of hardware. This concept was investigated using a data science process. Currently, there exist a number of machine learning tools that automate the last stages of the proposed data science process to create predictive models. The utilized tools vary in functionality and evaluation standards, but also provide other services such as data and model storage and visualization options. The proposed solution utilizes the deep feature synthesis algorithm that automatically generates features from problem-specific data. These engineered features boosted predictive model accuracy by an average of 10% for the AUC and up to 250% in recall for test (out of sample) data. The proposed solution estimates an impact exceeding $407M in the first five years for Dell Inc. and all of the involved suppliers. Conservatively, the direct impact on Dell Inc. is particular to batteries under warranty and is expected to surpass $2.7M during the first five years. The conclusions show a high potential for implementation. Thesis Supervisor: Kalyan Veeramachaneni Title: Principal Research Scientist in Laboratory for Information and Decision Systems Thesis Supervisor: Tauhid Zaman Title: KDD Career Development Professor in Communications and Technology 3 Thesis Supervisor: John J. Leonard Title: Samuel C. Collins Professor of Mechanical and Ocean Engineering 4 Acknowledgments This work was possible thanks to the help and contribution from many friends. First, I would like to thank Dell for giving me the opportunity for this incredible internship in Austin. The team at Client Services was comprised of some of the most supportive people I have worked with. Many thanks to Nikhil Vichare, Leonard Lo, Steve Herington, Doug Reeder, Chad Skipper, Rick Schuckle, Jim White, Eugene Minh, Kevin Terwilliger, Catherine Dibble, Neil Hand, Charles Brooker, Christophe Daguet, and the rest of the CS team. Everyone’s support went beyond their work obligations and played a key part in the development of this project. A special thanks is in order to my supervisor Neal Kohl, who supported me since day one, helped me get familiarized with Dell and Austin, and made this experience a most positive one. He provided me with invaluable guidance, encouragement, and also demands to strive for excellence. Neal helped me define the scope, maintain focus, and set a strategy, among other crucial steps in the development of the project. Thanks to my advisors, Kalyan and Tauhid, who guided me through a complex field. Their recommendations helped me navigate through this multifaceted area and grasp key points. Also, thanks to Kalyan, whose support was fundamental especially when developing this data science process and evaluating different technologies. Thank you to MIT, LGO, and Mexico for allowing me to have enjoyed one of the most transformative experiences in my professional and personal life. I hope to provide someday such opportunities to more people and give back as much to the Institute, program, and my country. I would also like to thank all the LGO staff and LGO Class of 2016 who have made these two years unforgettable. Finally, I thank my parents Mario and Julia for their instrumental advice and the values they have instilled in me through their way of life, and also thank my sisters Julie, Regina, and Natalia for always cheering me on. My family has always been my unconditional foundation of support and inspiration. Through their constant encouragement, I was able to maintain focus, sail through the difficult times, and find motivation to always continue doing my best. This work is dedicated to them. 5 6 Note on Proprietary Information In order to protect proprietary Dell information, the data presented throughout this thesis has been altered and does not represent actual values used by Dell Inc. Any dollar values, hardware data, and product names have been disguised, altered, or converted to percentages in order to protect competitive information. 7 THIS PAGE INTENTIONALLY LEFT BLANK 8 Contents 1 Introduction 21 1.1 Motivation: if an oven can be smart, why can’t a computer be smart? 21 1.2 Problem description and goals . . . . . . . . . . . . . . . . . . . . . . 23 1.3 Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 1.4 Thesis overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2 Background 2.1 27 What are the connections between Data Science, Artificial Intelligence and Machine Learning? . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.2 Three factors that are making AI very relevant now . . . . . . . . . . 29 2.3 Dell Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3 Smart Machines: AI Opportunities for Dell 35 3.1 What is a Smart Machine? . . . . . . . . . . . . . . . . . . . . . . . . 35 3.2 AI taxonomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.3 AI framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.4 Opportunity detection . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.4.1 Opportunities within Dell . . . . . . . . . . . . . . . . . . . . 39 3.4.2 Why monitor hardware to prevent failures? . . . . . . . . . . . 39 3.4.3 Opportunities outside of Dell . . . . . . . . . . . . . . . . . . 40 4 Literature Review 4.1 43 Statistical methods in hardware failure . . . . . . . . . . . . . . . . . 9 43 4.2 Statistical and machine learning methods in hardware failure . . . . . 44 4.3 Machine learning methods in hardware failure . . . . . . . . . . . . . 46 5 The Data Science Pipeline 49 5.1 Raw data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.2 Study-specific data extract . . . . . . . . . . . . . . . . . . . . . . . . 50 5.3 Problem-specific data . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5.4 Data slices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.5 Labeled training data . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 5.6 Models and results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 6 Machine Learning Tools 65 6.1 Ingesting and preparing the data . . . . . . . . . . . . . . . . . . . . 66 6.2 Modeling and choosing parameters . . . . . . . . . . . . . . . . . . . 68 6.3 Evaluating the trained models . . . . . . . . . . . . . . . . . . . . . . 69 6.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 6.5 Key findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 7 Financial Impact Analysis of the Proposed End-To-End Data-Driven Solution 79 7.1 Incurred costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 7.2 Investment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 7.3 Impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 7.4 Going big . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 8 Conclusions 85 8.1 Key Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 8.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 8.3 Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 8.4 Future projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 8.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 10 Appendix A AI applicable concepts 11 91 THIS PAGE INTENTIONALLY LEFT BLANK 12 List of Figures 2-1 Exponential increase in data generation; 50-fold growth from 2010 to 2020. Greatly influenced by IoT. . . . . . . . . . . . . . . . . . . . . . 30 2-2 Increasing computing power as estimated by Ray Kurzweil. The y-axis describes the calculations per second (cps) per $1,000. As shown, $1,000 will be able to buy enough computing power as “one human brain” in the next few years. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2-3 Total venture capital money for pure AI startups. The USA and London lead in start-ups, with many in Canada, India, and China. About one half of the funding has gone into deep learning, one fourth into computer vision, and one eight into natural language processing (NLP). The most active funds have been Intel Capital, Techstars, 500 Startups, and Y Combinator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3-1 AI Framework and Market Potential. Today’s estimated $4.5B market is growing at 35% CAGR based on IDC’s Cognitive software platform forecast. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3-2 The categories and top-concept applications were evaluated and selected using the shown parameters. . . . . . . . . . . . . . . . . . . . . . . . 40 3-3 Summarized AI Competitive Analysis. There was a great deal of activity in “cognitive” platforms, “data” machine learning, and analytics. Some of the most relevant companies are shown in the figure. . . . . . . . . 41 5-1 Proposed data science process. . . . . . . . . . . . . . . . . . . . . . . 50 13 5-2 The frequency of dispatches of different components closely follows the Pareto principle. As seen, hard drives are the most frequently dispatched component. . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5-3 There exist practically two data generation and acquisition paths that are triggered by a daily rule or an unexpected alert or failure. . . . . 54 5-4 Problem-specific data file (newpd log data large.csv). It has the Computer ID, DataPoint ID, timestamps, and readings from multiple sensors. 57 5-5 Problem-specific data file (DispatchInfo Parsed.xls). It has the Computer ID, Dispatched Commodity (e.g. hard drive, motherboard, etc.), and the date of dispatch. . . . . . . . . . . . . . . . . . . . . . . . . . 57 5-6 Process of forming a slice of data corresponding to a specific computer and then removing the sensor readings after the dispatch date. These sensor readings cannot be used for building the predictive model as they happened after the dispatch event. . . . . . . . . . . . . . . . . . 59 5-7 Labeled training data table (DFSFeatures.csv) has the Computer ID field, 650 engineered feature fields, and Label. . . . . . . . . . . . . . 61 5-8 Labeled training data table (Naı̈veFeatures.csv) has the Computer IDfield, 110 sensor fields, and Label. . . . . . . . . . . . . . . . . . . . 61 5-9 Data science process and results. Note the reduction in size of data from 3.6GB to 40MB and 5MB by the time we bring it to a machine learning modeling method. . . . . . . . . . . . . . . . . . . . . . . . . 62 5-10 Proposed end-to-end data-driven solution. . . . . . . . . . . . . . . . 63 6-1 The AUC metric is the area under the ROC curve. . . . . . . . . . . 69 7-1 Summary of the financial yearly implications for both use cases and an estimated global impact. Different variations of these use cases can be built using the given information. . . . . . . . . . . . . . . . . . . . . 14 83 7-2 Sensitivity analysis of the impact of the model’s effectiveness. The call center’s call reduction impact is very relevant to Dell’s case. Generally, it is worth noting how the performance of the predictive model has a very important financial impact. Hence, the relevance of the data, generated features, modeling techniques and tuning settings. . . . . . 15 84 THIS PAGE INTENTIONALLY LEFT BLANK 16 List of Tables 5.1 DDSAC Summary between February 2015 and November 2015. . . . 5.2 A set of databases in Dell’s SQL servers. DCSAI (Dell Client Support 50 Assist Internal), DDSAC (Dell Data Support Assist Client), Dispatch Information (dispatched hardware components for DDSAC), SMART (Hard drive sensor readings). . . . . . . . . . . . . . . . . . . . . . . 51 5.3 Study-specific data extract description. . . . . . . . . . . . . . . . . . 55 5.4 Problem-specific data description. . . . . . . . . . . . . . . . . . . . . 56 5.5 Data slice file description. . . . . . . . . . . . . . . . . . . . . . . . . 58 5.6 Feature table description. 61 6.1 Data ingestion and preparation. For all tools, Labeled training data was . . . . . . . . . . . . . . . . . . . . . . . . uploaded as a csv file. Some tools offered options to select columns and features, handle missing values, among others. This table shows specific steps we took in each tool to prepare the data for machine learning. . 6.2 67 Modeling and parameter choices. The tools allowed different levels of customization for hyperparameters. Highlights: Skytree offered automatic modeling techniques and parameter tuning. IBM Watson did not allow setting parameters. . . . . . . . . . . . . . . . . . . . . 6.3 68 This table shows the evaluation techniques, which included cross validation for some tools, as well as splitting the data into train data and test data for building and evaluating the predictive models. Data was always split as: 70% for training and 30% for testing. Only three tools - Nutonian, Skytree, and Azure offered ability to do cross validation. . 17 70 6.4 Performance of different tools for our problem - Naı̈veFeatures. The predictive models created by each tool were evaluated with different datasets for their performance. The datasets were train data, which was comprised of 70% of the data in the original Naı̈veFeatures and test data, which was comprised of 30% of the data in the original Naı̈veFeatures and was not used to train the model. Some tools were evaluated with 100% of the data since they did not offer the option to split the data, as seen in table 6.3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 72 Performance of different tools for our problem - DFSFeatures. The predictive models created by each tool were evaluated with different datasets for their performance. The datasets were train data, which was comprised of 70% of the data in the original DFSFeatures and test data, which was comprised of 30% of the data in the original DFSFeatures and was not used to train the model. Some tools were evaluated with 100% of the data since they did not offer the option to split the data, as seen in table 6.3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 73 AUC DFSFeatures vs. AUC Naı̈veFeatures shows a vast improvement in the models’ predicting accuracy when using DFSFeatures instead of the Naı̈veFeatures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 74 Glossary of Terms AI Artificial Intelligence AGI Artificial General Intelligence ANI Artificial Narrow Intelligence AML Amazon Machine Learning API Application Program Interfaces ASI Artificial Super Intelligence AUC Area under the curve BP Backpropagation CPU Central processing unit CSV Comma Separated Value CT Classification trees DFS Deep feature synthesis algorithm DS Data Science DSM Data Science Machine FAR False alarm rates, 100 times probability value GB Gigabytes GBT Gradient Boosted Trees GLM Generalized Linear Models GPU Graphics processing unit HDFS Hadoop Distributed File System IoT Internet of Things IT Information Technology JSON JavaScript Object Notation KNN K-nearest neighbor NLI Natural Language Interaction NLP Natural Language Processing MARS Multivariate Adaptive Regression Splines ML Machine Learning 19 MLaaS Machine Learning as a Service OCS Operations and Client Services PMML Predictive Model Markup Language RT Regression trees SGD Stochastic Gradient Descent SVM Support Vector Machines RAM Random-access memory RBF Radial Basis Function networks RDF Random Decision Forests RDS Relational Database Service REST Representational State Transfer ROC Receiver operating characteristic SAAS Software as a Service SAC Support Assist Clients SMART Self-Monitoring and Reporting Technology SPSS Statistical Package for the Social Sciences SQL Structured Query Language WA Failure warning accuracy (probability) 20 Chapter 1 Introduction With the purpose of exploring the exponentially growing field of Artificial Intelligence (AI) applications, this thesis has the objective to present an overview of AI technologies and machine learning tools available today, as well as their specific application for the prevention of hardware failures. AI is enabling the new “smart” hardware and software applications that are disrupting businesses and the way we live our daily lives. AI is a broad term that will be explained later, but for the objective of this work, AI applied to a pragmatic business-applied purpose will be referred to as “Smart Machines” [1]. This is highly relevant in our time, as barriers to entry have been considerably lowered by the availability of open source AI algorithms and platforms, the increase in economical and scalable cloud computing power, and surge in data generation, which can easily enable anyone to become a “citizen data scientist.” 1.1 Motivation: if an oven can be smart, why can’t a computer be smart? Today, there is already a “smart” oven called June [2], which can precisely tell the difference between a chocolate and a raisin cookie, knows the weight, etc. to choose the right cooking time and temperature gradient to follow [3], while always improving its recipes. It has an NVIDIA central processing unit (CPU) and graphics processing 21 unit (GPU) along with two gigabytes (GB) of random access memory (RAM), Wi-Fi connectivity, a camera, and other sensors [2]. This is practically a computer. So, why aren’t our computers even smarter? The overarching goal behind this project is to explore the application of new data science tools that can be applied to different business cases and for Dell Inc.1 to take advantage of such opportunities in an extremely demanding market. This research also explores potential concepts that will allow Dell to develop a competitive and higher-quality product. The proposed concepts create value for the customer and provide a better customer experience. Currently, Dell has over 100 million deployed systems comprising desktops and laptops, and it is expected to ship around 20 million systems in the next year. Dell puts special emphasis to offer the most secure, manageable, and reliable products. Given this ambitious goal, there is always potential for improvement. Today, computer failures are mostly treated post-event, which creates a major problem for the user and an important monetary impact on the service providers. However, Dell has recently launched Support Assist for Clients (SAC), software that enables the user to capture hardware data coming from hundreds of attributes pertaining to the hardware components, type of alerts, and failure types in a desktop or laptop. The uses for this data can be quite varied and can provide valuable insights, such as system performance, influential interactions of hardware components and operation, etc. With this data, we can also potentially prevent hardware failures by warning the user, or even self-correcting based on real-time data that feeds into ever-improving predictive models. Just considering hardware cost and attending customers’ calls, we have estimated these hardware failures to have an annual yearly cost of over $900M across the world. There is an interesting opportunity to make the products more reliable by making them predictive or proactive, rather than reactive, through an advanced and dependable process. 1 Dell Inc. will be referred to as “Dell” throughout this work 22 1.2 Problem description and goals Hardware component failures in desktops and laptops can occur instantly, randomly, and without much warning. This affects Dell’s customers globally, and Dell itself. These failures can cause great inconvenience to the customer, going beyond the physical damage to the product and data by extending to invaluable losses in time, productivity, and critical activities that depended on the reliability of these products. Additionally, Dell allocates resources consisting of people, customer service organizations and facilities, and hardware components to replace the damaged parts. This entire infrastructure to remedy failures is also reflected in a heavy financial burden that has the potential of being reduced or, in the best scenario, eliminated, and transformed into a business opportunity. This thesis, instead of conducting a root cause analysis for the hardware failures themselves, focuses on seeking a solution to make desktops and laptops effectively robust against hardware failures. This will be done through a thorough analysis of available machine learning tools and testing them with Dell’s available data for hardware failures. There will be a special focus on the tools’ functionalities, available machine learning algorithms, tuning of hyperparameters, evaluation metrics, and other services, such as visualization options. 1.3 Hypothesis The hypothesis of this inquiry is that an appropriate model to predict hardware failures can be built using data collected from different hardware sensors in the computer. The expected outcomes are a variety of accurate models to be created with the different available machine learning tools. Additionally, if the first hypothesis is proven true, another hypothesis will be explored that states that labeled training data generated with the deep feature synthesis algorithm (DFS) [4] will result in higher accuracy models. Finally, after this analysis, the development of an end-to-end data-driven solution for the problem of preventing hardware failure will be proposed on hindsight. 23 1.4 Thesis overview Chapter 1 introduces the motivation for this project. Some context is given for the problem to be solved: preventing hardware failures. Lastly, the hypotheses are stated as: determining whether hardware sensor data can be used to prevent different hardware failures, and that features created with the “deep feature synthesis” algorithm improve the accuracy of the predictive models. Chapter 2 establishes the connection between data science, AI, and machine learning. It also describes the background and factors that are making AI an increasingly relevant and applied topic today. Dell’s background and interest in improving products through AI is explained. Chapter 3 explores different AI opportunities that are available for Dell. For this, a taxonomy and framework to understand AI is proposed and explained. In the last section, the opportunities within and outside of Dell are presented, and the importance of preventing hardware failure is discussed. Chapter 4 reviews literature on previous work on the subject of preventing hardware failure. Statistical and machine learning methods are reviewed. Chapter 5 proposes a data science process with six stages to follow an adequate path into solving prediction problems. The different stages and their connections are carefully explained. This section dives deeply into the available data, and especially the structure of the data, concerning hardware components in Dell’s desktops and laptops. Also, the data science process is applied to Dell’s particular problem to prevent hardware failures and focuses on hard drive failures due to their relevance and data availability. Chapter 6 depicts seven different machine learning tools (platforms) and software that are currently available to analyze data, create predictive models, evaluate these models, and understand relationships and influences, among other uses. These tools are utilized to clarify their different capabilities, and specifically tested to build models to prevent hard drive failures using two different “labeled training data” approaches. Chapter 7 highlights the financial impact that preventing hardware failure can 24 have across Dell and their suppliers. The last section emphasizes the value of the proposed end-to-end data-driven solution to prevent hardware failure in desktops and laptops on a global scale. Chapter 8 concludes this thesis with a summary of the key findings and contributions and provides recommendations that resulted from this work. Future potential projects at Dell are discussed and conclusions are presented. 25 THIS PAGE INTENTIONALLY LEFT BLANK 26 Chapter 2 Background 2.1 What are the connections between Data Science, Artificial Intelligence and Machine Learning? Data science spans a broad range of interdisciplinary fields that includes computer science, mathematics, statistics, modeling, analytics, and information science, among others. The goal of data science is to utilize techniques to extract valuable knowledge from data with the aid of automated processes and systems. William Cleveland first introduced the discipline of data science in 2001, when he integrated the advances in “computing with data” to the field of statistics [5]. Like data science, AI is a broad subject and sometimes nebulous to people as it quickly convolutes into a topic dealing with machines that can think, reason, make decisions, and act like a human or have even higher capabilities than humans. Many researchers quickly turn to the psychology and philosophy of learning and discerning as humans and how this learning applies to these machines. More concretely, Tim Urban classifies AI into three categories [6]: 1. Artificial Superintelligence (ASI) ASI ranges from a computer that is somewhat smarter across the board than 27 a human to one that is smarter than any combination of all human society including scientific creativity, social skills, and general wisdom. 2. Artificial General Intelligence (AGI) AGI is also referred to as “Strong AI,” or “human-level AI.” It refers to a computer that is as smart across the board as a human–a machine that can perform any intellectual task that a human being can. This category of AI means machines have abilities to plan, reason, solve problems, think abstractly, comprehend the complex, and learn quickly and from experience (inference). 3. Artificial Narrow Intelligence (ANI) ANI is also referred to as “Weak AI.” This AI specializes in just one area. This type already exists, and we use it in our everyday lives. All of these capabilities make AI a wildly interesting topic; however, this thesis will focus on ANI technologies that are applicable to current businesses. Machine learning is the most common technology, or set of algorithms, associated with AI as it has a very broad use. We clearly see the parallelism with AI as Andrew Ng, Chief Scientist at Baidu Research and professor at Stanford in the Computer Science Department, describes machine learning as “the science of getting computers to learn, without being explicitly programmed.” There are two major machine learning branches: supervised and unsupervised. 1. Supervised learning Uses labeled data and focuses on classification and prediction. Some examples of algorithms include artificial neural networks, decision trees, genetic algorithms (evolutionary algorithms), K-nearest neighbor (KNN), multivariate adaptive regression splines (MARS), random forests, support vector machines (SVM), etc. 2. Unsupervised learning Uses unlabeled data and focuses on clustering, dimensionality reduction, and 28 density estimations. Some examples of algorithms include the Eclat algorithm, K-means algorithm, expectation-maximization algorithm, etc. Each of these different branches has a plethora of different algorithms that vary in performance depending on the data and end goal. In addition, deep learning has been an increasingly popular variant of machine learning. Deep learning uses neural nets typically with more than two processing layers. The layers provide a “deeper” level of features from the data, which provides better classification and prediction performance. 2.2 Three factors that are making AI very relevant now John McCarthy coined the term “AI” in 1956 when the first academic conference on this subject was held [7]. AI went through an exploratory process with much of today’s theory based on concepts from decades ago. So, why is it is so significant now? The crossroads of three factors have made AI stir unprecedented interest and activity in the past years and become very relevant in the business world. These factors are available data, increased computing power, and powerful algorithms. Data generation has reached historically maximum levels, and most of the data generated is unstructured. As IBM states, “Every day, we create 2.5 quintillion bytes of data – so much that 90% of the data in the world today has been created in the last two years alone” [8]. Also, with the Internet of Things (IoT) on the rise, data is coming from an increasing variety of sources such as GPS, online posts, climate sensors, energy meters, transportation routes, the human body, etc. The amount of data and sources is expected to increase considerably. According to Strategy&, “the installed-base of internet-connected devices exceeded 7 billion in early 2014 and is expected to grow to 50 billion by 2020. More than 10 times the amount of personal computers” [9]. Even IDC mentions that by 2020, we will be producing 50 times more data than in 2011 as pictured in Figure 2-1 [10]. Therefore, we need systems that can 29 generate useful insight from all this data. Figure 2-1: Exponential increase in data generation; 50-fold growth from 2010 to 2020. Greatly influenced by IoT. Additionally, the computing power available for very low prices has played an important role in the processing of all the generated data. As seen in Figure 2-2 [11], the increase in computing technology follows an exponential growth curve, based on Moore’s law, and futurist Raymond Kurzweil correlates this growth to a part of his “Law of Accelerating Returns” [12]. According to Deloitte, “computing cost-performance has decreased by more than three orders of magnitude in the past 25 years” [13], making it a decline of 33% year-onyear. Improvements in hardware, such as CPUs and GPUs, have allowed the efficient processing of this much data. According to Gartner [14], GPUs have had a 10,000 times improvement since 2008, increasing the number of possible connections from 1 × 107 million to 1 × 1011 million. An example is NVIDIA’s new “Tesla GPU,” which processes 10-100x the application throughput of traditional CPUs [15]. Moreover, advanced neuromorphic architectures based on field-programmable gate arrays surpass GPUs three times in energy efficiency and according to Gartner, have a 70% increase in throughput with comparable energy used [16]. 30 Figure 2-2: Increasing computing power as estimated by Ray Kurzweil. The y-axis describes the calculations per second (cps) per $1,000. As shown, $1,000 will be able to buy enough computing power as “one human brain” in the next few years. Much of the structure of algorithms powering AI today existed years ago, but sufficient data and computer processing power were not easily available. Now, these algorithms have come to be used and proven very effective at tasks such as classification of structured and unstructured data, pattern detection, optimization, and predictive modeling, among other uses. Some examples of the types of algorithms used in AI are machine learning, deep learning, image recognition, natural language processing, etc. These algorithms have various uses such as data mining, text mining and search, expert systems, speech recognition and interaction, medical diagnosis, financial decisions, and fraud detection, among others. The key differentiators of these algorithms is that they are no longer programmed to “solve a problem,” but to “learn how to solve a problem.” However, these algorithms have also improved, and we can see this strategy in the powerful open source algorithms that exist (such as in R or Python programming) and how many large companies have open sourced their algorithms, such as Google 31 (TensorFlow), Facebook (Torch), Apple (Swift), Microsoft (DMTK – Distributed Machine Learning Toolkit), Netflix (Spinnaker), etc. With these enabling factors, automated procedures are being developed to gather and process data to derive valuable insights for actionable results. Given these factors, Gartner expects that “by 2017, the amount of citizen data scientists will grow five times faster than the number of highly skilled data scientists” [17]. The business community has taken notice. According to data retrieved from Gartner and Venture Scanner [18], over the course of 2015, more than 270 new startups focused on AI were founded and over $2 billion dollars were invested in the field, which is more than a 700% increase from the previous year, where it had even tripled as seen in Figure 2-3. Figure 2-3: Total venture capital money for pure AI startups. The USA and London lead in start-ups, with many in Canada, India, and China. About one half of the funding has gone into deep learning, one fourth into computer vision, and one eight into natural language processing (NLP). The most active funds have been Intel Capital, Techstars, 500 Startups, and Y Combinator. 32 2.3 Dell Background Looking further back, Dell’s history [19] provides useful context for what are now applicable AI opportunities. Michael Dell founded Dell Computer Corporation, then PC’s Limited, in 1984 with just $1,000 in his dorm room at the University of Texas at Austin. His vision was to change “how technology should be designed, manufactured, and sold.” He set out to sell directly to customers with a focus on service in order to truly understand their needs. Just four years later, Dell went public and raised $30 million and continued growing 80% per year during the first 8 years, almost gaining half of the marketplace. During the 1990s, Dell expanded globally to Europe, Asia, and the Americas, while also becoming the number one ranked PC business in the USA and number one worldwide for PCs in medium and large businesses. In the 2000s, the ecommerce site dell.com became one of the highest volume ecommerce sites with over $40 million in revenue per day. Shipments grew 60%, about four times as much as typical for industry players. Later on, Dell started to focus on offering end-to-end solutions, which he achieved by acquiring over 20 companies for $13 billion from 2006 to 2013. In 2013, Michael Dell bought back Dell. This event was at the time the biggest public company to return to private. Dell focused on four key areas: Operations and Client Services, Enterprise Solutions, Software, and Research. As always inspired to offer a great product to customers and with the ever-changing and fast-paced tech environment, Dell announced in 2015 plans to acquire EMC in the biggest acquisition in tech history for $67 billion [20]. Dell now has over 100,000 employees, and is one of the top three largest suppliers of PCs and laptops. The focus of this work will be within the Operations and Client Services (OCS) business in Dell. OCS accounted for over half of the company’s $60+ billion in revenue in 2013, and it consists of various main categories: notebooks, desktop PCs, thin clients, and client-related peripherals. OCS is the largest business unit within Dell, and with the tough market, Dell has been recently under constant pressure to innovate. Therefore, because of the nature of the market, any technologies or innovations that 33 will create a product with better performance and higher quality that can be priced above market will always be an attractive and also crucial opportunity to go after. This work focuses on the potential AI applications for desktops, notebooks, and thin clients. 34 Chapter 3 Smart Machines: AI Opportunities for Dell 3.1 What is a Smart Machine? As explained in the introduction, this work focuses on applied AI for businesses. Gartner states, “‘Artificial intelligence’ and ‘cognitive computing’ are not synonymous with ‘smart machines’ and may set unrealistic expectations for development and deployment. ‘Smart machines’ reflect a more diverse, pragmatic business purpose” [1]. The “Smart Machines” term means that systems will acquire the ability to train themselves, learn from their mistakes, observe the environment, converse with people and other systems, enhance human cognitive capabilities, and replace workers in routine tasks. All these tasks are done through the different, applicable AI technologies this work has been referring to. 3.2 AI taxonomy In order to better understand the opportunities that lie ahead, this work proposes a taxonomy to classify the different AI technologies and algorithms. The basis for this classification of technologies lies in their uses and applications. The proposed taxonomy for AI in this work is the following: 35 1. Machine Learning It can probably be considered the broadest and most applicable of the technologies due to its flexible interdependent nature. It is a subfield of computer science that derived from pattern recognition and computational learning theory. As previously explained, with this technology software can develop insights and features from data without it being explicitly programmed to do so. As mentioned, the two main branches are supervised and unsupervised learning (see 2.1). 2. Deep Learning It is a set of algorithms that coincide with machine learning. More specifically, it is a technology based on neural nets. The concept of neural nets was inspired by the biological functioning of the brain, which takes multiple inputs and different areas extract specialized parts of information. Deep learning models abstract data from more complex sources and utilize multiple “specialized layers,” which permit a deeper level of feature abstraction, such as prediction accuracy and classification. 3. Image Recognition It is also known as computer vision. It is the field that has specialized methods to gather, process, analyze, and understand images through particular algorithms. It utilizes a combination of physics, geometry, statistics, and learning theories in order to achieve recognizing images. 4. Natural Language Processing and Natural Language Interaction (NLP & NLI) It is the ability of a computer to understand text and human speech. This can be done through a combination of machine learning algorithms in addition to specific rules to follow. This allows a computer to understand the structure, extract meaning, process it, and produce natural language. 5. Prescriptive Analytics 36 It is the highest of the three levels of analytics. It combines advanced statistics and mathematics with data synthesis and computational science. Prescriptive analytics will make different predictions based on the data and then recommend different choices of action to implement depending on the expected outcomes and consequences. Second, predictive analytics involve advanced forecasting methods such as regression, which predict what will happen according to the data we have. Third, descriptive analytics is a term used for performing the basic analytics to understand past events and can generate information such as average, mean, median, standard deviation, etc. 3.3 AI framework In order to detect potential opportunities for Dell, a framework was developed to understand and classify AI as well as to understand market potential. The methodology to develop such framework was based on interviews and research. The proposed AI framework is layered in the following way: 1. Smart Infrastructure Includes the client (personal computers and laptops), storage, servers, networking, database and cloud management. This provides advanced infrastructure that scales beyond current capabilities of IT, where it is agile, simple, and automated for dynamic environments. 2. Smart Data Includes discovery and analytics software, cognitive platforms, and information management software. This layer is based on the platforms and tools such as machine learning, deep learning, prediction, visualization, etc. and combines features of databases, business intelligence, and comprehensive research. 3. Smart Apps and Services Includes end-user apps and services. This layer provides real-time insights 37 and recommendations, automation and enhancement of work, and predictive behavior. Figure 3-1: AI Framework and Market Potential. Today’s estimated $4.5B market is growing at 35% CAGR based on IDC’s Cognitive software platform forecast. Within the proposed framework, Dell clearly plays in the first and second layers of Infrastructure and Data, which importantly includes a large part of the market. However, it is important to capitalize on the “Smart” opportunities within these fields as shown in Figure 3-1. 3.4 Opportunity detection Having defined a taxonomy and framework for AI, two strategies, internal and external, were set to research opportunities for Smart Machines. The internal strategy was to gather all current Smart Machines or related AI initiatives as well as to conduct a brainstorming session with technical experts. The external strategy was to conduct an extensive market and competitive analysis to detect attractive use cases. 38 3.4.1 Opportunities within Dell A brainstorming session was conducted and later evaluation criteria were selected and thoroughly researched in order to prioritize the implementation of the concepts. The concepts were focused on the Operations and Client Services business at Dell. The brainstorming yielded 30+ potential concepts, which were classified in the following categories: security, serviceability, manageability, and productivity. The criteria used to evaluate these concepts consisted in: technology readiness including ease of implementation, financial impact, Intellectual Property (IP) potential, and business alignment within Operations and Client Services and Dell. For more details on the ranking, reference Appendix A. Further analysis revealed that 16 were completely new concepts, seven were in discussion processes, and the rest were already being developed within Dell in some form. 3.4.2 Why monitor hardware to prevent failures? After evaluating the different concepts, only brand new ideas were selected, as shown in Figure 3-2, and out of those, the top ranked concept from each category was selected for further investigation. These were: 1. Security – Environmental and contextual security 2. Serviceability – User self-help smart Q&A 3. Manageability – Self-management and self-healing (hardware failure prevention) 4. Productivity – Personal productivity enhancer Monitoring hardware for failure prevention stood out among the rest for several reasons consisting of: technology readiness (including data availability), financial opportunity, and Dell fit. On the technology side, Dell had started to actively collect sensor data from authorized systems, which included desktops and laptops, within the past year. This data consisted of the parameters within hardware sensors, which 39 Figure 3-2: The categories and top-concept applications were evaluated and selected using the shown parameters. are collected by the Support Assist for Clients software. Algorithms, such as machine learning to find failure patterns, were also readily available. Financially, this concept promised a very attractive opportunity, and seemed to be viable to implement in the short term. Lastly, there was alignment with two of Dell’s quality drivers as most manageable and most reliable to give the user a supreme customer experience. 3.4.3 Opportunities outside of Dell The global market for Smart Machines was also considered when evaluating opportunities and, for this, a thorough market and competitive analysis was performed. Using the established AI taxonomy, research was done to evaluate rising and also working concepts from leading companies and startups. To evaluate the potential for opportunities, companies, as shown in Figure 3-3, were evaluated against four criteria: 1. Product availability Low consisted of only trials and demos. Medium consisted of already working with less than 10 clients. High consisted of 1000+ users and/or 10+ use cases. 2. Company or product maturity Low equals that the company has been established between zero and three years. Medium refers to a timeframe between four and seven years. High is between eight and 14 years, and very high is 15+ years. 3. Size of company or product Very low is established as less than $5M in revenue or less than $10M in funding. 40 Low is between $5-$50M in revenue or between $10-$20M in funding. Medium is between $50-$500M in revenue or between $20-$100M in funding. High is between $500M-$5B in revenue or between $100-$500M in funding. Very high consists of $5B+ in revenue or $500M+ in funding. Growth rate of revenues was also considered. 4. IP Very low is established as less than 5 patents. Low is between 6-10 patents. Medium is between 11-50 patents. High is between 51-200 patents. Very high consists 200+ patents. Product Platforms “Data” Machine Learning Deep Learning Product Maturity Image Recognition NLP & NLI IP Comments Largest data sets, automation & security MS Azure – Machine Learning Most flexible, but priciest as it scales Google Prediction API Only to offer real-time training IBM Watson BlueMix: collection of AI APIs HP Haven Big Data analysis platform Context Relevant Specialized in finance and security Skytree Big customers: AMEX, Honda, Ebay,… Rapidminer Many patents reference this tool DataRPM 5 Partners: Cisco, Jaguar, Micropact,… Wise.io Specialized algorithm – faster Facebook Focus on image recognition, open-s SW Baidu Non-existing commercial Google C Microsoft Prescriptive Analytics Size AWS – Machine Learning Search, image, NLP & open-source SW Research: zero UI – invisible interaction GE Predix Platform: 4K developers; 20K next year Shyft Focus on Healthcare Ayata Used by: Dell, Cisco, Microsoft Y Hat Compatibility with R, Python, and Spark River Logic Focus on Healthcare Clarifai Image and Video recogn: e-commerce Dextro Video analysis: search, discover, security Sighthound Video analysis: security focus Nuance Well-established, powers Siri tech Idibon Works with: Samsung, UNICEF, others Cortical.io Work closely with Numenta Gridspace Focus on business conversations Figure 3-3: Summarized AI Competitive Analysis. There was a great deal of activity in “cognitive” platforms, “data” machine learning, and analytics. Some of the most relevant companies are shown in the figure. The conclusions on this competitive analysis show that the majority of the compa41 nies or products are very new and have been in the market for less than three years. Additionally, platforms are lowering the barrier of entry to the market as algorithms and technologies for Smart Machines are available on-demand to the public. Moreover, startups are focusing on a specific vertical to perfect their technology. The results demonstrated clear opportunities exist for Dell within “Data” machine learning as already-developed products exist and the maturity of the companies means there is room to grow and become an important player in this area. 42 Chapter 4 Literature Review In this chapter, seven relevant research papers are reviewed. These research papers deal with prediction models and methods that have been used for failure prediction in hardware systems, hard drives, and other computer systems. The prediction methods described in the papers involve statistical analysis and a variety of machine learning algorithms. A brief analysis of the work and conclusions of each will be presented in order to better understand the progress made in this subject. 4.1 Statistical methods in hardware failure Elkan and Hamerly [21] utilize naı̈ve Bayesian Classifiers in order to overcome the difficulty of lack of data, since hard drives fail approximately 1% per year. The naı̈ve Bayes method is a recognized supervised learning method that produces a classifier able to distinguish between two classes of data. Elkan and Hamerly studied the Self-Monitoring and Reporting Technology (SMART) system in hard drives, which is a failure prediction system to predict near-term failure. Typical SMART data includes variables such as power-on hours (POH), contact start-loops (CSS), seek errors in track servo (SKE), spinup time (SUT), etc. The SMART system can be regarded as a statistical hypothesis test based on each individual manufacturers’ developed thresholds. These thresholds are set based on testing and engineering knowledge about the operational parameters. According to Hughes et al. manufacturers estimate the 43 “failure warning accuracy” (WA), or true-positive rate, of these systems to be between 3-10%, with estimated 0.1% “false alarm rates” (FAR), or false-positive results [22]. The failure events in hard drives do not happen very often, which means a known statistical distribution is hard to achieve. Elkan and Hamerly achieved a 33-59% WA with 0.5-0.7% FAR, which is a higher WA than the typical SMART performance, but with higher FARs. Hughes, Kreutz-Delgado, and Murray also analyze [22] the SMART system in hard drives and propose a method that uses a distribution-free Wilcox rank-sum statistical test, since the problem deals with a rare-occurring event. This statistical test is recommended when failures are rare and false-positives are very costly. The proposed rank-sum method is used in combination with multivariate and ORing tests. The multivariate tests exploit the statistical correlations between attributes, and the ORing test simply uses a single attribute. The researchers achieve a 40-60% WA with 0.2-0.5% FAR, for both the multivariate and ORing tests [22]. 4.2 Statistical and machine learning methods in hardware failure Going forward, Murray, Hughes, and Kreutz-Delgado compare statistical and machine learning methods [23] to try to predict hard-drive failures in computers by utilizing special attributes from the SMART system. They compare non-parametric statistical tests (rank-sum, reverse arrangements, and a proposed algorithm) as well as machine learning methods (SVMs and unsupervised clustering). They propose a new algorithm based on the multiple-instance learning framework since this study is considered a two-class semi-supervised problem. Multi-instance learning deals with objects that generate many instances of data, and an object’s data is collected in a “bag,” which receives a discrete value of 0 or 1 depending on the prediction problem. Additionally, they use a simple Bayesian classifier and pair it with the multiple-instance framework to create the multiple instance-naive Bayes (mi-NB) algorithm. Feature selection for 44 the models is done through the statistical reverse arrangements test and selecting such features depending on their relevant z-scores. The prediction models are created with SVMs, unsupervised clustering (Cheeseman and Stutz’ Autoclass package [24]), rank-sum tests, and the mi-NB algorithm. They ran experiments using 25 attributes, single attributes, and a combination of attributes. The results show that SVMs provide the best performance with 51-70% WA and 0-6% FAR, followed by the rank-sum tests with 28-35% WA and 0-1.5% FAR, mi-NB with 35-65% WA with 1-8% FAR, and clustering with 10-29% WA with 4.5-14% FAR. Murray et al. highlight that the achieved results of non-parametric statistical tests are to be noted, since they come with great computational efficiency considering that SVMs take 100 times longer in training and testing [23]. Motivated by the growing complexity and dynamism of computer systems, Salfner, Lenk, and Malek [25] report a survey of over 50 methods for failure prevention in hardware. They developed a taxonomy for failure prediction approaches for hardware and software, which can be represented with the following stages: 1. Failure Tracking It is based on the occurrence of previous failures. Methods included are probability distributions and co-occurrence. 2. Symptom Monitoring It is based on periodical analysis of the system. The key concept is that through system monitoring, the side-effects of individual hardware degradation, which are very difficult to catch, can be detected. Methods included are function approximations (stochastic models, regression, and machine learning), classifiers (Bayesian and fuzzy), system models (clustered models and graph models), and time series analysis (regression, feature analysis, and time series prediction). 3. Detected Error Reporting It is a non-active method, driven by events logged in the input data when an error occurs. Methods included are rule-based approaches, pattern recognition, statistical tests, and classifiers. 45 4. Undetected Error Auditing is actively searching for incorrect states of the system that can potentially predict failures within the system. Each method applies differently depending on the prediction problem and use case the user wants. Salfner et al. [25] do not provide remarks on WA or FAR of the different methods as the main purpose is to provide a framework for methods and their use cases. They conclude that proactive fault management will be the key that enables the next generation of dependability improvements. 4.3 Machine learning methods in hardware failure Turnbull and Alldrin [26] explore the prediction of hardware failure for servers with the hypothesis of being able to use sensor data for such predictions. To predict failures they use the servers’ sensor logs for positive (failed) and negative (did not fail) cases. Each log is composed of individual entries that are recorded every approximately 75 seconds and contain sensor data, such as different board temperatures and voltages. These entries also record any failures. The feature vectors from these entries are extracted during a “sensor window,” which is a specific amount of time when entries are collected. Each feature vector is associated with a “potential failure window” that comes after the “sensor window” as time continues. These vectors are used as input in a Radial Basis Function (RBF) network, a type of a neural network, to solve a two-class classification problem. They use 18 hardware sensors to build 72 features that include the mean, standard deviation, range, and slope. They achieve an 87% WA with a 0.1% FAR. They conclude that sensor data can be used to predict failures in hardware systems. An important remark they also make is to always consider the cost of prediction accuracy in false positive rates as well as working in a context of infrequent hardware failure [26]. Zhu et al. [27] provide a new approach to predict failure in large scale storage systems through SMART data analysis. They propose a new Backpropagation (BP) neural network and an improved SVM that achieve higher WA with considerably low FARs. They utilize 10, out of the 23, SMART attributes and their change rates as 46 features. The SVM is built using LIBSVM [28] and achieves a 68.5-95% WA with a 0.03-3.5% FAR. The BP neural network is composed of three layers and 19, 30, and 1 nodes, respectively, with a target value of 0.9 for healthy drives and 0.1 for failed drives in the output layer node. This BP neural network model achieves a 95-100% WA with 0.5-2.3% FAR. Their proposed models achieve very high WA with some tradeoffs in the FARs. One of the most recent works in hard drive failure prediction developed by Li et al. [29] involves Classification Trees (CT) and Regression Trees (RT). They achieve some of the best WAs with low FARs, and provide additional capabilities for the hardware’s health assessment. To build the trees for the models, they utilize the SMART attributes, as well as their change rates, and their targets’ “good” or “failed” values as input features. For their models, they utilize “information gain” as the splitting function for the nodes. This split function looks through all the SMART attribute values and finds the best split that maximizes this “gain in information.” The gain in information is calculated by comparing the entropy of information that each node contains. They continue to grow the tree by recursive partitioning until a node contains only one class or the node will not satisfy the split conditions. Additionally, the RT model is built similarly to the CT, but instead of using the “information gain” to split, they utilize “minimum squares” about the mean. Also, the RT model uses a quantitative target value rather than a classification of “good” or “failed” to describe the drive’s health status. The CT model achieves a 94-96% WA with .01-.1% FAR and the RT model achieves a 63-96% WA with .01-.15% FAR. As seen in the previous section, the progress for predicting hardware failure encompasses many areas and great advances and work have been done particularly in modeling hard drive failures. Different statistical and machine learning methods are used depending on the data availability, computing processing efficiency, and the type of prediction problem to solve. 47 THIS PAGE INTENTIONALLY LEFT BLANK 48 Chapter 5 The Data Science Pipeline Having developed a solution and undertaken a systematic approach, we use this experience to explicate a data science process, in order to understand the different stages and steps required to go from a collection of data to a prediction model. Overall, the proposed process consists of six stages and steps, as shown in Figure 5-1. In the following subsections, we will go through each stage and explain it in the context of our particular goal: building a predictive model for hardware failure. 5.1 Raw data In a typical enterprise, data is generated by a diverse array of actors including sensors, people, and computers, and appears in a variety of different formats, ranging from logs to databases to flat files. This data is usually stored with no particular goal in mind. It is often either unorganized, or organized in such a specific manner that it cannot be broadly generalized. In our particular scenario, Dell generates raw data from many sources. We will focus on one in particular: customers who have authorized the collection of sensor data. This data is generated from hundreds of hardware sensors inside computer systems and is then uploaded to a database. One of the main databases containing this type of raw data is called DDSAC, or “Dell Data – Support Assist Client.” Table 5.1 lists a summary of the characteristics from a snapshot of the DDSAC database between 49 Figure 5-1: Proposed data science process. February 2015 and November 2015. Tables Total Entries (Rows) Data Space 40 23,588,020 3,571 MB Table 5.1: DDSAC Summary between February 2015 and November 2015. This data is organized in many different tables within the DDSAC database. Table 5.2 lists a few additional databases capturing data in Dell’s SQL servers. 5.2 Study-specific data extract Depending on which problem is of interest, this raw data could have many different uses. For example, one data scientist might be interested in predicting hardware failures, while another might wish to predict usage patterns. Solutions to both of 50 Stage 1Raw data Files Size (rows) Size (MB) DCSAI DDSAC Dispatch Information SMART 153.3M 23.6M 23,904 1.5M 10,344 3,571 1 617 Table 5.2: A set of databases in Dell’s SQL servers. DCSAI (Dell Client Support Assist Internal), DDSAC (Dell Data Support Assist Client), Dispatch Information (dispatched hardware components for DDSAC), SMART (Hard drive sensor readings). these problems could be derived from the same raw data. Once a particular goal is chosen, a data scientist’s first task is to extract data that is applicable to the relevant study or question. With a specific goal in mind–predicting hardware failures–we selected specific data tables that could contain useful information for solving this problem. In this particular case, we chose DDSAC1 , which contains the authorized external customers’ data. For these customers, we also have the Dispatch Information table, which has specific data on hardware dispatches, or shipments, for these customers. Since these dispatches are usually called in to replace a failed or malfunctioning component, we considered this information to imply that the component failed prior to its dispatch date. We then investigated the DDSAC database to select a subset of the 40 available tables. Below, we describe some key points of the data, in order to to better understand its structure and characteristics: 1. ID that uniquely identifies a computer system Computer ID is an attribute that uniquely identifies a specific system (a desktop or laptop) and its subcomponents across time.2 2. Tables with sensor data associated with each hardware component Within DDSAC, there is a corresponding table with sensor data specifically for 1 In contrast, we could not use an even bigger database–DCSAI, which holds seven times the amount of data points than DDSAC (over 150M rows vs. 23M rows). This data in DCSAI is from Dell’s internal employees computer systems. Even if it shows Alerts, there was no dispatch information available for these computer systems or no way to detect an actual failure. We selected DDSAC due to this associated dispatch information, which we considered to be a proxy for failure. 2 This information does not identify the user but only the specific hardware. It is classified as non-personally identifiable information (non-PII) at Dell and in the industry widely. 51 each hardware component being monitored. For example, Disk.dbo is a table for the hard drive that contains data from different sensors monitoring its use.3 Similarly, each table within the database contains a variety of fields; combined, they contain about 450 different fields. These tables either have a Computer ID or a DataPoint ID, which allows us to link them to a specific system. 3. DataPoint ID A DataPoint ID uniquely identifies a collection of data4 at a specific time point. The DataPoint ID is a key identifier, and exists in approximately 95% of the tables. In many cases, DataPoint ID is recorded alongside Computer ID, thus identifying exactly which system the data was collected from. When Computer ID is missing in any of the tables, it can be retrieved by finding all the occurrences of the corresponding DataPoint IDs and finding an associated Computer ID in any of these occurrences. Thus, we are always able to identify which computer system a particular data point belongs to. 4. SensorData table Even though there are individual tables recording the sensor data corresponding to the usage of an individual hardware component, the SensorData table has the data from different components’ sensors. The SensorData table has 132 different fields, which contain readings from sensors as well as basic sensor and system information. One of these fields is DataPoint ID. In the SensorData table, there are 919,667 DataPoint ID records (258,850 records are from desktops, 660,744 are from laptops, and the rest are unclassified). There are 342,287 unique Computer IDs. 101,649 of these are desktops, 240,607 are laptops, and the remainder of them are unclassified. 5. Dispatch table The other essential table for predicting hardware failure is “Dispatch Information.” 3 ePPIDs (Electronic ID) allow us to uniquely identify a subset of these individual components. For example, hard drives, batteries, wireless cards, and motherboards are tracked, while cables, fans, and monitors are not. 4 A collection of data is sensor readings at or up until that time point. 52 This is a list of parts dispatched to a customer, and includes the Computer ID, the dispatch date, and the dispatched parts. Each Computer ID may have one or multiple dispatched parts (e.g. motherboard and hard drive). The Dispatch Information table contains 7,348 unique5 Computer IDs. These 23,094 dispatch events come from the 342,287 total tracked computers on DDSAC, which shows around a 2.15% failure rate in about 9 months time. As can be seen in Figure 5-2, hard drives and motherboards are two of the parts most affected by hardware failure. Figure 5-2: The frequency of dispatches of different components closely follows the Pareto principle. As seen, hard drives are the most frequently dispatched component. Data collection process: As shown in Figure 5-3, hardware sensor data is generated and recorded through two types of events: continuous and discrete. In a continuous process, data is generated and recorded in the corresponding tables on a daily basis. In the discrete process, data is generated only when there is a trigger, such as an alert or failure, and is recorded in corresponding tables, such as the SensorData or Alert tables. Currently, all this recorded data is uploaded to the SQL server only 5 Out of the 23,094 total dispatch events, which do not necessarily mean failures, but we will consider hard drive dispatches as a proxy for hard drive failures. 53 when there is a trigger, leaving much of the data on the systems. Thus, we only have DDSAC data when there have been triggers and alerts. Figure 5-3: There exist practically two data generation and acquisition paths that are triggered by a daily rule or an unexpected alert or failure. To recap: • Our goal is to predict hardware failures. • This study considered all data captured in SQL servers from February 2015 through November 2015. • We selected the DDSAC database because dispatch information was available for the computers in this database. • Of the 40 different tables contained within the DDSAC database, for building predictive models for hardware failures, we selected just 25. Table 5.3 shows the characteristics of these tables. 54 Stage Files 2Study-specific data ex-• tract • • • • • • • • • • • • • • • • • • • • • • • • dbo dbo dbo dbo dbo dbo dbo dbo dbo dbo dbo dbo dbo dbo dbo dbo dbo dbo dbo dbo dbo dbo dbo dbo dbo Size (rows) Size (MB) Alert.csv AlertDetail.csv BTModule.csv Battery.csv Bios Internal Logs.csv Cable.csv CableChangeHistory.csv CrashInfo.csv DIMM.csv Disk.csv DTFan.csv HardwareChangeLog.csv LanAdapt.csv LogicalProcessor.csv Monitor.csv MSBugCodes.csv NBFan.csv Partition.csv SensorData.csv SMART.csv SystemUnit QITeam.csv SystemUnit.csv Thermistor.csv WlanAdapt.csv WWAN • • • • • • • • • • • • • • • • • • • • • • • • • 356,858 111,392 311,307 89,758 452,049 2,346,483 16,475 146,819 1,950,698 1,333,304 171,976 732,211 917,052 907,378 4,271,891 899,053 265 320,623 4,512,656 1,125,679 225,635 341,285 1,232,045 816,499 557 Table 5.3: Study-specific data extract description. 55 • • • • • • • • • • • • • • • • • • • • • • • • • 121 21 112 14 64 226 2 56 254 203 16 79 1,024 133 134 86 0 27 293 415 13 29 125 126 0 5.3 Problem-specific data Our next stage is to define a specific predictive problem. We have a number of options: for example, we could predict hard drive, motherboard, or battery failures. We chose to predict hard drive failures because this task fits three criteria: 1. The frequencies of dispatch for different components, as shown in Figure 5-2, show that hard drives are the most commonly dispatched component. 2. The hard drive is a critical system component containing locally-stored data and programs. 3. It provides the maximum possible training examples available for training a model. For this specific prediction problem, two tables, Dispatch Information and SensorData, are used. The Dispatch table provides data on hardware components shipped due to failure. To reiterate: 1. The Dispatch table gives us the time point when a hard drive was dispatched to a customer. 2. The SensorData table provides information regarding how the specific system (identified via Computer ID) was operating (through multiple data collection points identified by DataPoint IDs) before it failed. Table 5.4 shows a description of the files finally used for building predictive models for hard drive failures. Table 5-4 and Table 5-5 show a snapshot of these two files. Stage Files Size (rows) Size (MB) 3Problem-specific data • newpd log data large.csv • DispatchInfo Parsed.xls • 727,740 • 23,904 Table 5.4: Problem-specific data description. 56 • 633 • 1 Date ComputerID ComponentID Begining Ending Sensor 1 Sensor 2 ....... Sensor 114 X432121 X812114 ZK1434 . . . Figure 5-4: Problem-specific data file (newpd log data large.csv). It has the Computer ID, DataPoint ID, timestamps, and readings from multiple sensors. ComputerID Dispatch date X812114 ZZ4614 . . . 12/14/14 11/1/15 . . . Commodity Hard Drive Motherboard . . . Figure 5-5: Problem-specific data file (DispatchInfo Parsed.xls). It has the Computer ID, Dispatched Commodity (e.g. hard drive, motherboard, etc.), and the date of dispatch. 57 5.4 Data slices Cleaning and curation: Next, we prepare the data set for feature engineering and machine learning. First, the files need to be cleaned, linked, and curated. This process consists of the following steps: • Removing columns that had no data. • Removing columns that had no relation to hard drive failure. • Standardizing date and time formats. • Eliminating empty rows. • Linking, or reconciling files that had the same attributes under different names (such as Computer ID and ComputerID). Slicing: After curation and linking, we slice the data to remove unusable data or data that could interfere with the prediction model. Specifically, for each Computer ID we remove the sensor readings that correspond to time after the dispatch date. Thus for each Computer ID, we construct a data slice that has all the sensor readings prior to the dispatch date. Figure 5-6 shows the process of slicing and removing the post prediction data. Table 5.5 shows a summary of the sliced files, which are then taken through to the next stages. 4 Stage Files Size (rows) Size (MB) Data slice • Sensor data aggregated.csv • Outcomes.csv • 42,313 • 12,483 • 4 • 0 Table 5.5: Data slice file description. 58 Date ComputerID Begining X812114 X812114 X812114 X812114 X812114 X812114 10/3/14 10/8/14 10/22/14 12/12/14 12/24/14 1/24/15 Ending 10/3/14 10/8/14 10/22/14 12/12/14 12/24/14 1/24/15 Sensor 1 ....... Sensor 114 .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... Dispatch X812114 12/14/14 Hard Drive Figure 5-6: Process of forming a slice of data corresponding to a specific computer and then removing the sensor readings after the dispatch date. These sensor readings cannot be used for building the predictive model as they happened after the dispatch event. 59 5.5 Labeled training data A predictive model will need features to be trained with. With the data selected and refined, features are now generated from the sensor readings. Features can be generated in a broad number of ways, and feature generation is a field unto itself. A few possibilities include using the readings themselves, using mathematical functions to modify the readings, and running specialized algorithms. Regardless of the methods used to generate these features, the user or program will end up with labeled training data files. Each row in this data file is a labeled example (hardware failure or not) for a specific computer system, and has these features as columns. Traditionally, feature generation is a manual and time-consuming process. Though feature generation tools do exist, different tools offer different capabilities, and some popular data science toolkits don’t offer any at all. In this particular case, two different feature generation methods were considered to develop the labeled training data. The first method uses the deep feature synthesis algorithm, presented by Kanter and Veeramachaneni [4], to create relevant features for the model. Features created with this algorithm include applied functions to sensor readings. These functions are means, sums, standard deviations, and maximum and minimum values, among others. The output file is DFSFeatures.csv. In total, 650 different features were created with this algorithm. Table 5-7 shows the structure of the created table. In the second method, Microsoft Excel was used to filter the unique Computer IDs. We selected the first entry for every Computer ID in the file, organized in descending order by time. This data set represents the first reading of data available for prediction. Features generated using these first readings are considered naı̈ve features. The output file is Naı̈veFeatures.csv, which includes all sensor data, along with a label if the hardware failed as pictured in Table 5-8. Table 5.6 shows a description of the output files with the created features. 60 5 Stage Files Size (rows) Size (MB) Labeled training data • Naı̈veFeatures.csv • DFSFeatures.csv • 12,483 • 12,483 • 5 • 40 Table 5.6: Feature table description. DFS Features ComputerID Label Feature 1 Feature 2 ....... Feature 50 Figure 5-7: Labeled training data table (DFSFeatures.csv) has the Computer ID field, 650 engineered feature fields, and Label. Figure 5-8: Labeled training data table (Naı̈veFeatures.csv) has the Computer IDfield, 110 sensor fields, and Label. 5.6 Models and results The last stage consists of employing a machine learning algorithm, which utilizes labeled training data in order to train a model. Machine learning is a broad topic in itself, and as previously mentioned, there exist a plethora of model-building algorithms and approaches, depending on the problem type. Once a model is trained, it is scored and evaluated according to its ability to predict. Depending on our model’s accuracy (WA and FAR as explained in Chapter 4), we can continue iterating with other features and tuning algorithmic parameters until the desired accuracy is obtained. Different tools were used to create these models. We first used FeatureLab, which generated a tenfold cross validated model that achieved a 0.72±0.04 WA (AUC). 61 The model used a Stochastic Gradient Descent (SGD) algorithm. Some of the most relevant features were: • SUM(Readings.CPU 0 PCT) • LAST(Readings.AC Adapter Type W) • MEAN(Readings.S4 mins) • STD(Readings.Avg ThreadCount) Figure 5-9 shows a summary of the resulting files, tables, and characteristics that result from following data science process as previously described. Figure 5-9: Data science process and results. Note the reduction in size of data from 3.6GB to 40MB and 5MB by the time we bring it to a machine learning modeling method. After finalizing the data science approach, we developed this proposed data science process, and carefully laid out the details to enable this process to be applied to any 62 prediction problem. Figure 5-10 shows the characteristics of the proposed end-to-end data science solution that arises from this process. Figure 5-10: Proposed end-to-end data-driven solution. 63 THIS PAGE INTENTIONALLY LEFT BLANK 64 Chapter 6 Machine Learning Tools In this chapter, we utilize different platforms and software, referred to as “tools,” to generate predictive models for our problem, while taking note of their differences in capabilities. We utilize Microsoft Azure, Amazon Machine Learning, IBM Watson Analytics, Nutonian Eureqa, BigML, and Skytree. Each of these platforms expect labeled training data as input1 . The labeled training data are generated using two different methods: 1. We used FeatureLab’s Deep Feature Synthesis (DFS) algorithm [4] to generate features from the sensor readings and form labeled training examples. We call these DFSFeatures. 2. We created features by following a naı̈ve approach. We took the first available sensor readings for each data slice (see Section 5.4). We call these Naı̈veFeatures. Training data summary: It is important to note that the data from the data slices is imbalanced, as there are 9,878 cases of “0” and 2,604 cases of “1.” This means that 79% of the attributes in our target, “Label,” are non-failures, and 21% are failures. Goals: Each of these tools offer a variety of services for data in addition to machine learning and evaluation of the trained models. They could be broadly categorized into 5 functions: 1 Except FeatureLab, which can generate features and could start with the data slices as input. 65 1. Ingestion and data visualization. 2. Data preparation. 3. Modeling and tuning parameters for training models. 4. Model evaluation. 5. Model analysis. With our experiments, our core focus is on loading the data, training a machine learning model, and evaluating the trained model in terms of predictive accuracy. Thus we limit ourselves to stages two, three, and four. Our three goals are to evaluate: 1. Whether machine learning services provided by these tools can overcome the lack of feature engineering. 2. For the purposes of modeling and evaluation, what value these tools add compared to open-source tools like scikit-learn based machine learning software. 3. Which part of the end-to-end data science process these tools address. What we are not evaluating: 1. Usability and User interface. 2. Input data format and size. 3. Model analysis. 4. Deployment functionality. 5. Language integration. 6.1 Ingesting and preparing the data Each tool had different settings for uploading the labeled training data and manipulating it afterwards. The data preparation steps we performed in the specific tools are summarized in Table 6.1. 66 Table 6.1: Data ingestion and preparation. For all tools, Labeled training data was uploaded as a csv file. Some tools offered options to select columns and features, handle missing values, among others. This table shows specific steps we took in each tool to prepare the data for machine learning. Tool Microsoft Azure Machine Learning Availability SaaS Amazon Machine Learning SaaS BigML SaaS IBM Watson SaaS Nutonian Eureqa Desktop Skytree Desktop 2 Data Preparation -Drop columns with entire missing values -Missing values set to mean -Feature selection: Select top 45% of features using Pearson correlation between columns and label (target) -Column selection possible -Modify column data type (Computer ID to categorical, Label to Binary) -Reduced DFSFeatures to first 5,014 rows (40% of original size)2 -Used Azure to identify top 49 influential features using Pearson correlation between sensor data and label (target)3 -Normalize data4 -Missing values set to mean -None The tool imposed a constraint on the data size we could feed to it. This was because we were using a free version. 3 The limit on number of features to be used was placed by this tool. However, no feature selection process or guidance was provided. Again, this limitation was perhaps for the free version we were using. 4 For automatically-detected columns based on data; subtracted mean and divided by the standard deviation. 67 6.2 Modeling and choosing parameters Once the labeled training data was uploaded and prepared, we then built predictive models by setting the hyperparameters as listed in Table 6.2. Table 6.2: Modeling and parameter choices. The tools allowed different levels of customization for hyperparameters. Highlights: Skytree offered automatic modeling techniques and parameter tuning. IBM Watson did not allow setting parameters. Tool Azure ML Amazon ML BigML IBM Watson Nutonian Skytree Modeling Technique Decision tree Parameter Settings - Resampling method = bagging - Number of decision trees = 8 - Maximum depth of DT = 32 - Number of random splits per node = 128 - Minimum number samples per leaf node = 1 Neural Network - Number of hidden nodes: 100 - Learning rate = 0.1 - Number of learning iterations = 100 - Initial learning weights diameter = 0.1 - Momentum = 0 - Normalizer type = MinMax Logistic Regression - Maximum ML Model size = 2000MB - Maximum data passes = 100 - Regularization = L2 - Regularization amount = 1e-4 - Data shuffle = Auto Decision Tree - Object weights 1 to 19 (Label 1) - Object weights 19 to 1 (Label 0) - Node threshold = 512 Decision Tree - Automatic 5 Symbolic Regression - Chose formula building blocks: Constant, Input variable, Addition, Subtraction, Multiplication, Division, Logistic function, Step function, If-then-else, Minimum, Maximum Gradient Boosted - Automodel (can chose: GBT, RDF, GLM, Tree SVM). Skytree chose GBT6 - Search iterations = 10 - Classifier testing; F-score = 0.1 5 Symbolic regression is a specialized machine learning algorithm developed to identify non-linear functional relationships in the data. 6 Model parameters selected by Skytree. Parameters for FirstSensorData: Number of trees = 58, Tree depth = 0, Max splits = 14, Learning rate = 0.104905, Regularization bins = 141, Probability threshold = 0.2152. Parameters for DFSFeatures: Number of trees = 234, Tree depth = 4, Learning rate = 0.126931, Regularization bins = 192, Probability threshold = 0.1984. 68 6.3 Evaluating the trained models Next, we evaluated the trained models. They were evaluated against the data they were trained with (train data) and against the data they were not trained with (test data). Different tools offered different options for the evaluations. Some offered the ability to split the data into train and test via a parameter. Some offered the ability to do k − f old cross validation. The tools offered different settings as shown in Table 6.3. Evaluation metrics The trained predictive models were evaluated with different metrics, which were: 1. AUC (Area under the curve) The AUC is a way to measure the accuracy of the model. As seen in Figure 6-1, it is the total area under the ROC (receiver operating characteristic) curve. The ROC curve is plotted on a graph that represents the false-positive rate on the x-axis and the true-positive rate on the y-axis. It means that the area measures discrimination, which is the ability of the model to correctly classify one category against another. Figure 6-1: The AUC metric is the area under the ROC curve. 69 2. FPR (False-positive rate): The FPR is the percentage of examples belonging to the negative class that have been incorrectly classified as a positive result. It is calculated with the following equation: FP , (F P +T N ) where F P is the number of “false-positives” and T N is “true-negatives.” 3. Precision: This is also known as the “positive predictive value.” It is calculated with the following equation: TP , (T P +F P ) where T P is the number of “true-positives” and F P is the number of “false-positives.” 4. Recall: Recall is also known as the “true-positive rate.” It is calculated with the following equation: TP , (T P +F N ) where T P is the number of “true-positives” and F N is the number of “false-negatives.” Table 6.3: This table shows the evaluation techniques, which included cross validation for some tools, as well as splitting the data into train data and test data for building and evaluating the predictive models. Data was always split as: 70% for training and 30% for testing. Only three tools - Nutonian, Skytree, and Azure offered ability to do cross validation. Evaluation technique - Cross-validation with 10 folds - Splitting* Amazon Machine Learning - Splitting** BigML - Splitting IBM Watson - Automatic** Nutonian - Automatic cross-validation - Splitting* Skytree - Cross-validation with 10 folds - Splitting* * Offers option to change split settings ** Does not offer option to change split settings Tool Microsoft Azure Machine Learning 70 6.4 Results Each predictive model was evaluated against these four particular metrics. The models were evaluated against the train data (70% of the data) and the test data (30% of the data), as shown in Table 6.4 and Table 6.5. 71 Algorithm Decision Tree Neural Network Logistic Regression Decision Tree Gradient Boosted Trees AUC 0.639 0.617 0.997 0.770** 0.822 Precision 0.365 0.462 0.961 0.866 0.810 Train data - 70% FPR 0.010 0.011 0.010 0.011 0.014 Recall 0.027 0.046 0.932 0.272 0.235 AUC 0.641 0.629 0.610 0.592** 0.826 Precision 0.426 0.486 0.431 0.263 0.844 Test data- 30% FPR 0.010 0.010 0.010 0.020 0.012 Recall 0.037 0.046 0.028 0.027 0.238 Table 6.4: Performance of different tools for our problem - Naı̈veFeatures. The predictive models created by each tool were evaluated with different datasets for their performance. The datasets were train data, which was comprised of 70% of the data in the original Naı̈veFeatures and test data, which was comprised of 30% of the data in the original Naı̈veFeatures and was not used to train the model. Some tools were evaluated with 100% of the data since they did not offer the option to split the data, as seen in table 6.3. Tool Microsoft Azure Machine Learning Amazon Machine Learning BigML* Skytree GUI All data - 100% AUC FPR Precision Recall IBM Watson Analytics CHAID Decision Tree 0.442 0.284 0.664 Nutonian Eureqa Symbolic Regression 0.568** 0.015 0.050 * Estimated metrics based on creating an ROC curve from the confusion matrices by varying the label weights ** Graphically estimated 72 73 Algorithm Decision Tree Neural Network Logistic Regression Decision Tree Gradient Boosted Trees AUC 0.708 0.674 0.999 0.928** 0.974 FPR 0.011 0.009 0.010 0.003 0.015 Precision 0.647 0.637 0.963 0.977 0.935 Train data - 70% Recall 0.096 0.075 0.996 0.410 0.814 AUC 0.688 0.674 0.680 0.611** 0.980 AUC All data - 100% FPR Precision Recall IBM Watson Analytics CHAID Decision Tree 0.3181 0.623 0.346 Nutonian Eureqa Symbolic Regression 0.659** 0.015 0.098 * Estimated metrics based on creating an ROC curve from the confusion matrices by varying the label weights ** Graphically estimated Amazon Machine Learning BigML* Skytree GUI Microsoft Azure Machine Learning Tool FPR 0.013 0.011 0.010 0.067 0.014 Precision 0.581 0.619 0.646 0.391 0.944 Test data- 30% Recall 0.087 0.084 0.067 0.175 0.843 Table 6.5: Performance of different tools for our problem - DFSFeatures. The predictive models created by each tool were evaluated with different datasets for their performance. The datasets were train data, which was comprised of 70% of the data in the original DFSFeatures and test data, which was comprised of 30% of the data in the original DFSFeatures and was not used to train the model. Some tools were evaluated with 100% of the data since they did not offer the option to split the data, as seen in table 6.3. 6.5 Key findings Having utilized these different machine learning tools, several notable findings emerge. We divide these findings into two sets: the first provides insights about the model development process, while the second provides an evaluation of the machine learning components of these tools. Model development process 1. Feature engineering provides a significant boost in predictive accuracy. Features developed through the DFS algorithm provided valuable information to create more accurate predictive models7 . When using DFSFeatures as the labeled training data, there was a notable improvement in the AUC for the ROC curve. As seen in Table 6.6, feature engineering increased AUC by an average of 13% when models were evaluated on train data, and by an average of 11% for models evaluated with test data. Arguably, since the feature generation process used multiple sensor readings rather than just the first available sensor readings (as in naı̈ve features), this finding may be intuitive. Table 6.6: AUC DFSFeatures vs. AUC Naı̈veFeatures shows a vast improvement in the models’ predicting accuracy when using DFSFeatures instead of the Naı̈veFeatures. AUC improvement Train data - 70% Test data - 30% Azure ML - Decision Tree 10.8% 7.3% Azure ML - Neural Network 9.2% 7.2% Amazon ML 0.2% 11.5% BigML 20.5% 3.2% Skytree 19.7% 18.7% Average 12.1% 9.6% All data - 100% Nutonian Eureqa 16.1% 16.1% Total Average 12.8% 10.7% 2. Model selection and hyperparameter tuning cannot overcome the lack of feature engineering. Of all the tools, only Skytree offered the ability to 7 Although we used the deep feature synthesis algorithm in this thesis, we envision similar benefits could be achieved with manual feature engineering. 74 automatically select models and tune their hyperparameters as seen in Table 6.2. The “automodel” option selected a Gradient Boosting Tree algorithm to train the model out of the available classification algorithms8 . With Naı̈veFeatures it was able to generate a predictive model with an AUC of 0.83 (on test data, refer to Table 6.4), a recall of 0.24, at a FPR of 0.01, while other tools, even with the DFSFeatures, were only able to achieve a 0.72 AUC9 as shown in Table 6.5. However, when Skytree’s tool was given DFSFeatures it performed even better, achieving a 0.98 AUC and a 0.84 recall at a FPR of 0.01. This means an increase of 19% in the AUC and a 254% boost in recall, while maintaining an FPR of 0.01. Hence, while model tuning can enable better accuracies even from simple Naı̈veFeatures, there is a possibility that engineered features can enhance the accuracy and evaluation metrics even further. Evaluation of the tools 1. All these tools combined only solve one small stage of the data science endeavor. All of the machine learning tools needed the input of labeled training data to start training a machine learning model10 . Therefore, these tools can only be used after the completion of the first four stages of the data science process (described in Chapter 5 and shown in Figure 5-10). A majority of our time was spent in these first four stages, and we imagine that most data science endeavors are similar. Further, if we wish to change the prediction problem from predicting hard drive failures to motherboard failures, much of the work has to be done outside of these tools. Arguably, a tool that can enter the process earlier will provide more flexibility and agility. 2. A few lines of python code can replace the core machine learning functionality provided by these tools. A few lines of code written using a popular package in python called scikit-learn can perform most of the modeling functions these machine learning tools offer. These include splitting the data into 8 Skytree offered the option to chose between GBT, RDF, GLM, and SVM algorithms. 0.72 AUC was achieved using FeatureLab’s modeling method as reported in section 5.6. 10 Except FeatureLab, which can work with data slices and generate labeled training data. 9 75 train and data sets, performing cross validation, training a model, and testing a model. Below we provide a simple code11 snippet that, when provided a feature matrix in X and target label in y can perform all the functions mentioned here. 1 from s k l e a r n import svm 2 from s k l e a r n . c r o s s v a l i d a t i o n import c r o s s v a l s c o r e 3 from s k l e a r n . m e t r i c s import r o c a u c s c o r e 4 from s k l e a r n . c r o s s v a l i d a t i o n import t r a i n t e s t s p l i t 5 6 7 #S p l i t t i n g t h e data i n t o t e s t and t r a i n X t r a i n , X t e s t , y t r a i n , y t e s t = t r a i n t e s t s p l i t (X, y , t e s t s i z e =0.33) 8 9 10 11 #Choose a c l a s s i f i e r and i t s h y p e r p a r a m e t e r s c l f = svm . SVC( k e r n e l= ’ l i n e a r ’ , C=1) 12 13 14 # Perform c r o s s v a l i d a t i o n s c o r e s = c r o s s v a l s c o r e ( c l f , X t r a i n , y t r a i n , cv =10 , s c o r i n g=” r o c a u c ” , n j o b s =−1) 15 c v a u c = s c o r e s . mean ( ) 16 17 18 # Learn a c l a s s i f i e r v i a f i t method c l f . f i t ( X train , y t r a i n ) 19 20 # Test t h e c l a s s i f i e r 21 y predict = c l f . predict ( X test ) 22 print roc auc score ( y test , y predict ) 3. The tools provided other important services. However, these tools offered additional services. One consistent service among all SaaS-based offerings was persistent storage of data and models, allowing a user with an account to login and reuse the models that s/he previously trained. Additionally, these tools presented intuitive graphical user interfaces, and enabled easy evaluation and 11 The code snippet was provided by Max Kanter. 76 execution of models on new data. However, the tools varied a lot in visualization, exploration, and analysis techniques, both for data and models. 4. The tools varied in how they evaluated models. There is general agreement in the machine learning community about how to train and evaluate classification models, and it was surprising to see that these tools did not follow such standards. Some of them did not offer a capability to cross validate models. Some did not allow the user to specify the train/test split. Others did not allow splitting the data, and expected that the users do that outside the tool. This is described in Table 6.3. These variations made it harder to compare the accuracy of solutions that were developed on them. 5. For most predictive problems, the size of the labeled training data will likely be small. This allows for training a model on a desktop. Arguably, these tools were developed for large/big data sets and offer scalable approaches for training models. Hence, the tools cannot be compared to an open source machine learning software like the one above, since the previous machine learning software does not provide immediate scaling. However, in our experience, while the raw data and the study-specific data comprised whole gigabytes, by the time we selected the prediction problem, generated the features, and assembled the labeled training data, the data size shrank to a couple of megabytes, as seen in Figure 5-9. Data size aside, we only had 12,000 training examples - a dataset for which we could train models even on an off-the-shelf computer. 77 THIS PAGE INTENTIONALLY LEFT BLANK 78 Chapter 7 Financial Impact Analysis of the Proposed End-To-End Data-Driven Solution The potential to deploy an integrated end-to-end solution for hardware failure prevention can reap tangible monetary benefits for Dell as well as benefits such as better brand reputation and the providing of a better customer experience. Moreover, Dell’s suppliers are also substantially impacted by this problem. In order to quantify the financial opportunities, the following two business cases are presented with a conservative approach: 1. The first case encompasses Dell and the hardware component suppliers. 2. The second case only considers Dell. 7.1 Incurred costs Dell and the hardware suppliers currently have incurred costs of manufacturing or buying, storing, shipping failed hardware components to be replaced, attending to customers’ calls and requests, among others. As mentioned, two business cases are developed to estimate the incurred costs from hardware failures. 79 In the first case, a moderate approach is taken and only two relevant costs are considered. Theses costs that are taken into account are replacing the physical hardware components and handling the calls related to the failures. The weighted average hardware component cost for the analyzed laptops and desktops is assumed to be $34. “Laptops and desktops” will be referred to as “systems.” Based on the extracted and cleaned data, systems have an 8.2% yearly failure rate, but 33% of these failures are considered to be due to accidents [30]. Therefore, only 5.5% is the “effective” failure rate for hardware components. The user base consideration for this case is composed of two segments: new users and previous users. The new user base accounted accords with the deployments of new systems, which amounts to 18.8M new systems per year. From the previous user base only an estimated 30% of users are accounted for from the two previous years, as they are considered to have bought an extended warranty [31], which usually lasts three years. This brings the active user base to a total of 30M+ users. The hardware manufacturers, including Dell, cover all these systems under a warranty. Utilizing the previous numbers, the calculated cost for all hardware component suppliers exceeds $56M per year for failed hardware components. Another important cost in this case is attending to customer calls. At a calculated rate of 15M calls per year, an average call time of 25min, and a conservative cost of $17 per hour per call [32], the total cost for attending to these calls from failures exceeds $107M per year. Therefore, as seen in Figure 7-1, the overall estimated expense to Dell and their suppliers is approximately $163M per year for hardware failure components and handling the respective calls. On the second case, Dell covers batteries under warranty. The assumed battery cost is $25. The failure rate is considered to be 1.9% of hardware failures, which means that systems fail 0.16% of the time due to a battery failure. Also taking into account that 33% of the failures are due to accidents, the considered “effective” failure rate is 0.103%. This case considers the same user base for new users and previous users as the previous case, but also considers a mix of 50% laptops and the rest desktops, making the active user base 15M+ users. With the previous numbers and assumptions, the incurred hardware costs to Dell for battery failures are estimated at $0.4M per 80 year. Also, the more relevant cost component is the customer service to attend to calls when there is a battery failure. Considering the same assumptions for the call center as before, but considering only 0.3M calls for batteries, the cost to handle these failures is approximately $1M per year. Hence, the hardware and customer service costs for Dell to deal with battery failures are estimated to be $1.4M per year, as seen in Figure 7-1. 7.2 Investment The investment necessary to develop and deploy the data driven tool is estimated to be between $0.75M and $2.1M, depending on the reach of the process and service. These amounts are based on previous project experience and taking into account the required tasks and time to complete them by working function. Most of the investment cost consists in labor costs. As seen before, the technology to integrate this solution is already available on the market and could be utilized at very low cost. 7.3 Impact The proposed end-to-end solution can be utilized to effectively monitor the hardware components in a system and successfully predict failures with high WAs. In this case, the developed models with FeatureLab provide an accuracy of 70%+ in correctly predicting the failure of a hard drive, which is the main hardware component that fails. For the impact analysis, the accuracy of the developed model is extrapolated to the rest of the components. Taking a conservative approach, the prevention of hardware failing is assumed to have an overall 50% effectiveness, meaning that half of the failures that were predicted could be prevented with the proposed end-to-end solution. This assumption is approximately 70% of the predictions that the current and most optimized model in FeatureLab could achieve, which was 72%. This prevention will also reduce by 50% the number of calls for these failed components. For the first case, where all the suppliers and Dell are considered since all components are covered 81 by a warranty, the impact of the expected yearly savings on hardware components would be $28M, the customer service calls avoided would amount to $54M, and this would total to approximately $407M for the net savings in a five-year period. For the second case, where batteries are covered by Dell’s warranty, the yearly hardware savings would reach $0.2M, the customer service calls avoided for batteries would be $0.5M, and approximately $2.7M could be obtained in net savings during the first five years. Figure 7-1 shows the yearly impact if the process to prevent failures were as accurate as the model. This described impact is a conservative approach to the potential that this solution can have in hardware and customer service cost savings. More realistically, additional savings need to be considered in supply chain, storage costs, inventories, among others. 7.4 Going big The impact this end-to-end solution can have around the globe for this specific use case is worth considering. IDC forecasts shipments of 101M consumer desktops and 40M consumer laptops for 2016 and cites a previous user base of 209M consumer desktops and 89M consumer laptops from the past two years [33]. Taking into account that 30% of previous users have extended warranties, the active user base totals 230M+. An average hardware component’s cost is also assumed to be $34. Failure rates are considered the same, at 8.2% system failures per year, and with an accident rate of 30% leaves an effective failure rate of 5.5%. Considering hardware costs, the total impact hardware failures have around the globe exceeds $431M per year. Additionally, if we consider the same user base, same competitive call center costs, and a conservative 30% call rate for hardware failures globally [34] [35], the impact exceeds $490M. Therefore, the worldwide impact exceeds $920M per year on hardware and customer service calls. Utilizing the 72% accurate prediction model created with FeatureLab, the value of this solution for this specific case surpasses $660M per year. As previously mentioned, this does not take into account any saved costs in inventories, supply chain, working capital, other support systems, among others. 82 The financial impact that hardware failures have on manufacturers and service providers is of important consideration. There are definitely other considerable scenarios, where the players’ involvement and the solution’s scope varies, such as considering that an implementation of this predictive solution would reduce calls to Dell across all hardware components, not just batteries, and have a much bigger impact. These different scenarios could be constructed from the mentioned assumptions and from Figure 7-1, which demonstrates the impact of the results of this study to the different players. Figure 7-1: Summary of the financial yearly implications for both use cases and an estimated global impact. Different variations of these use cases can be built using the given information. Finally, a sensitivity analysis was developed to understand the variability on the financial impact depending on the accuracy of the model. The results show an important impact depending on the use case and the industrial player. In the first case, a 10% variability in the model to prevent hardware failures would impact Dell and their suppliers in $16M per year. In the second case, the impact of a 10% variability in the model would impact Dell in $0.1M per year. The biggest influence is the call center cost. These results for the models effectiveness are summarized in Figure 7-2. 83 Figure 7-2: Sensitivity analysis of the impact of the model’s effectiveness. The call center’s call reduction impact is very relevant to Dell’s case. Generally, it is worth noting how the performance of the predictive model has a very important financial impact. Hence, the relevance of the data, generated features, modeling techniques and tuning settings. 84 Chapter 8 Conclusions The analysis, results, and potential for implementation of an end-to-end solution demonstrate through this work an interesting potential for the proposed data science process and prevention of hardware failure through sensor data. Dell has many of the necessary tools and capabilities to implement such solutions. 8.1 Key Findings The findings of this work can be explained in two areas: prevention of hardware failure at Dell and evaluation of machine learning tools. Prevention of hardware failure at Dell The “quality of a model is as good as the data used in its construction,” [36] and so far this work has presented a potential solution with promise for a real and tangible impact for Dell, its customers, and its suppliers. Taking the lessons and observations from the current prevention system and the proposed solution, an improved end-to-end solution would require the following characteristics: 1. Real-time integration of SAC Data is currently stored in summary files in each system and mostly only extracted when a failure happens. The change needs to be implemented in the following two ways: 85 (a) Real-time extraction of data from the systems’ hardware sensors that would be collected on to the server. (b) Compressed storage of historical data from systems that is uploaded at defined dates. 2. Real-time integration of new data to constructed models As new data is imported from systems, the previously created models can be re-evaluated for improved prediction accuracy. 3. Real-time feedback to the user Models can be exported to a program in the users’ system (such as SAC), or run on the cloud, which would be used to monitor for potential hardware failures with high accuracy and fast response time to the user. Additional work with more historical data needs to be done in order to improve the developed prediction models for hard drive failures. One of the most limiting findings, while selecting the data, was that any failure that was going to be analyzed had no more history than the current log for the day when it failed. This was a limiting factor to the insight that could be obtained from this data. Machine learning tools The utilization of the different machine learning tools enabled us to compare their functionalities. This makes these tools particularly dependant on the required uses and goals. Among the tools, this study demonstrated the following findings: 1. Varied predictive modeling performances, with non-standardized metrics, and different visualization options. 2. The tools only solve the last stage of the data science process, needing much manual work in the earlier stages. 3. These tools are ready to scale, as they offer additional services such as data and model storage and updating. 86 4. The DFS algorithm demonstrated to have improved the accuracy of the built machine learning models against using a traditional naı̈ve approach. This method to use DFS-generated features translates into added value for solving prediction problems The machine learning tools provide a user-friendly and intuitive platform with easy access for anyone to become a “citizen data scientist.” However, the tools showed varied performance and functionalities and it is important for the user to know their modeling, metrics, and visualization requirements when choosing one of these tools. 8.2 Contributions This work proposes new concepts in the subject of AI, a data-driven solution, the prevention of hardware failure, and improving the performance of predictive models. The contributions of this work are summarized in the following statements: 1. Recommended an AI taxonomy and framework to facilitate the understanding of AI as a tool and as a strategy in the pragmatic business sense as “Smart Machines.” The taxonomy divided AI into the following technologies: machine learning, deep learning, image recognition, NLP and NLI, and prescriptive analytics. The taxonomy represented AI in three layers: smart infrastructure, smart data, and smart apps and services. 2. A competitive analysis of the different AI technologies and business applications was presented as well, as an estimation for the Smart Machines market. This showed potential opportunities specifically for machine learning applications. 3. This thesis proposes a data science process with six stages to follow when there is a goal to predict a specific occurrence. The specific stages to complete are: raw data, study-specific data extract, problem-specific data, data slices, labeled training data, and models and results. 87 4. We compared seven different machine learning tools and their predictive models for the particular problem of hard drive failures. The most accurate model with test data (out of sample data) was achieved using DFSFeatures by Skytree with an AUC of 0.98 and a recall of 0.84 at a FPR of 0.01. 5. We demonstrated that DFS-generated training data improved the performance of the built machine learning models. This increased the predictive accuracy, measured by the AUC, on an average of 13% for train data and an average of 11% for test data. 6. This work presents a new approach to prevent hard drive failures through data from sensors and following the proposed the data science process. 8.3 Recommendations Through this analysis, it is apparent that Dell has the necessary tools to implement such end-to-end data science solution in order to have a real impact in their business and their customers’ experience. Additionally, once the impact of a desired technology or prediction problem is understood, the proposed data science process can be followed for the development of new use cases. This data-driven process is transferable to other parts of the business and products. 8.4 Future projects There exist 16 new concepts that were developed during a brainstorming session at Dell, and three more that a detailed financial and technological analysis was performed. There are four immediate alternatives that can be pursued at Dell are: 1. Implementation of the proposed end-to-end solution for hardware failure prevention. 2. Further data validation of ideas, in particular the other top three concepts that were identified in: 88 • Security • Serviceability • Productivity 3. Research additional Smart Machines opportunities for Dell’s potential entrance in this market. 4. Identify a new prediction problem of interest and follow the proposed data science process to develop a new particular end-to-end solution. 8.5 Conclusions AI is no longer a far away futuristic field, but a real and tangible day-to-day technology that we are utilizing to enhance processes, business, and our daily lives. The amount of progress and new tools that have been developed during the past year has been fascinating. As seen throughout this work, Smart Machines and the data science process are a very applicable concept to businesses and there is great potential for a plethora of use cases. A great deal of technology is easily available through automated platforms and even open source software that are reducing the entry-barriers for new players that can disrupt the traditional businesses. However, there are many variations on the functionalities of the available machine learning tools and software. But very importantly, these tools provide realistic improvements to conventional processes and are a powerful alternative that can make of everyone a citizen data scientist. After having taken a deep dive into the wide, complex, and amazing field of AI, there was a great deal of learning that provided an entrance into a field that is changing the world as we know it. Utilizing the same core technology, such as machine learning, and following the proposed data science process, there is much opportunity for implementations in a variety of cases. These cases are very relevant to many areas such as preventive maintenance of all industries, financial applications, disease treatment and prevention in health care, prediction of the outcome for any process, among others. Dell and any business have many detected opportunities, and 89 this detailed analysis was for just one business case. There are many more uses that eagerly wait to be explored around us with this data science process and machine learning tools. 90 Appendix A AI applicable concepts 91 THIS PAGE INTENTIONALLY LEFT BLANK 92 Bibliography [1] K. F. Brant and T. Austin, “Hype Cycle for Smart Machines, 2015,” Gartner, Tech. Rep., Jul. 2015. [Online]. Available: http://www.gartner.com/document/3099920? ref=solrAll&refval=162870108&qid=74f1fe050fe3befc42150d6ba0d1149a [2] “June Oven.” [Online]. Available: https://juneoven.com [3] D. Pierce, “This Smart Oven Bakes Perfect Cookies Without Your Help,” Jun. 2015. [Online]. Available: http://www.wired.com/2015/06/june-oven/ [4] J. M. Kanter and K. Veeramachaneni, “Deep feature synthesis: Towards automating data science endeavors,” in Data Science and Advanced Analytics (DSAA), 2015. 36678 2015. IEEE International Conference on. IEEE, 2015, pp. 1–10. [Online]. Available: http://ieeexplore.ieee.org/xpls/abs all.jsp?arnumber=7344858 [5] M. J. Kane, “Cleveland’s action plan and the development of data science over the last 12 years,” Statistical Analysis & Data Mining, vol. 7, no. 6, pp. 423–424, Dec. 2014. [Online]. Available: http://libproxy.mit.edu/login?url=https: //search.ebscohost.com/login.aspx?direct=true&AuthType=cookie,sso,ip,uid& db=a9h&AN=99708587&site=eds-live [6] T. Urban, “The Artificial Intelligence Revolution: Part 1 - Wait But Why,” Jan. 2015. [Online]. Available: http://waitbutwhy.com/2015/01/ artificial-intelligence-revolution-1.html [7] C. Smith, B. McGuire, T. Huang, and G. Yang, “The History of Artificial Intelligence,” University of Washington, Tech. Rep., Dec. 2006. [Online]. Available: http://courses.cs.washington.edu/courses/csep590/06au/projects/ history-ai.pdf [8] “IBM - What is big data?” [Online]. Available: http://www-01.ibm.com/ software/data/bigdata/what-is-big-data.html [9] R. Viereckl, D. Ahlemann, A. Koster, and S. Jursch, “Connected Car Study 2015: Racing ahead with autonomous cars and digital innovation,” Strategy&, Tech. Rep., Sep. 2015. [Online]. Available: http: //www.strategyand.pwc.com/reports/connected-car-2015-study 93 [10] L. Mearian, “World’s data will grow by 50x in next decade, IDC study predicts,” Jun. 2011. [Online]. Available: http://www.computerworld.com/article/2509588/data-center/ world-s-data-will-grow-by-50x-in-next-decade--idc-study-predicts.html [11] M. Mackay, “The Future of Artificial Intelligence,” Apr. 2009. [Online]. Available: http://www.bit-tech.net/bits/2009/04/29/the-future-of-artificial-intelligence/1 [12] R. Kurzweil, “The Law of Accelerating Returns,” Mar. 2001. [Online]. Available: http://www.kurzweilai.net/the-law-of-accelerating-returns [13] J. Hagel, J. S. Brown, T. Samoylova, and M. Lui, “From exponential technologies to exponential innovation,” Deloitte Center for the Edge, Tech. Rep. 2, 2013. [Online]. Available: http://www2.deloitte.com/content/ dam/Deloitte/es/Documents/sector-publico/Deloitte ES Sector-Publico From-exponential-technologies-to-exponential-innovation.pdf [14] D. W. Cearly, M. J. Walker, and M. Blosch, “The Top 10 Strategic Technology Trends for 2015,” Gartner, Tech. Rep., Jan. 2015. [Online]. Available: http://www.gartner.com/document/2964518?ref=feed [15] “Machine Learning Applications for High Performance Computing - NVIDIA.” [Online]. Available: http://www.nvidia.com/object/machine-learning.html [16] D. W. Cearly, M. J. Walker, and B. Burke, “Top 10 Strategic Technology Trends for 2016: At a Glance,” Gartner, Tech. Rep., Oct. 2015. [Online]. Available: https://www.t-systems.com/news-media/ gartner-top-10-strategic-technology-trends-for-2016-at-a-glance/1406184 1/blobBinary/gartner top10.pdf [17] R. van der Meulen, “Why Should CIOs Consider Advanced Analytics?” Dec. 2104. [Online]. Available: http://www.gartner.com/newsroom/id/2950317 [18] “Artificial Intelligence. Landscape Report and Data.” [Online]. Available: https://www.venturescanner.com/artificial-intelligence [19] “Dell - Our history.” [Online]. Available: http://www.dell.com/learn/us/en/vn/ our-history [20] M. J. De La Merced, “Dell to Buy EMC in Biggest Tech Takeover, a Year in the Making,” The New York Times, Oct. 2015. [Online]. Available: http://www.nytimes.com/2015/10/13/business/dealbook/ dell-announces-purchase-of-emc-for-67-billion.html [21] G. Hamerly, C. Elkan, and others, “Bayesian approaches to failure prediction for disk drives,” in ICML. Citeseer, 2001, pp. 202–209. [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.89.3006& rep=rep1&type=pdf 94 [22] G. Hughes, J. Murray, K. Kreutz-Delgado, and C. Elkan, “Improved disk-drive failure warnings,” IEEE Transactions on Reliability, vol. 51, no. 3, pp. 350–357, Sep. 2002. [23] J. Murray, G. Hughes, K. Kreutz-Delgado, and D. Schuurmans, “Machine Learning Methods for Predicting Failures in Hard Drives: A Multiple-Instance Application,” Journal of Machine Learning Research, vol. 6, no. 5, pp. 783–816, 2005. [Online]. Available: http://libproxy.mit.edu/login?url=https: //search.ebscohost.com/login.aspx?direct=true&AuthType=cookie,sso,ip,uid& db=aci&AN=18004064&site=eds-live [24] P. Cheeseman and J. Stutz, “chapter Bayesian Classification (AutoClass),” in Advances in Knowledge Discovery and Data Mining. Menlo Park, CA: AAAI Press, 1995, pp. 158–180. [25] F. Salfner, M. Lenk, and M. Malek, “A Survey of Online Failure Prediction Methods,” ACM Computing Surveys, vol. 42, no. 3, pp. 10:1–10:42, Mar. 2010. [Online]. Available: http://libproxy.mit.edu/login?url=https: //search.ebscohost.com/login.aspx?direct=true&AuthType=cookie,sso,ip,uid& db=bth&AN=49069494&site=eds-live [26] D. Turnbull and N. Alldrin, “Failure prediction in hardware systems,” University of California, San Diego, Tech. Rep., 2003. [Online]. Available: http://cseweb.ucsd.edu/∼dturnbul/Papers/ServerPrediction.pdf [27] B. Zhu, G. Wang, X. Liu, D. Hu, S. Lin, and J. Ma, “Proactive drive failure prediction for large scale storage systems,” in Mass Storage Systems and Technologies (MSST), 2013 IEEE 29th Symposium on. IEEE, 2013, pp. 1–5. [Online]. Available: http://ieeexplore.ieee.org/xpls/abs all.jsp?arnumber=6558427 [28] C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector machines,” ACM Transactions on Intelligent Systems and Technology, vol. 2, no. 3, pp. 1–27, Apr. 2011. [Online]. Available: http://dl.acm.org/citation.cfm?doid=1961189. 1961199 [29] J. Li, X. Ji, Y. Jia, B. Zhu, G. Wang, Z. Li, and X. Liu, “Hard drive failure prediction using classification and regression trees,” in Dependable Systems and Networks (DSN), 2014 44th Annual IEEE/IFIP International Conference on. IEEE, 2014, pp. 383–394. [Online]. Available: http://ieeexplore.ieee.org/xpls/abs all.jsp?arnumber=6903596 [30] A. Sands and V. Tseng, “1 in 3 Laptops fail over 3 years,” SquareTrade, Inc., Tech. Rep., Nov. 2009. [Online]. Available: https://www.squaretrade.com/htm/ pdf/SquareTrade laptop reliability 1109.pdf 95 [31] “Consumer Reports: Extended Warranties,” ConsumerReports.org, Tech. Rep., 2010. [Online]. Available: http://www.consumerreports.org/cro/magazine-archive/2010/december/ electronics/best-electronics/extended-warranties/index.htm [32] “Global Efficient Call Center Salary,” 2016. [Online]. Available: //www.indeed.com/salary/Global-Efficient-Call-Center.html http: [33] J. Gaw, “Pivot Table: Worldwide Consumer Market Model, 20102019,” Aug. 2015. [Online]. Available: http://www.idc.com.libproxy.mit.edu/getdoc.jsp? containerId=258735 [34] K. Tofel, “Only 5 percent of IBM’s Mac and iOS users call support, compared to 40 percent of Windows users,” Oct. 2015. [Online]. Available: http://www.zdnet.com/article/ only-5-of-ibms-mac-and-ios-users-call-support-compared-to-40-of-windows-users/ [35] “Desktop operating system market share 2012-2015.” [Online]. Available: http://www.statista.com/statistics/218089/global-market-share-of-windows-7/ [36] Y. Ali, “A Fuzzy Model For Surface Grinding,” Ph.D. dissertation, The University of Sidney, Sidney, Mar. 2007. [Online]. Available: http://yasserali.org/?cat=30 96