Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
A Brief Review of Alternative Uses of Data Mining: Education, Engineering, & Others Kendra J. Ahmed, Mahbub K. Ahmed, Scott McKay Southern Arkansas University [email protected], [email protected], [email protected] Abstract Data mining is the process of finding hidden information in large amounts of data. The most common previous uses of data mining have been to help businesses to gain and maintain a competitive advantage as well as to answer questions, solve problems, or make informed decisions. Although, more recently other industries have been turning to data mining to answer questions and solve problems. Some of these industries that have been turning to data mining are engineering, medicine, education, physics, and more. In this paper a review or survey of alternative uses of data mining in education, medicine, and engineering were discussed. The future of data mining will only grow and expand from where it is currently because more and more technological advances will be made to aid in the data mining process and the increasing need for finding more hidden information in large amounts of data. Introduction and Overview Data mining is big business, which is no surprise to anyone who has heard of data mining. The data mining industry is estimated to reach $50 billion by 2017 according to Wikibon [1]. With the onset of more and more people using computers and computers becoming more powerful in their computing and storage capabilities, businesses especially have been able to accumulate copious amounts of data stored away in their data warehouses about their customers. This data is only going to grow as more and more data is collected by businesses, as computers get faster and more prominent in daily life, and as well as data warehouses increasing in size and decreasing in price. Data mining this stored data allows businesses to make predictions about future trends and customer behaviors; it goes beyond just analyzing what has happened in the past and allows predictions to be made about the future. It aids in helping businesses to make smarter decisions. This is a glimpse at the most popular and most obvious use of data mining today. However, what other industries could and are benefiting from data mining outside of the business industry? This paper will look briefly at the history of data mining, will then look at three different industries that are using data mining to solve problems and answer questions, and finally will take a brief look at the future of data mining. Brief History of Data Mining Data mining is a practice that has been gaining a lot of popularity in the past couple of decades; however this is not a new idea or practice. Data mining is defined as the process of discovering Proceedings of the 2015 ASEE Gulf-Southwest Annual Conference Organized by The University of Texas at San Antonio Copyright © 2015, American Society for Engineering Education patterns in large amounts of data or uncovering hidden information in large amounts of data [2]. The actual term of “data mining” began showing up in the late 1980’s and by the early 1990’s it was recognized as a subprocess in KDD, Knowledge Discovery in Databases [3]. In 2007, the term predictive analytics started to be used as well, and similarly in 2011 the term data science started to be used. Some other contemporary terms used for data mining include terms such as data archaeology, information harvesting, information discovery, and knowledge extraction. The actual practice of looking for hidden information in data began in the early 1700’s with Bayes’ Theorem. Bayes’ Theorem is a theorem in probability and statistics that relates current to prior beliefs and current to prior evidence [4]. This theorem is very important in the mathematical manipulation of conditional probabilities. In the 1800’s this manual extraction of mining of patterns from data continued with regression analysis. Regression analysis is a statistical process for estimating the relationship between variables [5]. Both of these manual data extraction methods have been used for centuries. The rise in popularity that data mining has seen since the 1990’s until today has been mostly due to the fact of the rapid increase in the number of computers and the advances in technology that have led to greater computer processing capabilities and increased data storage capabilities. These factors have made it possible for the interested party to collect more and more data and to more quickly analyze this data to produce answers to questions and problems, like a business on its customers, a medical facility on its patients, a university on its students, etc. Data mining has been increasingly becoming a more automated process that has been mainly spurred on by advances in computer science in areas like neural networks, cluster analysis, decision trees, decision rules, genetic algorithms, and support vector machines. The past decades’ popularity of data mining is only expected to continue. According to Wikibon, the data mining industry is expected to reach at least $50 billion by 2017 [1]. The popularity of data mining that started in the 1990’s most certainly continued because of the ways that businesses found to utilize and capitalize on this process with the technological advances in the 1990’s and 2000’s that made computers more powerful in their computing and storage capabilities, as computers became more prominent in daily life in the early 2000’s, and as data warehouses increased in size and decreased in price. The data storage in data warehouses are now being measured in terabytes and petabytes. Some of the world’s leading businesses like Apple, Walmart, and eBay use the largest data warehouses all being measured in petabytes. Apple for example, stores information on every customer and their interactions in iTunes so that Apple knows who’s who and what each one is up to [6]. Walmart’s data warehouse not only informs them about their customers but also, gives information to Walmart’s suppliers so they know how much space they have for their products on the shelves, which products are selling, and how fast they are selling [6]. Many other businesses are finding ways to collect and store data on their customers. A brand new exciting example comes from Disney. Disney this year launched their MagicBands that are equipped with GPS and NFC. These MagicBands are worn by guests on the Disney grounds. They allow customers to select a time to ride a ride and then jump to the front of the line at that set time. They also allow customers to do things like unlock Proceedings of the 2015 ASEE Gulf-Southwest Annual Conference Organized by The University of Texas at San Antonio Copyright © 2015, American Society for Engineering Education their hotel door, enter the parks, and access their Disney PhotoPass photos taken by Disney’s professional photographers. This allows Disney to track everything a particular visitor does while on the Disney grounds. This will permit Disney to understand what rides are increasing in popularity and which are decreasing in popularity, similarly which characters are increasing in popularity and which are decreasing in popularity, and what times customers are coming and going in the hotels and restaurants. From these examples it can be seen that businesses use data mining to make predictions like, “When I send out this next promotion, which of my customers are most likely to take advantage of my promotion” or “When a park visitor comes on the grounds what is the first thing they are most likely to visit.” Data mining has allowed businesses to make smarter and more informed decisions in the past. Survey of Alternative Uses of Data Mining Data mining has most famously been used in the business industry but the question is what other industries can benefit and utilize data mining. In recent years many different industries have been researching and beginning to implement data mining to help them answer questions and solve problems. Some examples of these alternative uses of data mining are in education, engineering, and medicine. Data Mining in Education The first area of alternative uses of data mining that this paper will look at is in the education industry. Data mining in the education industry is known as Educational Data Mining or EDM. EDM has been defined as an emerging discipline concerned with developing methods for exploring the unique types of data that come from the educational setting and using those methods to better understand students and the settings which they learn in [7]. With the popularity of online learning environments increasing, data mining in education is becoming more and more popular as data is becoming easier to collect. In many classes professors require that students work together and it is very important that students learn how to work well in teams. In one study by Kay et al data mining was used on student group interaction data to look for significant sequences of activity that could be used to build a tool that would flag interaction sequences that were indicative of problems as well as to flag interaction sequences that were indicative of team success. This tool would then be used to warn student teams of problems in their teamwork so that they could learn from their problems as well as to indicate improvement steps teams could make to ensure success. In another study they used data mining to look at two different data sets – interactions between students and their professors and interactions between fellow students. The study found similarities and differences in the way students interacted with their professors in online questions and the ways that they interacted with their fellow students in online chat messages, which also identified disciplinary differences in the students’ online participation. As well the study found a correlation between the number of questions a student asked and their final grade. This study suggests that using data mining and text mining for online learning data can produce Proceedings of the 2015 ASEE Gulf-Southwest Annual Conference Organized by The University of Texas at San Antonio Copyright © 2015, American Society for Engineering Education considerable insight into students’ learning behaviors. However the study also mentioned some of the shortcomings of data mining in education [9]. Data mining in education is a relatively new application of data mining and therefore does not work as seamlessly as it does in many business applications. In a different study by Pechenizkiy et al, more information about the shortcomings of data mining in education was discussed. They looked at the online assessment data of students and made some suggestions regarding data mining in education. One of the suggestions from the study was to tailor data mining to be a better fit for the education industry. This just goes to show that in the future there is a lot of potential for data mining in education Data Mining in Medicine The second area of alternative uses of data mining that this paper will look at is in the medical industry. In the medical industry there are abundant sources of data collection mostly in the form of Electronic Health Records or EHR’s, health insurance claims, medical imagining databases, disease registries, spontaneous reporting sites, and clinical trials. With all this data available data mining has become a critical part of the medical industry. According to Data Mining for Medicine and Healthcare, “On the one hand, EHR offers the data that gets data miners excited, however on the other hand, is accompanied with challenges such as 1) the unavailability of large sources of data to academic researchers, and 2) limited access to data-mining experts.” Data mining is being used in many different areas of medicine [11]. Data mining has a strong future in the area of health care systems as well to improve the patient care system in general [12] The first study that was looked at was in cardiovascular medicine and the study was done by Alizadehsani et al. Cardiovascular disease is one of the leading causes of death, which makes a correct and early diagnosis extremely important. The study talked about how currently angiography is the most accurate diagnosis method but is extremely costly and has many side effects for the patient. This study found that data mining algorithms led to a higher rate of accuracy as well as being less costly and having fewer side effects. Similarly to the previous study done by Alizadehsani et al, a study done by Breault et al was done on data mining and diabetes. Diabetes is another major health problem in the USA like cardiovascular disease. There has been a long history of making registries with diabetes patients which have been stored in databases and now being stored in data warehouses. Breault et al used data mining on one such diabetes data warehouse from New Orleans with over 30,000 diabetes patients. They used the data mining to look for new associations that would be helpful for clinicians especially in predicting who might become a diabetes patient in the future. Another study by Uçar et al looked to use data mining in the diagnosis of Mycobacterium tuberculosis. Currently the most accurate way to test for Mycobacterium tuberculosis is a phlegm test. The problem is that the test results take forty-five days to come back. This study wanted to use a data mining approach to diagnosis as accurately as possible as well as to help answer the question as to whether or not it is reasonable to start tuberculosis treatment on a suspected patient without waiting for the test results. The study’s results showed that their ANFIS model was an accurate and reliable method in classifying tuberculosis patients. Proceedings of the 2015 ASEE Gulf-Southwest Annual Conference Organized by The University of Texas at San Antonio Copyright © 2015, American Society for Engineering Education Data Mining in Engineering The third area of alternative uses of data mining that this paper will look at is in the engineering industry. Data mining is as important in engineering as in other industries. Many different areas of engineering such as manufacturing, material science, operational research, as well as engineering design can all benefit from data mining. In engineering as in many other areas of science, the advances in technology are making huge amounts of data common in these fields and as a result many scientists and engineers are turning to data mining to help find the hidden information in these abundant amounts of data. In the first study looked at, the authors put data mining to use to make a practical intelligent database. Physical properties of materials in various engineering simulations are often required. An intelligent database system of such materials properties was built and the architecture of such system was discussed in their study [16]. Another study by Yang et al talks about how technological advances in the processing of nanoceramics lead to the use of data mining. Recently in nanoceramics a synthesis platform based on the former HiTCH synthesis technology was developed. With the large number of nanoceramics being made and formulated into appropriate libraries, large amounts of useful data can be collected. The authors described the information flow system of RAMSI, the data mining system for supporting discovery, QSAR, and modeling and design of experiments. These included the clustering of Raman spectra, interpretation of XRD measurements, and QSAR model building linking XRD data and photocatalytic properties. In engineering, the manufacturing processes can be enhanced through the proper uses of the data mining. A study done by Çiflikli et al shows such improvement of a manufacturing process to gain a competitive advantage through the use of data mining in carpet production. As a result of their research the manufacturing process was redeveloped. Use of data mining in the area of renewable power generation such as wind turbines can be useful as well. A wide range of data related to wind properties and the control variables related to wind turbine performance can be analyzed using the data mining technique to enhance the performances. A data mining [19] and evolutionary strategy algorithm was used in a study to find a way to maximize the power output of a wind turbine. Their study shows that a pitch angle optimization can lead to a maximized wind power output. Data mining has a potential use of prediction and analysis of the vast amount of data related to mechanical behavior and thermo-physical properties of materials. A study [20] used data mining approaches successfully to predict mechanical properties such as uniaxial compressive strength and deformation modulus of granites. Different thermo-physical properties such as viscosity, heat capacity, thermal conductivity, density are predicted using a data mining approach to optimize the performance of a vapor compression refrigeration system [21]. One area of engineering that really has been taking off with data mining is the area of quality improvement programs. These programs require the collection and analysis of data to solve problems, which is a formula for success in data mining. A comprehensive literature review was Proceedings of the 2015 ASEE Gulf-Southwest Annual Conference Organized by The University of Texas at San Antonio Copyright © 2015, American Society for Engineering Education done by Köksal et al. They looked at literature from 1997 to 2007 and did analyses on some selected quality tasks were provided on data mining applications in the manufacturing industry. Data mining could have a potential use in the building automation system to improve the performance of the building operation. Xiao et al [23] performed such an investigation where they mined the data related to the automation system of the largest building in Hong Kong. Another study [24] shows that data mining based energy modeling can help to improve the energy efficiency of building design. Waste-water management is another important area in engineering where a data mining can improve a process. A European Union funded research group focused on improving the reliability and efficiency of monitoring and controlling in an anaerobic digestion water treatment plant via data mining focusing on four particular themes. The four themes that Dixon et al [25] focused on were “1.) experience gained in the data mining exercise, 2.) the use of confidence and prediction intervals, 3.) prospects for generalization over different sizes and types of anaerobic digester, and 4.) relationship to the overall supervision system development in the project.” Brief Look at the Future of Data Mining The future of data mining will continue to be very promising. As technological advances are made and computers become more powerful and data warehouse storage increases and prices decrease, the need for data mining to maintain or to gain a competitive advantage will only increase. Businesses and other industries will always be looking for ways to understand data and to find answers to questions and problems in that data. If businesses can make smarter and more informed decision it can help them to reduce costly mistakes and to conserve their resources. As mentioned before Wikibon estimates that by 2017 the data mining industry is expected to reach at least $50 billion, which demonstrates that the industry will continue to rapidly grow. In the near future data mining will probably not make such drastic changes. It will continue to be used mostly in business applications to help businesses to gain or maintain a competitive advantage and/or to answer questions or solve problems. Possibly in the moderate future we might see some changes maybe something like what has been seen in the past with other technological advances in that it becomes easier to use and more widely spread usage. The distant future is where data mining the most exciting changes to data mining will probably be seen. From the alternative uses discussed in this paper some examples that might be in the distant future is computers being used to find new ways to diagnosis as well as new ways to treat diseases. There is some concern with the future of data mining as more and more data is collected on people and that is the concern of privacy. This leads to the question of how much data recorded on a person is too much. As the future of data mining progresses and more and more data is recorded on people this is a question that will most likely come up in government. Proceedings of the 2015 ASEE Gulf-Southwest Annual Conference Organized by The University of Texas at San Antonio Copyright © 2015, American Society for Engineering Education Summary and Conclusion As it has been shown in this paper data mining has been gaining a lot of popularity not only in business applications but also in other areas such as education, engineering, and medicine. A survey of some alternative ways that data mining has been used in industries other than business was done. The paper started with a look at how data mining is being used in education to understand students, then looked at how data mining is being used in medicine to help doctors and researchers find new ways to diagnosis and treat patients, and finally looked at how data mining is assisting engineers to improve research methods, production methods, and in practical applications like intelligent databases. Data mining is not a new idea it is something people have been using to look for hidden information in large amounts of data, first manually and currently in more automated methods. In conclusion, data mining has a bright future especially as more and more technological advances are made and more and more alternative uses are found for data mining in areas other than business applications. References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. "Big Data Vendor Revenue and Market Forecast 2012-2017." Wikibon. 19 February 2013. Web. 30 November 2014. <http://wikibon.org/wiki/v/Big_Data_Vendor_Revenue_and_Market_Forecast_20122017>. "Data Mining." Wikipedia. Wikimedia Foundation, 27 November 2014. Web. 30 November 2014. <http://en.wikipedia.org/wiki/Data_mining>. Frans C., "Data mining: past, present and future." The Knowledge Engineering Review (2011): Vol. 26:1, 25–29. Cambridge University Press. 30 November 2014. "Bayes’ Theorem." Wikipedia. Wikimedia Foundation, 23 November 2014. Web. 30 November 2014. <http://en.wikipedia.org/wiki/Bayes%27_theorem>. "Regression Analysis." Wikipedia. Wikimedia Foundation, 25 November 2014. Web. 30 November 2014. <http://en.wikipedia.org/wiki/Regression_analysis>. "Why Apple, eBay, and Walmart Have Some of the Biggest Data Warehouses You’ve Ever Seen." Gigaom. Gigaom Inc, 27 March 2013. Web. 30 November 2014. <https://gigaom.com/2013/03/27/why-apple-ebay-and-walmart-have-some-of-the-biggest-data-warehousesyouve-ever-seen/>. "Educational Data Mining." International Educational Data Mining Society, Web. 30 November 2014. <http://www.educationaldatamining.org/>. Kay J., Maisonneuve N., Yacef K., Zaïane O. (2006) Proceedings of the Workshop on Educational Data Mining at the 8th International Conference on Intelligent Tutoring Systems (ITS 2006) (pp. 45-52) He W. (2012). Examining Students’ Online Interaction in a Live Video Streaming Environment Using Data Mining and Text Mining. Computers in Human Behavior. Pechenizkiy M.., Calders T., Vasilyeva E., De-Bra, P. (2008). Mining the student assessment data: Lessons drawn from a small scale case study. Educational Data Mining 2008, 187. "Overview" Data Mining for Medicine and Healthcare. Web. 30 November 2014. <http://www.dmmh.org/>. Ramon J, Fierens D., Güiza F. , Meyfroidt G., Blockeel H., Bruynooghe M., Berghe G.V.D., “Mining data from intensive care patients”, Advanced Engineering Informatics, Volume 21, Issue 3, July 2007 Alizadehsani R., Habibi J., Hosseini M.J., Mashayekhi H., Boghrati R., Ghandeharioun A., Bahadorian B., Sani Z.A., “A data mining approach for diagnosis of coronary artery disease”, Computer Methods and Programs in Biomedicine, Volume 111, Issue 1, July 2013, Pages 52-61 Proceedings of the 2015 ASEE Gulf-Southwest Annual Conference Organized by The University of Texas at San Antonio Copyright © 2015, American Society for Engineering Education 14. Breault J.L., Goodall C.R., Fos P.J., “Data mining a diabetic data warehouse”, Artificial Intelligence in Medicine, Volume 26, Issues 1–2, September–October 2002, Pages 37-54. 15. Uçar T., Karahoca A., “Predicting existence of Mycobacterium tuberculosis on patients using data mining approaches”, Procedia Computer Science, Volume 3, 2011, Pages 1404-1411 16. GUa Q., Zhong R., Ju D. “Development of materials database system for cae system of heat treatment based on data mining technology”, Transactions of Nonferrous Metals Society of China, Volume 16, Supplement 2, June 2006, Pages s572–s576 17. Yang Y., Lin T., Weng X. L., Darr J.A., Wang X.Z., “Data flow modeling, data mining and QSAR in highthroughput discovery of functional nanomaterials”, Computers & Chemical Engineering, Volume 35, Issue 4, 7 April 2011, Pages 671-678 18. Çiflikli C., Kahya-Özyirmidokuz E., ”Implementing a data mining solution for enhancing carpet manufacturing productivity”, Original Research Article”, Knowledge-Based Systems, Volume 23, Issue 8, December 2010, Pages 783-788 19. Kusiak A., Zheng H., Song Z., “Power optimization of wind turbines with data mining and evolutionary computation”, Renewable Energy, Volume 35, Issue 3, March 2010, Pages 695-702 20. Martins F. F., Begonha A., Braga M.A.S., ”Prediction of the mechanical behavior of the Oporto granite using Data Mining techniques”, Expert Systems with Applications, Volume 39, Issue 10, August 2012 21. Küçüksille E. U., Selbaş R., Şencan A., “Data mining techniques for thermophysical properties of refrigerants”, Energy Conversion and Management, Volume 50, Issue 2, February 2009 22. Köksal G., Batmaz I., Testik M. C., “A review of data mining applications for quality improvement in manufacturing industry”, Expert Systems with Applications, Volume 38, Issue 10, 15 September 2011, Pages 13448-13467 23. Xiao F., Fan C., “Data mining in building automation system for improving building operational “, Energy and Buildings, Volume 75, June 2014 24. Kim H., Stumpf A., Kim W., “Analysis of an energy efficient building design through data mining approachOriginal Research Article”, Automation in Construction, Volume 20, Issue 1, January 2011 25. Dixon M., Gallop J., Lambert S., Healy J., “Experience with data mining for the anaerobic wastewater treatment process”, Environmental Modelling & Software, Volume 22, Issue 3, March 2007, Pages 315322 Proceedings of the 2015 ASEE Gulf-Southwest Annual Conference Organized by The University of Texas at San Antonio Copyright © 2015, American Society for Engineering Education