Download A Brief Review of Alternative Uses of Data Mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
A Brief Review of Alternative Uses of Data Mining: Education, Engineering,
& Others
Kendra J. Ahmed, Mahbub K. Ahmed, Scott McKay
Southern Arkansas University
[email protected], [email protected], [email protected]
Abstract
Data mining is the process of finding hidden information in large amounts of data. The most
common previous uses of data mining have been to help businesses to gain and maintain a
competitive advantage as well as to answer questions, solve problems, or make informed
decisions. Although, more recently other industries have been turning to data mining to answer
questions and solve problems. Some of these industries that have been turning to data mining are
engineering, medicine, education, physics, and more. In this paper a review or survey of
alternative uses of data mining in education, medicine, and engineering were discussed. The
future of data mining will only grow and expand from where it is currently because more and
more technological advances will be made to aid in the data mining process and the increasing
need for finding more hidden information in large amounts of data.
Introduction and Overview
Data mining is big business, which is no surprise to anyone who has heard of data mining. The
data mining industry is estimated to reach $50 billion by 2017 according to Wikibon [1]. With
the onset of more and more people using computers and computers becoming more powerful in
their computing and storage capabilities, businesses especially have been able to accumulate
copious amounts of data stored away in their data warehouses about their customers. This data is
only going to grow as more and more data is collected by businesses, as computers get faster and
more prominent in daily life, and as well as data warehouses increasing in size and decreasing in
price. Data mining this stored data allows businesses to make predictions about future trends and
customer behaviors; it goes beyond just analyzing what has happened in the past and allows
predictions to be made about the future. It aids in helping businesses to make smarter decisions.
This is a glimpse at the most popular and most obvious use of data mining today. However, what
other industries could and are benefiting from data mining outside of the business industry? This
paper will look briefly at the history of data mining, will then look at three different industries
that are using data mining to solve problems and answer questions, and finally will take a brief
look at the future of data mining.
Brief History of Data Mining
Data mining is a practice that has been gaining a lot of popularity in the past couple of decades;
however this is not a new idea or practice. Data mining is defined as the process of discovering
Proceedings of the 2015 ASEE Gulf-Southwest Annual Conference
Organized by The University of Texas at San Antonio
Copyright © 2015, American Society for Engineering Education
patterns in large amounts of data or uncovering hidden information in large amounts of data [2].
The actual term of “data mining” began showing up in the late 1980’s and by the early 1990’s it
was recognized as a subprocess in KDD, Knowledge Discovery in Databases [3]. In 2007, the
term predictive analytics started to be used as well, and similarly in 2011 the term data science
started to be used. Some other contemporary terms used for data mining include terms such as
data archaeology, information harvesting, information discovery, and knowledge extraction.
The actual practice of looking for hidden information in data began in the early 1700’s with
Bayes’ Theorem. Bayes’ Theorem is a theorem in probability and statistics that relates current to
prior beliefs and current to prior evidence [4]. This theorem is very important in the
mathematical manipulation of conditional probabilities. In the 1800’s this manual extraction of
mining of patterns from data continued with regression analysis. Regression analysis is a
statistical process for estimating the relationship between variables [5]. Both of these manual
data extraction methods have been used for centuries.
The rise in popularity that data mining has seen since the 1990’s until today has been mostly due
to the fact of the rapid increase in the number of computers and the advances in technology that
have led to greater computer processing capabilities and increased data storage capabilities.
These factors have made it possible for the interested party to collect more and more data and to
more quickly analyze this data to produce answers to questions and problems, like a business on
its customers, a medical facility on its patients, a university on its students, etc. Data mining has
been increasingly becoming a more automated process that has been mainly spurred on by
advances in computer science in areas like neural networks, cluster analysis, decision trees,
decision rules, genetic algorithms, and support vector machines.
The past decades’ popularity of data mining is only expected to continue. According to Wikibon,
the data mining industry is expected to reach at least $50 billion by 2017 [1]. The popularity of
data mining that started in the 1990’s most certainly continued because of the ways that
businesses found to utilize and capitalize on this process with the technological advances in the
1990’s and 2000’s that made computers more powerful in their computing and storage
capabilities, as computers became more prominent in daily life in the early 2000’s, and as data
warehouses increased in size and decreased in price. The data storage in data warehouses are
now being measured in terabytes and petabytes. Some of the world’s leading businesses like
Apple, Walmart, and eBay use the largest data warehouses all being measured in petabytes.
Apple for example, stores information on every customer and their interactions in iTunes so that
Apple knows who’s who and what each one is up to [6]. Walmart’s data warehouse not only
informs them about their customers but also, gives information to Walmart’s suppliers so they
know how much space they have for their products on the shelves, which products are selling,
and how fast they are selling [6]. Many other businesses are finding ways to collect and store
data on their customers. A brand new exciting example comes from Disney. Disney this year
launched their MagicBands that are equipped with GPS and NFC. These MagicBands are worn
by guests on the Disney grounds. They allow customers to select a time to ride a ride and then
jump to the front of the line at that set time. They also allow customers to do things like unlock
Proceedings of the 2015 ASEE Gulf-Southwest Annual Conference
Organized by The University of Texas at San Antonio
Copyright © 2015, American Society for Engineering Education
their hotel door, enter the parks, and access their Disney PhotoPass photos taken by Disney’s
professional photographers. This allows Disney to track everything a particular visitor does
while on the Disney grounds. This will permit Disney to understand what rides are increasing in
popularity and which are decreasing in popularity, similarly which characters are increasing in
popularity and which are decreasing in popularity, and what times customers are coming and
going in the hotels and restaurants. From these examples it can be seen that businesses use data
mining to make predictions like, “When I send out this next promotion, which of my customers
are most likely to take advantage of my promotion” or “When a park visitor comes on the
grounds what is the first thing they are most likely to visit.” Data mining has allowed businesses
to make smarter and more informed decisions in the past.
Survey of Alternative Uses of Data Mining
Data mining has most famously been used in the business industry but the question is what other
industries can benefit and utilize data mining. In recent years many different industries have been
researching and beginning to implement data mining to help them answer questions and solve
problems. Some examples of these alternative uses of data mining are in education, engineering,
and medicine.
Data Mining in Education
The first area of alternative uses of data mining that this paper will look at is in the education
industry. Data mining in the education industry is known as Educational Data Mining or EDM.
EDM has been defined as an emerging discipline concerned with developing methods for
exploring the unique types of data that come from the educational setting and using those
methods to better understand students and the settings which they learn in [7]. With the
popularity of online learning environments increasing, data mining in education is becoming
more and more popular as data is becoming easier to collect.
In many classes professors require that students work together and it is very important that
students learn how to work well in teams. In one study by Kay et al data mining was used on
student group interaction data to look for significant sequences of activity that could be used to
build a tool that would flag interaction sequences that were indicative of problems as well as to
flag interaction sequences that were indicative of team success. This tool would then be used to
warn student teams of problems in their teamwork so that they could learn from their problems
as well as to indicate improvement steps teams could make to ensure success.
In another study they used data mining to look at two different data sets – interactions between
students and their professors and interactions between fellow students. The study found
similarities and differences in the way students interacted with their professors in online
questions and the ways that they interacted with their fellow students in online chat messages,
which also identified disciplinary differences in the students’ online participation. As well the
study found a correlation between the number of questions a student asked and their final grade.
This study suggests that using data mining and text mining for online learning data can produce
Proceedings of the 2015 ASEE Gulf-Southwest Annual Conference
Organized by The University of Texas at San Antonio
Copyright © 2015, American Society for Engineering Education
considerable insight into students’ learning behaviors. However the study also mentioned some
of the shortcomings of data mining in education [9].
Data mining in education is a relatively new application of data mining and therefore does not
work as seamlessly as it does in many business applications. In a different study by Pechenizkiy
et al, more information about the shortcomings of data mining in education was discussed. They
looked at the online assessment data of students and made some suggestions regarding data
mining in education. One of the suggestions from the study was to tailor data mining to be a
better fit for the education industry. This just goes to show that in the future there is a lot of
potential for data mining in education
Data Mining in Medicine
The second area of alternative uses of data mining that this paper will look at is in the medical
industry. In the medical industry there are abundant sources of data collection mostly in the form
of Electronic Health Records or EHR’s, health insurance claims, medical imagining databases,
disease registries, spontaneous reporting sites, and clinical trials. With all this data available data
mining has become a critical part of the medical industry. According to Data Mining for
Medicine and Healthcare, “On the one hand, EHR offers the data that gets data miners excited,
however on the other hand, is accompanied with challenges such as 1) the unavailability of large
sources of data to academic researchers, and 2) limited access to data-mining experts.” Data
mining is being used in many different areas of medicine [11]. Data mining has a strong future in
the area of health care systems as well to improve the patient care system in general [12]
The first study that was looked at was in cardiovascular medicine and the study was done by
Alizadehsani et al. Cardiovascular disease is one of the leading causes of death, which makes a
correct and early diagnosis extremely important. The study talked about how currently
angiography is the most accurate diagnosis method but is extremely costly and has many side
effects for the patient. This study found that data mining algorithms led to a higher rate of
accuracy as well as being less costly and having fewer side effects.
Similarly to the previous study done by Alizadehsani et al, a study done by Breault et al was
done on data mining and diabetes. Diabetes is another major health problem in the USA like
cardiovascular disease. There has been a long history of making registries with diabetes patients
which have been stored in databases and now being stored in data warehouses. Breault et al used
data mining on one such diabetes data warehouse from New Orleans with over 30,000 diabetes
patients. They used the data mining to look for new associations that would be helpful for
clinicians especially in predicting who might become a diabetes patient in the future.
Another study by Uçar et al looked to use data mining in the diagnosis of Mycobacterium
tuberculosis. Currently the most accurate way to test for Mycobacterium tuberculosis is a phlegm
test. The problem is that the test results take forty-five days to come back. This study wanted to
use a data mining approach to diagnosis as accurately as possible as well as to help answer the
question as to whether or not it is reasonable to start tuberculosis treatment on a suspected patient
without waiting for the test results. The study’s results showed that their ANFIS model was an
accurate and reliable method in classifying tuberculosis patients.
Proceedings of the 2015 ASEE Gulf-Southwest Annual Conference
Organized by The University of Texas at San Antonio
Copyright © 2015, American Society for Engineering Education
Data Mining in Engineering
The third area of alternative uses of data mining that this paper will look at is in the engineering
industry. Data mining is as important in engineering as in other industries. Many different areas
of engineering such as manufacturing, material science, operational research, as well as
engineering design can all benefit from data mining. In engineering as in many other areas of
science, the advances in technology are making huge amounts of data common in these fields
and as a result many scientists and engineers are turning to data mining to help find the hidden
information in these abundant amounts of data.
In the first study looked at, the authors put data mining to use to make a practical intelligent
database. Physical properties of materials in various engineering simulations are often required.
An intelligent database system of such materials properties was built and the architecture of such
system was discussed in their study [16].
Another study by Yang et al talks about how technological advances in the processing of
nanoceramics lead to the use of data mining. Recently in nanoceramics a synthesis platform
based on the former HiTCH synthesis technology was developed. With the large number of
nanoceramics being made and formulated into appropriate libraries, large amounts of useful data
can be collected. The authors described the information flow system of RAMSI, the data mining
system for supporting discovery, QSAR, and modeling and design of experiments. These
included the clustering of Raman spectra, interpretation of XRD measurements, and QSAR
model building linking XRD data and photocatalytic properties.
In engineering, the manufacturing processes can be enhanced through the proper uses of the data
mining. A study done by Çiflikli et al shows such improvement of a manufacturing process to
gain a competitive advantage through the use of data mining in carpet production. As a result of
their research the manufacturing process was redeveloped.
Use of data mining in the area of renewable power generation such as wind turbines can be
useful as well. A wide range of data related to wind properties and the control variables related to
wind turbine performance can be analyzed using the data mining technique to enhance the
performances. A data mining [19] and evolutionary strategy algorithm was used in a study to
find a way to maximize the power output of a wind turbine. Their study shows that a pitch angle
optimization can lead to a maximized wind power output.
Data mining has a potential use of prediction and analysis of the vast amount of data related to
mechanical behavior and thermo-physical properties of materials. A study [20] used data mining
approaches successfully to predict mechanical properties such as uniaxial compressive strength
and deformation modulus of granites. Different thermo-physical properties such as viscosity,
heat capacity, thermal conductivity, density are predicted using a data mining approach to
optimize the performance of a vapor compression refrigeration system [21].
One area of engineering that really has been taking off with data mining is the area of quality
improvement programs. These programs require the collection and analysis of data to solve
problems, which is a formula for success in data mining. A comprehensive literature review was
Proceedings of the 2015 ASEE Gulf-Southwest Annual Conference
Organized by The University of Texas at San Antonio
Copyright © 2015, American Society for Engineering Education
done by Köksal et al. They looked at literature from 1997 to 2007 and did analyses on some
selected quality tasks were provided on data mining applications in the manufacturing industry.
Data mining could have a potential use in the building automation system to improve the
performance of the building operation. Xiao et al [23] performed such an investigation where
they mined the data related to the automation system of the largest building in Hong Kong.
Another study [24] shows that data mining based energy modeling can help to improve the
energy efficiency of building design.
Waste-water management is another important area in engineering where a data mining can
improve a process. A European Union funded research group focused on improving the
reliability and efficiency of monitoring and controlling in an anaerobic digestion water treatment
plant via data mining focusing on four particular themes. The four themes that Dixon et al [25]
focused on were “1.) experience gained in the data mining exercise, 2.) the use of confidence and
prediction intervals, 3.) prospects for generalization over different sizes and types of anaerobic
digester, and 4.) relationship to the overall supervision system development in the project.”
Brief Look at the Future of Data Mining
The future of data mining will continue to be very promising. As technological advances are
made and computers become more powerful and data warehouse storage increases and prices
decrease, the need for data mining to maintain or to gain a competitive advantage will only
increase. Businesses and other industries will always be looking for ways to understand data and
to find answers to questions and problems in that data. If businesses can make smarter and more
informed decision it can help them to reduce costly mistakes and to conserve their resources. As
mentioned before Wikibon estimates that by 2017 the data mining industry is expected to reach
at least $50 billion, which demonstrates that the industry will continue to rapidly grow.
In the near future data mining will probably not make such drastic changes. It will continue to be
used mostly in business applications to help businesses to gain or maintain a competitive
advantage and/or to answer questions or solve problems. Possibly in the moderate future we
might see some changes maybe something like what has been seen in the past with other
technological advances in that it becomes easier to use and more widely spread usage. The
distant future is where data mining the most exciting changes to data mining will probably be
seen. From the alternative uses discussed in this paper some examples that might be in the
distant future is computers being used to find new ways to diagnosis as well as new ways to treat
diseases. There is some concern with the future of data mining as more and more data is
collected on people and that is the concern of privacy. This leads to the question of how much
data recorded on a person is too much. As the future of data mining progresses and more and
more data is recorded on people this is a question that will most likely come up in government.
Proceedings of the 2015 ASEE Gulf-Southwest Annual Conference
Organized by The University of Texas at San Antonio
Copyright © 2015, American Society for Engineering Education
Summary and Conclusion
As it has been shown in this paper data mining has been gaining a lot of popularity not only in
business applications but also in other areas such as education, engineering, and medicine. A
survey of some alternative ways that data mining has been used in industries other than business
was done. The paper started with a look at how data mining is being used in education to
understand students, then looked at how data mining is being used in medicine to help doctors
and researchers find new ways to diagnosis and treat patients, and finally looked at how data
mining is assisting engineers to improve research methods, production methods, and in practical
applications like intelligent databases. Data mining is not a new idea it is something people have
been using to look for hidden information in large amounts of data, first manually and currently
in more automated methods. In conclusion, data mining has a bright future especially as more
and more technological advances are made and more and more alternative uses are found for
data mining in areas other than business applications.
References
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
"Big Data Vendor Revenue and Market Forecast 2012-2017." Wikibon. 19 February 2013. Web. 30
November 2014. <http://wikibon.org/wiki/v/Big_Data_Vendor_Revenue_and_Market_Forecast_20122017>.
"Data Mining." Wikipedia. Wikimedia Foundation, 27 November 2014. Web. 30 November 2014.
<http://en.wikipedia.org/wiki/Data_mining>.
Frans C., "Data mining: past, present and future." The Knowledge Engineering Review (2011): Vol. 26:1,
25–29. Cambridge University Press. 30 November 2014.
"Bayes’ Theorem." Wikipedia. Wikimedia Foundation, 23 November 2014. Web. 30 November 2014.
<http://en.wikipedia.org/wiki/Bayes%27_theorem>.
"Regression Analysis." Wikipedia. Wikimedia Foundation, 25 November 2014. Web. 30 November 2014.
<http://en.wikipedia.org/wiki/Regression_analysis>.
"Why Apple, eBay, and Walmart Have Some of the Biggest Data Warehouses You’ve Ever
Seen." Gigaom. Gigaom Inc, 27 March 2013. Web. 30 November 2014.
<https://gigaom.com/2013/03/27/why-apple-ebay-and-walmart-have-some-of-the-biggest-data-warehousesyouve-ever-seen/>.
"Educational Data Mining." International Educational Data Mining Society, Web. 30 November 2014.
<http://www.educationaldatamining.org/>.
Kay J., Maisonneuve N., Yacef K., Zaïane O. (2006) Proceedings of the Workshop on Educational Data
Mining at the 8th International Conference on Intelligent Tutoring Systems (ITS 2006) (pp. 45-52)
He W. (2012). Examining Students’ Online Interaction in a Live Video Streaming Environment Using Data
Mining and Text Mining. Computers in Human Behavior.
Pechenizkiy M.., Calders T., Vasilyeva E., De-Bra, P. (2008). Mining the student assessment data: Lessons
drawn from a small scale case study. Educational Data Mining 2008, 187.
"Overview" Data Mining for Medicine and Healthcare. Web. 30 November 2014.
<http://www.dmmh.org/>.
Ramon J, Fierens D., Güiza F. , Meyfroidt G., Blockeel H., Bruynooghe M., Berghe G.V.D., “Mining data
from intensive care patients”, Advanced Engineering Informatics, Volume 21, Issue 3, July 2007
Alizadehsani R., Habibi J., Hosseini M.J., Mashayekhi H., Boghrati R., Ghandeharioun A., Bahadorian B.,
Sani Z.A., “A data mining approach for diagnosis of coronary artery disease”, Computer Methods and
Programs in Biomedicine, Volume 111, Issue 1, July 2013, Pages 52-61
Proceedings of the 2015 ASEE Gulf-Southwest Annual Conference
Organized by The University of Texas at San Antonio
Copyright © 2015, American Society for Engineering Education
14. Breault J.L., Goodall C.R., Fos P.J., “Data mining a diabetic data warehouse”, Artificial Intelligence in
Medicine, Volume 26, Issues 1–2, September–October 2002, Pages 37-54.
15. Uçar T., Karahoca A., “Predicting existence of Mycobacterium tuberculosis on patients using data mining
approaches”, Procedia Computer Science, Volume 3, 2011, Pages 1404-1411
16. GUa Q., Zhong R., Ju D. “Development of materials database system for cae system of heat treatment
based on data mining technology”, Transactions of Nonferrous Metals Society of China, Volume 16,
Supplement 2, June 2006, Pages s572–s576
17. Yang Y., Lin T., Weng X. L., Darr J.A., Wang X.Z., “Data flow modeling, data mining and QSAR in highthroughput discovery of functional nanomaterials”, Computers & Chemical Engineering, Volume 35, Issue
4, 7 April 2011, Pages 671-678
18. Çiflikli C., Kahya-Özyirmidokuz E., ”Implementing a data mining solution for enhancing carpet
manufacturing productivity”, Original Research Article”, Knowledge-Based Systems, Volume 23, Issue 8,
December 2010, Pages 783-788
19. Kusiak A., Zheng H., Song Z., “Power optimization of wind turbines with data mining and evolutionary
computation”, Renewable Energy, Volume 35, Issue 3, March 2010, Pages 695-702
20. Martins F. F., Begonha A., Braga M.A.S., ”Prediction of the mechanical behavior of the Oporto granite
using Data Mining techniques”, Expert Systems with Applications, Volume 39, Issue 10, August 2012
21. Küçüksille E. U., Selbaş R., Şencan A., “Data mining techniques for thermophysical properties of
refrigerants”, Energy Conversion and Management, Volume 50, Issue 2, February 2009
22. Köksal G., Batmaz I., Testik M. C., “A review of data mining applications for quality improvement in
manufacturing industry”, Expert Systems with Applications, Volume 38, Issue 10, 15 September 2011,
Pages 13448-13467
23. Xiao F., Fan C., “Data mining in building automation system for improving building operational “, Energy
and Buildings, Volume 75, June 2014
24. Kim H., Stumpf A., Kim W., “Analysis of an energy efficient building design through data mining
approachOriginal Research Article”, Automation in Construction, Volume 20, Issue 1, January 2011
25. Dixon M., Gallop J., Lambert S., Healy J., “Experience with data mining for the anaerobic wastewater
treatment process”, Environmental Modelling & Software, Volume 22, Issue 3, March 2007, Pages 315322
Proceedings of the 2015 ASEE Gulf-Southwest Annual Conference
Organized by The University of Texas at San Antonio
Copyright © 2015, American Society for Engineering Education