Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
College of Computer Engineering & Science Fall Semester 2012/2013 Artificial Inelegance COSC 4362 Home Work #4 Data Mining Sarah Al-Bassam 200800790 Background Data mining is a computer science field that attempts to discover patterns in large data sets. It uses methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The ultimate goal of the data mining process is to get information from a data set and convert it into an understandable and clear structure for further use. Some alternative names of data mining are: knowledge discovery in database (KDD), knowledge extraction, pattern/data analysis, data archeology, data dredging, information harvesting, and business intelligence. [3]. The main task of data mining is to analyze large quantities of data automatically or semi-automatically to extract the unknown interesting patterns, such as, groups of data records (cluster analysis), unusual records (anomaly detection) and dependencies (association rule mining). This analysis includes using database techniques, for example, spatial indexes. At the end, these patterns being extracted are seen as a summary of the input data, and can be used in additional analysis, such as, machine learning and predictive analytics [2]. Data mining includes six classes of tasks. The first class of task is named anomaly detection, outlier, change, or deviation detection. This class of task identifies the uncommon data records or data errors and requires further investigation. The second class of task is called association rule learning or dependency modeling. It searches and looks for relationships between variables. To illustrate, a supermarket is able to collect data on customer purchasing habits. By considering the association rule learning, the supermarket can determine the products which are regularly bought together and use this information for marketing purposes. This is known as market basket analysis. The third class of task is known as clustering. This task discovers groups and structures in the data that are related or similar, without using known structures in the data. Another class of task is classification. This task generalizes the known structure to apply to new data. For example, an e-mail program attempts to categorize an e-mail as "junk" or as "legitimate ". Additionally, Regression is a class of task that tends to find a function that can model the data with the least error. Finally, the sixth class of task is summarization. This task provides a compact representation of the data set, including visualization and report generation [3]. Applications Data mining applications are defined as computer software programs that extract and identify of patterns from stored data. These applications are usually a software interface that interacts with a large database that has important data. Data mining is broadly used by companies and public bodies, for example, marketing, detection of fraudulent activity, and scientific research. [4] There are many data mining applications for business uses, for example Customer Relationship Management (CRM). Such an application allows marketing managers to know the behaviors of their customers so that they can predict the possible behavior of future clients. To exemplify, a company might decide to increase prices, and by using data mining, the company can predict the number of customers that the company will loss for a specific percentage increase in product price. [4] Another application of data mining is text mining. It is defined as the process of having high quality information from text; for example, security application tools, such as, the study of text encryption and decryption, analyzing online text such as news and blogs for notational security purposes. Another example of text mining are biomedical applications, such as PubGene which is a combination of biomedical data mining and network virtualization as an Internet service. [3] Web mining is another application of data mining that discovers patterns from the web. It has three types which are web usage mining, web content mining, and web structure mining. Web usage mining means extracting valuable information from server logs, such as, user's history. It enables knowing what users are looking for on Internet. Web structure mining is the technique of using graph theory in the process of analyzing the node and connection structure of a web site. Web content mining is the process of extracting and integrating useful data and knowledge from Web page contents. [3] Advantages of Data Mining Marketing / Retail Data mining supports marketing companies in building models based on historical data and that helps in predicting who will respond to new marketing campaign. Accordingly, marketers will have a clear vision about the profitable products which target customers with high satisfaction. [1] Finance / Banking Data mining provides financial institutions information regarding loan information and credit reporting. Depending upon previous customer’s data with common characteristics to build a model, the bank and financial can approximate the god and bad loans and its risk level. Further, data mining helps banks to identify fraudulent credit card transaction which helps credit card’s owner to avoid losses. [1] Governments Data mining supports government agency by examining and studying records of financial transaction and building patterns that can identify money laundering or criminal activity. [1] Disadvantages of Data Mining Privacy Issues The personal privacy is a major concern for many individuals mainly when internet has many social networks, e-commerce, forums, and blogs. Many people are concerned that their personal information will be collected and used in unethical way. Also, businesses gather information about their customers in more than one way in order to understand their purchasing behaviors trends. However, some days, the businesses may be acquired by others, bur the personal information they own might be sold to other or leaks out. [1] Security issues Businesses normally have information about their employee and customers, such as, social security number, birthday, and payroll. However this information is not always taken properly. There have been a large number of cases were hackers accessed and stole important data of customers from big corporation, for example, Ford Motor Credit Company, and Sony. [1] Misuse of information/inaccurate information Information collected through data mining can be misused. This information is misused by unethical people or business to take advantage of vulnerable people or discriminate against a group of people. Moreover, data mining technique is not always accurate, and when inaccurate information is used for decision making it will cause serious consequence. [1] References [1] Advantages and Disadvantages of Data Mining. [Online]. Available: http://www.zentut.com/data-mining/advantages-and-disadvantages-of-data-mining/ [Accessed: December. 18, 2012]. [2] Data Mining [Online]. Available: http://www.slideshare.net/Tommy96/data- miningppt-4035580#btnNext [Accessed: December. 17, 2012]. [3] The Wikipedia website [Online]. Available: http://en.wikipedia.org. [Accessed: December. 15, 2012]. [4] What are Data Mining Applications? [Online]. Available: http://www.wisegeek.com/what-are-data-mining-applications.htm#did-you-know [Accessed: December. 14, 2012].