Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ISSN 2454-3349 Int. Journal of Philosophies in Computer Science Vol. 1, No. 1, (2015), pp. 21-30 AN APPLICATION OF PREPROCESSING AND CLUSTERING IN WEB LOG MINING S. Uma Maheswari1 and S. K. Srivatsa2 1 SCSVMV University, Kanchipuram, Tamilnadu, India E-mail: [email protected] 2 St. Joseph College of Engineering, Chennai, Tamilnadu, India E-mail: [email protected] ABSTRACT Identifying the user behavior is one of the aspect of web usage mining and hence web logs play an major role to identify the user behavior. There are many pattern mining methods to identify user behavior. The accuracy & quality of pattern mining algorithms are improved with the help of preprocessing techniques. Various activities like identifying the number of unique users, reducing the size of log file, identifying the sessions are done with the help of preprocessing techniques in the existing algorithms. The newly proposed algorithm is Enhanced User Behavior (EUB) which identifies user behavior and groups the similar kind of users using clustering techniques. This paper brings into discussion about the basic concepts of web log preprocessing, various existing clustering preprocessing techniques and the proposed EUB algorithm. KEYWORDS Preprocessing techniques, web mining, web logs, clustering. 1. INTRODUCTION Web seems to be too huge for effective data warehousing and data mining. World Wide Web serves as a huge, widely distributed, global information service center for news, advertisements, consumer information, financial management, education, e-commerce etc. Information is arranged in proper hierarchy in the form of websites. The web also contains a rich and dynamic collection of hyperlink information. Collection of web pages named websites are accessed via hyperlinks. Nowadays internet plays a vital role for providing information to all kinds of users to obtain their needs. Day - by- day the usage and accessibility of internet is increasing tremendously. Websites are highly helpful for providing any kind of information to any kind of users at any time. A web server usually registers a weblog entry for every access of webpage. Whenever the user interacts with the web site, the interaction details are automatically recorded in web server in the form of web logs [2]. The web seems to be two huge for effective data warehousing and mining. Also the complexity of web pages is far greater than that of any old text documents. Only a small portion of the information on the web is truly relevant [13]. It is possible to get lots of data on user access patterns and also possible to mine interesting nuggets of information. It is one of the main applications of data mining, artificial intelligence and so on to the web data and forecast the user's visiting behaviors and obtains their interests by investigating the samples [4].Web usage mining involves the analysis and discovery of user access patterns from Web servers logs in order to better serve the user's needs. In web usage mining or web log mining, user’s behavior or interests are revealed by applying data mining techniques on web log file. The ability to know the patterns of user’s habits and interests helps the operational strategies of enterprises. It is highly important for the website analyst to know and understand about the user level Maheswari and Srivatsa of interest and behavior for variety of reasons. In web usage mining web logs plays an important role to know about user behavior. In paper [3] the algorithms proposed have done the preprocessing activities for reducing the size of the log file and to identify the number of unique users and sessions. According to paper [1] intelligent system web usage preprocessor categorize human and search engine accesses before applying the preprocessing techniques. Web usage mining is the third category in web mining. This type of web mining allows for the collection of web access information for web pages. The usage data that is gathered provides the companies with the ability to produce results more effective to their businesses and increasing of sales. Usage data can also be useful for developing marketing skills that will out-sell the competitors and promote the company’s services or product on a higher level. Usage mining is valuable not only to businesses using online marketing, but also to e-businesses whose business is based solely on the traffic provided through search engines. The use of this type of web mining helps to gather the important information from customers visiting the site. This enables an in-depth log to complete analysis of a company’s productivity flow. E-businesses depend on this information to direct the company to the most effective web server for promotion of their product or service. Web usage mining is one of the categories of data mining technique that identifies usage patterns of the web data [5]. The projection of these paths helps to log the user registration information giving commonly used paths the forefront to its access. Therefore, it is easily determined that usage mining has valuable uses to the marketing of businesses and a direct impact to the success of their promotional strategies and internet traffic. This information is gathered on a daily basis and continues to be analyzed consistently. Analysis of this pertinent information will help companies to develop promotions that are more effective, internet accessibility, inter-company communication and structure, and productive marketing skills through web usage mining. The intelligent system web usage preprocessor splits the human and search engine accesses before using the preprocessing techniques. This can be extended by using some other learning algorithms also [1]. It can be utilized to user profiling and similar image retrieval by tracing the visitor’s on line behaviors for effective web usage mining [11]. Many preprocessing techniques can be effectively applied in web log mining [7]. The preprocessing of web log data for finding frequent patterns using weighted association rule mining technique can be done to other industrial and social organizations too [6]. In recent days, the web usage mining has great potential and frequently employed for the tasks like web personalization, web pages prefetching and website reorganization etc. [12]. So, it is required to know the users behavior when interaction is made with the web. Yang and Padmanabhan focused on grouping the customer transactions by using the clustering technique [14]. The set of transactions in a group has some similarities, so we can easily identified the customer behaviour and the web site analyst can able to understand the customer expectation and make the website customer friendly. Mahdaviand and Abolhassani dealt with two types of groups such as web clustering groups which groups the relative pages from the web server log files, and the user clustering groups which groups the user who refers the same type of web pages [15]. Guha, Rastogi and Shim mainly focused on the data preprocessing step to remove the unnecessary data such as images, extra click events [16]. Das and Vyas [17] have presented a model for web personalization approach using web mining. The server side and browser side details are taken for consideration. Suneetha and Krishnamoorti [18] have developed an intelligent recommendation system to determine pages that are most likely to be visited by the user in future. This assists site owners in optimization, improving user satisfaction etc. 2. ON LINE BEHAVIOUR AND TRACING OF VISITORS IN WEB MINING It must to trace the visitor’s on-line behaviors for website usage analysis. Actually it is an analysis to get knowledge about how visitors use website which could provide guidelines to website reorganization and helps to prevent disorientation. It also helps designers in placing the important information appropriately so that a visitor will look for it. It has to be done for pre-fetching and caching web pages. Also it must provide adaptive website (personalization). This is represented in the following Figure 1. 22 Preprocessing and Clustering in Web Log Mining Website Reorganizatio Prefetching web pages Figure 1 Personalization Website usage analysis Many organizations have been supported by the analysis of user’s browsing patterns for the purpose of giving personalized recommendations of web pages. Generally the usage based personalized recommendation gives solutions to many problems occurred in the web [13, 14, 15]. It has created an interest between the researchers to do research. The recommendation systems lessen information overload by suggesting pages that fulfills the user’s requirement. In recent days, the web usage mining has great potential and frequently employed for the tasks like web personalization, web pages prefetching and website reorganization etc. [16]. Data used in web usage mining are obtained in three levels such as at server level, client level and proxy level [12]. In server level, the server keeps the client request details whereas at client level the client itself forwards data about user’s behavior to a database. At proxy level, the proxy side maintains user behavior information of the users whose web clients pass through the proxy even though the web data is taken from many users on various web sites. Generally the web pages, intra page structures, inter page structures and usage data are the input used in web usage mining. Other forms of web data reside as profiles, registration information and cookies. Web usage data is collective data about how a user utilizes a web site through his mouse and keyboard. This data can also be available in form of web server logs, referral logs, registration files and index server logs and cookies. The aim of web log file is to create user profile by allowing their browsing similarities with previous users. Before the data mining process, it is required to clean, condense and transform the raw data of web log before performing data mining. Web log data can be integrated with web content and web linkage structure mining to help web page ranking and web document classification. The interaction details of user with website are recorded automatically in web servers as the form of web logs [2]. Web logs are kept as in form of line of text in web server, proxy server and browser [8]. Other forms of logs are server access logs, server referrer logs, agent logs, client-side cookies, user profiles, search engine logs and database logs. These are considered as input for knowing the end user behaviour in web usage mining. Log files are those files that list the actions that have been occurred [19]. It holds many parameters which have employed in recognizing user browsing patterns. Some of the parameters are user name, visiting path, path traversed time stamp, success rate, user agent, URL, request type, and pages last visited [18]. The information on user’s request from their web browsers is stored in transfer / access log as shown in Table 1. The recorded two fields of referrer log are URL and referrer URL as shown in Table 2. Table 1 Time Date Host name File requested Transfer or access log Amount of data Transferred Table 2 URL Status of report Referrer log Referrer URL The list of errors and requests which have failed are collected in error log. Not only for the page which holds links to a file that does not exist, but also for the user who is not permitted to access a particular page, the user request may fail. It is depicted in the following Figure 2. 23 Maheswari and Srivatsa Request SERVER CLIENT Reply Figure 2 3. Error log BASICS OF P REPROCESSING AND C LUSTERING TECHNIQUES The data preprocessing is done for the purpose of descriptive data summarization Data cleaning, data integration and transformation, data reduction, discretization and concept hierarchy generation. Data in the real world is dirty like incomplete, noisy, and inconsistent. Now-a-days there is no quality data and no quality mining results are available. So the quality decisions must be based on quality data. For example, duplicate or missing data may cause incorrect or even misleading statistics. Also the data warehouse needs consistent integration of quality data. Moreover the data extraction, cleaning, and transformation comprise the majority of the work of building a data warehouse. That is why it is important to preprocess the data. The inputs given to web usage mining are list of agents visiting the websites, duration of session, order in which the agents view the web pages. It is difficult to use the web logs directly for pattern mining algorithms to extract the features. So the preprocessing techniques are necessary to make them consistent and complete. 3.1 Basic Structure of Web Log Mining Preprocessing a web log mining model aims to reformat the original web logs to identify user’s access sessions. Access log is the input to preprocessing block. A web log is a file to which the web server writes information each time a user requests a resource from that particular site. The basic structure of web log mining is depicted in the following Figure 3. Access Log Data Cleaning User Identification Session Identification Path Completion Session File Figure 3 Basic structure of web log mining Web logs are kept as in form of line of text [8]. They are such as web server, proxy server, and browser. It holds all the log files and gives more complete and accurate user’s interaction information along with the website. The World Wide Web consortium retains a standard format to use for the web server log files. It stores client IP address, request date, request time, page requested, hyper text transfer protocol code, bytes transferred, user agent and referrer. The proxy server behaves like a mediator between the browser and the web server. It gets hyper text transfer protocol request from 24 Preprocessing and Clustering in Web Log Mining user and it passes them to the web server. Web logs for the particular user are maintained in the browser machine. Browsers are programmed and scripting languages are employed in it to collect the client side data. In general, three types of web log formats [8] are available. The formats are namely World Wide Web consortium extended log file format. It is a default log file format on the Internet information server. Here the fields are shown along with the space. The time is represented as the Greenwich Mean Time. This format can be altered by the administrators to add or remove fields based on the information required. National center for supercomputing application (NCSA) common log file format stores user name, date, time, request type, hyper text transfer protocol status code, and the number of bytes. NCSA has a fixed format and it is not changed to remove or add by administrators. Microsoft internet information server (IIS) log file are also employed in American Standard Code for Information Interchange format and it is not customized. Here fields are represented by using comma. 3.2 Preprocessing Steps Preprocessing [9] is an important activity in web usage mining and treated as a key to success. These techniques eliminate the unwanted information from the web logs and facilitate the effective pattern mining. The steps involved in data preprocessing consists of data cleaning, user and session identification, and path completion. These are explained below. 3.2.1 Data Cleaning Data collection [6] is the initial step in web log preprocessing. Irrelevant records are eliminated during data cleansing. Data cleaning [10] is the process of removing noisy and irrelevant data that are not helpful for mining the knowledge from the web logs. 3.2.2 User and Session Identification The task of user and session identification is to find out the different user sessions from the original web access log. Session identification [7] is the process of dividing the individual user access logs into sessions. In general, a referrer based method is used for identifying sessions. 3.2.3 Path Completion Path completion should be used acquiring the complete user access path. The incomplete access path of every user session is recognized based on user session identification. If in a start of user session, referrer as well URL has data value, delete value of referrer by adding a dash ‘-’. Web log preprocessing helps in removal of unwanted click streams from the log file and also reduces the size of original file by 40 – 50%. 3.3 Clustering Techniques Clustering of data in a large dimension space is of great interest in many data mining applications. It is a technique to group together a set of items having similar characteristics. In addition, it can be performed on either the users or the page views. Clustering analysis in web usage mining intends to find the cluster of user, page, or sessions from web log file, where each cluster represents a group of objects with common interesting or characteristic. User clustering is designed to find user groups that have common interests based on their behaviors, and it is critical for user community construction. Page clustering is the process of clustering pages according to the user’s access over them. Such knowledge is especially useful for inferring user demographics in order to perform market segmentation in e-Commerce applications or provide personalized web content to the users. On the other hand, clustering of pages will discover groups of pages having related content. This information is useful for the Internet search engines and Web assistance providers. In both applications, permanent or dynamic HTML pages can be created that suggest related hyperlinks to the user according to the user’s query or past history of information needs. The intuition is that if the probability of visiting page, given page has also been visited, is high, then maybe they can be grouped into one cluster. Several clustering algorithms can be applied for clustering in large multimedia databases. The effectiveness and efficiency of the existing algorithms, however, are somewhat limited. It is because, 25 Maheswari and Srivatsa clustering in multimedia databases requires clustering of high dimensional feature vectors and it often contains large amounts of noise. In this paper, we therefore introduce a new kernel density estimation based algorithm for clustering in large multimedia databases called DENCLUE (Density based clustering). Clusters can then be identified by determining density attractors and clusters of arbitrary shape can be easily described by a simple equation of the overall density function. The advantages of our kernel density estimation based DENCLUE approach are: (1) it has a firm mathematical basis; (2) it has good clustering properties in data sets with large amounts of noise; (3) it allows a compact mathematical description of arbitrarily shaped clusters in high-dimensional data sets; and (4) it is significantly faster than existing algorithms. A comparison with k-Means, DBSCAN, and BIRCH shows the superiority of our proposed approach. 4. EXISTING PREPROCESSING ALGORITHM Now-a-days the usage of internet services has been increased tremendously. Internet surfing or website surfing is highly necessary in order to make surfing practice to develop relationship between the users. Hence this activity triggers the users to do analysis by gathering information and capture the user level of interest from the web log files. The measures used to retrieve the user behavior are the web logs. Mining the user behavior from the web log is called web log mining. It mines the user’s level of interest by their frequent visit to the websites. The web logs are updated every time whenever the user visits a particular web site. In user interest level preprocessing (UILP) algorithm, website and webpage navigation behavior are considered as the basic source for identifying the interest of the user. In UILP, data cleaning method is used to remove the noisy and irrelevant information from the web log. This is one of the features in identifying the user level of interest. The second feature used is based on site topology and cookies. Frequency value, session identification, path completion are also identified using this UILP algorithm [11].User’s interest level is identified mainly based on their website and webpage navigation behavior. The proposed UILP algorithm considered the following four features to identify the user interest level. These are listed below: 5. o During data cleaning process, explicit image and multimedia requests from users are considered; those requests are not removed from web logs. o Users are identified based on site topology and cookies. o Session time is calculated based on the time spent on each website by a particular user. o Frequency value is calculated based on the number of web pages visited by the user on particular website. PROPOSED METHODOLOGY In the existing system user interest level is identified using data cleaning technique. From the detailed analysis of existing preprocessing techniques it is well understood that not only identifying the user level of interest is the major criteria to be focused, but also grouping the similar kind of users could be done using any one of the clustering technique. This paper highly concentrates on how similar kind of user behavior can be identified and grouped using clustering technology. Clustering is the process of grouping the data into classes or clusters, so that objects in the same cluster have higher similarity in comparison to one another but are very dissimilar to the objects in other clusters. Dissimilarities are assessed based on attribute values describing the objects. There are various methods like partitioning methods, hierarchical methods, density based methods, grid based methods, for high dimensional data. The proposed system makes use of partition methods to group the similar kind of users. The most and well known commonly used partition methods are k-means to identify and group the similar kind of users. Frequent users are grouped based on clustering techniques. Web logs are processed using enhanced user behavior algorithm based on how frequent the user visits the websites. Web logs contain the following fields which serve as the input for reducing the density of log contents in a log file. A sample web log table is presented in the Table 3, where IP_addr is the IP 26 Preprocessing and Clustering in Web Log Mining address; End_usr is the name of the user; URL is the website address; Session is the duration taken by the user over session; and Frequency is the umber of times the user visits the website. Table 3 IP_addr 5.1 End_usr Fields in a web log URL Session Frequency Procedure for K-means Algorithm The following procedure is generally used in K-means algorithm. 1. Arbitrarily choose k objects from the web log. 2. Find the number of users frequently visiting the websites by calculating mean value. 3. Reassign each object to the cluster to which the object is most similar based on mean value. 4. Update the cluster means often. 5. Similarly group dissimilar the objects in another cluster. 5.2 Enhanced user Behaviour Procedure In the proposed enhanced user behaviour (EUB) procedure, clustering plays a key role to classify web visitors on the basis of user click history and similarity measure. This algorithm considers of four entities namely IP address, user name, website name, and frequency of accessed sites. Cookies based web logs are taken as the input which mainly classify the unique users and helps to create user clusters. Here, the website and webpage navigation behavior are considered as the basic source for tracing the visitor’s online behaviour and also to identify the interest of the user in accessing the various web sites. Hence the EUB algorithm effectively traces the behavior of online users which supports the website usage analysis. The following procedure is used in the EUB procedure. 1. Choose the input data from the web log. 2. Using the frequency field as the basic constraint, calculate the mean value. 3. Depending on the mean value clusters are formed. 4. Reassign each object to the cluster to which the object is most similar based on the mean value. 5. Repeat the steps until objects are grouped appropriately. 6. Similarly group dissimilar objects in another cluster. 6. EXPERIMENTAL SETUP AND C OMPARATIVE R ESULTS The web log files are gathered from the college web server in the period of 18th October 2012 to 21st August 2013. The total number of records were collected are 27962 in count. The preprocessing algorithm is implemented by using jdk 1.6. It helps and supports the execution. EUB algorithm is evaluated along with the current methods in terms of data cleaning, user identification, and session identification. 6.1 Statement of Evaluation Figure 4 shows evaluation data for the data cleaning. The UILP algorithm takes consideration of explicit user request for image and multimedia files. It eliminates the 5084 image files and 3813 multimedia files. The ISWUP is an intelligent learning algorithm, which removes 5950 image files and 4784 video files. Also Sudheer Reddy algorithm removes 6330 image files and 5140 video files. But the EUB algorithm removes 5010 image files and 3421 video files. Removal of auto search 27 Maheswari and Srivatsa engine request and failure status code is more or less same for all the cases. The EUB algorithm deals both explicit and implicit user requests. Figure 4 Data cleaning overall comparison EUB algorithm proves its efficiency and effectiveness for finding the number of users and unique users by using clustering concept. Clustering is the process of organizing objects into groups whose members are similar in some way. A cluster is therefore a collection of objects which are similar between them and are dissimilar to the objects belonging to other clusters. The proposed algorithm identifies 1310 users and 615 unique users by implementing the clustering concept. But the UILP algorithm identifies 1271 users and 550 unique users. The ISWUP identifies 1190 users and 457 unique users. The other existing algorithms identify 1190 users and 428 unique users. This is depicted in the following Figure 5. Figure 5 User identification performance comparisons From Figure 6, it is known that the number of sessions identified by existing and proposed algorithms. The two measures used are Session identification and frequency calculation. Session is calculated based on the movement from one website to another website and frequency value is calculated with respect to the number of pages visited by the user in the particular website. The session and frequency is considered as an important factor for clustering the users based on their interesting measure. EUB Algorithm finds 65200 sessions and the UILP identifies 61783. Also the number of sessions identified by ISWUP and Sudheer are 4760 and 3570 respectively. Hence, by overall the proposed EUB algorithm significantly maximizes the performance of identifying unique users and number of sessions. 28 Preprocessing and Clustering in Web Log Mining 70000 60000 50000 40000 30000 20000 10000 0 EUB Figure 6 7. UILP ISWUP K.Sudheer Session identification performance comparison CONCLUSIONS AND FUTURE ENHANCEMENT Enhanced user behaviour algorithm is formulated to do data cleaning, user identification, session identification, path completion and also tells the unique users by using the cluster concept. There are many techniques and suggestions are discussed for pre processing the data taken from the web log files. The proposed EUB algorithm effectively preprocesses the web log data and identifies user behavior, and groups the similar kind of users using clustering techniques. It must to trace the visitor’s online behaviors for website usage analysis. A number of further tasks could be added by demonstrating the utility of web mining. It can be done by making exploratory changes to web sites. REFERENCES [1] V. V. R. Maheswara Rao, and V. Valli Kumari, “An Enhanced Pre-Processing Research Framework for Web Log Data using a Learning Algorithm”, Proceedings of the Second International Conference on Networks & Communications (Netcom 2010), Bangalore, India, pp. 1–15, (2011). [2] Sanjay Bapu Thakare, and Sangram Z. Gawali, “A Effective and Complete Preprocessing for Web Usage Mining”, International Journal on Computer Science and Engineering, Vol. 2(3), pp. 848-851, (2010). [3] T. Hussain, S. Asghar, and S. Fong, “A Hierarchical Cluster Based Preprocessing Methodology for Web Usage Mining”, Proceedings of the Sixth International Conference on Advanced Information Management and Service, IEEE Xplore, pp. 472-477, (2010). [4] P. Nithya, and P. Sumathi, “An Enhanced Pre-Processing Technique for Web Log Mining by Removing Web Robots”, Proceedings of the IEEE International Conference on Computational Intelligence and Computing Research, IEEE Xplore, pp. 1-4, (2012). [5] K. Sudheer Reddy, M. Kantha Reddy, and V. Sitaramalu, “An effective Data Preprocessing method for Web Usage Mining”, Proceedings of the International Conference on Information Communications and Embedded Systems, IEEE Xplore, pp. 7-10, (2013). [6] M. Malarvizhi, S. A. Sahaaya, and Arul Mary, “Preprocessing of Educational Institution Web Log Data for Finding Frequent Patterns using Weighted Association Rule Mining Technique”, European Journal of Scientific Research, Vol. 74(4), pp. 617-633, (2012). [7] Sheetal A. Raiyani, and Shailendra Jain, “Efficient Preprocessing Technique using Web Log Mining”, International Journal of Advancements in Research & Technology, Vol. 1(6), pp. 1-5, (2012). [8] J. Srivatsava, R. Cooley, M. Deshpande, and P. N. Tan, “Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data”, SIGKDD Explorations, Vol. 1(2), pp. 12-23, (2000). 29 Maheswari and Srivatsa [9] V. Chitraa, and Antony Selvadoss Devamani, “A Novel Technique for Sessions Identification in Web Usage Mining Preprocessing”, International Journal of Computer Applications, Vol. 34(9), pp. 23-27, (2011) [10] Vijayashri Losarwar, and Madhuri Joshi, “Data Preprocessing in Web Usage Mining”, Proceedings of the International Conference on Artificial Intelligence and Embedded Systems (ICAIES 2012), Singapore, (2012). [11] R. Suguna, and D. Sharmila, “User Interest Level based Preprocessing Algorithms using Web usage Mining”, International Journal on Computer Science and Engineering, Vol. 5(9), pp. 815-822, (2013). [12] C. P. Sumathi, R. Padmaja Valli, and T. Santhanam, “Automatic Recommendation of Web Pages in Web Usage Mining”, International Journal on Computer Science and Engineering, Vol. 2(9), pp. 30463052, (2010). [13] Jiawai Han, and Micheline Kamber, “Data Mining - Concepts and Techniques”, Second Edition, Elsevier, USA (2010). [14] Yinghui Yang, and B. Padmanabhan, “GHIC: A Hierarchical Pattern-Based Clustering Algorithm for Grouping Web Transactions”, IEEE Transactions on Knowledge and Data Engineering, Vol. 17(9), pp. 1300-1304, (2005). [15] M. Mahdavi, and H. Abolhassani, “Harmony K - means Algorithm for Document Clustering”, Data Mining and Knowledge Discovery, Vol. 18(3), pp. 370-391, (2009). [16] Sudipto Guha, Rajeev Rastogi, and Kyuseok Shim, “CURE: An Efficient Clustering Algorithm for Large Database”, ACM SIGMOD, Vol. 27(2), pp. 73-84, (1998). [17] K. Sharma, G. Shrivastava, and V. Kumar, “Web Mining: Today and Tomorrow”, Proceedings of the Third International Conference on Electronics Computer Technology, IEEE Xplore, Vol. 1, pp. 399 – 403, (2011). [18] K. R. Suneetha, and R. Krishnamoorti, “IRS: Intelligent Recommendation System for Web Personalization”, European Journal of Scientific Research, Vol. 65(2), pp. 175-186, (2011). Authors Biography S. Umamaheswari is working as Associate Professor and Head of the department of IT in C. Abdul Hakeem College of engineering & Technology. She did B.E. (CSE) in V. R. S. College of Engineering & Technology and M. Tech. (IT) in Sathyabama University. She is a distinction holder in both UG and PG. She is doing research in data mining at SCSVMV University, Kancheepuram. She is having 11 years of experience in teaching. She has published 8 papers in International Journals and conferences. Her area of interest is data mining, artificial intelligence and neural networks and guided more number of UG and PG projects. S. K. Srivatsa is retired Senior Professor in Anna University who currently working as Senior Professor in Prathyusha Institute of Technology and Management, Chennai. He received his bachelor of Electronics and Telecommunication (Honor) from Jadavpur University and master degree in Electronics & Communication Engineering from Indian Institute of science and Ph.D. also from Indian Institute of Science. He has produced 25 Ph. D’s. He is author of 440 publications in reputed journal and conference proceedings. He is a life member and fellow in twenty four registered professional societies. 30