Download Clustering of Engineering Materials Data Sets Using

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: [email protected], [email protected] Volume 1, Issue 3, September – October 2012 ISSN 2278-6856 Clustering of Engineering Materials Data Sets Using Fuzzy System Sarakutty.T.K1, Dr.M.Hanumanthappa2 1 Department of Computer Science & Applications, Dayananda Sagar College, Bangalore, India 2 Department of Computer Science & Applications, Bangalore University, Bangalore, India Abstract: Data mining enables efficient knowledge extraction from large datasets, in order to discover hidden or non-obvious patterns in data. Clustering of engineering material data sets deals with the systematic categorization of materials based on distinguished characteristics as well as criteria. Material informatics deals with real world material data sets with high dimensionality and complex structure. Fuzzy approaches can play an important role in data mining, because they can deal with complex high dimension data and is capable of producing comprehensible results. Fuzzy clustering method is used to cluster the materials data set based on their similarities and performance. The knowledge extracted from the engineering material data sets is proposed for effective decision making in advanced engineering materials design applications. Keywords: Data mining, Clustering, Fuzzy C-Means Material Informatics, 1. INTRODUCTION Materials play an important role in the construction and manufacturing of equipment/tools, transportation, housing, clothing, communication, recreation and food production. Historically, the development and advancement of societies have been intimately tied to the member’s ability to produce and manipulate materials to fill their needs. During the last decades many new materials and material types have been developed. At present of the order of 100000 engineering materials exist. In addition many materials have successively obtained improved properties. This has been possible not only due to the development of the materials but also due to the appearance of new production methods. As a consequence of this rapid development many material types can be used for a given component. Computational tools assist in making decisions by analyzing the data, and discovering useful patterns for predicting future trends. In the Materials Science domain it is imperative to connect materials suppliers, automobile companies, heat treatment industries, universities, researchers, aerospace agencies, manufacturing companies and other users [1]. Exchange of knowledge among these users enables them to make faster and more effective decisions. For example, prior Volume 1, Issue 3 September-October 2012 knowledge of the fact that distortion is likely to occur in a part when it is heat treated under certain conditions is useful, in selecting parameters so as to minimize distortion in an industrial heat treatment process. This in turn helps to optimize processes and make better products hence improving business by satisfying customers. Thus on the whole, E-business is promoted by facilitating worldwide exchange of knowledge useful in the domain for supporting various aspects of decision support [2]. Coupling of computational material science and informatics is essential in order to  Accelerate insertion of materials into engineering systems  Establishment of new structure, property correlations among large, heterogeneous and distributed data sets  Discovery of new chemistries and compounds  Formulation and / or refinement of new theories for materials behavior  Rapid identification of critical data and theoretical needs for future problems The research areas of materials informatics are mainly focused on following tasks - Data standards, Organization and management of material data and data mining on materials data [3],[4]. Materials informatics is very likely to become a major force because of enormous improvements in efficiency and capabilities in computational methods for materials and the recent progress in data mining techniques. The research is aimed to establish if data mining techniques can be used to assist in the clustering of materials by finding the meaningful patterns that exist across various materials. The materials are clustered based on their properties. The resulting clusters, and the classifications that can be developed from them, depend on the selected attributes and to some extent on the method of clustering. Grouping materials allows a designer to assess the similarity of two materials, stimulating innovation and suggesting substitutions. The knowledge extracted from this is proposed for effective decision making in advanced engineering materials design applications. Page 18 International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: [email protected], [email protected] Volume 1, Issue 3, September – October 2012 ISSN 2278-6856 2. LITERATURE SURVEY Materials informatics has been a subject of materials science, since the international conference of Materials Informatics [5]. It is a new subject that leverages information technology and computer network technology to represent, parse, store, manage and analyze the material data, in order to realize the sharing and knowledge mining of materials data for uncovering the essence of materials, and accelerate the new material discovery and design[5]. 2.1 Materials informatics Data quality plays a central role for compiling valid and reliable plans to make the right decisions. At the same time, it is acknowledged that planning processes are both data and knowledge intensive and characterized by the human-computer interface. Informatics is a science where a new knowledge system is built up by collecting and classifying information using computers and networks. It is the integration of computer science, information science, and some domain area to provide new understandings and to facilitate knowledge discovery [6]. Materials informatics can be thought of as a tool for material scientists to gain new understandings of their data through the use of a myriad of machine learning approaches, integrated with new visualization schemes, more human-like interactions with the data, and guided by domain experts. It can also accelerate the research process and minimize data handling. All of this is fuelled by the unprecedented growth in the field of information technology and is driving the interest in the application of knowledge representation, knowledge discovery, machine learning, information retrieval, semantic technology etc. [7]. The main issues to be addressed regarding the development of materials informatics are  Redefinition of database formats, aiming at improved data sharing  Database networking and the development of software for data sharing  Development of data analysis software and visualization software  Development of software for data mining from databases  Prediction of new functions by the combination of data mining and computation science.  Standardization of platforms that integrate all these factors [6]. 2.2 Previous Work A comparative study of different classification algorithms is present in [8] and Fuzzy C-Means algorithm performs well on unsupervised data with uncertainty. Cluster analysis [9] is used as an analytical tool to materials design to cluster materials and the processes that shape them, using their attributes as indicators of relationship. Volume 1, Issue 3 September-October 2012 Naïve Bayesian classification algorithm [5] is used to classify engineering materials data sets consisting of only categorical attribute values. Here we are using fuzzy system to classify engineering materials data sets consisting of both numerical and categorical attribute values. When we consider both numerical and categorical attribute values it is possible for us to have higher classification accuracy since many of the material properties are expressed numerically. This technique reduces complexity and helps expose hidden order and deeply buried patterns in data. 3. PROPOSED METHOD Clustering is an unsupervised learning method used to find a structure in a collection of unlabeled data. Clustering is the process of organizing objects into groups whose members are similar in some way. A cluster is therefore a collection of objects which are similar between them and are dissimilar to the objects belonging to other clusters. Clustering of data is a method by which large sets of data are grouped into clusters of smaller sets of similar data. Clustering is used to quickly and easily seed the process of taxonomy generation. It provides a way of understanding how attributes of high dimensional data are organized and related. Clustering and fuzzy logic together provide simple powerful techniques to model complex systems. Fuzzy clustering provides a robust and resilient method of classifying collections of data elements by allowing the same data point to reside in multiple clusters with different degrees of membership. Interpretations of membership degrees include similarity, preference, and uncertainty. In contrast to classical set theory, in which an object or a case either is a member of a given set defined by some property or not, fuzzy set theory makes it possible that an object or a case belongs to a set only to a certain degree. Using fuzzy clustering it is possible to state how similar an object or case is to a prototypical one, it can indicate preferences between suboptimal solutions to a problem, or it can model uncertainty about the true situation, if this situation is described in imprecise terms. In general, due to their closeness to human reasoning, solutions obtained using fuzzy approaches are easy to understand and to apply. Due to these strengths, fuzzy systems are the method of choice, if linguistic, vague, or imprecise information has to be modeled. There are many different clustering algorithms that could be used, and we have relied on the Fuzzy CMeans algorithm, because it is fast and straightforward. Fuzzy C-Means is a data clustering technique in which a dataset is grouped into n clusters with every data point in the dataset belonging to every cluster to a certain degree. For example, a certain data point that lies close to the center of a cluster will have a high degree of belonging or membership to that cluster and another data point that lies far away from the center of a cluster will have a low degree of belonging or membership to that cluster. In our Page 19 International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: [email protected], [email protected] Volume 1, Issue 3, September – October 2012 ISSN 2278-6856 study, Fuzzy C-Means algorithm is used to deal unsupervised data with uncertainty. The goal of Fuzzy CMeans algorithm is to group the objects into clusters based only on their observable features such that each cluster contains objects that share some important properties [8]. Fuzzy C-Means algorithm used in the proposed model for clustering engineering materials is given below. Algorithm: Fuzzy C-Means Input: Data - data set to be clustered; each row is a sample data point Cluster n - number of clusters (greater than one) Output: Center coordinates of final cluster centers Obj_fcn - values of the objective function during iterations Let X = {x1 , x2, x3 ..., xn} be the set of data points and V = {v1 , v2, v3 ..., vc} be the set of centers. 1) Randomly select ‘c’ cluster centers. 2) Calculate the fuzzy membership 'µij' using c  ij  1 / k 1 (d ij / d ik ) ( 2 / m1) 4. EXPERIMENTAL SETUP & RESULTS Materials database is organized from popular materials website [11] and from peer reviewed research papers published. The atomic and electronic structure of the material determines its properties. A typical set of training sample data set is shown in table 1 which contains the properties of metal with respect to steel like specific gravity, young’s modulus, thermal conductivity, linear expansion coefficient, melting point and electrical resistivity. The properties are assumed at 20Deg. C. Table 1: Material Properties (1) Compute the fuzzy centers 'vj' using n n V j  ( ( ij ) m xi ) /( (ij ) m ), j  1,2,........c i 1 (2) i 1 3) Repeat step 2) and 3) until the minimum 'J' value is achieved or ||U (k+1) - U (k) || < β. Where, k is the iteration step. β is the termination criterion between [0, 1]. U = (µij)n*c is the fuzzy membership matrix. J is the objective function, which is to minimize n c J (U , V )   ( ij ) m xi  v j 2 (3) i 1 j 1 where, ||xi – vj|| is the Euclidean distance between ith data and jth cluster center [10]. A block diagram summarizing FCM clustering algorithm is given in figure 1. Material property charts are two-dimensional plots using pairs of material properties as the variables. The idea of seeking clusters in two dimensions is to plot the two variables as if they were x, y coordinates. Material 1 appears as the point x=X11, y=Y11[9]. Figure 2 shows a cluster diagram using the values of two technical attributes, specific gravity and young’s modulus for metals selected with respect to steel. Figure 1: Block diagram summarizing FCM Clustering Algorithm Volume 1, Issue 3 September-October 2012 Page 20 International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: [email protected], [email protected] Volume 1, Issue 3, September – October 2012 ISSN 2278-6856 Figure 3: Results Here we have taken two properties specific gravity and young’s modulus and clustered the metals into two groups. Based on the clusters it is possible for us to select the metals which have similar values for the selected properties. The same analysis can be continued with different properties so that we get clusters based on those properties. [1] Begley E.F, “National Institute of Standards and Technology Report”, USA, Jan 2003. [2] Aparna S. Varde, Makiko Takahashi, Elke A. Rundensteiner, Matthew O. Ward, Mohammed Maniruzzaman and Richard D. Sisson, “Apriori Algorithm and Game-of-Life for Predictive Analysis in Materials Science” [3] Rajan, "Informatics and Integrated Computational Materials Engineering: Part II”, JOM, Vol. 61, pp. 47-47, 2009. [4] Wei,Q.Y., Peng,X. D., Liu, X.G., Xie,W.D .: ,(2006) "Materials informatics and study on its further development," CHINESE SCIENCE BULLETIN, Vol. 51, pp 498-504 [5] Doreswamy, Hemanth.K.S, “Hybrid Data Mining Technique for Knowledge Discovery from Engineering Materials Data Sets”, International Journal of Database Management Systems, Vol.3, No.1, February 2011. [6] Toyohiro Chikyow, “Trends in Materials Informatics in Research on inorganic materials”, quarterly review No 20, July 2006. [7] R. L. King, O. Abuomar, H. Rhee, A. Konstantinidis, N. Pavlidou and M. Petrou, “On materials informatics and pattern formation in materials”, ENOC 2011, 24-29, July 2011. [8] P. Bhargavi, Dr. S. Jyothi, “Soil Classification Using Data Mining Techniques: A Comparative Study”, International Journal of Engineering Trends and Technology- July to Aug Issue 2011 [9] K.W. Johnson, P.M. Langdon, M.F. Ashby, “Grouping materials and processes for the designer: an application of cluster analysis”, Elsevier Science Ltd, 2002 [10] Mohanad Alata, Mohammad Molhim, and Abdullah Ramini, “Optimizing of Fuzzy C-Means Clustering Algorithm Using GA”, World Academy of Science, Engineering and Technology, 2008. [11] http://www.engineersedge.com/properties_of_m etals.htm 5. CONCLUSION & FUTURE WORKS AUTHORS Figure 2: Cluster diagram The fuzzy clustering algorithm outputs the final cluster centers and values of objective function for each iteration. The clustering process stops when the objective function improvement between two consecutive iterations is less than the minimum amount of improvement specified that is 1e-5, with the accuracy off 0.99. The result obtained by applying Fuzzy C-Means clustering with 2 cluster centers using MATLAB is shown in figure 3. Fuzzy C-Means was used for classifying the engineering materials for better business decision, which helps to identify which engineering material belongs to which category by using numerical properties and clustering the materials data set based on their similarities and performance. This can be achieved by repeating the same analysis first with different properties and then with different materials. This exploratory analysis suggests how a designer might be able to use such an analysis to suggest materials that are similar to each other. The same module can be used to cluster and classify the different engineering materials to take business decisions. Sarakutty T K received MCA degree from Bharathiar University and M.Phil Computer Science from M S University. She is working in the department of Computer Science and Applications in Dayananda Sagar College, Bangalore, India. She has 15 years of teaching experience in the field of computer science and applications and her research area includes Data Mining, Predictive Analytics and Algorithms. References Volume 1, Issue 3 September-October 2012 Page 21 International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: [email protected], [email protected] Volume 1, Issue 3, September – October 2012 ISSN 2278-6856 Dr. M. Hanumanthappa is currently working as a faculty as well as chairman in the Dept. of Computer Science and Applications, Bangalore University, Bangalore. He has over 16 Years of teaching (Post Graduate) as well as Industry experience. His area of Interest includes mainly Data Mining, Information Retrieval and Programming Languages. Besides, he has conducted a number of training programmes and workshops for Computer Science students. He is also the Principle Investigator of UGC-Major Research Project; he has published nearly 50 Research Papers in National and International Journal and Conferences. Currently he is guiding students for Ph.D in Computer Science, under Bangalore University. He is also one of the member of Board of Studies as well as Board of Examiners for various Universities of Karnataka. Volume 1, Issue 3 September-October 2012 Page 22

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Clustering of Engineering Materials Data Sets Using