Download Clustering of Engineering Materials Data Sets Using

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

K-means clustering wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
Web Site: www.ijettcs.org Email: [email protected], [email protected]
Volume 1, Issue 3, September – October 2012
ISSN 2278-6856
Clustering of Engineering Materials Data Sets
Using Fuzzy System
Sarakutty.T.K1, Dr.M.Hanumanthappa2
1
Department of Computer Science & Applications,
Dayananda Sagar College, Bangalore, India
2
Department of Computer Science & Applications,
Bangalore University, Bangalore, India
Abstract:
Data mining enables efficient knowledge
extraction from large datasets, in order to discover hidden or
non-obvious patterns in data. Clustering of engineering
material data sets deals with the systematic categorization of
materials based on distinguished characteristics as well as
criteria. Material informatics deals with real world material
data sets with high dimensionality and complex structure.
Fuzzy approaches can play an important role in data mining,
because they can deal with complex high dimension data and
is capable of producing comprehensible results. Fuzzy
clustering method is used to cluster the materials data set
based on their similarities and performance. The knowledge
extracted from the engineering material data sets is proposed
for effective decision making in advanced engineering
materials design applications.
Keywords: Data mining,
Clustering, Fuzzy C-Means
Material
Informatics,
1. INTRODUCTION
Materials play an important role in the construction and
manufacturing of equipment/tools, transportation,
housing, clothing, communication, recreation and food
production.
Historically,
the development and
advancement of societies have been intimately tied to the
member’s ability to produce and manipulate materials to
fill their needs.
During the last decades many new materials and material
types have been developed. At present of the order of
100000 engineering materials exist. In addition many
materials have successively obtained improved properties.
This has been possible not only due to the development of
the materials but also due to the appearance of new
production methods. As a consequence of this rapid
development many material types can be used for a given
component. Computational tools assist in making
decisions by analyzing the data, and discovering useful
patterns for predicting future trends. In the Materials
Science domain it is imperative to connect materials
suppliers, automobile companies, heat treatment
industries, universities, researchers, aerospace agencies,
manufacturing companies and other users [1]. Exchange
of knowledge among these users enables them to make
faster and more effective decisions. For example, prior
Volume 1, Issue 3 September-October 2012
knowledge of the fact that distortion is likely to occur in a
part when it is heat treated under certain conditions is
useful, in selecting parameters so as to minimize
distortion in an industrial heat treatment process. This in
turn helps to optimize processes and make better products
hence improving business by satisfying customers. Thus
on the whole, E-business is promoted by facilitating
worldwide exchange of knowledge useful in the domain
for supporting various aspects of decision support [2].
Coupling of computational material science and
informatics is essential in order to
 Accelerate insertion of materials into engineering
systems
 Establishment of new structure, property
correlations among large, heterogeneous and
distributed data sets
 Discovery of new chemistries and compounds
 Formulation and / or refinement of new theories
for materials behavior
 Rapid identification of critical data and theoretical
needs for future problems
The research areas of materials informatics are mainly
focused on following tasks - Data standards, Organization
and management of material data and data mining on
materials data [3],[4]. Materials informatics is very likely
to become a major force because of enormous
improvements in efficiency and capabilities in
computational methods for materials and the recent
progress in data mining techniques.
The research is aimed to establish if data mining
techniques can be used to assist in the clustering of
materials by finding the meaningful patterns that exist
across various materials. The materials are clustered
based on their properties. The resulting clusters, and the
classifications that can be developed from them, depend
on the selected attributes and to some extent on the
method of clustering. Grouping materials allows a
designer to assess the similarity of two materials,
stimulating innovation and suggesting substitutions. The
knowledge extracted from this is proposed for effective
decision making in advanced engineering materials
design applications.
Page 18
International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
Web Site: www.ijettcs.org Email: [email protected], [email protected]
Volume 1, Issue 3, September – October 2012
ISSN 2278-6856
2. LITERATURE SURVEY
Materials informatics has been a subject of materials
science, since the international conference of Materials
Informatics [5]. It is a new subject that leverages
information technology and computer network technology
to represent, parse, store, manage and analyze the
material data, in order to realize the sharing and
knowledge mining of materials data for uncovering the
essence of materials, and accelerate the new material
discovery and design[5].
2.1 Materials informatics
Data quality plays a central role for compiling valid and
reliable plans to make the right decisions. At the same
time, it is acknowledged that planning processes are both
data and knowledge intensive and characterized by the
human-computer interface. Informatics is a science where
a new knowledge system is built up by collecting and
classifying information using computers and networks. It
is the integration of computer science, information
science, and some domain area to provide new
understandings and to facilitate knowledge discovery [6].
Materials informatics can be thought of as a tool for
material scientists to gain new understandings of their
data through the use of a myriad of machine learning
approaches, integrated with new visualization schemes,
more human-like interactions with the data, and guided
by domain experts. It can also accelerate the research
process and minimize data handling. All of this is fuelled
by the unprecedented growth in the field of information
technology and is driving the interest in the application of
knowledge representation, knowledge discovery, machine
learning, information retrieval, semantic technology etc.
[7].
The main issues to be addressed regarding the
development of materials informatics are
 Redefinition of database formats, aiming at
improved data sharing
 Database networking and the development of
software for data sharing
 Development of data analysis software and
visualization software
 Development of software for data mining from
databases
 Prediction of new functions by the combination of
data mining and computation science.
 Standardization of platforms that integrate all
these factors [6].
2.2 Previous Work
A comparative study of different classification algorithms
is present in [8] and Fuzzy C-Means algorithm performs
well on unsupervised data with uncertainty. Cluster
analysis [9] is used as an analytical tool to materials
design to cluster materials and the processes that shape
them, using their attributes as indicators of relationship.
Volume 1, Issue 3 September-October 2012
Naïve Bayesian classification algorithm [5] is used to
classify engineering materials data sets consisting of only
categorical attribute values. Here we are using fuzzy
system to classify engineering materials data sets
consisting of both numerical and categorical attribute
values. When we consider both numerical and categorical
attribute values it is possible for us to have higher
classification accuracy since many of the material
properties are expressed numerically. This technique
reduces complexity and helps expose hidden order and
deeply buried patterns in data.
3. PROPOSED METHOD
Clustering is an unsupervised learning method used to
find a structure in a collection of unlabeled data.
Clustering is the process of organizing objects into groups
whose members are similar in some way. A cluster is
therefore a collection of objects which are similar between
them and are dissimilar to the objects belonging to other
clusters. Clustering of data is a method by which large
sets of data are grouped into clusters of smaller sets of
similar data. Clustering is used to quickly and easily seed
the process of taxonomy generation. It provides a way of
understanding how attributes of high dimensional data
are organized and related.
Clustering and fuzzy logic together provide simple
powerful techniques to model complex systems. Fuzzy
clustering provides a robust and resilient method of
classifying collections of data elements by allowing the
same data point to reside in multiple clusters with
different degrees of membership. Interpretations of
membership degrees include similarity, preference, and
uncertainty. In contrast to classical set theory, in which
an object or a case either is a member of a given set
defined by some property or not, fuzzy set theory makes it
possible that an object or a case belongs to a set only to a
certain degree. Using fuzzy clustering it is possible to
state how similar an object or case is to a prototypical
one, it can indicate preferences between suboptimal
solutions to a problem, or it can model uncertainty about
the true situation, if this situation is described in
imprecise terms. In general, due to their closeness to
human reasoning, solutions obtained using fuzzy
approaches are easy to understand and to apply. Due to
these strengths, fuzzy systems are the method of choice, if
linguistic, vague, or imprecise information has to be
modeled. There are many different clustering algorithms
that could be used, and we have relied on the Fuzzy CMeans algorithm, because it is fast and straightforward.
Fuzzy C-Means is a data clustering technique in which a
dataset is grouped into n clusters with every data point in
the dataset belonging to every cluster to a certain degree.
For example, a certain data point that lies close to the
center of a cluster will have a high degree of belonging or
membership to that cluster and another data point that
lies far away from the center of a cluster will have a low
degree of belonging or membership to that cluster. In our
Page 19
International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
Web Site: www.ijettcs.org Email: [email protected], [email protected]
Volume 1, Issue 3, September – October 2012
ISSN 2278-6856
study, Fuzzy C-Means algorithm is used to deal
unsupervised data with uncertainty. The goal of Fuzzy CMeans algorithm is to group the objects into clusters
based only on their observable features such that each
cluster contains objects that share some important
properties [8].
Fuzzy C-Means algorithm used in the proposed model for
clustering engineering materials is given below.
Algorithm: Fuzzy C-Means
Input: Data - data set to be clustered; each row is a
sample data point
Cluster n - number of clusters (greater than one)
Output: Center coordinates of final cluster centers
Obj_fcn - values of the objective function during
iterations
Let X = {x1 , x2, x3 ..., xn} be the set of data points and
V = {v1 , v2, v3 ..., vc} be the set of centers.
1) Randomly select ‘c’ cluster centers.
2) Calculate the fuzzy membership 'µij' using
c
 ij  1 / k 1 (d ij / d ik ) ( 2 / m1)
4. EXPERIMENTAL SETUP & RESULTS
Materials database is organized from popular materials
website [11] and from peer reviewed research papers
published. The atomic and electronic structure of the
material determines its properties. A typical set of
training sample data set is shown in table 1 which
contains the properties of metal with respect to steel like
specific gravity, young’s modulus, thermal conductivity,
linear expansion coefficient, melting point and electrical
resistivity. The properties are assumed at 20Deg. C.
Table 1: Material Properties
(1)
Compute the fuzzy centers 'vj' using
n
n
V j  ( ( ij ) m xi ) /( (ij ) m ), j  1,2,........c
i 1
(2)
i 1
3) Repeat step 2) and 3) until the minimum 'J' value
is achieved or ||U (k+1) - U (k) || < β.
Where,
k is the iteration step.
β is the termination criterion between [0, 1].
U = (µij)n*c is the fuzzy membership matrix.
J is the objective function, which is to minimize
n
c
J (U , V )   ( ij ) m xi  v j
2
(3)
i 1 j 1
where, ||xi – vj|| is the Euclidean distance
between ith data and jth cluster center [10].
A block diagram summarizing FCM clustering algorithm
is given in figure 1.
Material property charts are two-dimensional plots using
pairs of material properties as the variables. The idea of
seeking clusters in two dimensions is to plot the two
variables as if they were x, y coordinates. Material 1
appears as the point x=X11, y=Y11[9]. Figure 2 shows a
cluster diagram using the values of two technical
attributes, specific gravity and young’s modulus for
metals selected with respect to steel.
Figure 1: Block diagram summarizing FCM Clustering
Algorithm
Volume 1, Issue 3 September-October 2012
Page 20
International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
Web Site: www.ijettcs.org Email: [email protected], [email protected]
Volume 1, Issue 3, September – October 2012
ISSN 2278-6856
Figure 3: Results
Here we have taken two properties specific gravity and
young’s modulus and clustered the metals into two
groups. Based on the clusters it is possible for us to select
the metals which have similar values for the selected
properties. The same analysis can be continued with
different properties so that we get clusters based on those
properties.
[1] Begley E.F, “National Institute of Standards and
Technology Report”, USA, Jan 2003.
[2] Aparna S. Varde, Makiko Takahashi, Elke A.
Rundensteiner, Matthew O. Ward, Mohammed
Maniruzzaman and Richard D. Sisson, “Apriori
Algorithm and Game-of-Life for Predictive Analysis
in Materials Science”
[3] Rajan, "Informatics and Integrated Computational
Materials Engineering: Part II”, JOM, Vol. 61, pp.
47-47, 2009.
[4] Wei,Q.Y., Peng,X. D., Liu, X.G., Xie,W.D .: ,(2006)
"Materials informatics and study on its further
development," CHINESE SCIENCE BULLETIN,
Vol. 51, pp 498-504
[5] Doreswamy, Hemanth.K.S, “Hybrid Data Mining
Technique for Knowledge Discovery from
Engineering Materials Data Sets”, International
Journal of Database Management Systems, Vol.3,
No.1, February 2011.
[6] Toyohiro Chikyow, “Trends in Materials Informatics
in Research on inorganic materials”, quarterly
review No 20, July 2006.
[7] R. L. King, O. Abuomar, H. Rhee, A.
Konstantinidis, N. Pavlidou and M. Petrou, “On
materials informatics and pattern formation in
materials”, ENOC 2011, 24-29, July 2011.
[8] P. Bhargavi, Dr. S. Jyothi, “Soil Classification Using
Data Mining Techniques: A Comparative Study”,
International Journal of Engineering Trends and
Technology- July to Aug Issue 2011
[9] K.W. Johnson,
P.M. Langdon, M.F. Ashby,
“Grouping materials and processes for the designer:
an application of cluster analysis”, Elsevier Science
Ltd, 2002
[10] Mohanad Alata, Mohammad Molhim, and
Abdullah Ramini, “Optimizing of Fuzzy C-Means
Clustering Algorithm Using GA”, World
Academy of Science, Engineering
and
Technology, 2008.
[11] http://www.engineersedge.com/properties_of_m
etals.htm
5. CONCLUSION & FUTURE WORKS
AUTHORS
Figure 2: Cluster diagram
The fuzzy clustering algorithm outputs the final cluster
centers and values of objective function for each iteration.
The clustering process stops when the objective function
improvement between two consecutive iterations is less
than the minimum amount of improvement specified that
is 1e-5, with the accuracy off 0.99.
The result obtained by applying Fuzzy C-Means
clustering with 2 cluster centers using MATLAB is
shown in figure 3.
Fuzzy C-Means was used for classifying the engineering
materials for better business decision, which helps to
identify which engineering material belongs to which
category by using numerical properties and clustering the
materials data set based on their similarities and
performance. This can be achieved by repeating the same
analysis first with different properties and then with
different materials. This exploratory analysis suggests
how a designer might be able to use such an analysis to
suggest materials that are similar to each other. The same
module can be used to cluster and classify the different
engineering materials to take business decisions.
Sarakutty T K received MCA degree
from Bharathiar University and M.Phil
Computer Science from M S
University. She is working in the
department of Computer Science and
Applications in Dayananda Sagar
College, Bangalore, India. She has 15
years of teaching experience in the
field of computer science and
applications and her research area includes Data Mining,
Predictive Analytics and Algorithms.
References
Volume 1, Issue 3 September-October 2012
Page 21
International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
Web Site: www.ijettcs.org Email: [email protected], [email protected]
Volume 1, Issue 3, September – October 2012
ISSN 2278-6856
Dr. M. Hanumanthappa is
currently working as a faculty as
well as chairman in the Dept. of
Computer
Science
and
Applications,
Bangalore
University, Bangalore. He has
over 16 Years of teaching (Post
Graduate) as well as Industry
experience. His area of Interest includes mainly Data
Mining, Information Retrieval and Programming
Languages. Besides, he has conducted a number of
training programmes and workshops for Computer
Science students. He is also the Principle Investigator of
UGC-Major Research Project; he has published nearly 50
Research Papers in National and International Journal
and Conferences. Currently he is guiding students for
Ph.D in Computer Science, under Bangalore University.
He is also one of the member of Board of Studies as well
as Board of Examiners for various Universities of
Karnataka.
Volume 1, Issue 3 September-October 2012
Page 22