Download as a PDF

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nearest-neighbor chain algorithm wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

K-means clustering wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
International Journal of Advances in Engineering & Technology, May 2013.
©IJAET
ISSN: 2231-1963
DATA MINING BASED KNOWLEDGE DISCOVERY FOR
QUALITY PREDICTION AND CONTROL OF EXTRUSION BLOW
MOLDING PROCESS
E.V.Ramana1 and P. Ravinder Reddy2
1
Department of Mechanical Engineering,
Shadan College of Engineering & Technology, Hyderabad, India
2
Department of Mechanical Engineering, Chaitanya Bharati Institute of Technology,
Hyderabad, India
ABSTRACT
In practice, when quality of products fall below the expected level, causal relationships between process
variables and product defects are investigated. Quality improvement and control activities in product
manufacturing include identifying the factors that significantly affect quality, modeling relationships between
input attributes and target attribute (yield, quality, performance index etc) and predicting quality levels of given
input attributes. Data mining tools have created new ways of extracting useful knowledge from existing
manufacturing process databases. This paper proposed data mining based knowledge discovery approach from
Extrusion Blow Molding process database making use of Microsoft Clustering, Naïve Bayes and Decision Trees
algorithms available in SQL Server 2008. Data mining models have been created using these algorithms from
extrusion blow molding process data set of a typical product (Gum Bottle). These models were trained based on
database containing historical process data. The objective of this study was to apply the discovered process
knowledge to set optimum process parameters resulting in high quality products, increased output and
reduction of scrap.
KEYWORDS: Data mining, Clustering, Pattern recognition, Decision Trees, Knowledge discovery
I.
INTRODUCTION
New approach like data mining is required for quality prediction and control to keep pace with
increased complexity of manufacturing. Data Mining is the non-trivial process of identifying valid,
novel, potentially useful and ultimately understandable patterns. [4] The domain knowledge is used to
guide the search or evaluate the interestingness of resulting patterns.[5] Data mining is able to predict,
classify data as well as detect relationships existing between quality measure (target attribute) and
input attributes (manufacturing process data). The quality measure may have nominal values such as
“Accepted” / “Rejected”. Data mining tools are useful in many areas of manufacturing such as defect
analysis, yield improvement, quality monitoring and process control etc. [17][1][15] While quality
monitoring tries to reduce the variability by detection and removal of assignable causes, process
control is based on process compensation and regulation to reduce variability. [9]
Classification is a form of supervised learning where class labels for training samples are given and
used as examples to supervise the learning of a classification model. Typical classification algorithms
used for data mining task are Decision Tree (DT), Artificial Neural Networks (ANN), Naïve Bayes
etc. Clustering is the process of grouping data into classes or clusters so that objects with in a cluster
have high similarity but are very dissimilar to objects in other clusters.[13] Clustering is also known
as unsupervised learning. Unlike supervised learning (Classification), the class label of each data
object is not known. [21] Data mining tools can be used to extract knowledge from process data sets.
The discovered knowledge can be used to minimize number of defective products and to achieve
desired level of process performance and quality of product. [7] This knowledge shall be represented
in a form that is understandable to human. [19][18][16][11] Data mining techniques like Artificial
Neural Networks (ANN) and Genetic Algorithms (GA) have been used in determining process
703
Vol. 6, Issue 2, pp. 703-713
International Journal of Advances in Engineering & Technology, May 2013.
©IJAET
ISSN: 2231-1963
parameters in injection molding, welding etc. Combining a trained Radial Basis Network (RBN) and
sequential Quadratic Programming (SQP) method was used to determine an optimal parameter setting
of injection molding process. [14] A genetic algorithm approach was implemented to the optimization
of process parameters in Laser Beam Welding. [8]
Knowledge discovered from Microsoft (MS) Clustering, Decision Trees and Naïve Bayes algorithms
have been implemented to determine optimal process parameters for extrusion blow molding process.
This paper consists of eight sections. Extrusion Blow Molding process and its associated data set is
disclosed in Section 2.MS Clustering algorithm has been applied on data set to form clusters in
Section 3. In Section 4, the method of using MS Naïve Bayes algorithm to predict the class of the data
has been presented. Section 5 describes MS Decision Trees algorithm applied for classification task.
Evaluation measures used for predicting accuracy models are presented in Section 6. Conclusions of
this paper are disclosed in Section 7. Finally Section 8 presents the scope of future work.
II.
EXTRUSION BLOW MOLDING
In extrusion blow molding, a hot tubular parison is extruded continuously and is cut by operating the
knife after obtaining the desired length. [6] Now the mould will move towards the extruder center,
catch the parison and its open end is sealed off by closing the mold halves. Air is blown at desired
pressure into hot parison to expand it against the walls of the mould. The product is ejected after
adequate cooling from the mould.
2.1. Data Set
For this application, a process data set of the plastic product (150 ml Gum Bottle) made of HDPE
(High Density Poly Ethylene) consisting of 60 records is used. The product is manufactured by
extrusion blow molding process. Data acquisition related to process attributes is done from the data
stored in machine and manually recorded data. The process attributes considered for building data
mining models are presented in Table 1.
Table 1 Extrusion blow molding process attributes
Barrel Temperatures in Zone 1 to Zone 5 (oC) Cycle Time (s)
Ejection Time (s)
Ext ID
Extrusion Die Temperature (oC) Inflation Pressure (bar) Mold Temperature (oC)
Parison Length (mm)
Parison shape Recycled material Surface finish Total of Inflation and Cooling time(s) Trace of foreign material
__________________________________________________________________________________
III.
CLUSTERING
Clustering is simple and natural for human being in dealing with small set of attributes. It becomes
difficult to handle as number of attributes grows for human mind. MS Clustering algorithm enables to
conceive the hidden relationships between the large numbers of attributes existing in modern data
sets. The most common usage of MS Clustering algorithm is to detect the clusters in the data and label
the data with the clusters that are discovered. This algorithm can be used to predict values as well as
to provide natural groupings, but traditionally clustering is not used for prediction. [12] Clustering
shall be used as preprocessing step for other algorithms such as decision trees in a large analytical
project. It is often the first data mining task to explore any underlying patterns that exist in the data.
Clustering has been performed using MS clustering algorithm on Extrusion Blow Molding process
data set to gain insight into characteristics of each cluster and to focus on a particular set of clusters
for further analysis.
3.1 Performing Clustering
MS Clustering algorithm has been applied with out providing class label in the first instance. The
cluster diagram generated by the algorithm shows that clusters 4, 5 and 6 are isolated from the rest of
them as presented in Fig-1. Cluster 3 having population size of one is not taken into consideration. It
is assumed in the first instance that one among the Clusters 1 and 2 is representing accepted products,
since these are having significant population. The same algorithm has been applied on the data set by
setting class attribute as predictable to predict class label of clusters in the second instance so as to
704
Vol. 6, Issue 2, pp. 703-713
International Journal of Advances in Engineering & Technology, May 2013.
©IJAET
ISSN: 2231-1963
identify defective and acceptable product cluster characteristics and their associated values. MS
Clustering algorithm parameters are set to the values given below.
Cluster count: 0 (Algorithm use heuristics to determine number of clusters to build), Clustering
method (default: scalable EM), Maximum attributes (default: 255), Maximum states (default: 100),
Minimum support (default: 1), Modeling cardinality (default: 10), Sample size (default: 50000),
Stopping Tolerance: 1, this value is used to determine when convergence is reached)
Fig-1 Cluster diagram
Cluster 1 is representing the characteristics of accepted products as assumed in the first instance while
Cluster 4 representing the characteristics of products rejected due to excess flash on both ends whose
discrimination scores with Cluster 1 are given in Table-2. Extrusion die temperature, parison length
and surface finish are attributes that have impact on the above mentioned defect. [6][20]
Table-2 Discrimination scores of Cluster 4 and Cluster 1
Variables
Values
Favors Cluster 4 Favors Cluster 1
Extrusion Die Temperature
>= 43
100.000
Parison Length
>= 113
100.000
Extrusion Die Temperature
< 43
100.000
Parison Length
< 113
100.000
Class
Accepted
64.604
Class
Rejected due to Excess flash on both
ends of component
64.604
Surface Finish
Average
64.604
Surface Finish
Good
64.604
Trace of foreign material, high mold temperature and average surface finish are the attributes that
favors the Cluster 3 (representing the components rejected due to discoloration and holes on surface)
and its discrimination score with Cluster 1 representing the accepted components is shown in Fig-2.
Fig-3 shows high mold temperature, parison shape with hooking and poor surface are the key
influencers in causing poor neck formation and finish.
705
Vol. 6, Issue 2, pp. 703-713
International Journal of Advances in Engineering & Technology, May 2013.
©IJAET
ISSN: 2231-1963
Fig.2 Discrimination scores for Cluster 3 and Cluster1
Fig.3 Discrimination scores for Cluster 5 and Cluster1
IV.
NAÏVE BAYES CLASSIFICATION
Naïve Bayes is one of the simplest classifiers which provide very effective way to explore data.
Model’s training is done with a single pass over the training data which makes the algorithm suitable
to perform analysis on large data sets with large numbers of attributes. [12] It assumes that the effect
of an attribute value on a given class is independent of the other attributes. The ability to explore
relationships between attributes can be applied to identify the factors that are having high impact on
quality in a manufacturing process. [15] MS Naïve Bayes algorithm enables to identify the key input
attributes (influencers) for a specific target attribute. MS Naïve Bayes mining model has been created
by making use of same process attributes that were used by clustering algorithm as shown in Table 1.
Naïve Bayes dependency network view in Fig.4 shows the attributes that are having strong impact in
predicting the class after eliminating weaker links. Each node represents the attribute and a node has
an outgoing edge, it is predictive of the node at the end of the edge. MS Naïve Bayes algorithm
parameters are set to the values given below.
706
Vol. 6, Issue 2, pp. 703-713
International Journal of Advances in Engineering & Technology, May 2013.
©IJAET
ISSN: 2231-1963
Maximum input attributes (default: 255), Maximum output attributes (default: 255), Maximum states
(default: 100, Minimum-Dependency-Probability (0.1, specifies minimum dependency probability
between input and output attributes)
Fig-4 Dependency network view
Characteristics of attributes of accepted products and their associated values or range of values are
given in the attribute characteristics view of MS Naïve Bayes viewer as shown in Table-3. Fig-5
displays that high mold temperature, poor surface finish and parison shape hooking favoring rejection
of components due to poor neck formation & finish.
Table-3 Attribute characteristic view
707
Attributes
Values
Probability
Inflation Pressure
5-6
100.000%
Extrusion Die Temperature
< 43
100.000%
Mold Temperature
Normal
100.000%
Material Used
HDPE
100.000%
Ejection Time
1
100.000%
Barrel Zone-4 Temperature
175 - 176
100.000%
Barrel Zone-3 Temperature
170 - 172
100.000%
Cycle Time
11
100.000%
Barrel Zone-5 Temperature
180 - 181
100.000%
Surface Finish
Good
100.000%
Trace Of Foreign Material
No
100.000%
Barrel Zone-2 Temperature
165 - 166
100.000%
Barrel Zone-1 Temperature
160 - 161
100.000%
Total Of Inflation And Cooling Time
10
100.000%
Parison Shape
Straight
100.000%
Parison Length
< 113
100.000%
Recycled Material
No
57.500%
Recycled Material
Yes
42.500%
Vol. 6, Issue 2, pp. 703-713
International Journal of Advances in Engineering & Technology, May 2013.
©IJAET
ISSN: 2231-1963
Fig-5 Discrimination scores for accepted components and rejected components due to poor neck formation &
finish
Fig-6 displays the long parison and high mold temperatures are resulting in the rejections due to
excess flash on both ends of component. Table-4 establishes by discrimination scores of attributes that
recycled material with trace of foreign matter and high molding temperature can cause the rejection of
components due to discoloration and holes on surface.
Fig-6 Discrimination scores for accepted components and rejected components due to excess flash on both ends
of component
Table-4 Discrimination scores that favors accepted components and components rejected due to discoloration
and holes on surface
Attributes
Trace Of Foreign
Material
708
Values Favors Accepted
No
Favors Rejected due to Discoloration and holes on
surface
100.000
Vol. 6, Issue 2, pp. 703-713
International Journal of Advances in Engineering & Technology, May 2013.
©IJAET
ISSN: 2231-1963
Trace Of Foreign
Material
Yes
100.000
Surface Finish
Average
100.000
Surface Finish
Good
Mold Temperature
High
Mold Temperature
Normal
Recycled Material
Yes
Recycled Material
No
V.
100.000
100.000
100.000
13.919
13.919
DECISION TREES CLASSIFICATION
Decision tree is one of the most popular data mining techniques because of high degree of accuracy
and easily understandable patterns. Decision tree algorithms extract a decision tree from the data and
it contains explicit knowledge that can be easily interpreted by user. [12][2] ID3, C4.5, CART, T2,
CAL5, CN2 are some of the decision tree algorithms that are used in industry. [3][10] MS decision
trees algorithm can be used for classification, regression and association tasks. In this paper, decision
trees algorithm has been applied on process data associated with attributes given in Table 1 for
classification task. Dependency network generated by the algorithm shown in Fig-7 displays that
surface finish and trace of foreign material are the only attributes to predict the class label. Since
surface finish of the product (good/average/poor) is the outcome of varying the other controllable
process attributes like barrel temperatures, mold temperature, extrusion die temperature etc., it has
been ignored from consideration in decision trees model irrespective of its prediction accuracy. MS
Decision Trees algorithm parameters are set to the values given below.
Complexity penalty (0.1), Maximum input attributes (255), Maximum output attributes (255),
Minimum support (1, Minimum number of cases leaf node must contain), Score method (3, Bayesian
with K2 prior method to calculate split score), Split method (2, split the tree completely on each
attribute)
Fig-7 Dependency network view of Decision Trees model
Decision tree provided by MS Tree Viewer with mining legend is shown in Fig-8. The following
decision rules can be derived from the decision tree.
709
Vol. 6, Issue 2, pp. 703-713
International Journal of Advances in Engineering & Technology, May 2013.
©IJAET
ISSN: 2231-1963
Fig-8 Decision Tree
Rule-1: If Mold Temperature=”Normal” and Extrusion Die Temperature < 43 Then Class=Accepted
with Probability=89.13% and Support=40 cases
Rule-2: If Mold Temperature=”High” and Recycled material=”Yes” Then Class=”Rejected due to
discoloration and holes on surface” with Probability=66.17% and Support=5 cases
Rule-3: If Mold Temperature=”Normal” and Extrusion Die Temperature >=”43” Then
Class=”Rejected due to excess flash on both ends of component” with Probability=62.15% and
Support=4 cases
Rule-4: If Mold Temperature=”High” and Recycled Material=”No” Then Class=”Rejected due to
poor neck formation & finish” with Probability=57.14% and Support= 3 cases
VI.
EVALUATION MEASURES
The quality of knowledge extracted from process data and prediction accuracy of the models can be
assessed by accuracy charts, classification matrix and cross validation. [3] The standard lift chart
contains one line for each model selected and two extra lines: a random line and an ideal line. Lift
chart is generated with the input selection of class attribute as predictable column and prediction value
Accepted is shown in Fig-9. Mining models based on Naïve Bayes, Decision Trees, and Clustering
algorithms can get 60% target using only 50% of the data and their respective prediction probabilities
are shown in the mining legend. The prediction accuracy of models has been measured by lift chart on
test data set with 10% hold out cases.
710
Vol. 6, Issue 2, pp. 703-713
International Journal of Advances in Engineering & Technology, May 2013.
©IJAET
ISSN: 2231-1963
Fig-9 Standard lift chart
Multifold cross validation is done on the three models mentioned above with 10 folds on the entire
data set (to make use of all available cases for training) for a target attribute “class” and target state
“Accepted”. It splits training data into 10 folds and model is built on the data from all the other folds
(partitions) and validated against data from current fold (partition). Cross validation results are
presented in Fig-10.Training data is good enough for all the models since results for each partition are
similar with an average value of 4.3 and standard deviation of 0.45.
Fig- 10 Cross validation results
VII.
CONCLUSION
In the present article, Microsoft Naïve Bayes, Decision Trees, Clustering algorithms available in SQL
Server 2008 have been implemented to extract the useful and expressive knowledge from extrusion
blow molding process dataset of a typical product(150 ml Gum Bottle). The prediction accuracy of
the models has been evaluated by standard lift chart and ten fold cross validation methods on the test
cases. Naïve Bayes and clustering models were found to have better accuracy than Decision Trees in
the evaluation performed by standard lift chart while predicting process parameter values that result in
acceptable products. The knowledge driven and proactive decisions have been implemented in
quickly setting process parameters and their range of values that resulted in increased output of high
quality products and significantly reduced the scrap.
711
Vol. 6, Issue 2, pp. 703-713
International Journal of Advances in Engineering & Technology, May 2013.
©IJAET
ISSN: 2231-1963
VIII.
FUTURE WORK
One important extension to this work will be to build up expert system based on extrusion blow
molding process for a specific company from the knowledge discovered by the data mining models.
Data mining algorithms such as Artificial Neural Networks (ANN), Genetic Algorithms (GA),
Logistic Regression and hybrid algorithms shall be considered for evaluation in evolving more
effective models. Data mining models can be made more reliable by building them on process
database accumulated over adequate period accommodating large number of cases covering all
possible materials, products, equipment and defects.
REFERENCES
[1] A.K. Choudhary J.A. Harding, M.K. Tiwari, “Data mining in manufacturing: a review based on kind of
knowledge”, Journal of Intelligent Manufacturing, 2008
[2] A. Kusiak, “Data mining: manufacturing and service applications”, International Journal of Production
Research, Vol. 44, 2006, pp4175-4191
[3] Andrew Kusiak, “Selection of Invariant Objects With a Data-Mining approach, IEEE Transactions On
Electronics Packaging Manufacturing, Vol. 28, No.2, April 2005,pp187-196
[4] Arun K Pujari, “Data Mining Techniques”, Universities Press, pp43-44, 2001
[5] B.N. Lakshmi, G.H. Raghunandhan, “A Conceptual Overview of Data Mining”, Proceedings of the National
Conference on Innovations in Emerging Technology-2011, Erode, Tamilnadu, India.17 & 18 February,
2011,.pp.27-32.
[6] Dr.Terry L.Richardson, ”Industrial Plastics: Theory Applications”, South-Western Publishing Co.,pp310316
[7] E.V.Ramana, P.Ravinder Reddy, “Integration of control charts and Data Mining for Process Control and
Quality Improvement”, International Journal of Advances in Engineering & Technology, Vol. 2, Issue
1,2012,pp.640-648
[8] G. Harinath Goud, E Venugopal Goud, “A genetic algorithm approach to the optimization of process
parameters in laser beam welding”, International Journal of Mechanical Engineering and Technology, Vol. 3,
Issue 3, 2012, pp459-470
[9] Gulser Koksal, Inci Batmaz, Murat Caner Testik, “A review of data mining applications for quality
improvement in manufacturing industry”, Expert systems with Applications, Vol:38, 2011, pp-13448-13467
[10] Hovhannes Sadoyan, Armen Zakarian, Pravansu Mohanty, ”Data mining algorithm for manufacturing
process control”, International. Journal Advanced Manufacturing Technology, Vol.28, 2006, pp342-350
[11] J.A. Harding, M.Shahbaz, Srinivas, A.Kusiak, Data mining in Manufacturing: A Review, Journal of
Manufacturing Science and Engineering, Vol. 128, 2006, pp969-976
[12] Jamie Maclennan, ZhaoHui Tang, Bogdan Crivat, Data Mining with Microsoft SQL Server 2008, pp215317, 2009
[13] Jiawei Han and Micheline Kamber,”Data Mining: Concepts and Techniques”, Morgan Kaufmann
Publishers,pp383-385,2008
[14] Jie-Ren, Shie, “Optimization of injection molding process for contour distortions of polypropylene
composite components by a radial basis neural network”, International Journal of Advanced Manufacturing
Technology, vol.36, 2008, pp1091-1103
[15] Kescheng Wang, Applying data mining to manufacturing: the nature and applications, Journal of
Intelligent Manufacturing, Vol.18, 2007, pp487-495
[16] Lior Rokach, Oded Maimon, “Data mining for improving the quality of manufacturing: a feature set
decomposition approach”, Journal of Intelligent Manufacturing, Vol.17,2006, pp285-299
[17] Lior Rokach, Roni Romano, Oded Maimon, “Mining manufacturing databases to discover the effect of
operation sequence on the product quality, Journal of Intelligent Manufacturing, Vol.19, 2008, pp313-325
712
Vol. 6, Issue 2, pp. 703-713
International Journal of Advances in Engineering & Technology, May 2013.
©IJAET
ISSN: 2231-1963
[18] M. A. Karim, G. Russ, and A. Islam, “Detection of faulty products using data mining”, Proceedings of
International Workshop on Data Mining and Artificial Intelligence (DMAI' 08), 24 December, 2008, Khulna,
Bangladesh, pp101-107
[19] Mark Ploczynski, Andrzej Kochanski, “Knowledge Discovery and Analysis in Manufacturing, Quality
Engineering, Vol. 22, pp169-181
[20] Muralisrinivasan Natamai Subramanian, “The Basics of Troubleshooting in plastics processing” Wiley,
pp135-148
[21] Vikram Pudi, P.Radha Krishna, “Data Mining”, Oxford University Press, pp122-128, 2010
AUTHORS
E. V. Ramana is working as a Professor& Head of Department of Mechanical Engineering,
Shadan College of Engineering & Technology, Hyderabad. He received his first M.Tech
degree in Energy Systems and second M.Tech degree in CAD/CAM in the years 1992 and
1997 from JNT University, Hyderabad. He possesses 16 years of experience in teaching and
6 years in industry and research. He is currently pursuing the PhD degree from JNTUH,
Hyderabad, India. He has considerable experience in developing mechanical engineering
related application soft wares. His active area of research is Data Mining in Manufacturing.
P. Ravinder Reddy is born in 1965 and he is working as a Professor and Head of
Mechanical Engineering, Chaitanya Bharathi Institute of Technology, Hyderabad. He is
having 22 Years of Teaching, Industrial and Research experience. Taught Postgraduate and
under graduate Engineering subjects. Published Research Papers over 132 in International
and national Journals, and Conferences. Guided 10 Ph.Ds and 4 Ph.D scholars submitted
their thesis. Guided over 250 M.E/M.Tech Projects and carried out research and
consultancy to a tune of Rs. 1.9 Cr sponsored by BHEL, AICTE, UGC, NSTL and other
industries. Organized 23 Refresher/STTPs/ workshops, one international conference and delivered 92 invited/
keynote/ special lecturers. Received “UGC Fellowship” award by UGC (1999). Raja Rambapu Patil National
award for promising Engineering Teacher by ISTE for the year 2000 in recognition of his outstanding
contribution in the area of Engineering and Technology. Excellence “A” Grade awarded by AICTE monitoring
committee for the MODROB project sponsored by AICTE in 2002. “Engineer of the year Award-2004” for his
outstanding contribution in Academics and research by the Govt. of Andhra Pradesh and Institution of
Engineers (India), AP State Centre on 15th September 2004 on the occasion of 37th Engineer’s Day. Best
Technical Paper Award in the year Dec. 2008 by National Governing Council of Indian Society for Non
Destructive Testing.
713
Vol. 6, Issue 2, pp. 703-713