Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Industrial Diagnosis by Hyper Space Data Mining Presented at AAAI 99 Spring Symposiumon Equipment Diagnosis Stanford University March 23, 1999 Dr. Dongping (Daniel) Zhu Zaptron Systems, Inc. Mountain View, CA 94043 Tel: 650-966-8700, Fax: 650-966-8780 E-mail: [email protected] http://www.zaptron.com Zaptron, 1999 1 OUTLINE Diagnosis overview: applications & technologies Hyperspace data mining Diagnostic examples product quality control (steel making) resolve bottleneck (gasoline production) improve yield (chemical plan) Conclusions MasterMiner™ demo Zaptron, 1999 2 Diagnosis &Trouble-Shooting Cost of support to products/services Customer satisfaction Key Issues how to best approach the same problem next time how to use history information - data mining how to update KB Solutions on-line help web-based, remote diagnostics knowledge management tools data mining (history data are available) Zaptron, 1999 3 A Web-based Diagnostic System Call Centers Service Teams Support Teams Data Collecting Mechanisms Standardization Data Management D Mining KD(D+K) K Updating Product Delivery Mechanisms Training tools Web-based diagnosis On-line Help SW Zaptron, 1999 Remote Repairs Factor Analysis KB manage 4 Rule-based Diagnostic Process History Database Fix Fault Fault Physics Primary Cases Analysis Diagnose Rule Base Diagnostic Matrix Cause Self Learning Query New data & Cases Update Database Zaptron, 1999 5 Expert System Architecture Interviewer (fi, hj) K Collector (aijl, bikl) Analyzer, Visualizer Web Users Data Base {a, b} Web GUI KB Builder (Mijk) KB Problem Solver (Search Engine) {Mij} Self Learner rijk Zaptron, 1999 6 Evolution of Diagnostic Techniques • Equipment and Processes • Sensors • Data • Databases • Data Models • Data Patterns (behavior in space) • Data Fusion, sensor fusion • Data Mining • Data …… Zaptron, 1999 7 Data Mining: Techniques • Correlation/association analysis • Factor analysis • Trend prediction & forecasting • Neural networks • Genetic algorithms • Fuzzy logic, expert systems • Uncertainty reasoning (DS, rough sets) • Bayessian Networks • Hyper space data mining • find data pattern first • no model assumption • provide solutions to failure isolation/recognition Zaptron, 1999 8 Hyper Space Data Mining Introduction Diagnosis - An optimization problem A Hyper Space Technology Application Examples SW: MasterMiner™ Zaptron, 1999 9 A General Issue • For any system - find a model to describe Operating data record In situ sensor report Raw materials composition Design/operating Relation ships Nonlinear High noise M-variant (no model) process parameters Zaptron, 1999 Failure & fault Bottle neck Energy use Cost/risk Quality Yield/returns Reliability Productivity 10 A Catch 21 Problem Data Pattern <--?--> Data Model Questions: what type of data to collect which data to use in modeling Solution: Hyperspace data mining Zaptron, 1999 11 To Start - A Real Case Aluminum Production Problem Target: to Optimize the Leaching Rate of Al2O3 Factors: a1 - Fe/Al in the ore a2 - Sodium Na/(Al2O3+Fe2O3)) a3 - leaching temperature a4 - lime (CaO)/(SiO2-TiO2) 2 Solutions: Principal Component Analysis (PCA) by SAS JMP or RS/1 - bad Hyperspace data mining by Zaptron MasterMiner™ - good result Zaptron, 1999 12 Can you see the pattern? If not, do data mining to separate into subspaces Zaptron, 1999 13 A Real Case - PCA Result: no separation Zaptron, 1999 14 A Real Case - MasterMiner: good separation Zaptron, 1999 15 MasterMiner 2nd step: complete separation Zaptron, 1999 16 A Real Case - MasterMiner: build a model Zaptron, 1999 17 History Data Steps in Data Mining Separability Test Pretreatment: local view, delete outliers Linearity, topological type, correlation, association, best matching point, Data Mining NN points Feature reduction (entropy, voting) Feature Selection Modeling (PH, MREC, ANN, GA) Inequality, equations, PLS, sensitivity, advisory Map description Extrapolation to Equations as State diagnosis Propose an optimal of cross-sections optimal zone for operating condition by using current criteria for operation data optimal control of normal op zone max yield or new materials & failure zones Zaptron, 1999 18 Clustering - Data Separation PCA - projection in the max separable direction Fisher: line projection with max distance between clusters MREC: projective geometry, better than either One-sided (voting) Data Base Data Mining Data Patterns Inclusive (entropy) Zaptron, 1999 Exclusive Sandwich 19 Software Architecture GUI DataBase Pattern Recognitin KnowBase Artificial Neural Nets Genetic Algorithm Zaptron, 1999 20 MasterMiner™ Functions Zaptron, 1999 21 MasterMiner™ Tools • Data loading, editing, sorting, calculation • Preprocessing: statistics, Feature selection, folding • Factor analysis target-factor analysis factor-factor analysis • Projections Fisher, LMAP, PCA, PLS, MREC • Modeling envelope, auto-box, Sphere, KL, ANN (train, estimation, sensitivity) • Extrapolation PLS vector (linear), Simplex, appending, Zaptron, 1999 22 Virtual Mining Tools for Convex and concave space Virtual mining in hyper space • Hidden projection - tunnel model • Envelope - generate a convex polyhedron • Use “auto-box” for concave polyhedrons of samples • Interchange of data classes • Folding transform (to change data pattern in space) Virtual mining of data samples • divide into multiple segments • convert concave polyhedron into convex ones • build the model for each subspace • separability went from 31% to 96% in one case Zaptron, 1999 23 Virtual Mining Methods (b) The Envelop-Boxing method (a) Tunnel model to separate data samples in hyper space (c) Generate convex polyhedrons from a concave one Zaptron, 1999 24 Iterative Feature Selection/Reducton Data pattern classified into 2 topological classes “one-sided class” “inclusive class” Hidden projections applied Projected factors are orthogonal in hyper space Feature selection method (highly effective): Entropy method is used for inclusive pattern Voting method is used for one-sided pattern Reduce features to reduce noise & complexity e.g., good result based on 5 features out of 500 Reduced feature set needs to pass Separation test Zaptron, 1999 25 MREC - Map Recognition Method MREC - Projection in the best direction, complete separation in 2 steps PCA: No separation Zaptron, 1999 26 We have Improved the Quality of alloy steels carbon fiber reinforced, resin-based composite materials Bi2O3-containing High Tc superconductors rare earth containing phosphor electrode materials of Ni/H batteries VPTC ceramic semi-conductor high temperature, SiC-based structural ceramics high-polymers: PVC, synthetic fiber & rubber, polyethylene, ... high energy materials semi-conductor devices MOCVD method of III-V compound film Zaptron, 1999 27 We have applied MasterMiner™ to Industrial Optimization & Diagnosis Petrochemical industry • distillation • hydro-cracking • vapor recovery • platinum reforming • delayed cooking • de-waxing • vinyl acetate • polypropylene • jet fuel (Union Oil recipe, yield 87% -> 94%, +6,000 ton/yr) • increase life of catalyst in polyvinyl plant (catalyst cost $1.2MM) • etc. Zaptron, 1999 28 We have applied MasterMiner™ to Industrial Optimization & Diagnosis Metallurgical Industry • blast furnace • casting • alloy steels quality improving (60% -> 80%) • energy saving in aluminum production Automobile Industry • electro-plating • heat treatment Chemical Industry • PVC, polyformaldhyde • butadiene rubber Zaptron, 1999 29 Application Areas Data Mining Process Optimization Equipment Process Diagnosis Petrochemical Industry Materials Design Metallurgical Semiconductor Industry Industry GOAL: Optimal control of complex processes involving Heat transfer Mass transfer Fluid flow Chemical reactions Zaptron, 1999 30 Pattern Recognition Methods • Linear Regression (LS) - “forced fitting” LS fitting coefficients as model parameters, the “best wish” • PCA - principal component analysis projection in “best” direction, select two directions, LS • LMAP - linear mapping • NN - neural nets blind learning, over-fitting, forced fitting origin at cluster center, covered with an ellipsoidal, PCA • MREC - map recognition (non linear) polyhedrons, hidden projections, separation, back-mapping • NNREC - neural nets + MREC Zaptron, 1999 31 Comparison of Various Methods CONDITION METHOD TO USE 1. (in some cases) Mechanism known Rule-based expert systems 2. (in 20% cases) Linear w/o noise Linear regression, statistical method 3. (in most cases) Highly noisy Multi-variant Non Gaussian Hyper-space data mining Zaptron, 1999 32 Why not Principle Component Analysis (PCA) ? Principle Component Analysis (PCA) Data Mining by MasterMiner nonlinear, Hierarchical Linear Gaussian Low noise Use all data in modeling 20 projections Non-Gaussian High noise Use subset of data in modeling 2 projections good separation No separation Zaptron, 1999 33 Why not Least Square Only ? PLS applies when PRESS < 0.3 (1/4 of cases in our practice) PROJECT synthetic rubber steel plate for ship building rare earth phosphor Baoshan Iron & Steel Ni/H battery Ni/H materials propylene recovery (noisy data) propylene recovery solvent oil VPTC hydro-cracking plant methanol production casting for car PRESS (Error) 0.2052 (can use PLS) 0.6419 (can not use PLS) 0.3067 0.3441 0.7389 0.1932 0.7755 0.3752 0.3975 0.1330 0.2055 0.8255 0.9157 Zaptron, 1999 34 Why not Neural Networks (GA) Only ? Over-fitting problem by NN (GA) Industrial records are not complete e.g. Leaching rate problem at an aluminum Co. Leaching rate = f(a, b, c, T) A cross-section of the optimal zone: • by ANN: too large • by our Yield Mater™: smaller c Wrong zone by ANN Zone by MasterMiner b Zaptron, 1999 35 Applications in Diagnosis • Equipment setup • steel making (roller distance, • oil refinery (bottleneck in gasoline production) • chemical plans (cooling pipe length, inlet position) • Process optimization • drug fermentation • environmental emission controls • materials manufacturing Zaptron, 1999 36 E.g. 1 Steel Making Blasting furnace • • • • Steel making Casting Hot rolling Cold rolling German equipment, yield 10,000 tons/yr ST14 steel plate for auto body Problem - “deep pressing” property 100 = 5x20 factors in 5 stages 2 major factors: • N2 - Nitrogen content should be reduced • d1/d2 - distance ratio of cold rollers increased • Benefit - wasted steel reduced by 5 times Zaptron, 1999 37 2nd issue: QC in ST14 Steel Plate Making Feed of Scrap, CaO, MgO, Iron Ore O2 blower Ladle Zaptron, 1999 38 Problem Background • After each batch, samples were taken in a 3-min test for QC • Need to control the amount of O2 blown and scrap added • Japanese case-based reasoning SW --> 65% separability • Problem: ST14 quality is off-spec • We used MasterMiner to build a model for QC • Target: FC (C content in steels, 17-30% by customer spec) • 13 Factors • Model built and used to control product quality • Result: 100% separability, products are on-spec Zaptron, 1999 39 Feature Selection Feature selected Property LY PLH DYSLT DYCD DYTEMP PCAO PMGO PORE WCH TOIRON SCAPT LDLIFE QO2 age of O2 gun (years) height of O2 gun O2 amount (m3) before sampling C content at sampling time (10-2 %) liquid iron temperature when sampling (C°) amount of CaO used amount of MgO added amount of iron ore added total charge of the converter in ton total liquid iron amount of scrap life of ladle used to transport liquid iron amount of O2 blown after sampling Zaptron, 1999 40 114 Sample Data Zaptron, 1999 41 Target-Feature Maps Zaptron, 1999 42 Data Separation by MasterMiner: 100% Zaptron, 1999 43 Data Separation by PCA: 30% Zaptron, 1999 44 Feature Selection (1) - Principle component regression Zaptron, 1999 45 Feature Selection (2) - PLS (partial least square) Zaptron, 1999 46 Feature Selection (3) - KW method (linear) Zaptron, 1999 47 Tunnel Models: 32 Inequalities Zaptron, 1999 48 Quality Control Issue • Solve the set of 32 equations • or use “appending” operation • assign values to uncontrollable factors • add N random samples • project them onto the N-dimensional space • select those falling into the optimal space • Results: The C content of ST14 products are on-specs Zaptron, 1999 49 Add Random Samples (green) Zaptron, 1999 50 E.g.2 Bottleneck in Gasoline Production Cooling coil Jet fuel Gasoline Crude oil inlet Diesel Naphtha heat Heavy oil Asphalt Distillation Tower Problem: gasoline yield low diagnose thermal cracking setup data mining method identify major factors diagnostic result: the length of cooling coils is too short Benefit: gasoline increased by 10,000 tons/yr Zaptron, 1999 51 e.g. 3 Ethylbenzene Synthesis Fractionation Tower Naphtha Inlet Reactor Platinum Catalyst Ethylbenzene heat A Platinum Reforming Workshop Zaptron, 1999 52 Ethylbenzene Synthesis Problem: yield low Data Mining Diagnostic result: position of inlet is wrong Action: move from layer 99 to 111 Benefit: yield raised by 35% Zaptron, 1999 53 E.g. 4 Predictive Control of Chaotic Process • Answer: No • Reason: Chaotic noises (Dr. Leon Chao of UC-Berkeley) • An historical story: a butterfly in Thailand caused a hurricane in Florida! • Chaotic noises in chemical reactions: A -> B, C -> D A B Materials C Product Atomic collision Zaptron, 1999 D 54 E.g. 4 Predictive Control of Chaotic Process • A Real Case: quality control in PTC ceramic production • Problem: inconsistent (average) particle size (good rate: 60%) • Material used: ultra-fine Al2O2 powder • Chemical reaction: NaAlO2 + H2O --> Al(ON2)3 + NaOH • Process: • add acid or base to control the above induction process • or change the cooling rate • heated Al(ON)3 powder formed • distribution of the particle size - near Gaussian • Al2O3 powder formed • Zaptron, 1999 55 E.g. 4 Predictive Control of Chaotic Process • Discovery:use a violet light, the transparency is varying from batch to batch Violet Light 2800 å Al2O3 Transparency measure Violet Transparency 1 2 3 Zaptron, 1999 Time 100 56 E.g. 4 Predictive Control of Chaotic Process • Analysis: chaotic noises do have patterns by DataMaster™ • Practical Solution: • measure the resistance r curve of a Al2O3 block being formed • predict the product quality 30 min before finishing • change the cooling rate to control the final r at 60 min • Result: quality increased from 60% to 100% in 500 experiments r Temperature (C°) 1350 r1 r2 r3 0 30 t 60 Zaptron, 1999 30 time (min) 0 57 Conclusion If Linear (near linear) must have “one-sided” pattern use LS - “the best wish” extrapolate by accurate model-based prediction If Nonlinear if one-sided pattern use Fisher method extrapolate by principal components if inclusive pattern use MREC extrapolation by Simplex Zaptron, 1999 58 Conclusion: Integrated Solution 1997, L. Zadeh: “What is important about soft computing is that FL, NN, GA & PCA are synergistic rather than competitive.” In agreement with our experience Data do have patterns Different patterns need different methods Several methods need to be integrated New data mining technologies developed Zaptron, 1999 59 Economic Benefit Generated Factory Application Benefit (USD) A Petroleum Co. years yield increased: jet fuel, 3.5million/2 gas solvent, oil, propylene, xylene A Petrochem Refinery yield increased: gasoline, wax products 1.2 million/year An Iron & Steels . Yield increased: alloy steels for ships 3 million/year Total profit 7.5 million/year Ratio of cost to profit in 5 years: 1:100 Zaptron, 1999 60 MasterMiner™ Software • Desktop application software • Run on Window95/NT • Software demo download http://www.zaptron.com/masterminer • Examples: Zaptron, 1999 61 4-D Maps for Control Zaptron, 1999 62 Test Samples Added Zaptron, 1999 63 Announcement 2nd International Conference on Information Fusion -- FUSION’99 July 6 -8, 1999 Sunnyvale Hilton Silicon Valley, California, USA abstract due: Feb 1, 1999 http://www.inforfusion.org/fusion99 Sponsored by International Society of Information Fusion NASA, ARO IEEE Signal Processing Society IEEE Robotics and Automation Society IEEE Control Systems Society Special Session on Diagnostic Information Fusion Zaptron, 1999 64 Thank You ! VIP Zaptron Zaptron, 1999 65