Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
2000 Systems Engineering Capstone Conference • University of Virginia DOMINION SEMICONDUCTOR INFORMATION TECHNOLOGY SYSTEM Student team: Peter Bogdanowicz, Craig Johnson, Joe Vongkitbuncha, Kathleen Wong Faculty Advisors: Christina Mastrangelo and K. Preston White Department of Systems Engineering Client Advisors: Frank Anastasio, Rick Newcomer, and Kevin Ryan Dominion Semiconductor Corporation 9600 Godwin Drive Manassas, VA 20110 E-mail: [email protected] KEYWORDS: Semiconductor manufacturing, software evaluation, data warehousing, data modeling, data mining. GSA. The results of this project should provide valuable documentation on the data models and software tools for both DSC and future Capstone teams. ABSTRACT INTRODUCTION Dominion Semiconductor (DSC) and the University of Virginia Institute for Microelectronics (UVIM) have united to build the Semiconductor Manufacturing Information Technology Center (SMITC). The Center will serve as a means for industry and academic institutions to work cooperatively to improve on the process of semiconductor manufacturing. This Capstone project served as the Center’s first undertaking: developing the means for Golden Signature Analysis (GSA). If ideal conditions and machine settings for producing high yield wafers can be determined, then DSC can establish a “Golden Signature” for memory chip production that can be used to increase the yield on all wafers. This level of analysis can be achieved through the implementation of a data warehouse. One of the most important pieces in data warehouse implementation is designing the underlying data model, which is the primary focus of this project. The data model designs were based on research conducted in modeling techniques, feedback from DSC engineers, and thorough studies of DSC’s current database structure. In order to extract the data organized by the data models, it is necessary to have the appropriate data mining software tools. An extensive software trade study was conducted to determine the best data modeling, data warehousing, and data mining tools for DSC to employ. This paper describes the design of the data models that encapsulate the DSC database structure, as well as an overview of the software evaluations. The data models and software evaluations provide DSC with the first step in the process towards achieving The Golden Signature Analysis project for Dominion Semiconductor Corporation (DSC) is the inaugural project for the Semiconductor Manufacturing Information Technology Center (SMITC). SMITC was formed in 1999 to facilitate interaction between the industry and academic community in the state of Virginia. The University of Virginia and DSC are the founding members of the Center. The mission of this venture is to apply information technology to the semiconductor industry and improve manufacturers’ efficiency and productivity. The Golden Signature Analysis project will span multiple years. This year’s project focuses on examining the data collected on wafers with a high yield to determine the values of the parameters and data measurements that produced this desired result. The process engineers can then compare these with the measurements of less successful wafers to determine what adjustments must be made to the production line. In addition, by examining the preferred parameters and data measurements, the engineers can see how variations affect the yield. With a data warehouse, the engineers will have a method of bridging multiple databases that each contains critical data. They can apply data mining tools to locate trends and “red flags” in the data, which will result in increased efficiency. Because the semiconductor manufacturing industry moves very quickly, it is imperative that DSC identifies and corrects errors as quickly as possible in order to remain competitive. 51 Dominion Semiconductor Information Technology System BACKGROUND INFORMATION Dominion Semiconductor Corporation Dominion Semiconductor Corporation of Manassas, VA, produces DRAM memory chips. Due to the proprietary nature of the data it collects, the data we used in our analysis were taken from the MARTE recipe, a chip design that Dominion discontinued in mid-1999. There are five hundred chips on one wafer, and each chip is assigned a unique location in terms of the x and y coordinates on the wafer. On the production line, groups of twenty-five wafers, or lots, are processed together. The data the company collects are specified at the chip, wafer, or lot level. The chips undergo approximately 500 process steps in manufacturing. The physical manufacturing of the chips occurs in the wafer fabrication stage, which contains four operations: layering, patterning, doping, and heat treatments. Yield Yield, the ratio of functional chips to the total number of chips on a wafer, is the primary measure of the efficiency of the production process. By maximizing the yield, DSC will be able to sell more chips, which increases the company’s profits. Dominion collects data on five different types of yields. The most important measurements include direct yield, which tests the electrical properties of the chip; functional yield, which examines the functionality of the chip under optimal conditions; and margin yield, which measures how effectively the chips perform under sub-optimal conditions. Data Warehousing Data warehousing, data mining, and data modeling were three important concepts in this project. A data warehouse is a repository of data used for decision support. The objective of data warehousing is to create a clearing-house whose purpose is to gather and organize critical business data. Implementing a data warehouse at DSC would greatly improve the company’s analysis and decision-making process. Data mining is the process of using a data warehouse to discover previously unknown relationships in the data. Currently, there are several available data mining tools that use automated algorithms to uncover these relationships. A data model is a description of a database’s (or data warehouse’s) organization, often created as entity relationship diagrams or dimensional models. The 52 initial milestone for the Golden Signature Analysis project requires the team to create a data model of DSC’s pertinent data. SOFTWARE EVALUATION We considered several software tools for the data modeling, data mining, and data warehousing processes. We qualitatively evaluated these tools based on the following criteria: ease of use, maintenance, robustness, Internet platform, professional appearance, and cost. For each process tool, we focused on the two tools that best met the criteria. In order to confirm the two tools for each process, we held meetings at DSC to explain our choices. Data Modeling For the data modeling phase, we considered Rational Rose and Oracle Designer as possible software tools. The team had experience using Rational Rose; as a result, we felt that we knew most of the intricacies of the tool. In addition, we felt that we could produce a data model in a relatively short period of time using Rational Rose. Although we developed the preliminary model using this modeling tool, we recognized that Rational Rose was not the most powerful data modeling tool on the market. Thus, we decided to look at Oracle’s modeling solution. Oracle is widely praised for its databaserelated products, and we felt that experimenting with the demo versions of its products would provide us with a good understanding of the product. For this reason, we ordered the Oracle Tools Demo CD Pack. After successfully installing the Oracle suite, we translated the Rational Rose model into Oracle Designer. The team decided that the Oracle Designer model was more accurate and professional than the preliminary model. Therefore, we recommended Oracle Designer as the superior modeling tool. Data Mining For the data mining phase, we considered SAS Enterprise Miner (SAS EM) and Oracle Discoverer. Because we had no prior experience with any of these tools, we decided to compare the demo versions for both of them. Oracle Discoverer is an award-winning, ad hoc query and analysis tool. In addition, it integrates very well with other Oracle products. However, the team quickly realized that Oracle Discoverer needed both a client and a server side. Our computers, with the installed Oracle Discoverer software, comprised the 2000 Systems Engineering Capstone Conference • University of Virginia client side; we still needed a populated Oracle database that would represent the server side. Unfortunately, we could not successfully gain access to any Oracle database from DSC’s SMITC lab. Thus, our evaluation of the tool could only stem from analyzing the Discoverer on-line documentation and tutorials. We looked at this documentation in great detail, and we found that Oracle Discoverer, at the very least, was a well-documented and, easily maintainable tool. The documentation also displayed some Discoverer graphics that appeared very professional in quality. In the end, however, we had no first-hand experience with the package, and, as a result, we could not defend a recommendation for this product. SAS EM yielded installation problems, but we overcame these problems in a short period of time. We found that the SAS EM workspace is a drag-and-drop graphical user interface (GUI), and it is Windows oriented—meaning that the user can use the mouse to drag a particular data mining node (from the Tools Palette) down to the workspace. To establish links between two or more nodes, the user can simply click on a tool and drag the mouse cursor to the other tools. Thus, the interface is very user-friendly. SAS EM follows the SEMMA methodology for data mining: (1) Sample the data by creating one or more data tables. The samples should be large enough to contain the significant information, yet small enough to process. (2) Explore the data by searching for anticipated relationships, unanticipated trends, and anomalies in order to gain understanding and ideas. (3) Modify the data by creating, selecting, and transforming the variables to focus the model selection process. (4) Model the data by using the analytical tools to search for a combination of the data that reliably predicts a desired outcome. (5) Assess the data by evaluating the usefulness and reliability of the findings from the data mining process. SAS EM contains the following data mining nodes: (1) Sampling nodes include Input Data Source, Sampling, and Data Partition. (2) Explore nodes include Distribution Explorer, Multiplot, Insight, Association, and Variable Selection. (3) Modifying nodes include Data Set Attributes, Transform Variables, Filter Outliers, Replacement, SOM/Kohonen, and Clustering. (4) Modeling nodes include Regression, Tree, Neural Network, User Defined Model, and Ensemble. (5) Assessing nodes include Assessment, Score, and Reporter. The SAS EM handbook, Getting Started with Enterprise Miner Software: Version 3, describes all the details of each node. SAS EM provides a user-friendly environment that is easy to learn—our team learned how to use the package in less than an hour. SAS EM also provides a wide variety of data mining tools with professionallooking graphics. The package is also flexible; multiple users can work on the same project simultaneously. In addition, the software can support up to 100,000 diagrams per project. Because we have not worked extensively with Oracle Discoverer to view its capabilities directly, our team cannot conclusively recommend a data mining and modeling tool. However, from our use of SAS Enterprise Miner, this package appears capable of performing the functions necessary for the Golden Signature Project. Data Warehousing After conducting preliminary research, we learned that IBM provided Visual Warehouse, Oracle provided Express, SAP provided Business Information Warehouse (BIW), and SAS provided SAS Warehouse as data warehousing tools. We talked to several people experienced with these warehousing packages to gather some opinions about the packages. We received many positive comments about Oracle Express, few comments about SAP BIW and SAS Warehouse, and mostly negative comments about IBM Visual Warehouse. Furthermore, the team attended an SAP presentation at DSC in October, and we concluded that SAP would not support the robust requirements of DSC’s data warehouse. As a result, we quickly limited the software tool search to Oracle Express and SAS Warehouse. Like Oracle Discover, we received a demo version of Oracle Express. However, we were unable to use the tool because of the absence of the server side. Thus, our evaluations were based solely from the on-line tutorials and from third-party opinion. From our online third-party research, we gathered that the Oracle warehousing solution was high-quality and widely respected. It provided OLAP capabilities and multidimensional analysis. Again, the documentation and graphics appeared very professional in quality, but we still had no first-hand experience with the tool. 53 Dominion Semiconductor Information Technology System The members of our team have used SAS Warehouse in the past, so we have extensive first-hand experience with the tool. Although SAS is not known for being user-friendly, we do know how to harness most of its capabilities. In addition, it is a powerful data warehousing tool with an excellent front end. Although it seems to provide the needed functionality for GSA, because we have not had thorough first-hand experience with Oracle Express, our team believes it is improper to recommend one of the warehousing tools over the other. Figure 1. Die Sort Data Model CHARLOT DSPRODUCT : varchar PROCESS : varchar 1 LOT ID : varchar PROGRAM : varchar VERSION : varchar * CATGLOT DSPRODUCT : varchar PROCESS : varchar 1 LOT ID : varchar PROGRAM : varchar VERSION : varchar * RESULTS CHARWFR DSPRODUCT : varchar PROCESS : varchar 1 LOT ID : varchar PROGRAM : varchar VERSION : varchar WAFER ID : varchar * CATGWFR DSPRODUCT : varchar PROCESS : varchar 1 LOT ID : varchar PROGRAM : varchar VERSION : varchar WAFER ID : varchar * CHARCHP DSPRODUCT : varchar PROCESS : varchar LOT ID : varchar PROGRAM : varchar VERSION : varchar WAFER ID : varchar X : num Y : num CATGCHP DSPRODUCT : varchar PROCESS : varchar LOT ID : varchar PROGRAM : varchar VERSION : varchar WAFER ID : varchar X : num Y : num 1 The current manufacturing system at DSC requires that the chips go through a series of tests before any chip is sold to a customer. DSC has decided that the manipulation of test data is most appropriate for supporting the necessary analysis to help improve the yield of chip production. The testing always begins at the lot level and eventually breaks down into the wafer and chip levels. The first set of test data that is collected from the chips is stored in the TEG database. This collection of tables holds electrical in-line test data. The second set of test data belongs to the Die Sort database. This collection of tables contains final-test yield data and is gathered after the TEG tests have been conducted. The yield data is split into categorical test data, which reveals what type of fail/pass category the chip is in, and characteristic test data, which describes any failure trends or regions on a chip, a wafer, or a lot. Finally, the last set of test data resides in the QC database. This collection of tables holds measurement data, or parameters on a test, and process data, or values from a test. Unlike the data from the TEG and Die Sort databases, the QC test data is gathered along the way during both the TEG and Die Sort testing. For client confidentiality, the details about the manufacturing processes will be omitted. The combined data models depicted in Figures 1, 2, and 3, form one universal model that acts as the preliminary model for the data warehouse that will be used as a support tool for faster and more accurate error detection in the chip production process. Because there are more than one thousand fields in the databases, only the important fields are included for discussion. * CATGORY PROGRAM : varchar VERSION : varchar Figure 2. TEG Data Model CSD_PMM ID CSD_DEVNUM : varchar CSD_BGNDAT : date CSD_PMMVER : number CSD_CHPXCD : number 1 CSD_CHPYCD : number CSD_ITEMID : number CSD_CHPDAT : number 1 1 ITEMID_MST ITMNAM : varchar ITEMID : number 1 1 1 LOTSLDB_HED HED_LOTNUM : varchar HED_KNDNAM : varchar HED_STPCOD : varchar 1 HED_DEVNUM : varchar HED_BGNDAT : date HED_ENDDAT : date 1 1 * 1 LOTSLDB_DAT LSD_DEVNUM : varchar LSD_BGNDAT : date LSD_PMMNAM : varchar LSD_ITEMID : number * 1 LSD_LOTAVR : number 1 * 1 * 1 * LOTSLPMM_TBL LSP_DEVNUM : varchar LSP_BGNDAT : date LSP_PMMNAM : varchar LSD_LOTMAX : number LSD_LOTMIN : number LSD_LOTYLD : number WFRSLPMM_TBL WSP_DEVNUM : varchar WSP_BGNDAT : date WSP_PMMNAM : varchar 1 * WSP_PMMVER : number 1 UNIT_MST TEGHED11 : number UNITNAM : varchar1 1 1 ITEMSDB_MST PMMNAM : varchar1 PMMVER : number ITEMID : number TEGHED11 : number CHPSLDB_HED HED_DEVNUM : varchar HED_BGNDAT : date HED_PMMNAM : varchar HED_PMMVER : number HED_CHPXCD : number HED_CHPYCD : number * 1 WFRSLDB_HED HED_WFRIDC : varchar HED_LOTNUM : varchar HED_KNDNAM : varchar HED_STPCOD : varchar HED_DEVNUM : vachar HED_BGNDAT : date HED_ENDDAT : date 1 * WFRSLDB_DAT WSD_DEVNUM : varchar WSD_BGNDAT : date WSD_PMMNAM : varchar WSD_PMMVER : number WSD_ITEMID : number WSD_TEGHED11 : number WSD_WFRAVR : number WSD_WFRMAX : number WSD_WFRMIN : number WSD_WFRMDN : number Figure 3. QC Data Model 1 * DEVMST DEV_DEVNUM : varchar* DEV_DEVCOD : varchar * 1 * 1 1 RALLOT HED_KEYCOD : varchar HED_DEVNUM : varchar HED_MODCOD : varchar1 RALWFR HED_KEYCOD : varchar HED_DEVNUM : varchar HED_MODCOD : varchar1 SQCLOT HED_KEYCOD : varchar HED_DEVNUM : varchar HED_MODCOD : varchar1 RALFLD FLD_TBLNAM : varchar 1 1 DATA TABLES (RAL) RAL_KEYCOD : varchar DATA TABLES (SQC) SQC_KEYCOD : varchar 1 1 * MODMST MOD_DEVCOD : varchar MOD_MODCOD : varchar 54 1 SQCWFR 1 HED_KEYCOD : varchar HED_DEVNUM : varchar HED_MODCOD : varchar SQCFLD (Q_) FLD_TBLNAM : varchar 2000 Systems Engineering Capstone Conference • University of Virginia These data models will serve as a guide to mining relevant data as the Golden Signature Analysis project begins. Different types of data can be gathered to demonstrate the information needed to better detect errors. Using these data models, the team collected data from the databases and created a wafer fail map, displayed in Figure 4, which illustrates the regions of the wafer where the chips failed. A chip is represented on an X and Y coordinate system, with the dotted regions indicating the area where the failed chips have occurred. From interviews with Dominion engineers, we learned that the functional chips are usually in the center of the wafer. Figure 4 corroborates this claim; most of the chips that failed reside on the edges of the wafer. This manual process confirms that Dominion’s current database structure supports Golden Signature Analysis. It enables engineers to traverse through the important fields in each of the three databases to find data that reveals anomalies in the production line that resulted in defects and low yield. problem lies in deciding which data are useful, and how to organize this data in a clear and concise manner. In developing these data models, DSC is one step closer to accomplishing their goals in the Golden Signature Project. The data models form the underlying framework of the data warehouse that will make Golden Signature Analysis possible. In order to perform this type of large-scale analysis, an appropriate software tool is required. Our team confidently recommends the SAS Enterprise Miner as a data mining tool, to be used with SAS Warehouse. As far as the Capstone project is concerned, this team’s efforts will be passed on to next year’s Capstone participants. This project is the beginning of a bigger project that will continue for years to come. Dominion Semiconductor is determined to improve its production processes and feels that information technology will enable it to achieve its goal. REFERENCES Burwen, M. P. 1999. Database Solution. Mountain View: Palo Alto Management Group, Inc. Figure 4. Wafer Fail Map Center For Innovative Technology. 1999. Semiconductor Manufacturing Technology Center. Charlottesville: Center For Innovative Technology. Dataspace. 1999. Data Warehousing - An Executive's Perspective. Ann Arbor: Dataspace. Gupta, V. R. 1997. An Introduction to Data Warehousing. Chicago: System Services Corporation. Hall, C. 1999. Data Warehousing for Business Intelligence. Arlington: Cutter Information Corporation. McGuff, F. 1998. Designing the Perfect Data Warehouse. Northbrook: Telos Solutions Inc. Orr, K. 1997. Data Warehousing Technology. Topeka: The Ken Orr Institute. CONCLUSIONS In our high-tech society today, there is high demand for professionals who are capable of performing effective data management. This information age has brought, along with its high-power and high-efficiency capabilities, the clutter and confusion associated with raw data. Collecting data is easy, but analyzing the data and storing it efficiently is very burdensome. The ORACLE Corporation. 2000. ORACLE Warehouse. ORACLE Corporation. [cited 22 Jan. 2000]. <http://www.oracle.com/datawarehouse/index.html> SAP AG. 2000. SAP Business Information Warehouse. SAP AG. [cited 22 Jan. 2000]. <http://www.sap.com/solutions/bi/bw/index.htm> 55 Dominion Semiconductor Information Technology System SAS Institute. 1999. Getting Started with Enterprise MinerSoftware: Version 3. SAS Institute Inc. 1999. SAS Enterprise Miner. SAS Institute Inc. [cited 25 Oct. 1999]. <http://www.sas.com/software/components/miner.ht ml>. Van Zant, P. 1997. Microchip Fabrication, 3rd ed. New York: McGraw-Hill. Welbrock, P. R. 1998. Strategic Data Warehousing Principles Using SAS Software. Cary, N.C.: SAS Institute Inc. BIOGRAPHIES Peter Bogdanowicz is an undergraduate, fourth year student majoring in Systems Engineering and minoring in computer science originally from New Jersey. He completed three summer internships with Lucent Technologies, Philips Consumer Communications, and Motorola. For this project, he used his Systems Engineering and Computer Science background to conduct an extensive research on the possible software tools and to make an intelligent recommendation at the end of the project. He will begin working full-time in July with Merrill Lynch in New Jersey. Craig Johnson is a fourth-year Systems Engineering major from Woodbridge, Virginia, concentrating in computer information systems. His principal contributions to the project included researching potential design schemas and modeling strategies and assisting in the formulation of the actual data models. Mr. Johnson has accepted a position with PEC Solutions in Fairfax, VA. Joe Vongkitbuncha is a fourth-year Systems Engineering student. He is originally from Thailand, but has spent the last 12 years of his life in Arlington, VA. His contribution to this Capstone is mainly on the data modeling concepts. He worked closely with Craig Johnson on understanding the current system that DSC is using to retrieve data and has documented the process thoroughly. He plans to go back to Thailand for a few weeks after graduation. After that, he plans to start work on June 19th, 2000 at Ernst & Young in Northern Virginia. Kathleen Wong is a fourth-year Systems Engineering major originally from Springfield, VA. She focused on defining the goal of the system by identifying the use cases for the data warehousing system. In addition, because the project will continue next year, she documented the activities and the knowledge gained by 56 this year’s team in order to facilitate a smooth and efficient turnover. This documentation also may be used internally at DSC for training of employees new to the project. After graduation, Kathleen will work for Cisco Systems in Long Island, NY, as an Associate Systems Engineer.