Download 2000-12 - Systems and Information Engineering

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
2000 Systems Engineering Capstone Conference • University of Virginia
DOMINION SEMICONDUCTOR INFORMATION TECHNOLOGY SYSTEM
Student team: Peter Bogdanowicz, Craig Johnson, Joe Vongkitbuncha, Kathleen Wong
Faculty Advisors: Christina Mastrangelo and K. Preston White
Department of Systems Engineering
Client Advisors: Frank Anastasio, Rick Newcomer, and Kevin Ryan
Dominion Semiconductor Corporation
9600 Godwin Drive
Manassas, VA 20110
E-mail: [email protected]
KEYWORDS: Semiconductor manufacturing,
software evaluation, data warehousing, data modeling,
data mining.
GSA. The results of this project should provide
valuable documentation on the data models and
software tools for both DSC and future Capstone teams.
ABSTRACT
INTRODUCTION
Dominion Semiconductor (DSC) and the University
of Virginia Institute for Microelectronics (UVIM) have
united to build the Semiconductor Manufacturing
Information Technology Center (SMITC). The Center
will serve as a means for industry and academic
institutions to work cooperatively to improve on the
process of semiconductor manufacturing. This
Capstone project served as the Center’s first
undertaking: developing the means for Golden
Signature Analysis (GSA). If ideal conditions and
machine settings for producing high yield wafers can be
determined, then DSC can establish a “Golden
Signature” for memory chip production that can be used
to increase the yield on all wafers.
This level of analysis can be achieved through the
implementation of a data warehouse. One of the most
important pieces in data warehouse implementation is
designing the underlying data model, which is the
primary focus of this project. The data model designs
were based on research conducted in modeling
techniques, feedback from DSC engineers, and
thorough studies of DSC’s current database structure.
In order to extract the data organized by the data
models, it is necessary to have the appropriate data
mining software tools. An extensive software trade
study was conducted to determine the best data
modeling, data warehousing, and data mining tools for
DSC to employ. This paper describes the design of the
data models that encapsulate the DSC database
structure, as well as an overview of the software
evaluations.
The data models and software evaluations provide
DSC with the first step in the process towards achieving
The Golden Signature Analysis project for
Dominion Semiconductor Corporation (DSC) is the
inaugural project for the Semiconductor Manufacturing
Information Technology Center (SMITC). SMITC was
formed in 1999 to facilitate interaction between the
industry and academic community in the state of
Virginia. The University of Virginia and DSC are the
founding members of the Center. The mission of this
venture is to apply information technology to the
semiconductor industry and improve manufacturers’
efficiency and productivity.
The Golden Signature Analysis project will span
multiple years. This year’s project focuses on
examining the data collected on wafers with a high
yield to determine the values of the parameters and data
measurements that produced this desired result. The
process engineers can then compare these with the
measurements of less successful wafers to determine
what adjustments must be made to the production line.
In addition, by examining the preferred parameters and
data measurements, the engineers can see how
variations affect the yield.
With a data warehouse, the engineers will have a
method of bridging multiple databases that each
contains critical data. They can apply data mining tools
to locate trends and “red flags” in the data, which will
result in increased efficiency. Because the
semiconductor manufacturing industry moves very
quickly, it is imperative that DSC identifies and
corrects errors as quickly as possible in order to remain
competitive.
51
Dominion Semiconductor Information Technology System
BACKGROUND INFORMATION
Dominion Semiconductor Corporation
Dominion Semiconductor Corporation of Manassas,
VA, produces DRAM memory chips. Due to the
proprietary nature of the data it collects, the data we
used in our analysis were taken from the MARTE
recipe, a chip design that Dominion discontinued in
mid-1999.
There are five hundred chips on one wafer, and each
chip is assigned a unique location in terms of the x and
y coordinates on the wafer. On the production line,
groups of twenty-five wafers, or lots, are processed
together. The data the company collects are specified at
the chip, wafer, or lot level. The chips undergo
approximately 500 process steps in manufacturing. The
physical manufacturing of the chips occurs in the wafer
fabrication stage, which contains four operations:
layering, patterning, doping, and heat treatments.
Yield
Yield, the ratio of functional chips to the total
number of chips on a wafer, is the primary measure of
the efficiency of the production process. By
maximizing the yield, DSC will be able to sell more
chips, which increases the company’s profits.
Dominion collects data on five different types of
yields. The most important measurements include
direct yield, which tests the electrical properties of the
chip; functional yield, which examines the functionality
of the chip under optimal conditions; and margin yield,
which measures how effectively the chips perform
under sub-optimal conditions.
Data Warehousing
Data warehousing, data mining, and data modeling
were three important concepts in this project. A data
warehouse is a repository of data used for decision
support. The objective of data warehousing is to create
a clearing-house whose purpose is to gather and
organize critical business data. Implementing a data
warehouse at DSC would greatly improve the
company’s analysis and decision-making process.
Data mining is the process of using a data
warehouse to discover previously unknown
relationships in the data. Currently, there are several
available data mining tools that use automated
algorithms to uncover these relationships.
A data model is a description of a database’s (or
data warehouse’s) organization, often created as entity
relationship diagrams or dimensional models. The
52
initial milestone for the Golden Signature Analysis
project requires the team to create a data model of
DSC’s pertinent data.
SOFTWARE EVALUATION
We considered several software tools for the data
modeling, data mining, and data warehousing
processes. We qualitatively evaluated these tools based
on the following criteria: ease of use, maintenance,
robustness, Internet platform, professional appearance,
and cost. For each process tool, we focused on the two
tools that best met the criteria. In order to confirm the
two tools for each process, we held meetings at DSC to
explain our choices.
Data Modeling
For the data modeling phase, we considered
Rational Rose and Oracle Designer as possible software
tools. The team had experience using Rational Rose; as
a result, we felt that we knew most of the intricacies of
the tool. In addition, we felt that we could produce a
data model in a relatively short period of time using
Rational Rose. Although we developed the preliminary
model using this modeling tool, we recognized that
Rational Rose was not the most powerful data modeling
tool on the market.
Thus, we decided to look at Oracle’s modeling
solution. Oracle is widely praised for its databaserelated products, and we felt that experimenting with
the demo versions of its products would provide us with
a good understanding of the product. For this reason,
we ordered the Oracle Tools Demo CD Pack. After
successfully installing the Oracle suite, we translated
the Rational Rose model into Oracle Designer. The
team decided that the Oracle Designer model was more
accurate and professional than the preliminary model.
Therefore, we recommended Oracle Designer as the
superior modeling tool.
Data Mining
For the data mining phase, we considered SAS
Enterprise Miner (SAS EM) and Oracle Discoverer.
Because we had no prior experience with any of these
tools, we decided to compare the demo versions for
both of them.
Oracle Discoverer is an award-winning, ad hoc
query and analysis tool. In addition, it integrates very
well with other Oracle products. However, the team
quickly realized that Oracle Discoverer needed both a
client and a server side. Our computers, with the
installed Oracle Discoverer software, comprised the
2000 Systems Engineering Capstone Conference • University of Virginia
client side; we still needed a populated Oracle database
that would represent the server side. Unfortunately, we
could not successfully gain access to any Oracle
database from DSC’s SMITC lab. Thus, our evaluation
of the tool could only stem from analyzing the
Discoverer on-line documentation and tutorials. We
looked at this documentation in great detail, and we
found that Oracle Discoverer, at the very least, was a
well-documented and, easily maintainable tool. The
documentation also displayed some Discoverer
graphics that appeared very professional in quality. In
the end, however, we had no first-hand experience with
the package, and, as a result, we could not defend a
recommendation for this product.
SAS EM yielded installation problems, but we
overcame these problems in a short period of time. We
found that the SAS EM workspace is a drag-and-drop
graphical user interface (GUI), and it is Windows
oriented—meaning that the user can use the mouse to
drag a particular data mining node (from the Tools
Palette) down to the workspace. To establish links
between two or more nodes, the user can simply click
on a tool and drag the mouse cursor to the other tools.
Thus, the interface is very user-friendly.
SAS EM follows the SEMMA methodology for data
mining:
(1) Sample the data by creating one or more data
tables. The samples should be large enough to
contain the significant information, yet small
enough to process.
(2) Explore the data by searching for anticipated
relationships, unanticipated trends, and
anomalies in order to gain understanding and
ideas.
(3) Modify the data by creating, selecting, and
transforming the variables to focus the model
selection process.
(4) Model the data by using the analytical tools to
search for a combination of the data that
reliably predicts a desired outcome.
(5) Assess the data by evaluating the usefulness
and reliability of the findings from the data
mining process.
SAS EM contains the following data mining nodes:
(1) Sampling nodes include Input Data Source,
Sampling, and Data Partition.
(2) Explore nodes include Distribution Explorer,
Multiplot, Insight, Association, and Variable
Selection.
(3) Modifying nodes include Data Set Attributes,
Transform Variables, Filter Outliers,
Replacement, SOM/Kohonen, and Clustering.
(4) Modeling nodes include Regression, Tree,
Neural Network, User Defined Model, and
Ensemble.
(5) Assessing nodes include Assessment, Score,
and Reporter.
The SAS EM handbook, Getting Started with
Enterprise Miner Software: Version 3, describes all the
details of each node.
SAS EM provides a user-friendly environment that
is easy to learn—our team learned how to use the
package in less than an hour. SAS EM also provides a
wide variety of data mining tools with professionallooking graphics. The package is also flexible; multiple
users can work on the same project simultaneously. In
addition, the software can support up to 100,000
diagrams per project.
Because we have not worked extensively with
Oracle Discoverer to view its capabilities directly, our
team cannot conclusively recommend a data mining
and modeling tool. However, from our use of SAS
Enterprise Miner, this package appears capable of
performing the functions necessary for the Golden
Signature Project.
Data Warehousing
After conducting preliminary research, we learned
that IBM provided Visual Warehouse, Oracle provided
Express, SAP provided Business Information
Warehouse (BIW), and SAS provided SAS Warehouse
as data warehousing tools. We talked to several people
experienced with these warehousing packages to gather
some opinions about the packages. We received many
positive comments about Oracle Express, few
comments about SAP BIW and SAS Warehouse, and
mostly negative comments about IBM Visual
Warehouse. Furthermore, the team attended an SAP
presentation at DSC in October, and we concluded that
SAP would not support the robust requirements of
DSC’s data warehouse. As a result, we quickly limited
the software tool search to Oracle Express and SAS
Warehouse.
Like Oracle Discover, we received a demo version
of Oracle Express. However, we were unable to use the
tool because of the absence of the server side. Thus,
our evaluations were based solely from the on-line
tutorials and from third-party opinion. From our online
third-party research, we gathered that the Oracle
warehousing solution was high-quality and widely
respected. It provided OLAP capabilities and multidimensional analysis. Again, the documentation and
graphics appeared very professional in quality, but we
still had no first-hand experience with the tool.
53
Dominion Semiconductor Information Technology System
The members of our team have used SAS
Warehouse in the past, so we have extensive first-hand
experience with the tool. Although SAS is not known
for being user-friendly, we do know how to harness
most of its capabilities. In addition, it is a powerful
data warehousing tool with an excellent front end.
Although it seems to provide the needed functionality
for GSA, because we have not had thorough first-hand
experience with Oracle Express, our team believes it is
improper to recommend one of the warehousing tools
over the other.
Figure 1. Die Sort Data Model
CHARLOT
DSPRODUCT : varchar
PROCESS : varchar 1
LOT ID : varchar
PROGRAM : varchar
VERSION : varchar
*
CATGLOT
DSPRODUCT : varchar
PROCESS : varchar 1
LOT ID : varchar
PROGRAM : varchar
VERSION : varchar
*
RESULTS
CHARWFR
DSPRODUCT : varchar
PROCESS : varchar 1
LOT ID : varchar
PROGRAM : varchar
VERSION : varchar
WAFER ID : varchar
*
CATGWFR
DSPRODUCT : varchar
PROCESS : varchar
1
LOT ID : varchar
PROGRAM : varchar
VERSION : varchar
WAFER ID : varchar
*
CHARCHP
DSPRODUCT : varchar
PROCESS : varchar
LOT ID : varchar
PROGRAM : varchar
VERSION : varchar
WAFER ID : varchar
X : num
Y : num
CATGCHP
DSPRODUCT : varchar
PROCESS : varchar
LOT ID : varchar
PROGRAM : varchar
VERSION : varchar
WAFER ID : varchar
X : num
Y : num
1
The current manufacturing system at DSC requires
that the chips go through a series of tests before any
chip is sold to a customer. DSC has decided that the
manipulation of test data is most appropriate for
supporting the necessary analysis to help improve the
yield of chip production.
The testing always begins at the lot level and
eventually breaks down into the wafer and chip levels.
The first set of test data that is collected from the chips
is stored in the TEG database. This collection of tables
holds electrical in-line test data. The second set of test
data belongs to the Die Sort database. This collection
of tables contains final-test yield data and is gathered
after the TEG tests have been conducted. The yield
data is split into categorical test data, which reveals
what type of fail/pass category the chip is in, and
characteristic test data, which describes any failure
trends or regions on a chip, a wafer, or a lot. Finally,
the last set of test data resides in the QC database. This
collection of tables holds measurement data, or
parameters on a test, and process data, or values from a
test. Unlike the data from the TEG and Die Sort
databases, the QC test data is gathered along the way
during both the TEG and Die Sort testing.
For client confidentiality, the details about the
manufacturing processes will be omitted. The
combined data models depicted in Figures 1, 2, and 3,
form one universal model that acts as the preliminary
model for the data warehouse that will be used as a
support tool for faster and more accurate error detection
in the chip production process. Because there are more
than one thousand fields in the databases, only the
important fields are included for discussion.
*
CATGORY
PROGRAM : varchar
VERSION : varchar
Figure 2. TEG Data Model
CSD_PMM ID
CSD_DEVNUM : varchar
CSD_BGNDAT : date
CSD_PMMVER : number
CSD_CHPXCD : number
1
CSD_CHPYCD : number
CSD_ITEMID : number
CSD_CHPDAT
:
number
1
1
ITEMID_MST
ITMNAM : varchar
ITEMID : number
1
1
1
LOTSLDB_HED
HED_LOTNUM : varchar
HED_KNDNAM : varchar
HED_STPCOD : varchar
1
HED_DEVNUM : varchar
HED_BGNDAT : date
HED_ENDDAT
:
date
1
1
*
1
LOTSLDB_DAT
LSD_DEVNUM : varchar
LSD_BGNDAT : date
LSD_PMMNAM : varchar
LSD_ITEMID : number
* 1
LSD_LOTAVR : number
1
*
1
*
1
*
LOTSLPMM_TBL
LSP_DEVNUM : varchar
LSP_BGNDAT : date
LSP_PMMNAM : varchar
LSD_LOTMAX : number
LSD_LOTMIN : number
LSD_LOTYLD : number
WFRSLPMM_TBL
WSP_DEVNUM : varchar
WSP_BGNDAT : date
WSP_PMMNAM : varchar
1 *
WSP_PMMVER : number
1
UNIT_MST
TEGHED11 : number
UNITNAM : varchar1
1
1
ITEMSDB_MST
PMMNAM : varchar1
PMMVER : number
ITEMID : number
TEGHED11 : number
CHPSLDB_HED
HED_DEVNUM : varchar
HED_BGNDAT : date
HED_PMMNAM : varchar
HED_PMMVER : number
HED_CHPXCD : number
HED_CHPYCD : number
*
1
WFRSLDB_HED
HED_WFRIDC : varchar
HED_LOTNUM : varchar
HED_KNDNAM : varchar
HED_STPCOD : varchar
HED_DEVNUM : vachar
HED_BGNDAT : date
HED_ENDDAT : date
1
*
WFRSLDB_DAT
WSD_DEVNUM : varchar
WSD_BGNDAT : date
WSD_PMMNAM : varchar
WSD_PMMVER : number
WSD_ITEMID : number
WSD_TEGHED11 :
number
WSD_WFRAVR : number
WSD_WFRMAX : number
WSD_WFRMIN : number
WSD_WFRMDN : number
Figure 3. QC Data Model
1
*
DEVMST
DEV_DEVNUM : varchar*
DEV_DEVCOD : varchar
*
1
*
1
1
RALLOT
HED_KEYCOD : varchar
HED_DEVNUM : varchar
HED_MODCOD : varchar1
RALWFR
HED_KEYCOD : varchar
HED_DEVNUM : varchar
HED_MODCOD : varchar1
SQCLOT
HED_KEYCOD : varchar
HED_DEVNUM : varchar
HED_MODCOD : varchar1
RALFLD
FLD_TBLNAM : varchar
1
1
DATA TABLES (RAL)
RAL_KEYCOD : varchar
DATA TABLES (SQC)
SQC_KEYCOD : varchar
1
1
*
MODMST
MOD_DEVCOD : varchar
MOD_MODCOD : varchar
54
1
SQCWFR
1
HED_KEYCOD : varchar
HED_DEVNUM : varchar
HED_MODCOD : varchar
SQCFLD (Q_)
FLD_TBLNAM : varchar
2000 Systems Engineering Capstone Conference • University of Virginia
These data models will serve as a guide to mining
relevant data as the Golden Signature Analysis project
begins. Different types of data can be gathered to
demonstrate the information needed to better detect
errors. Using these data models, the team collected
data from the databases and created a wafer fail map,
displayed in Figure 4, which illustrates the regions of
the wafer where the chips failed. A chip is represented
on an X and Y coordinate system, with the dotted
regions indicating the area where the failed chips have
occurred. From interviews with Dominion engineers,
we learned that the functional chips are usually in the
center of the wafer. Figure 4 corroborates this claim;
most of the chips that failed reside on the edges of the
wafer.
This manual process confirms that Dominion’s
current database structure supports Golden Signature
Analysis. It enables engineers to traverse through the
important fields in each of the three databases to find
data that reveals anomalies in the production line that
resulted in defects and low yield.
problem lies in deciding which data are useful, and how
to organize this data in a clear and concise manner.
In developing these data models, DSC is one step
closer to accomplishing their goals in the Golden
Signature Project. The data models form the
underlying framework of the data warehouse that will
make Golden Signature Analysis possible. In order to
perform this type of large-scale analysis, an appropriate
software tool is required. Our team confidently
recommends the SAS Enterprise Miner as a data mining
tool, to be used with SAS Warehouse.
As far as the Capstone project is concerned, this team’s
efforts will be passed on to next year’s Capstone
participants. This project is the beginning of a bigger
project that will continue for years to come. Dominion
Semiconductor is determined to improve its production
processes and feels that information technology will
enable it to achieve its goal.
REFERENCES
Burwen, M. P. 1999. Database Solution. Mountain
View: Palo Alto Management Group, Inc.
Figure 4. Wafer Fail Map
Center For Innovative Technology. 1999.
Semiconductor Manufacturing Technology Center.
Charlottesville: Center For Innovative Technology.
Dataspace. 1999. Data Warehousing - An Executive's
Perspective. Ann Arbor: Dataspace.
Gupta, V. R. 1997. An Introduction to Data
Warehousing. Chicago: System Services
Corporation.
Hall, C. 1999. Data Warehousing for Business
Intelligence. Arlington: Cutter Information
Corporation.
McGuff, F. 1998. Designing the Perfect Data
Warehouse. Northbrook: Telos Solutions Inc.
Orr, K. 1997. Data Warehousing Technology.
Topeka: The Ken Orr Institute.
CONCLUSIONS
In our high-tech society today, there is high demand
for professionals who are capable of performing
effective data management. This information age has
brought, along with its high-power and high-efficiency
capabilities, the clutter and confusion associated with
raw data. Collecting data is easy, but analyzing the data
and storing it efficiently is very burdensome. The
ORACLE Corporation. 2000. ORACLE Warehouse.
ORACLE Corporation. [cited 22 Jan. 2000].
<http://www.oracle.com/datawarehouse/index.html>
SAP AG. 2000. SAP Business Information
Warehouse. SAP AG. [cited 22 Jan. 2000].
<http://www.sap.com/solutions/bi/bw/index.htm>
55
Dominion Semiconductor Information Technology System
SAS Institute. 1999. Getting Started with Enterprise
MinerSoftware: Version 3.
SAS Institute Inc. 1999. SAS Enterprise Miner. SAS
Institute Inc. [cited 25 Oct. 1999].
<http://www.sas.com/software/components/miner.ht
ml>.
Van Zant, P. 1997. Microchip Fabrication, 3rd ed.
New York: McGraw-Hill.
Welbrock, P. R. 1998. Strategic Data Warehousing
Principles Using SAS Software. Cary, N.C.: SAS
Institute Inc.
BIOGRAPHIES
Peter Bogdanowicz is an undergraduate, fourth year
student majoring in Systems Engineering and minoring
in computer science originally from New Jersey. He
completed three summer internships with Lucent
Technologies, Philips Consumer Communications, and
Motorola. For this project, he used his Systems
Engineering and Computer Science background to
conduct an extensive research on the possible software
tools and to make an intelligent recommendation at the
end of the project. He will begin working full-time in
July with Merrill Lynch in New Jersey.
Craig Johnson is a fourth-year Systems Engineering
major from Woodbridge, Virginia, concentrating in
computer information systems. His principal
contributions to the project included researching
potential design schemas and modeling strategies and
assisting in the formulation of the actual data models.
Mr. Johnson has accepted a position with PEC
Solutions in Fairfax, VA.
Joe Vongkitbuncha is a fourth-year Systems
Engineering student. He is originally from Thailand,
but has spent the last 12 years of his life in Arlington,
VA. His contribution to this Capstone is mainly on the
data modeling concepts. He worked closely with Craig
Johnson on understanding the current system that DSC
is using to retrieve data and has documented the process
thoroughly. He plans to go back to Thailand for a few
weeks after graduation. After that, he plans to start
work on June 19th, 2000 at Ernst & Young in Northern
Virginia.
Kathleen Wong is a fourth-year Systems Engineering
major originally from Springfield, VA. She focused on
defining the goal of the system by identifying the use
cases for the data warehousing system. In addition,
because the project will continue next year, she
documented the activities and the knowledge gained by
56
this year’s team in order to facilitate a smooth and
efficient turnover. This documentation also may be
used internally at DSC for training of employees new to
the project. After graduation, Kathleen will work for
Cisco Systems in Long Island, NY, as an Associate
Systems Engineer.