Download XML Validation II Schemas

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Predictive engineering analytics wikipedia , lookup

Transcript
Document
Engineering
Robin Burke
ECT 360
Outline
Admin
 Quiz + Answers
 Document Engineering
 In-Class Exercise

Admin

Project Milestone #2
identify domain for project
 supposed to be due last week

• no submission link


will be due today
Project Milestone #3
document analysis
 due 10/10

Quiz

30 minutes
Document Engineering



Glushko and McGrath coined this term
 "a new discipline for specifying, designing and
implementing the documents that serve as the
interfaces to business processes."
Topic much larger than XML
 XML provides a mechanism for the results of such an
engineering activity
Central insight
 The concept of "document" is very stable and central
to many business processes
 XML technologies allow systems to consume and
produce documents
Tasks in Document
Engineering
1.
Analyzing the Context

2.
Analyzing/Design Business Processes

3.

collecting documents and analyzing their contents
extracting components
Assembling Components & Models

6.
examining the boundaries of processes and seeing what
documents go in and out
Analyzing Documents & Components

5.
express processes at the level where we can identify documents
as input and output
Analyzing/Design Business Transactions

4.
what is the problem to be solved
put components together into data models and document models
Implementation


writing XML schemas
writing code that accepts, manipulates and outputs XML
In this class

Mostly interested in #6
defining XML languages
 writing code


But
languages must come from
somewhere
 process and content analysis to derive
requirements

For your project

You will select a domain


in which you can find existing documents
assume that the first three steps are
complete
• you know that these are the important documents
to represent

You will try to figure out what about these
documents needs to be represented

document analysis
Document Analysis
1.
2.
3.
4.
5.
6.
Collect representative documents
Examine documents
Identify information-bearing
components
Identify their role in the relevant
business process
Name them
Type them
Components

Any piece of information that
has a unique label or identifier is a
candidate component
 is self-contained and comprehensible
on its own is a candidate component


A component is a logical unit, with no
presentation implied

may be organized structurally
Components

Just because information is presented as a unit
 doesn't mean it is one component
 Example
• "Robert J. Glushko and Tim McGrath"

Just because information is not presented together
 doesn't mean the components should be separate
 Example
• Depaul University
• School of CTI
• 243 S. Wabash Ave.
Hints for Components

Spatial features of documents





whitespace
rules
boxes
layout patterns
Typography

font sizes and styles
• not always

Proximity


figures and captions
Structure


be careful!
document may not have the right structure
• better to pull out internal information components
• see if the structure emerges from the analysis
What to record

Tentative name


must be tentative; names change
frequently
Type of data
Example

http://www.nytimes.com/2005/10/03/th
eater/newsandfeatures/03wilson.html
?pagewanted=1&th&emc=th
Example 2

http://www.internest.com/brittanyhome
s/brittanyhomes4340.asp?Print=on
Example 3

http://www.irs.gov/pub/irspdf/f1040ez.pdf
Component Harvest

For each document


extract components
Do so independently of other
documents

lets you identify differences in
representation and contents
Harvesting
http://www.internest.com/brittanyhome
s/brittanyhomes4340.asp?Print=on
 http://www.internest.com/rivercity/river
city12226.asp?Print=on

Component Consolidation

Examine different sets of harvested
components
Look for similarities and differences
 Try to resolve differences

• Renaming
• Structural reorganization

Develop detailed type information

value standardization
Standardizing Values



Assists in writing schema/DTD
Assists in document processing
BUT


value space not likely to remain constant
too many choices doesn't help
• 180 countries but do you do business with all of
them?

too few choices is also a problem
• if the distinctions important to your process can't
be captured
Naming

Names are critical


Names are the most dynamic part of the
analysis



they communicate what each part of the
document is for
expect them to change several times
useful to have a dictionary nearby
In consolidation


we need to merge synonyms
come up with new names for homonyms
• usually best to rename all homonyms
Example

title (in a lecture series)


Is it the title of the talk? The job title of
the speaker? The name of the lecture
series?
longer names needed to specify
Talk Title
 Series Title
 Job Title

Exercise
Each group to get liner notes
documents
 Produce harvest tables
 Produce consolidated table
 Switch documents


see how they fit
Next week
Schemas
 Next project milestone
