Download Implementing a Taxonomy

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Versant Object Database wikipedia , lookup

Information privacy law wikipedia , lookup

Operational transformation wikipedia , lookup

Clusterpoint wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Data vault modeling wikipedia , lookup

Open data in the United Kingdom wikipedia , lookup

Enterprise content management wikipedia , lookup

Relational model wikipedia , lookup

Business intelligence wikipedia , lookup

Database model wikipedia , lookup

Transcript
Services: White Paper: Implementing a Taxonomy
Implementing a Taxonomy
A Comparison of Database Approaches
Vignette Content Management Blueprint
White Paper
December 2002. v.1.0
Table of
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1
Taxonomy Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1
String-Based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1
Dimension-Based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1
Data Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2
Editorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2
Transactional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2
Schema Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2
Single-Table Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2
Single-Table Example Schemas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2
Multi-Table Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4
Multi-Table Example Schemas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5
Appendix A. Detailed Database Schema . . . . . . . . . . . . . . . . . . . . . . . . . .6
Services: White Paper: Implementing a Taxonomy
Copyright 2002 Vignette Corporation. All rights reserved. U.S. Patent Pending. This document is confidential, and is an unpublished work and trade secret of
Vignette. This document is for internal use only and may not be distributed to third parties. Vignette and the V Logo are trademarks or registered trademarks
of the Vignette Corporation in the United States and other countries. All other company, product, and service names and brands are the trademark or registered
trademarks of their respective owners.
A Comparison of Database Approaches
Vignette Content Management Blueprint
Introduction
The concept of a content classification taxonomy is crucial to the proper implementation and maintenance of
a content management solution. A content taxonomy provides order to a large volume of content, and allows
business users to navigate and manage content more efficiently. This paper addresses two basic approaches
for providing taxonomy support in the content management application, and explains how an implementation
should be performed. The Appendix provides a sample detailed database schema. This paper assumes that
readers have read “Designing an Integrated Content Management Solution: A Taxonomy-Based Approach”
and are familiar with various taxonomy descriptions and terminology.
Taxonomy Approaches
casual”. One advantage of this approach is that it allows
There are basically two types of taxonomies. The first is a
for a linear view of the content categories. It also provides
string-based taxonomy. In a string-based approach,
flexibility in adding new nodes and ensures that new nodes
taxonomy nodes are represented by a string that
can be added without affecting other taxonomy
designates the position of a category in the hierarchy. The
constraints. A string-based approach also provides a
second type is a dimension-based approach that allows for
clearer view of the taxonomy when performing searches.
the categories in the taxonomy to be broken down into
However, there are some drawbacks to the string- based
discrete areas. A dimension-based approach allows more
approach. First, this approach forces the taxonomy to be
flexibility in placing content into various nodes represented
linear by design, and although this helps build in structure,
by the dimension to which they belong. Dimensions are
it limits the robustness of the taxonomy. Second, utilizing a
most commonly utilized in a navigation taxonomy to
string to represent the taxonomy makes it difficult to
present multiple views of the same content. For example,
provide multiple tagging capabilities for specific pieces of
dimensions in a retail taxonomy may be product type
content. Lastly, the string-based approach typically does
(sweaters, shoes, etc.), gender (male, female), or type
not enforce referential integrity and thus makes it more
(casual, evening, special, etc.). Content items can be
difficult to maintain the taxonomy.
classified using nodes in one or all of these dimensions.
Dimension-Based Approach
String-Based Approach
A dimension-based taxonomy approach provides more
In a string-based approach, nodes are formed by a
flexibility in tagging to categories in specific dimensions. It
separate string that defines the node and its position in the
also enforces a more structured approach to the hierarchy
hierarchy. For example, “product > sweaters > women’s >
of the taxonomy. Dimensions also typically build in
referential integrity, which aids in the maintenance of the
taxonomy. Some of the disadvantages of this approach
involve performance and tagging constraints. Performance
Services: White Paper: Implementing a Taxonomy
Implementing a Taxonomy
Implementing a Taxonomy – White Paper. December 2002. v 1.0
may become an issue if items are tagged across several
dimensions and cause SQL joins across multiple tables and
complicated queries. Tagging may also become an issue if
affinity to the various schema approaches:
■ String-based taxonomy with Editorial Data approach
favors MULTI-TABLE
several dimensions are built into the taxonomy. If a content
producer must tag content for multiple dimensions, this
approach increases the content tagging work significantly.
■ Dimension-based taxonomy with Editorial Data approach
favors SINGLE-TABLE
■ String-based taxonomy with Transactional Data approach
Data Approaches
Another key area to consider when deciding how to implement
the database design to support a taxonomy is the approach
used for storing the data that represents the taxonomy. There
are two basic types of data approaches: an editorial approach
to data in which data will be viewed as mere content that gets
published and pushed out to the Web, or a transactional
approach in which data is created by transactions.
Editorial
Editorial data is the easiest of all to deal with in terms of
entering, storing, and displaying because it does not require
extensive overhead to ensure transactional compliance. An
editorial approach typically aligns well with a generic view of
data. Editorial data can use a single-table approach, since a
single table can view all the content as basically having the
same behavior with the only difference being the taxonomy
category to which the content items are tagged.
Transactional
favors MULTI-TABLE
■ Dimension-based taxonomy with Transactional Data
approach favors MULTI-TABLE
Single-Table Approach
The single-table schema approach utilizes a single table to
hold content and typically provides a single table to manage
the taxonomy. There are several advantages to utilizing this
approach. First, it provides an easy and flexible mechanism
for publishing content. Second, it allows greater flexibility for
use of taxonomy dimensions. Lastly, it provides faster time to
deployment. The disadvantages of this approach include
scalability, performance, and documentation issues. Scalability
issues may arise if a certain threshold is reached within the
content table, depending on the specific DBMS used.
Performance issues may also be a concern depending upon
how the data is queried and retrieved from the database.
Documentation can also be an issue if a developer has a need
to understand how to retrieve specific content items.
Transactional data involves a more complex data model due to
Single-Table Example Schemas. The schemas below show
the fact that entities will behave differently and have specific
examples of the single-table approach.
relationships to other data entities. The transactional approach
favors a multi-table view of the data because it creates a need
String-Based Editorial
CONTENT_TYPE
for transactional overhead on specific tables and a need to
PK
create one-to-many views of the data. The data is differentiated
by the entities in which it lies and forms a foundation for
securing and maintaining transactional compliance.
Schema Approaches
Once the taxonomy and data approaches have been
NAVIGATION_TAXONOMY
TYPE_ID
ENABLED
DESCRIPTION
TEMPLATE_PATH
PK NAV_ID
NAV_STRING
CONTENT
NAV_MAP
PK,FK1
PK,FK2
PK
CONTENT_ID
FK1
FK2
TITLE
TEASER
BODY
IMAGE
CONT_ID
TYPE_ID
NAV_ID
CONTENT_ID
selected, it is much easier to decide which schema approach
to take. The following lists the various approaches and their
CONTENT_TAXONOMY
PK
CONT_ID
CONTENT_STRING
2
Implementing a Taxonomy – White Paper. December 2002. v 1.0
This is a very simplistic view of how a string-based taxonomy
Dimension-Based Editorial
can be implemented in the database schema. The Navigation
In this example, dimensions located within the
and Content taxonomy are kept separate. A content item can
NAVIGATION_TAXONOMY and CONTENT_TAXONOMY
be assigned to multiple navigation taxonomy strings whereas
tables represent the taxonomy. The dimensions are represented
it can only be associated to one content taxonomy string. This
by the recursive relationship that allows a node to be a parent
approach ensures that content items are only associated to
of nodes below it and to have parent nodes above it. This
one node within the content taxonomy, which creates a more
approach allows for a more structured view of the taxonomy
structured view of the taxonomy for managing the content.
and the various dimensions that make up the taxonomy.
CONTENT_TYPE
PK TYPE_ID
NAVIGATION_TAXONOMY
PK NAV_ID
ENABLED
DESCRIPTION
TEMPLATE_PATH
PARENT_ID
NODE
CONTENT
NAV_MAP
PK
PK,FK1 NAV_ID
PK,FK2 CONTENT_ID
CONTENT_ID
TITLE
TEASER
BODY
IMAGE
FK1 CONT_ID
FK2 TYPE_ID
CONTENT_TAXONOMY
PK CONT_ID
PARENT_ID
NODE
3
Implementing a Taxonomy – White Paper. December 2002. v 1.0
Multi-Table Approach
approach because it allows them to maintain better control
Using a multi-table is the classical data model approach and
over performance and maintenance of the data model.
is one that will maintain transactional compliance for data
Multi-Table Example Schemas. Below are some examples of
exchange and storage. This approach is generally taken
how the multi-table schema might be approached. Refer to
when a client wishes to utilize the database beyond mere
the schema in Appendix A for further details.
content management. Most DBAs will also prefer this
String – Multi-Table
COMPANY_INFO
PK
COMPANY_ID
NAME
DESCRIPTION
FK1 CONTENT_TAXID
CONTENT_TAXONOMY
PK CONTENT_TAXID
TAX_STRING
PK
PRODUCT
NAV_TAX_MAP
PRODUCT_ID
PK,FK1 PRODUCT_ID
PK,FK2 NAV_ID
TITLE
TEASER
BODY
IMAGE
PRICE
FK1 CONTENT_TAXID
FK3
FK4
SERVICE_ID
COMPANY_ID
SERVICE
PK
SERVICE_ID
TITLE
TEASER
BODY
PRICE
FK1 CONTENT_TAXID
NAVIGATION_TAXONOMY
ORDER
PK
PK NAV_ID
NAV_STRING
ORDER_ID
DATE_ORDERED
STATUS
DATE_SHIPPED
FK1 COMPANY_ID
ORDER_DETAIL
FK1 PRODUCT_ID
FK2 ORDER_ID
4
Implementing a Taxonomy – White Paper. December 2002. v 1.0
In this data model, products, services and companies are all
provide organization for content around these core
comprised of separate entities. This separation is done for
components of the schema. The navigation taxonomy is used
transactional purposes to maintain integrity on the
to specify the contents of specific views of product and
information that is being stored, but there is content in these
services data as it applies to company info.
tables which should also be viewed on the Web Site. In order
to display the content on the Web Site, the tables should be
Conclusion
associated with the navigation taxonomy nodes, and with the
This paper has presented several approaches to
content taxonomy nodes for backend content management
implementing a taxonomy through database design, and
tagging. This model uses a string-based approach in order to
addressed the pros and cons of the approaches and
place new products into distinctive nodes without relying on a
provided examples for database schema models. The
structured taxonomy.
Appendix has a more detailed implementation database
The example below shows how a transactional schema might
look for a dimension-based approach. Product, Service and
schema that provides a greater level of detail and real-world
requirements.
Company Info all relate to the Content Taxonomy and
Dimension – Multi-Table
COMPANY_INFO
PK COMPANY_ID
NAME
DESCRIPTION
CONTENT_TAXONOMY
PK CONT_ID
PARENT_ID
NODE
PRODUCT
NAV_TAX_MAP
SERVICE
PK PRODUCT_ID
PK,FK1 PRODUCT_ID
PK SERVICE_ID
TITLE
TEASER
BODY
IMAGE
PRICE
FK2
FK3
ORDER
PK
ORDER_ID
SERVICE_ID
COMPANY_ID
TITLE
TEASER
BODY
PRICE
NAVIGATION_TAXONOMY
PK NAV_ID
DATE_ORDERED
STATUS
DATE_SHIPPED
FK1 COMPANY_ID
PARENT_ID
NODE
ORDER_DETAIL
FK1 PRODUCT_ID
FK2 ORDER_ID
5
Implementing a Taxonomy – White Paper. December 2002. v 1.0
Appendix A. Detailed Database Schema
Dimension-Based Editorial
Channel_Item_Map
Channel
channel_id
sequence
parent_channel_id (FK) (IE)
item_name
workflow_status
workflow_oid
channel_item_id
publish_date (IE)
expire_date (IE)
sequence
item_id (FK) (IE)
channel_id (FK) (IE)
relationship_modifier (FK) (IE)
priority (FK) (IE)
item_name
workflow_status
workflow_oid
Item_Attribute_Usage
item_type_id (FK)
attribute_id (FK) (IE)
attribute_usage_modifier (FK) (IE)
allow_multiple
sequence
Item_Type
item_type_id
item_type_name
item_type_description
Item
Channel_Display
Attribute_Type
channel_title (IE)
channel_description
channel_id (FK) (IE)
channel_template_id (FK) (IE)
display_modifier (FK) (IE)
attribute_id
attribute_name
attribute_description
attribute_data_type
uses_value_short
item_id
audit_user
audit_date
item_status (IE)
item_type_id (FK) (IE)
item_modifier (FK) (IE)
content_provider_id (FK) (IE)
Item_Item_Map
parent_item_id (FK)
child_item_id (FK) (IE)
relationship_type (FK) (IE)
sequence
Attribute_Value
item_id (FK)
value_long
attribute_id (FK) (IE)
sequence
value_short
Channel_Template
channel_template_id
template_path (IE)
template_name
Provider_Channel_Map
channel_id (FK) (IE)
provider_channel_modifier (FK) (IE)
content_provider_id (FK) (IE)
Lookup
Content_Provider
lookup_id
type_description
content_provider_id
provider_name
editable
contact
address
phone_number
language_id (FK) (IE)
encoding_id (FK)
comments
deleted
session_id
expire_date
Sub_Lookup
vgn_ur
sub_lookup_id
subtype_description
lookup_id (FK) (IE)
id
country_id (FK)
language_id
login
name
password
email
passwordcreated
canchangepassword
ch_lname
ch_fname
en_fname
en_lname
address1
address2
address3
city
province_id
postal_code
phone
mobile
pager
birthday
gender
email2
income
education
fax
maritial_status
occupation
Workflow_Audit
table_name
table_column
table_id
audit_user
audit_date
action (FK) (IE)
User_Profile
user_id
ticker_symbol
sequence
Country
country_id
country_name
Label_Display
Stock_Data
Province
User_Weather
province_id
country_id
province_name
city_id
user_id
ticker_symbol
exchange
company_name
current_price
currency_type
volume
bid_price
ask_price
net_change
percent_change
yield
open_price
close_price
close_date
high_price
low_price
dividend
year_high
year_low
earnings
pe_ratio
trade_time
trade_date
ric_code
delayed
label_text
label_id (FK) (IE)
label_display_modifier (FK) (IE)
Label
label_id
label_description
label_key
6
SVCWP_IMPL_TAX_1202
Implementing a Taxonomy – White Paper. December 2002. v 1.0
7
Vignette Corporate Headquarters
Vignette Latin America
Vignette Europe / Middle-East / Africa
Vignette Asia-Pacific
1601 South MoPac Expressway
305.789.6603 Tel
44.1628.77.2000 Tel
1.800.800.848 Tel
Austin, TX 78746-5776
305.789.6612 Fax
44.1628.77.2266 Fax
61.2.9455.5200 Fax
512.741.4300 Tel
[email protected]
[email protected]
[email protected]
512.741.4500 Fax
888.608.9900 Toll-Free
Email info @ vignette.com
Publication date: June 2002. Vignette does not warrant, guarantee, or make representations concerning the contents of this document. All information is provided “AS-IS,” without express or implied warranties of any
kind. Vignette reserves the right to change the contents of this document and the features or functionalities of its products at any time without obligation to notify anyone of such changes.
Copyright 1997-2001 Vignette Corporation. All rights reserved.
Vignette, the V Logo, www.vignette.com, StoryServer, netCustomer, and Centerstage are trademarks or registered trademarks of Vignette Corporation in the United States and foreign countries. VGM, VPS, and Vignette
Village are servicemarks of Vignette Corporation in the United States and foreign countries. All other brands, products and company names mentioned are the trademarks of their respective owners.