Download Materialized View Creation and Transformation of Schemas in

Document related concepts

Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Registry of World Record Size Shells wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Ingres (database) wikipedia , lookup

Relational algebra wikipedia , lookup

Serializability wikipedia , lookup

Concurrency control wikipedia , lookup

Relational model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Database model wikipedia , lookup

Transcript
Jørgen Løland
Materialized View Creation
and Transformation of
Schemas in Highly Available
Database Systems
Thesis for the degree philosophiae doctor
Trondheim, October 2007
Norwegian University of Science and Technology
Faculty of Information Technology,
Mathematics and Electrical Engineering
Department of Computer and Information Science
NTNU
Norwegian University of Science and Technology
Thesis for the degree philosophiae doctor
Faculty of Information Technology, Mathematics and Electrical Engineering
Department of Computer and Information Science
© Jørgen Løland
ISBN 978-82-471-4381-0 (printed version)
ISBN 978-82-471-4395-7 (electronic version)
ISSN 1503-8181
Doctoral theses at NTNU, 2007:199
Printed by NTNU-trykk
To Ingvild and Ottar.
Preface
This thesis is submitted to the Norwegian University of Science and Technology in partial fulfillment of the degree PhD. The work has been carried out
at the Database System Group, Department of Computer and Information
Science (IDI). The study was funded by the Faculty of Information Technology, Mathematics and Electrical Engineering through the “forskerskolen”
program.
Acknowledgements
First, I would like to thank my advisor Professor Svein-Olaf Hvasshovd for
his guidance and ideas, and for providing valuable comments to drafts of
the thesis and papers. I would also like to thank my co-advisors Dr. Ing.
Øystein Torbjørnsen and Professor Svein Erik Bratsberg for constructive
feedback and interesting discussions regarding the research.
During the years I have been working on this thesis, I have received help
from many people. In particular, I would like to thank Heine Kolltveit and
Jeanine Lilleng for many interesting discussions. In addition, Professor Kjetil
Nørvåg has been a seemingly infinite source of information when it comes to
academic publishing. I would also like to thank the members of the Database
System Group in general for providing a good environment for PhD students.
I sincerely thank Rune Havnung Bakken, Jon Olav Hauglid and Associate
Professor Roger Midtstraum for proofreading and commenting drafts of the
thesis. Your feedback have been invaluable.
I would also like to thank my parents and sister for their inspiration and
encouragements. Finally, I express my deepest thanks to my wife Ingvild for
her constant love and support.
Abstract
Relational database systems are used in thousands of applications every day,
including online web shops, electronic medical records and for mobile telephone tracking. Many of these applications have high availability requirements, allowing the database system to be offline for only a few minutes each
year.
In existing DBMSs, user transactions get blocked during creation of materialized views (MVs) and non-trivial schema transformations. Blocking user
transactions is not an option in database systems requiring high availability.
A non-blocking method to perform these operations is therefore needed.
Our research has focused on how the MV creation and schema transformation operations can be performed in database systems with high availability requirements. We have examined existing solutions to MV creation
and schema transformations, and identified requirements. Most important
among these requirements were that the method should not have blocking
effects, and should degrade performance of concurrent transactions to the
smallest possible extent.
The main contribution of this thesis is a method for creation of derived
tables (DTs) using relational operators. Furthermore, we show how these
DTs can be used to create MVs and to perform schema transformations. The
method is non-blocking, and may be executed as a low priority background
process to minimize performance degradation.
The MV creation and schema transformation methods have been implemented in a prototype DBMS. By performing thorough empirical validation
experiments on this prototype, we show that the method works correctly.
Furthermore, through extensive performance experiments, we show that the
method incurs little response time and throughput degradation under moderate workloads. Thus, the method provides a way to create MVs and to
transform the database schema that can be used in highly available database
systems.
Contents
I
Background and Context
3
1 Introduction
1.1 Motivation . . . . . . . . . . . . . . . . . . .
1.1.1 The Derived Table Creation Problem
1.2 Research Questions . . . . . . . . . . . . . .
1.3 Research Methodology . . . . . . . . . . . .
1.4 Organization of this thesis . . . . . . . . . .
2 Derived Table Creation Basics
2.1 Database Systems - An Introduction
2.2 Concurrency Control . . . . . . . . .
2.3 Recovery . . . . . . . . . . . . . . . .
2.4 Record Identification Policy . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
5
7
10
11
12
.
.
.
.
14
14
16
17
21
3 A Survey of Technologies Related to Non-Blocking Derived
Table Creation
3.1 Ronström’s Schema Transformations . . . . . . . . . . . . . .
3.1.1 Simple Schema Changes . . . . . . . . . . . . . . . . .
3.1.2 Complex Schema Changes . . . . . . . . . . . . . . . .
3.1.3 Cost Analysis of Ronström’s Method . . . . . . . . . .
3.2 Fuzzy Table Copying . . . . . . . . . . . . . . . . . . . . . . .
3.3 Materialized View Maintenance . . . . . . . . . . . . . . . . .
3.3.1 Snapshots . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.2 Materialized Views . . . . . . . . . . . . . . . . . . . .
3.4 Schema Transformations and DT creation in Existing DBMSs
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
23
25
25
36
40
41
41
42
44
45
II
47
Derived Table Creation
4 The Derived Table Creation Framework
49
x
CONTENTS
4.1
4.2
4.3
4.4
4.5
4.6
4.7
Overview of the Framework . . . . . . . . . . . . .
Step 1: Preparation . . . . . . . . . . . . . . . . . .
Step 2: Initial Population . . . . . . . . . . . . . .
Step 3: Log Propagation . . . . . . . . . . . . . . .
Step 4: Synchronization . . . . . . . . . . . . . . .
Considerations for Schema Transformations . . . .
4.6.1 A lock forwarding improvement for schema
mations . . . . . . . . . . . . . . . . . . . .
Summary . . . . . . . . . . . . . . . . . . . . . . .
5 Common DT Creation Problems
5.1 Missing Record and State Identification .
5.2 Missing Record Pre-States . . . . . . . .
5.3 Lock Forwarding During Transformations
5.4 Inconsistent Source Records . . . . . . .
5.4.1 Repairing Inconsistencies . . . . .
5.5 Summary . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
6 DT Creation using Relational Operators
6.1 Difference and Intersection . . . . . . . . .
6.1.1 Preparation . . . . . . . . . . . . .
6.1.2 Initial Population . . . . . . . . . .
6.1.3 Log Propagation . . . . . . . . . .
6.1.4 Synchronization . . . . . . . . . . .
6.2 Horizontal Merge with Duplicate Inclusion
6.2.1 Preparation . . . . . . . . . . . . .
6.2.2 Initial Population . . . . . . . . . .
6.2.3 Log Propagation . . . . . . . . . .
6.2.4 Synchronization . . . . . . . . . . .
6.3 Horizontal Merge with Duplicate Removal
6.3.1 Preparation Step . . . . . . . . . .
6.3.2 Initial Population Step . . . . . . .
6.3.3 Log Propagation Step . . . . . . .
6.3.4 Synchronization Step . . . . . . . .
6.4 Horizontal Split Transformation . . . . . .
6.4.1 Preparation . . . . . . . . . . . . .
6.4.2 Initial Population . . . . . . . . . .
6.4.3 Log propagation . . . . . . . . . . .
6.4.4 Synchronization . . . . . . . . . . .
6.5 Vertical Merge . . . . . . . . . . . . . . . .
6.5.1 Preparation . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
transfor. . . . .
. . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
49
51
53
53
54
56
. 59
. 59
.
.
.
.
.
.
60
60
61
63
66
68
69
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
70
71
71
73
73
75
77
79
79
79
80
81
83
83
83
85
86
86
87
87
89
89
91
CONTENTS
6.6
6.7
6.8
III
xi
6.5.2 Initial Population . . . . . . . . . . . . . . . . . . . .
6.5.3 Log Propagation . . . . . . . . . . . . . . . . . . . .
6.5.4 Synchronization . . . . . . . . . . . . . . . . . . . . .
Vertical Split over a Candidate Key . . . . . . . . . . . . . .
6.6.1 Preparation . . . . . . . . . . . . . . . . . . . . . . .
6.6.2 Initial Population . . . . . . . . . . . . . . . . . . . .
6.6.3 Log Propagation . . . . . . . . . . . . . . . . . . . .
6.6.4 Synchronization . . . . . . . . . . . . . . . . . . . . .
Vertical Split over a Functional Dependency . . . . . . . . .
6.7.1 Preparation . . . . . . . . . . . . . . . . . . . . . . .
6.7.2 Initial Population . . . . . . . . . . . . . . . . . . . .
6.7.3 Log Propagation . . . . . . . . . . . . . . . . . . . .
6.7.4 Synchronization . . . . . . . . . . . . . . . . . . . . .
6.7.5 How to Handle Inconsistent Data - An Extension to
Vertical Split . . . . . . . . . . . . . . . . . . . . . .
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Implementation and Evaluation
7 Implementation Alternatives
7.1 Alternative 1 - Simulation . . . . . . .
7.2 Alternative 2 - Open Source DBMS . .
7.3 Alternative 3 - Prototype . . . . . . . .
7.4 Implementation Alternative Discussion
.
.
.
.
8 Design of the Non-blocking DBMS
8.1 The Non-blocking DBMS Server . . . . .
8.1.1 Database Communication Module
8.1.2 SQL Parser Module . . . . . . . .
8.1.3 Relational Manager Module . . .
8.1.4 Scheduler Module . . . . . . . . .
8.1.5 Recovery Manager Module . . . .
8.1.6 Data Manager Module . . . . . .
8.1.7 Effects of the Simplifications . . .
8.2 Client and Administrator Programs . . .
8.3 Summary . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
91
91
93
95
96
96
96
97
97
99
100
101
102
. 103
. 105
109
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
111
. 112
. 112
. 115
. 117
.
.
.
.
.
.
.
.
.
.
120
. 123
. 123
. 123
. 124
. 127
. 128
. 129
. 130
. 132
. 132
9 Prototype Testing
134
9.1 Test Environment . . . . . . . . . . . . . . . . . . . . . . . . . 134
9.2 Empirical Validation of the Non-Blocking DT Creation Methods139
xii
CONTENTS
9.3
9.4
Performance Testing . . . . . . . . . . . . . . . . . . . . . . . 142
9.3.1 Log Propagation - Difference and Intersection . . . . . 147
9.3.2 Log Propagation - Vertical Merge . . . . . . . . . . . . 154
9.3.3 Low Performance Degradation or Short Execution Time?156
9.3.4 Other Steps of DT Creation . . . . . . . . . . . . . . . 157
9.3.5 Performance Experiment Summary . . . . . . . . . . . 159
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
10 Discussion
10.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . .
10.1.1 A General DT Creation Framework . . . . . . . . . .
10.1.2 DT Creation for Many Relational Operators . . . . .
10.1.3 Support for both Schema Transformations and Materialized Views . . . . . . . . . . . . . . . . . . . . . .
10.1.4 Solutions to Common DT Creation Problems . . . .
10.1.5 Implemented and Empirically Validated . . . . . . .
10.1.6 Low Degree of Performance Degradation . . . . . . .
10.1.7 Based on Existing DBMS Functionality . . . . . . . .
10.1.8 Other Considerations - Total Amount of Data . . . .
10.2 Answering the Research Question . . . . . . . . . . . . . . .
10.2.1 Summary . . . . . . . . . . . . . . . . . . . . . . . .
162
. 162
. 163
. 163
11 Conclusion and Future Work
11.1 Research Contributions . . . . . . . . . . . . . . . . . . . . .
11.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.3 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . .
172
. 172
. 173
. 175
IV
177
Appendix
.
.
.
.
.
.
.
.
164
165
165
165
168
169
169
171
A Non-blocking Database: SQL Syntax
179
B Performance Graphs
181
Glossary
191
Bibliography
195
List of Figures
2.1
2.2
Database System . . . . . . . . . . . . . . . . . . . . . . . . . 15
Compensation Log Records provide valid State Identifiers . . . 18
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
Ronströms Horizontal Merge Method . . . . . . . . . .
Ronströms Horizontal Split Method . . . . . . . . . . .
Examples of Vertical Merge Schema Change . . . . . .
Chain of Triggers in Ronströms Vertical Merge Method
Ronströms Vertical Split Method . . . . . . . . . . . .
Ronströms Vertical Split Transformation . . . . . . . .
Ronströms Vertical Split Method and Inconsistent Data
Example MV Consistency Problem . . . . . . . . . . .
4.1
The four steps of DT creation. . . . . . . . . . . . . . . . . . . 50
5.1
5.2
5.3
5.4
5.5
5.6
5.7
Solving the Record and State Identification Problems
Solving the Missing Record Pre-State Problem . . . .
Example Simple Lock Forwarding (SLF) . . . . . . .
Lock Compatibility Matrix . . . . . . . . . . . . . . .
Example Many-to-One Lock Forwarding (M1LF) . .
Example Many-to-Many Lock Forwarding (MMLF) .
Inconsistent Source Records . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
61
62
64
65
66
67
67
6.1
6.2
6.3
6.4
6.5
6.6
6.7
6.8
6.9
6.10
Difference and Intersection DT Creation . . . . . . . . . . .
Horizontal Merge DT Creation . . . . . . . . . . . . . . . . .
Horizontal Merge - Duplicate Inclusion . . . . . . . . . . . .
Horizontal Merge - Duplicate Inclusion with type attribute .
Horizontal Merge - Duplicate Removal. . . . . . . . . . . . .
Horizontal Split DT Creation . . . . . . . . . . . . . . . . .
Example vertical merge DT creation. . . . . . . . . . . . . .
Synchronization of a Vertical Merge Schema Transformation
Vertical split over a Candidate Key. . . . . . . . . . . . . . .
Vertical split over a non candidate key. . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
72
77
78
80
82
86
90
94
95
98
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
27
28
30
32
33
34
35
42
xiv
LIST OF FIGURES
7.1
Possible Modular Design of Prototype. . . . . . . . . . . . . . 116
8.1
8.2
8.3
8.4
8.5
8.6
Modular Design Overview of the Non-blocking DBMS. . . .
UML Class Diagram of the Non-blocking Database System. .
Sequence Diagram - Relational Manager Processing a Query
Organization of the log. . . . . . . . . . . . . . . . . . . . .
Organization of data records in a table. . . . . . . . . . . . .
Screen shot of the Client program in action. . . . . . . . . .
.
.
.
.
.
.
121
122
126
128
129
133
9.1
9.2
9.3
9.4
9.5
9.6
9.7
9.8
Response time and throughput for difference and intersection
Response time distribution for difference and intersection. . .
Response time - difference log propagation . . . . . . . . . .
Throughput - difference log propagation . . . . . . . . . . .
Response time and throughput - vertical merge DT creation
Comparison of vertical merge and difference/intersection . .
Time vs Degradation . . . . . . . . . . . . . . . . . . . . . .
Response Time Summary . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
145
148
151
153
154
155
157
160
11.1 Example of Schema Transformation performed in two steps. . 175
11.2 Example interface for dynamic priorities for DT creation. . . . 175
B.1
B.2
B.3
B.4
B.5
B.6
B.7
B.8
Response time and throughput - horizontal merge DT creation
Response time - horizontal split DT creation . . . . . . . . . .
Throughput - horizontal split DT creation . . . . . . . . . . .
Response time - vertical merge DT creation . . . . . . . . . .
Response time - vertical merge DT creation, varying table size
Throughput - vertical merge DT creation . . . . . . . . . . . .
Response time - vertical split DT creation . . . . . . . . . . .
Throughput - vertical split DT creation . . . . . . . . . . . . .
182
183
184
185
186
187
188
189
List of Tables
3.1
3.2
3.3
3.4
The three dimensions of Ronström’s schema transformations.
Legend for Tables 3.3 and 3.4. . . . . . . . . . . . . . . . . .
Cost Incurred by Ronström’s Vertical Merge Schema Transformation Method . . . . . . . . . . . . . . . . . . . . . . . .
Added Cost by Ronström’s Vertical Split Schema Transformation Method . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 25
. 36
. 37
. 38
5.1
DT Creation Problems and Solutions . . . . . . . . . . . . . . 69
6.1
6.2
DT Creation Operators . . . . . . . . . . . . . . . . . . . . . . 71
Problems and solutions for DT Creation methods . . . . . . . 107
7.1
7.2
Evaluation - Open Source DBMSs . . . . . . . . . . . . . . . . 115
Evaluation of implementation alternatives. . . . . . . . . . . . 118
9.1
9.2
9.3
9.4
9.5
9.6
9.7
9.8
Hardware and Software Environment for experiments. .
Transaction Mix 1 . . . . . . . . . . . . . . . . . . . .
Transaction Mix 2 . . . . . . . . . . . . . . . . . . . .
Transaction Mix 3 . . . . . . . . . . . . . . . . . . . .
Table Sizes used in the experiments . . . . . . . . . . .
Response Time Distribution Summary . . . . . . . . .
Response Time Initial Population and Log Propagation
Effects of varying priorities . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
135
137
137
138
138
149
158
158
Part I
Background and Context
Chapter 1
Introduction
The topic of this thesis is schema transformations and materialized view creation in relational database systems with high availability requirements. The
main focus is on how creation of derived tables can be used to perform both
operations while incurring minimal performance degradation for concurrent
transactions.
In this chapter, the motivation for the topic is presented, and the research
questions and methodology are discussed.
1.1
Motivation
Relational database systems have had tremendous success since Ted Codd
introduced the relation concept in the famous paper “A relational model
of data for large shared data banks” in 1970 (Codd, 1970). Today, this
type of database system is so dominant that “database system” is close to
synonymous with “relational database system”.
Relational database systems1 , or simply database systems, are used in
virtually all kinds of applications, spanning from simple personal database
systems to huge and complex business database systems. Personal database
systems, including CD or book archives and contact information for friends
and family, typically contain few tuples (order of hundreds). The consequences of unavailability2 for such database systems are, in general, not critical; it would still be possible to play music even if the CD archive was
1
Throughout this thesis, the term database is used to denote a collection of related
data. A Database Management System (DBMS) is a program used to manage these data,
while database system denotes a collection of data managed by a DBMS (Elmasri and
Navathe, 2004).
2
In this thesis, database systems are considered available when they can be fully accessed by their intended users.
6
1.1. MOTIVATION
unavailable. Because people normally consider sporadic downtime of such
systems acceptable, these systems have low requirements for availability.
At the other end of the scale, database systems used in business applications may be very large, often in the order of millions or even billions
of tuples. Business database systems are involved in everything from stock
exchanges, web shops, banking and airline ticket booking to patient histories at hospitals3 , Enterprise Resource Planning (ERP) and Home Location
Registries (HLR) used to keep track of mobile telephones in a network.
While database system availability is not critical to all business applications, it certainly is to many. Database systems are, e.g., required for the
exchange of stocks at NASDAQ and for customers to shop at Amazon.com.
Even more critical; the HLR database system is required for any mobile
phone to work in a network. These systems should not be unavailable for
long periods of time.
Database Operations
Users interact with database systems by performing transactions, which may
consist of one or more database operations (Elmasri and Navathe, 2004).
Examples of basic database operations include inserting a new patient into a
hospital database and querying a patient’s medical record for possible dangerous allergies before a surgery.
In modern database systems, e.g. Microsoft SQL Server 2005 (Microsoft
TechNet, 2006) and IBM DB2 version 9 (IBM Information Center, 2006), the
basic operations are designed to achieve high degrees of concurrency (GarciaMolina et al., 2002). While the operations performed by one user may block
other users from accessing the very same data items at the same time, other
data items are still accessible.
A database operation is said to be blocking if it keeps other transactions
from executing their update (and possibly read) operations, effectively making the involved data unavailable. Short term blocking of small parts of the
database may not be problematic. However, blocking a huge amount of data
for a long period of time seriously reduces availability. This is obviously
unwanted in highly available database systems.
In the following section, two database operations, database schema transformations and creation of materialized views, are described. Neither of these
can be performed without blocking the involved tables for a long period of
time in existing DBMSs (Løland, 2003).
3
Patient history databases may, e.g., describe previous treatment, allergies, x-ray images etc
CHAPTER 1. INTRODUCTION
1.1.1
7
The Derived Table Creation Problem
”Due to planned technical maintenance and upgrades, the online
bank will be unavailable from Saturday 8 p.m. to Sunday 10 a.m.
Our contact center will not be able to help with account information as internal systems are affected as well.
We are sorry for the inconvenience this may cause for our customers.”
Norwegian Online Bank, October 2006
Schema Transformations
Database schemas4 are typically designed to model the relevant parts and
aspects of the world at design time. The schema may be excellent for the
intended usage at the time it is design, but many applications change over
time. New kinds of products appear, departments which had one head of
department suddenly has a board, or new laws that affect the company are
introduced by the government. These are examples of changes that may
require a transformation of the database schema.
In addition to changing needs as a source for transformations, designers
may also have been unsuccessful in designing a good schema in the first place.
After being used for some time, it may turn out that a schema does not work
as well as it was intended to. Often, the reason for this is that the design is a
compromise between many factors, some of which include readability of the
E/R diagram, removal of anomalies and optimization of runtime efficiency. It
may very well turn out that the schema is too inefficient or that the designers
just forgot or misinterpreted something.
In a study of seven applications, Marche (Marche, 1993) reports of significant changes to relational database schemas over time. Six of the studied
schemas had more than 50% of their attributes changed. The evolution continued after the development period had ended. A similar study of a health
management system came to the same conclusion (Sjøberg, 1993).
As should be clear, a database schema may sometimes have to be changed
after the database has been populated with data. In this thesis, we refer to
such changes as “schema transformations”. An important shortcoming of all
but the least complex schema transformations is that they must be performed
in a blocking way in todays DBMSs (Lorentz and Gregoire, 2003b; Microsoft
TechNet, 2006; IBM Information Center, 2006). This will be elaborated on
in Section 3.4.
4
The description, or model, of a database (Elmasri and Navathe, 2000).
8
1.1. MOTIVATION
Materialized Views
A database view is a table derived from other tables, and is defined by a
database query called the view query (Elmasri and Navathe, 2004). Views
may be either virtual or materialized. Virtual views do not physically store
any records, but can still be queried like normal tables. This is done by using
the view queries to rewrite the user queries (Elmasri and Navathe, 2004).
Depending on the complexity of the view query, querying a virtual view
may be much more costly than querying a normal table. To remedy this,
most modern DBMSs support Materialized Views (MVs)5 (Løland, 2003).
Unlike a virtual view, the result of the view query is stored physically in an
MV (Elmasri and Navathe, 2004).
MVs have many uses in addition to speeding up queries (Alur et al., 2002).
They can be used to store historical information, e.g. sales reports for each
quarter of a year. They are also frequently used in Data Warehouses. Because
of the great performance advantages of MVs and their widespread use, much
research has been conducted on how to keep MVs consistent with the source
tables (Løland and Hvasshovd, 2006c). However, in current DBMSs, the MVs
still have to be created in a way that blocks all updates to the source tables
while the MV is created (Lorentz and Gregoire, 2003b; IBM Information
Center, 2006; Microsoft TechNet, 2006).
Using Derived Tables for Schema Transformations and Materialized View Creation
The blocking MV creation and schema transformation methods described in
the previous sections may take minutes or more for tables with large amounts
of data. If either of these operations is required, the database administrator
is forced to choose between unavailability while performing the operation,
or to not perform it at all. Both choices may, however, be unacceptable.
This is especially the case when the database system has high availability
requirements.
A derived table (DT) is, as the name suggests, a database table containing
records derived from other tables6 (Elmasri and Navathe, 2004). A table
“Sales Report” that stores a one-year history of all sales by all employees
is an example of a DT. Hence, a materialized view is obviously one type of
DT. A less intuitive application of DTs is to redirect operations from source
5
Materialized Views are called Indexed Views by Microsoft (Microsoft TechNet, 2006)
and Materialized Query Table by IBM (IBM Information Center, 2007)
6
Throughout this thesis, the tables that records are derived from will be called source
tables.
CHAPTER 1. INTRODUCTION
9
tables to derived tables and thereby perform a schema transformation. A
method to create DT is therefore likely to be usable for both operations.
Both schema transformations and Materialized Views are defined by a
query (Microsoft TechNet, 2006; Lorentz and Gregoire, 2003a; Løland, 2003),
and we therefore focus on DT creation using relational operators7 , also called
relational algebra operations. Relational operators can be categorized in
two groups: non-aggregate and aggregate operators (Elmasri and Navathe,
2004). The non-aggregate operators are cartesian product, various joins,
projection, union, selection, difference, intersection and division. Aggregate
operators are mathematical functions that apply to collections of records.
Both non-aggregate and aggregate operators can be used to define schema
transformations and MVs. However, aggregate operators are typically not
used without non-aggregate operators (Alur et al., 2002), and we therefore
consider non-aggregate operators the best starting point for DT creation.
The main topic of this thesis is to develop a method that solves the
unavailability problem of creating derived tables, using common relational
operators. Due to time constraints, we will focus on six operators: full outer
equijoin (one-to-many and many-to-many relationships), projection, union,
selection, difference and intersection8 . Full outer equijoin is chosen because
these can later be reduced to any inner/left/right join simply by removing
records from the result, and because equality is the most commonly used
comparison type in joins (Elmasri and Navathe, 2004). Furthermore, in terms
of derived table creation, cartesian product is actually a simpler full outer
join in which no attribute comparison is performed. The final non-aggregate
operator, division, can be expressed in terms of the other operators, and is
therefore considered less important.
The suggested method must solve any problem associated with utilizing
the DTs as materialized views and in schema transformations. To gain insight
in the field, the work must include a thorough study of existing solutions to
the described and closely related problems. Existing DBMS functionality
should be used to the greatest possible extent to ease the integration of
the method into existing DBMSs. Since the goal is to develop a method
that incurs little performance degradation to concurrent transactions, the
performance implications of the method needs to be tested.
7
Relational operators are the building blocks used in queries (Elmasri and Navathe,
2004).
8
Due to naming conventions in the literature (Ronström, 1998), we use the names vertical merge and split, horizontal merge and split, difference and intersection, respectively,
when these relational operators are used in DT creation.
10
1.2
1.2. RESEARCH QUESTIONS
Research Questions
Based on the discussion in the previous section, the main research question
of the thesis is:
How can we create derived tables and use these for schema transformation
and materialized view creation purposes while incurring minimal performance
degradation to transactions operating concurrently on the involved source tables.
We realize that this is a research question with many aspects. To be able to
answer it, the research question is therefore refined into four key challenges:
Q1: Current situation
What is the current status of related research designed to address the
main research question or part of it?
Q2: System Requirements
What DBMS functionality is required for non-blocking DT creation to
work?
Q3: Approach and solutions
How can derived tables be created with minimal performance degradation, and be used for schema transformation and MV creation purposes?
• How can we create derived tables using the chosen six relational
operators.
• What is required for the DTs to be used a) as materialized views?
b) for schema transformations?
• To what extent can the solution be based on standard DBMS functionality and thereby be easily integradable in existing DBMSs?
Q4: Performance
Is the performance of the solution satisfactory?
• How much does the proposed solution degrade performance for
user transactions operating concurrently?
• With the inevitable performance degradation in mind; under which
circumstances is the proposed solution better than a) other solutions? b) performing the schema transformation or MV creation
in the traditional, blocking way?
CHAPTER 1. INTRODUCTION
1.3
11
Research Methodology
Denning et al. divides computer science research into three paradigms: theory, abstraction and design (Denning et al., 1989).
Theory is rooted in mathematics and aims at developing validated theories.
The paradigm consists of four steps:
1. characterize objects of study
2. hypothesize possible relationships among them, i.e., form theorem
3. determine whether the relationships are true, i.e., proof
4. interpret results
Abstraction is rooted in experimental scientific method. In this method, a
phenomenon is investigated by collecting and analyzing experimental
results. It consists of four steps:
1. form a hypothesis
2. construct a model and make a prediction
3. design an experiment and collect data
4. analyze results
Design is rooted in engineering, and aims at constructing a system that
solves a problem.
1. state requirements
2. state specifications
3. design and implement the system
4. test the system
The research presented in this thesis fits naturally into the Design paradigm.
The research aims at solving the problem that creation of derived tables is a
blocking operation.
For our suggested solution to be useful, the method must fit into common
DBMS design. Hence, the first step in solving the research question is to
understand commonly used DBMS technology that is somehow related to
the research question. This will enable us to state requirements.
The next step is to state specifications for a method that can be used to
create derived tables in a non-blocking way. The method should be designed
12
1.4. ORGANIZATION OF THIS THESIS
to fit into existing DBMSs to the greatest possible extent, and to degrade
performance as little as possible.
To verify validity, and to test the actual performance degradation, a
DBMS and the suggested method is then designed and implemented. The
implementation is then subjected to thorough performance testing.
1.4
Organization of this thesis
The thesis is divided into three parts with different focus. The focus in Part
I is on the background for the research. This includes the research question,
required functionality and a survey of related work. Part II presents our
solution to the derived table creation problem, and shows how the DTs can
be used to transform the schema and to create materialized views. In Part
III, we discuss the results of experiments on a prototype DBMS. This part
also includes a discussion of the research contributions, and suggests further
work.
Part I - Background and Context contains an introduction to derived
table creation. The focus in this part is on research from the literature
and existing systems that is relevant to our solution of the research
question and suggestions for further work.
Chapter 1 contains this introduction. The chapter states the motivation for the research, and the research methodology that is used
in the work.
Chapter 2 introduces the DBMS fundamentals required to perform
non-blocking derived table creations.
Chapter 3 is a survey of existing solutions to the non-blocking DT
creation problem and related problems.
Part II - DT Creation Framework presents our solution for non-blocking
creation of derived tables, and how these derived tables can be used for
schema transformations and materialized view creation.
Chapter 4 introduces our framework for non-blocking derived table
creation.
Chapter 5 identifies problems that are encountered when derived tables are created as described in Chapter 4. The chapter also shows
how these problems can be solved.
CHAPTER 1. INTRODUCTION
13
Chapter 6 describes in detail how the DT creation framework presented in Chapter 4 can be used for non-blocking creation of derived tables using the six relational operators that have been chosen. The chapter also describes what needs to be done to use these
DTs for schema transformations or as materialized views.
Part III - Implementation and Testing presents the design of our prototype DBMS. The prototype is capable of performing non-blocking
DT creation as described in Part II. Results from performance testing
of this prototype are also presented.
Chapter 7 evaluates three alternatives for implementation of the DT
creation method.
Chapter 8 describes the design of a prototype DBMS capable of performing the DT creation method developed in Part II.
Chapter 9 discusses experiment types the results of performing the
required experiments on the prototype.
Chapter 10 contains a discussion of the results of the research.
Chapter 11 presents an overall conclusion and the contributions of
the thesis.
Chapter 2
Derived Table Creation Basics
This chapter describes basic Database Management System concepts that are
used by or are otherwise relevant to our non-blocking DT creation method. A
thorough description of DBMSs is out of the scope of this thesis. For further
details, the reader is referred to one of the many text books on the subject,
e.g. “Database Systems The Complete Book” (Garcia-Molina et al., 2002)
or “Fundamentals of Database Systems” (Elmasri and Navathe, 2004).
2.1
Database Systems - An Introduction
A database (DB) is a collection of data items1 , each having a value (Bernstein
et al., 1987). All access to the database goes through the Database Management System (DBMS). As illustrated in Figure 2.1, a database managed by
a DBMS is called a database system (Elmasri and Navathe, 2004).
Database access is performed by executing special transaction programs,
which have a defined start and end, set by start and either commit or abort
operation requests. A commit ensures that all operations of the transaction
are executed and safely stored, while an abort call removes all effects of the
transaction (Elmasri and Navathe, 2004).
In its most common use, transactions have four properties, known as the
“ACID” properties (Gray, 1981; Haerder and Reuter, 1983):
Atomicity - The transaction must execute successfully, or must appear not
to have executed at all. This is also referred to as the “all or nothing”
property of transactions. Thus, the DBMS must be able to undo all
operations performed by a transaction that is aborted.
1
The data items are called tuples or rows in the relational data model. Internally in
database systems, they are called records (Garcia-Molina et al., 2002). To avoid confusion,
the term “record” will be used throughout this thesis.
CHAPTER 2. DERIVED TABLE CREATION BASICS
15
User
Database System
Database Management System
Database
Figure 2.1: Conceptual Model of a Database System.
Consistency - A transaction must always transform the database from one
consistent state2 to another.
Isolation - It should appear to each transaction that other transactions
either appeared before or after it, but not both (Gray and Reuter,
1993).
Durability - The results of a transaction are permanent once the transaction has committed.
Broadly speaking, the ACID properties are enforced mainly by concurrency control, which is used to achieve isolation, and recovery which is used
for atomicity and durability. Consistency means that transactions must preserve constraints. The following two sections give a brief introduction to
common concurrency control and recovery mechanisms.
2
A consistent state is a state where database constraints are not broken (Garcia-Molina
et al., 2002).
16
2.2
2.2. CONCURRENCY CONTROL
Concurrency Control
In a database system where only one transaction is active at any time, concurrency control is not needed. In this scenario, the operations from each
transaction are executed in serial (i.e. sequential) order, and two transactions
can never interfere with each other. The isolation property is therefore implicitly guaranteed. However, this scenario is seldom used since the database
system is normally only able to use small parts of the available resources at
any time (Bernstein et al., 1987).
When concurrent transactions are allowed, the operations of the various
transactions must be executed as if the execution was serial (Garcia-Molina
et al., 2002). A sequence, or history, of operations that gives the same
result as serial execution is called serializable. It is the responsibility of the
“Scheduler”, or “Concurrency Controller”, to enforce serializable histories,
which in turn is a guarantee for isolation (Bernstein et al., 1987).
Schedulers
Schedulers can be either optimistic or pessimistic. With the optimistic strategy, transactions perform operations right away without first checking for
conflicts. When the transaction requests a commit, however, its history is
checked. If the transaction has been involved in any non-serializable operations, the transaction is forced to abort. Timestamp ordering, serialization
graph testing and locking can all be used for optimistic scheduling (Bernstein
et al., 1987).
The most common form of scheduling is pessimistic, however. With this
strategy, transactions are not allowed to perform operations that will form
non-serializable histories in the first place. Thus, the scheduler has to check
every operation to see if it conflicts with any operation executed by another
currently active transaction. When a conflict is found, the scheduler may
decide to either delay or reject the operation (Bernstein et al., 1987).
The pessimistic Two Phase Locking (2PL) strategy has become the de
facto scheduling standard in commercial DBMSs. It is, e.g., used in Oracle
Database 10g (Cyran and Lane, 2003). In 2PL, a lock must be set on a
data item before a transaction is allowed to operate on it. Two lock types,
shared and exclusive, are typically used. The idea is that multiple transactions should be allowed to concurrently read the same record, while only one
transaction should at any time be allowed to write to a record. Thus, read
operations are allowed if the transaction has a shared or exclusive lock on
the record, while write operations are allowed only if the transaction has an
exclusive lock on it.
CHAPTER 2. DERIVED TABLE CREATION BASICS
17
As the name indicates, 2PL works in two phases: locks are acquired
during the first phase, and released during the second phase. This implies
that a transaction is not allowed to set new locks once it has started releasing
locks. Unless the transaction pre-declares all operations it will execute, the
scheduler does not know if a transaction is done operating on a particular
object, or if it will need more locks in the future. Locks are therefore typically
not released until the transaction terminates. This is known as Strict 2PL
(Garcia-Molina et al., 2002). The derived table creation method described
in this document assumes that 2PL is used, although it may be tailored to
suit other scheduling strategies as well.
Two-phase commit is a commonly used protocol for commit handling in
distributed DBMSs used to ensure that the transaction either commits on
all nodes or aborts on all nodes. The protocol works in two phases: in the
prepare phase, the transaction coordinator asks all transaction participants
if they are ready to commit. If they all agree to commit, the coordinator
completes the transaction by sending a commit message to all participant
(Gray, 1978). This is called the commit phase.
2.3
Recovery
In a database system, failure may occur on three levels. Transaction failure
happens when as transaction either chooses to, or is forced to, abort. System
failure happens when the contents of volatile storage is lost or corrupted. A
power failure is a typical reason for such failures. Media failure happens when
the contents in non-volatile storage is either lost or corrupted, e.g. because
of a disk crash. In what follows, “memory” and “disk” will be used instead
of volatile and non-volatile storage, respectively.
Physical Logging
Recovery managers are constructed to correct the three types of failure. The
idea behind almost all recovery managers is that information for how to recover the database to the correct state must be stored safely at any time. This
information is typically stored in a log, which can either be physical, logical
or physiological (Haerder and Reuter, 1983; Bernstein et al., 1987). Physical
logging, or value logging, writes the before and after value of a changed object
to the log (Gray, 1978). The physical unit that is logged is typically a disk
block or a record. Assuming that records are smaller than disk blocks, the
former produces drastically higher log volumes than the latter (Haerder and
Reuter, 1983). Since the log records contain before and after values of the
18
2.3. RECOVERY
11: T1 - R1=10
12: T2 - R2=15
13: T2 commit
History
Block:27 LSN:10
Disk
Block
R1=3
R2=6
14: T1 abort
Block:27 LSN:12
R1=10
R2=15
Block:27 LSN:??
R1=3
R2=15
Figure 2.2: Two records in the same disk block are updated by different transactions, T1 and T2 . After T1 aborts, there is no valid state identifier for the
block.
changed object, logged operations are idempotent which means that redoing
the operation multiple times yields the same result as redoing it once.
Logical Logging
Logical logging, or operation logging, logs the operation that is performed
instead of the before and after value (Haerder and Reuter, 1983). This strategy produces much smaller log volumes than the physical methods (Bernstein
et al., 1987). A Log Sequence Number (LSN) is assigned to each log record,
and data items are tagged with the LSN of the latest operation that has
changed it. This is done to ensure that changes are applied only once to each
record since logically logged operations are not idempotent in general. LSNs
may be assigned to block (block state identifier, BSI) or record (record state
identifier, RSI) level (Bernstein et al., 1987). The former requires slightly less
disk space whereas the latter is better suited in replicated database systems
based on log redo since this allows for different physical organization at the
different nodes (Bratsberg et al., 1997a).
Two common methods to increase the degree of concurrency are finegranularity locking and semantically rich locks. Fine-granularity locks are
normal locks that are set on small data items, i.e. records (Weikum, 1986;
Mohan et al., 1992). Semantically rich locks allow multiple transactions
to lock the same data item provided that the operations are commutative
(Korth, 1983). Operations that commute may be performed in any order,
e.g., “increase” and “decrease”. When these methods are combined with
logical logging, Compensating Log Records (Crus, 1984) are required (Mohan
et al., 1992). The reason for this is that there is no correct LSN that can be
used as BSI after certain undo operations, as illustrated in Figure 2.2; after
the abort of transaction 1, the LSN of the block cannot be set to what it was
before the update because that would not reflect the change performed by
CHAPTER 2. DERIVED TABLE CREATION BASICS
19
transaction 2. Neither can the LSN of the abort log record be used since this
invalidates the one-to-one correspondence between updates and log records
(Gray and Reuter, 1993). Thus, Compensating Log Records (CLR) (Crus,
1984) are written to the log when an undo operation is performed due to any
of the failure scenarios presented in Section 2.3. The CLR describes the undo
action that takes place (Gray, 1978). It also keeps the LSN of the log record
that was undone. E.g., if the insert of record X is undone, a CLR describing
the deletion of X is written to the log. LSNs are assigned to CLRs, thus
the state identifier of a record or disk block will increase even when undo
operations are performed.
Logical logging is considered better than physical logging because of the
reduced log volume and because the state identifiers reduces recovery work.
However, it has one major flaw: the logged operations are not action consistent3 since they are not atomic. One insert may, e.g., require that large parts
of a B-tree is restructured. This can be solved by using a two-level recovery
scheme where the low-level system provides action consistent operations to
the high-level logical logging scheme (Gray and Reuter, 1993). Shadowing is
an example low-level scheme. With this method, blocks are copied before a
change is applied. The method is complex and requires locks to be set on
blocks since this is the granularity of the copies (Gray and Reuter, 1993).
Physiological Logging
Physiological logging (Mohan et al., 1992), also called physical-to-a-page
logical-within-a-page, is a compromise between physical and logical logging.
It uses logical logging to describe operations on the physical objects; blocks.
In the shadowing strategy, non-atomic operations are executed by minitransactions. The mini-transactions consist of atomic operations, each of
which are physiologically logged. Thus, the log records are small, while the
problems of logical logging are avoided (Gray and Reuter, 1993).
(No-)Steal and (No-)Force Cache Managers
The log may provide information to undo or redo an operation. Which kind
of information is required for recovery to work depends heavily on the strategy of another DBMS component, the cache manager, which is responsible
for copying data items between memory and disk. Two parameters are of
particular relevance to the recovery manager. The first determines whether
3
This means that one logical operation may involve multiple physical operations.
Hence, a database system may crash when only parts of a logically logged operation
has been performed (Gray and Reuter, 1993).
20
2.3. RECOVERY
or not updates from uncommitted transactions may be written to disk. If
uncommitted updates are allowed on disk, the cache manager uses a steal
strategy; otherwise a no-steal strategy is used (Gray, 1978). Since memory is
almost always a limited resource, stealing gives the cache manager valuable
freedom in choosing which data items should be moved out. The problem is
that if a failure occurs, data items on disk may have been changed by transactions that have not committed. The atomicity property requires that the
effects of these transactions should be removed. Hence, if stealing is allowed,
undo information of uncommitted writes must be forced to the log before the
updated records are written to disk.
The second parameter determines if data items updated by a transaction must be forced to disk or not (i.e. no-force) before the transaction is
committed (Gray, 1978). If force is used, the disk must be accessed during
critical parts of transaction execution. This may lead to inefficient cache
management. If no-force is used, redo information of all committed changes
must be written to the log, and the log must then be forced to disk. This is
known as Force Log at Commit (Mohan et al., 1992).
It is common to use a steal/no-force cache manager since this provides
the maximum freedom and highest performance. This is also the case for a
well-known recovery strategy, ARIES (Mohan et al., 1992), which is briefly
described in the next section.
The ARIES Recovery Strategy
ARIES (Mohan et al., 1992) (Algorithm for Recovery and Isolation Exploiting Semantics) is a recovery method for the steal/no-force cache manager
strategy described above. ARIES uses 2PL for concurrency. It is in common
use in many commercial DBMS, e.g. SQL Server 2005 (Microsoft TechNet,
2006) and IBM DB2 version 9 (IBM Information Center, 2006). The principles are also used in the derived table creation method, which is the topic of
this thesis.
ARIES uses the Write-Ahead Logging (WAL) protocol (Gray, 1978),
which requires that a log record describing a change to a data item is written
to disk before the change itself is written. One sequential log, containing
both undo and redo log records, is used. A unique, ascending Log Sequence
Number (LSN) is assigned to each record in this log. The LSN is also used
to tag blocks so that a disk block is known to reflect a logged change if and
only if the LSN of the disk block is equal to or greater than that of a log
record. Log records are initially added to the volatile part of the log file, and
are forced to disk either when a commit request is processed or when the
cache manager writes changed data items to disk.
CHAPTER 2. DERIVED TABLE CREATION BASICS
21
The ARIES protocol can be used with both logical and physiological
logging. It supports fine-granularity locks and semantically rich locks (Mohan
et al., 1992).
2.4
Record Identification Policy
The DBMS needs a way to uniquely identify records so that transactional
operations and recovery work is applied to the correct record. Each record
is therefore assigned a Record Identifier (RID). The mapping from RID to
the physical record is called the access path. There are four identification
techniques (Gray and Reuter, 1993): Relative Byte Address, Tuple Identifier,
Database Key, and Primary Key.
Physical Identification Policies
Relative Byte Addresses (RBA) consist of a block address and an offset, i.e.
the byte number within that block. RBA is fast since it points directly to the
correct physical address. Physical location is not very stable, however. E.g.,
an update may increase a records size, which may change the offset or block.
An address that is as unstable as this is not well suited as a RID (Gray and
Reuter, 1993).
Tuple Identifiers (TID) consists of a block address and a logical ID within
the block. Each block has an index used to map the ID to the correct offset.
Hence, a record may be relocated within a block without changing the RID.
A pointer to the new address is used if a record is relocated to another block.
When a pointer is followed, the access path to the record becomes more
costly, however. Hence, relocated records should eventually receive a new
TID reflecting the actual location. This reorganization must be executed
online, i.e. in parallel to normal processing, and represents an overhead.
This seems to be the most common record identification technique; it is used
by, e.g., IBM DB2 v9 (IBM Information Center, 2006), SQL Server 2005
(Microsoft TechNet, 2006) and Oracle 10g (Cyran and Lane, 2003).
Logical Identification Policies
Database Keys are unique, ascending integers assigned to records by the
DBMS. A translation table maps database keys to the physical location of
the records. The database key works as an index in the array-like translation
table, and therefore requires only one block access. This mapping ensures
that a record can be relocated to any block without having to change its
RID. The extra lookup incurs an access path overhead, however.
22
2.4. RECORD IDENTIFICATION POLICY
Since all records in a relational database system are required to have a
unique primary key 4 , the primary keys may serve as RIDs as well. Addressing
is indirect, but in contrast to the previous method, primary keys can not
be used as an index in a translation table since primary keys do not have
monotonically increasing values. Thus, a B-tree is used to map the primary
key to the physical location of the record. The access path is approximately
as costly as database keys, but has a number of advantages. These include
that access to records is often done through primary keys, so the primary key
mapping to record must be done either way. The uniqueness of primary keys
must also be guaranteed by the DBMS. This is efficient to do when primary
key is used as RID (Gray and Reuter, 1993). This technique is used, e.g, in
Oracle 10g if the table is index-organized (Cyran and Lane, 2003).
When creating derived tables, the physical location of records residing
in DTs are not the same as in the source tables. The blocks are obviously
different. The location within the blocks may also be different since a relational operator is applied. Hence, the DT creation method described in this
thesis assumes that a logical record identification scheme is used, i.e. either
Database Keys or Primary Keys. Using physical identification policies, i.e.
RBA or TID, is also possible, but requires an additional address mapping
table.
4
Either supplied by the user or generated by the system (Gray and Reuter, 1993).
Chapter 3
A Survey of Technologies
Related to Non-Blocking
Derived Table Creation
This chapter describes the state of the art in non-blocking creation of derived
tables (DT). The aim of the survey is to evaluate the functionality and cost
of existing methods used for this purpose. Some of the ideas presented here
will later be used in our non-blocking DT creation method. This will be
explicitly commented on.
Three related areas of research are discussed. First, a schema transformation method that can be used for some of the relational operators is described. To the author’s knowledge, this is the only research on non-blocking
transformations in relational database systems in the literature. Next, we
describe fuzzy copying, which is a method for non-blocking creation of DTs,
but without the ability to apply relational operators. Third, maintenance
techniques for Materialized Views (MV) are discussed. The motivation for
this is that an MV is a type of DT, and some of the research in MV maintenance is therefore applicable to our suggested DT creation method. Finally,
methods for schema transformations and DT creation available in existing
DBMSs are described.
3.1
Ronström’s Schema Transformations
Ronström (Ronström, 2000) presents a non-blocking method that uses both a
reorganizer and triggers within users’ transactions to perform schema transformations, called schema changes by Ronström (Ronström, 1998). It is
argued that there are three dimensions to schema transformations. These
24
3.1. RONSTRÖM’S SCHEMA TRANSFORMATIONS
are soft vs. hard schema changes, simple vs. complex schema changes and
simple vs. complex conversion functions (Ronström, 1998). A summary can
be found in Table 3.1.
The soft vs. hard schema change dimension determines whether or not
new transactions are allowed to use the new schema before all transactions
on the old schema have terminated. Thus, with soft schema changes, transactions that were started before the new schema was created continue processing on the old schema while new transactions start using the transformed
one (Ronström, 1998). With hard schema changes, new transactions are not
allowed to start processing on the affected parts of the transformed schema
until all transactions on the old schema have terminated.
Soft schema changes are desirable since with this strategy, new transactions are not blocked while the old transactions finish processing. In some
cases, soft schema changes can not be used, however. This happens whenever the new schema does not contain enough information to trigger updates
back to the old schema, i.e. when a mapping function from the transformed
attributes to the old attributes does not exist (Ronström, 1998).
The second dimension to schema changes divides transformations into
simple and complex schema changes. Simple schema changes are short lived,
and typically involve changes to the schema description only (Ronström,
1998). Complex schema changes involve many records and take considerable
time. With this method, complex schema changes should not be executed
as one single blocking transaction due to their long execution time (Ronström, 2000). Instead, complex schema changes are organized using SAGAs1
(Garcia-Molina and Salem, 1987).
The third dimension to schema changes is that of simple vs. complex conversion functions. In simple conversions, all the information needed to apply
an operation in the transformed schema is found in the operated-on record in
the old schema. Complex conversions, on the other hand, need information
from other records before the operation can be applied. This information
may be found in other tables (Ronström, 2000). Complex conversions can
only be performed by complex schema changes (Ronström, 2000).
The following sections describe how transformations are performed in
Ronström’s method. The description divides transformations into simple
and complex changes, i.e. along the second dimension. Even though not
all of the complex schema changes actually create DTs, all changes that can
be performed by the method are presented for readability. A thorough cost
analysis of schema transformation operators is presented in Section 3.1.3.
1
SAGA is a method to organize long running transactions (Garcia-Molina and Salem,
1987)
CHAPTER 3. A SURVEY OF TECHNOLOGIES RELATED TO
NON-BLOCKING DERIVED TABLE CREATION
25
Schema Change
Soft
New transactions are allowed to start accessing the new tables
while the old transactions are accessing the old tables.
Hard
Transactions that try to access the new tables are blocked until
all transactions accessing the old tables have completed.
Simple
Short lived, typically only changes schema description, executed as one transaction.
Complex Long lived, involves many records, executed using triggers and
SAGA transactions.
Conversion Function
Simple
A record can be added to the new schema by reading only the
operated-on record in the old schema.
Complex Adding a record to the new schema may require information
from multiple records in the old schema. Always executed as a
complex schema change.
Table 3.1: The three dimensions of Ronström’s schema transformations.
3.1.1
Simple Schema Changes
Simple schema changes only change the schema description of the database.
The changes are organized in a way similar to the two-phase commit protocol,
described in Section 2.2. First, the system coordinator sends the new schema
description to all nodes. If the transformation is hard, each node will wait for
ongoing transactions to finish processing. The nodes then lock the involved
parts of the schema, update the schema description and log the change before
acknowledging the change request. When all nodes have acknowledged the
change, the coordinator sends commit to all nodes, including a new schema
version number. New transactions will from now on use the transformed
schema.
Examples of simple schema changes include adding and dropping a table,
adding an attribute with a default value, and dropping an attribute or index.
None of these operations involve creation of derived tables.
3.1.2
Complex Schema Changes
Schema changes involving many records are considered complex in Ronström’s method. This includes adding attributes, with values derived from
other attributes, to a table. It also includes adding a secondary index. Additionally, both horizontal and vertical merge and split of tables can be performed (Ronström, 1998). In terms of relational operators, horizontal merge
26
3.1. RONSTRÖM’S SCHEMA TRANSFORMATIONS
corresponds to the union operator, while vertical merge corresponds to the
left outer join operator. The split methods are inverses of the merge methods.
All complex schema changes go through three phases. The schema is
first changed by adding the necessary tables, attributes, triggers, indices and
constraints (Ronström, 2000). Second, the involved tables are operated on
by reading and performing necessary operations one record at a time. The
required operations depend on the transformation being performed. Involved
tables are left unlocked for the entire transformation, whereas the records are
locked temporarily. To ensure that the transformation does not lock records
for long periods of time, only one record is operated on per transaction.
All these transactions, each operating on one record, are organized by using
SAGAs. While transactions read records in the source tables and perform
the operations necessary for the transformation, triggers ensure that insert,
delete and update operations in the old schema are performed in the new
schema as well (Ronström, 1998).
The third phase is started once the SAGA organized transactions have
completed. If the schema change is soft, new user transactions start using
the new schema immediately while active transactions are allowed to finish execution on the old schema. Since both schemas are in use, triggers
have to forward operations both from the old to the new schema and vice
versa. If the schema change is hard, transactions are not allowed to use the
new schema until all transactions on the old schema have completed. When
all transactions that were using the old schema have terminated, obsolete
triggers, tables, attributes, indices and constraints are removed (Ronström,
2000).
In what follows, all complex schema changes that can be performed by
Ronström’s method are described in detail. Ordered by increasing complexity, these are horizontal merge and split, and vertical merge and split
transformations.
Horizontal Merge Schema Change
In Ronström’s schema transformation framework, horizontal merge corresponds to the UNION relational operator without duplicate removal. The
transformation is performed by creating a new table in which records from
both source tables are inserted. Hence, this is a derived table.
As illustrated in Figure 3.1, records from both source tables may have
identical primary keys. This is a problem because two records with the same
primary key are not allowed to coexist in the new table. This may be solved
by using a non-derived primary key, e.g. an additional attribute with an autoincremented number, in the new table. Alternatively, the primary key of the
CHAPTER 3. A SURVEY OF TECHNOLOGIES RELATED TO
NON-BLOCKING DERIVED TABLE CREATION
27
Foreign key
Vinyl ("original table 1")
Artist
Album
Smith, Jimmy
Jones, Norah
Davis, Miles
Root Down
Come away...
Kind of...
FK
FromTbl
CD ("original table 2")
Artist
Smith, Jimmy
Krall, Diana
Album
Record ("new table")
FK
Vin
Vin
Vin
CD
CD
Artist
Smith,
Jones,
Davis,
Smith,
Krall,
Jimmy
Norah
Miles
Jimmy
Diana
Album
FK
Root Down
Come away...
Kind of...
Root Down
The look...
Root Down
The look...
Figure 3.1: Horizontal Merge. Two tables, “Vinyl” and “CD”, are
merged into one new table, “Record”. The primary key of both tables is
<artist,album>. The new table includes an attribute “FromTable” so that
identical primary key values from the two source tables may coexist.
new table may be a combination of the primary key from the old schema
and an attribute identifying which table the record belonged to (Ronström,
2000).
The method starts by creating the new table. Foreign keys are then
added to both the old tables and to the new table. Since duplicates are not
removed, there is a one-to-one relationship between records in the old and
the new schema. Thus, triggers in both schemas will only have to operate
on one record in the other schema.
Update and delete operations in one of the source tables trigger the same
operation on the record referred to in the new table. For soft transformation,
updates and deletes in the new table have similar triggers. Insert operations
in a source table simply trigger an equal insert into the new table. Inserts
into the new table should trigger inserts into one of the old tables as well. If
the new table contains an attribute that identifies which old table it should
belong to, this is straightforward. If this is not the case, e.g. because a nonderived key is used in the new table, the transformation cannot be performed
softly.
In the second step, records from the old tables are read and inserted into
the new table one record at a time. When all records have been copied, new
transactions are given access to the new schema. The old tables are deleted
once all old transactions have finished processing.
28
3.1. RONSTRÖM’S SCHEMA TRANSFORMATIONS
Foreign key
HighSalary ("new table1")
F.Name
S.Name
Salary
Hanna
Markus
Valiante
Oaks
$40’
$42’
FK
Employee ("original table")
F.Name
S.Name
Salary
Hanna
Erik
Markus
Peter
Valiante
Olsen
Oaks
Pine
$40’
$32’
$42’
$35’
FK
LowSalary ("new table2")
F.Name
S.Name
Salary
Erik
Peter
Olsen
Pine
$32’
$35’
FK
Figure 3.2: Horizontal Split. One table, “Employee”, is split into two tables
based on salary.
Horizontal Split Schema Change
Horizontal split is the inverse of horizontal merge; it splits one table into
two or more tables by copying records to the new tables depending on a
condition. An example transformation is that of splitting “Employee” into
“High Salary Employee” and “Low Salary Employee” based on conditions
like “salary >= $40.000” and “salary < $40.000”. This transformation is
illustrated in Figure 3.2. Only horizontal split transformations where all
source table records match the condition of one and only one new table is
described by Ronström (Ronström, 2000). Because of this, records in the
old table refer to exactly one record in the new schema, thus simplifying the
transformation.
The new tables are first added to the schema, thus this method creates
derived tables. Foreign keys, initially set to null, are then added both to the
old and new tables. Once a record has been copied to the new schema, the
foreign keys in both schemas are updated to point to the record in the other
schema.
The transformation can easily be made soft by adding triggers to both
the old and the new tables. Because of the one-to-one relationship between
records in the old and new schema, these triggers are straightforward: deletes
and updates perform the same operation on the record referred to by the
foreign key. Insert operations into the old table trigger an insert into the
new table that has a matching condition. Ronström does not discuss how to
handle an insert into a new table if the old table already contains a record
with the same primary key. In all other cases, inserts into the new table
CHAPTER 3. A SURVEY OF TECHNOLOGIES RELATED TO
NON-BLOCKING DERIVED TABLE CREATION
29
simply result in an insert into the old table as well.
When the triggers have been added, the transformation is executed as
described in the general method by copying the data in the old table one
record at a time.
Vertical Merge Schema Changes
The vertical merge schema change uses the left outer join relational operator. Since records without a join match in the left table of the join are not
included, this transformation is not lossless. The method requires that the
tables have the same primary key, or that one table has a foreign key to the
other table (Ronström, 2000). This is illustrated in Figures 3.3(a) and 3.3(b),
respectively. These requirements imply that the method cannot perform a
join of many-to-many relations, nor a full outer join.
Since the join is performed by adding attributes to one of the existing
tables, a DT is not created. Hence, the method cannot be used for other
purposes than schema transformations.
The transformation starts by adding the attributes belonging to the joined
record to the left table of the join. This table is called the left table in the
old schema and the merged table in the new schema. The left table is represented by “Person” in both Figures 3.3(a) and 3.3(b). A foreign key to the
right table of the join, called the originating table, is also added if it does not
already exist. The originating table is represented by “Salary” and “PAddress” in Figures 3.3(a) and 3.3(b), respectively. In addition, an attribute
indicating whether the record has already been transformed is added. During
the transformation, transactions that operate on the old schema do not see
the attributes added to the left table.
Triggers are then added to the originating table. Update and delete operations trigger update operations of all records referring to it in the merged
table. The trigger on deletes also removes the foreign key of referring records.
Insert operations trigger updates of all records in the merged table matching the join attributes. All these triggers also set the has-been-transformed
attribute to true (Ronström, 2000).
Since old transactions are free to operate on the left table, a number
of triggers must be added there as well. The trigger of insert operations
reads the matching record in the originating table so that the value of the
added attributes can be updated accordingly. Update operations that change
the foreign key, trigger a read of the new join match and update the added
attributes to keep it consistent. In addition, all modifying operations2 have
2
Inserts, updates and deletes are modifying operations.
30
3.1. RONSTRÖM’S SCHEMA TRANSFORMATIONS
"Left table"
Old Schema
New Schema
Person
Person
Firstname
Surname
Address
Zip Code
Firstname
Surname
Address
Zip Code
Salary
Salary
"Merged table"
Salary
"Originating
table"
Firstname
Surname
Salary
(a) Vertical merge with same primary key. “Firstname, Surname” is the primary key in all tables. The merged table is created by adding the salary
attribute to “Person”.
Old Schema
Person
"Left table"
"Originating
table"
Firstname
Surname
Address
Zip Code
New Schema
Person
Firstname
Surname
Address
Zip Code
City
City
"Merged table"
PAddress
Zip Code
City
(b) Vertical Merge of functional dependency. Person.ZipCode is a foreign key
to PAddress.ZipCode. The merged table is created by adding city to “Person”.
Figure 3.3: Examples of Vertical Merge Schema Change.
CHAPTER 3. A SURVEY OF TECHNOLOGIES RELATED TO
NON-BLOCKING DERIVED TABLE CREATION
31
to update the foreign key reference in the original table.
Once all necessary attributes and triggers are in place, the records in the
left table are processed one at a time. A transaction reads a record and uses
the foreign key to find the record referred to in the originating table. The
attribute values of that record are then written to the added attributes in
the merged table, and the has-been-transformed attribute is set (Ronström,
2000).
When all records in the merged table have been processed, it contains a
left outer join of the two tables. A hard schema change is simply achieved
by letting old transactions complete on the old schema. Triggers, attributes
and foreign keys no longer in use are then dropped, before new transactions
are allowed to use the new schema (Ronström, 2000).
Performing a soft vertical merge transformation implies adding triggers
to the merged table as well. These triggers make sure that write operations
executed by transactions operating on the new schema are also visible for
transactions using the old schema. Thus, updates on the added attributes
of records in the merged table must trigger updates to the referred record in
the originating table.
Insert operations into the merged table trigger a scan of the originating
table to see if it already contains a matching record. If so, only the foreign
key of the inserted record is updated. Otherwise, a record is also inserted
into the originating table.
A problem arises if a record is inserted into the merged table and the
trigger scanning the originating table finds that an inconsistent record already
exists. The same case is encountered if the added attributes of a merged
table record are updated while multiple records refer to the same originating
record. Since two records with the same primary key cannot exist in the
originating table at the same time, a simple insert of a new record is not
possible. Furthermore, it would not be correct to just update the originating
record since the other records referring to it would disagree on its attribute
values. This problem is not addressed in the method, but there are at least
two possible solutions: the first possibility is to abort the transaction trying
to insert or update the record in the merged table. This would be a serious
restriction to operations in the merged table. The second is to update the
record in the originating table, which in turn triggers updates on all records
in the merged table that refers to it. This is illustrated in Example 3.1.1:
Example 3.1.1 (Triggered Updates During Soft Vertical Merge)
Consider the vertical merge illustrated in Figure 3.4. During a soft schema
change, an attribute added to the merged table, “City”, is updated. This
triggers an update to the originating table, “Postal Address”, again triggering
32
3.1. RONSTRÖM’S SCHEMA TRANSFORMATIONS
"update merged set city = London where..."
User update
Triggered update
F.Name
S.Name
Address
Zip
City
Zip
City
Hanna
Erik
Markus
Peter
Sofie
Valiante
Olsen
Oaks
Pine
Clark
Moholt 3
Torvbk 6
Mollenb.7
Oslovn 34
Berg 1
7020
5121
7020
0340
7020
Tr.heim
Bergen
Tr.heim
Oslo
Tr.heim
0340
5121
7020
9010
Oslo
Bergen
Tr.heim
Tromsø
Figure 3.4: Example 3.1.1 - An update to a record in the merged table triggers
updates in both the originating and the merged table.
updates on records in the merged table that refers to it.
This second scenario would probably be preferred in most cases. Note,
however, that the behavior of transactions in the new schema would not be
equal.
Vertical Split Schema Change
The vertical split transformation uses the projection relational operator. The
transformation is the inverse of vertical merge. Like vertical merge, vertical
split uses triggers and transactions that operate on one record at a time to
perform the transformation.
The transformation starts by creating a table containing a subset of attributes from the original table. If, e.g., a table “Person” is vertically split
into “Person” and “PAddress”, “PAddress” is the new table. This is illustrated in Figure 3.5. The table that is split is called the original table in the
old schema and the split table in the transformed schema (Ronström, 2000).
Note that the “new” table is the only derived table that is created in this
transformation.
When the new table has been created, foreign keys are added to both the
original and the new tables. Records in the original table use these to refer
to the records in the new table and vice versa. As illustrated in Figure 3.6,
all foreign keys are initially NULL (Ronström, 2000).
A number of triggers are needed on the original table to ensure that
operations are executed on the new table as well (Ronström, 2000).
Inserts into the original table trigger a scan of the new table. If a record
with matching attribute values is found in the new table, the foreign key of
the inserted record is set to point to it. In addition, the foreign key of the
CHAPTER 3. A SURVEY OF TECHNOLOGIES RELATED TO
NON-BLOCKING DERIVED TABLE CREATION
Old Schema
33
New Schema
Person
Person
"Original table"
Firstname
Surname
Address
Zip Code
City
City
Firstname
Surname
Address
Zip Code
"Split table"
PAddress
"New table"
Zip Code
City
Figure 3.5: Vertical Split over a functional dependency. A table “Person” is
split into “Person” and “PAddress”. Only the “new” table, PAddress, is a
derived table.
record in the new table is updated to also point to the newly inserted record.
If no matching record is found in the new table, a record is inserted before
updating the foreign keys.
A delete operation triggers a delete of the record referred to in the new
table if this is the only original record contributing to it. Hence, the existence
of other referring records has to be checked before the record in the new table
is deleted.
A trigger is also needed on updates that affect records in the new table. If
the updated record in the original table is the only record referring to it, the
new record is simply updated. Otherwise, a record with the updated values
is inserted into the new table before updating the foreign keys.
Assuming that the schema transformation is soft, triggers must also be
added to the new table (Ronström, 1998; Ronström, 2000). Delete operations
trigger the deletion of split attribute values in all referring records. Update
operations trigger updates to all referring records in the split table. Finally,
insert operations update all record in the original table with matching join
attribute values. Thus, the insert trigger has to scan all original records to
find these join matches.
Inconsistencies. Assuming that inconsistencies never occur, the described
vertical split method works well. Unfortunately, this assumption does not
hold. Consider Figure 3.7, which illustrates a typical inconsistency: two
records with zip code “7020” have the city value “Tr.heim”, while a third
34
3.1. RONSTRÖM’S SCHEMA TRANSFORMATIONS
F.Name
S.Name
Address
Zip
City
FK
Hanna
Erik
Markus
Peter
Sofie
Valiante
Olsen
Oaks
Pine
Clark
Moholt 3
Torvbk 6
Mollenb. 7
Oslovn 34
Berg 1
7020
5121
7020
0340
7020
Tr.heim
Bergen
Tr.heim
Oslo
Tr.heim
NULL
NULL
NULL
NULL
NULL
Zip
City
FK
(a) The first step of Vertical Split Schema Transformation of a table “Person” into
“Person” and “PAddress”. The new table has been created, and foreign keys have been
added to both the original and the new table. The City attribute is gray because it is
only part of the original schema, not the transformed one.
Currently scaning
Foreign key
F.Name
S.Name
Address
Zip
City
Hanna
Erik
Markus
Peter
Sofie
Valiante
Olsen
Oaks
Pine
Clark
Moholt 3
Torvbk 6
Mollenb. 7
Oslovn 34
Berg 1
7020
5121
7020
0340
7020
Tr.heim
Bergen
Tr.heim
Oslo
Tr.heim
FK
Zip
City
5121
7020
Bergen
Tr.heim
FK
NULL
NULL
(b) The transformation process has started to copy records from the original table to
the new table. Only the three topmost records have been read so far.
Figure 3.6: Illustration of the vertical split transformation.
record has a mistyped value “Tr.hemi”. Inconsistencies like this are called
anomalies (Garcia-Molina et al., 2002).
Anomalies are likely to appear in vertical split transformations since the
old schema does not guarantee consistency. E.g., nothing prevents the city
attribute values “Tr.hemi” and “Tr.heim” to appear at the same time, as illustrated in Figure 3.7. In many cases, vertical split changes over a functional
dependency may be performed just to avoid such anomalies by decomposing
a table into Boyce-Codd Normal Form (BCNF)3 .
This problem is not addressed in the method description, but there are at
least a few possible solutions: one possibility is to update the record in the
new table with values from the latest read. This would in turn result in an
update of all records in the original table that are referred to by this record.
3
A schema in BCNF is guaranteed to not have anomalies (Garcia-Molina et al., 2002).
CHAPTER 3. A SURVEY OF TECHNOLOGIES RELATED TO
NON-BLOCKING DERIVED TABLE CREATION
35
Currently scaning
Foreign key
F.Name
S.Name
Address
Zip
City
Hanna
Erik
Markus
Peter
Sofie
Valiante
Olsen
Oaks
Pine
Clark
Moholt 3
Torvbk 6
Mollenb. 7
Oslovn 34
Berg 1
7020
5121
7020
0340
7020
Tr.heim
Bergen
Tr.heim
Oslo
Tr.hemi
FK
?
Zip
City
5121
7020
0340
Bergen
Tr.heim
Oslo
FK
Figure 3.7: Vertical Split with an inconsistency. The figure illustrates the
same scenario as Figure 3.6, but with an inconsistency between records with
zip code 7020.
In Figure 3.7, both Hanna Valiante and Markus Oaks would in this case get
an incorrect but consistent city name.
Another solution is to count the number of records that agree on each
value, in this case two vs. one, and let the majority decide. With this strategy,
the record in the original table may have to be updated. In the figure, the
city value of Sofie Clark would be updated to Tr.heim. This strategy is likely
to produce correct results in most scenarios, but there are no guarantees. It
may, in some cases, result in incorrect attribute values for records that were
correct in the first place.
A third possibility is to add a non-derived primary key, e.g. an autoincremented number, to the new table. The new schema could, e.g., look like
this:
Person (f.name,s.name, address, postalID)
PostalAddress (postalID, zip, city)
This would, however, not remove anomalies since the implicit functional
dependency between zip code and city is not resolved. Thus, in cases where
the vertical split is used to decompose a table into 3NF or BCNF, this solution
would not meet the intention of the operation.
The described problem is similar to that described for soft vertical merge
transformations in the previous section, where triggers of inserts into the
merged table find a record in the originating table that has the same key but
differs on other attributes. Such inconsistencies are, however, much more
likely to occur in vertical split transformations since they may be introduced
not only during the time interval when transactions operate on both schemas,
but also at any point in time before the transformation started. Furthermore,
36
3.1. RONSTRÖM’S SCHEMA TRANSFORMATIONS
rx /wx
rxf k /wxf k
|x|
RF
vm
vs
lef t
ogt
m
ogl
split
new
Legend
Read/write a record in table x
Read/write the foreign key of a record in table x
Cardinality of table x
Reference Factor - average number of references a record in one
table has to records in another table
Vertical merge
Vertical split
Left table (see vertical merge)
Originating table (see vertical merge)
Merged table (see vertical merge)
Original table (see vertical split)
Split table (see vertical split)
New table (see vertical split)
Table 3.2: Legend for Tables 3.3 and 3.4.
the problem cannot be solved by aborting the transaction that introduced
the anomaly since it may already have committed.
3.1.3
Cost Analysis of Ronström’s Method
Since the triggers that keep the old and the new schemas consistent are executed within the transaction that triggered them, the cost added to normal
operations is highly relevant: it will increase response time of each operation
in addition to the overall workload of the database system. Since cost estimates or test results have not been published for the method, an analysis is
provided in this section.
The reference factor (RF) is defined as the average number of references
a record in one table has to records in another table. Thus,
RFvm =
|lef t|
|originating|
(3.1)
|split|
|new|
(3.2)
RFvs =
for vertical merge and split, respectively. RFvm and RFvs will be referred to
as RF unless otherwise noted.
Tables 3.3 and 3.4 summarizes the added trigger cost for normal operations on the involved tables during vertical merge and split schema transformations, respectively. The added trigger costs for the horizontal methods
CHAPTER 3. A SURVEY OF TECHNOLOGIES RELATED TO
NON-BLOCKING DERIVED TABLE CREATION
37
Vertical Merge
Operation on
Added cost
Left
rogt +wm +wogtf k
(
rogt + wm + 2 × wogtf k
Update
0
Delete
Orig.t
Old Schema
Insert
Insert1
0
(
RF × wm
|lef t| × rm + RF × wm
assuming an index on join attribute
if join att. is updated,
otherwise
if index on join att. in left table,
otherwise.
Update RF × wm
Delete
Merged
New Schema
Insert
RF × (rmf k + wm )


rogt + wogt + (RF − 1) × wm
rogt + wogtf k


wogt
if non-equal record exists,
if equal record exists,
if record does not exist


rogtf k + wogtf k + wogt
if non-derived prim-key, >1 reference,




rogtf k + wogt
if non-derived prim-key, one reference,

Update
rogtf k + wogt + (RF − 1) × wm



if join att is prim-key,



0
if no atts. in orignt. table are updated.
Delete
0
Table 3.3: Added Cost Incurred by Vertical Merge Schema Transformation
Methods. (1 ) Note that RF is often 0 for inserts into the originating table.
are always the same as the cost of the original operation, and are therefore
not shown in a separate table. A few examples are provided to clearify the
incurred costs:
Example 3.1.2 (Cost of Consistent Insert During Vertical Split)
A database contains a number of tables. One of these is “PersonInfo”, containing information on all Norwegian citizens, including zip code and city.
The table is then vertically split into “Person” and “PostalAddress”. There
are 4.6 million people and 4600 zip codes registered in the database. Thus,
RF =
4.6million
= 1000
4600
During the transformation, a user inserts a person into the original table (i.e.
in the old schema). This new person has a zip code and city that is consistent
38
3.1. RONSTRÖM’S SCHEMA TRANSFORMATIONS
Vertical Split
Operation on
Original
Split
if inconsistent,
if consistent,
if non-existing.


rnewf k + wnewf k + wnew if non-derived prim-key, >1 reference,





r
if non-derived prim-key, one reference,
 newf k + wnew
Update
rnewf k + wnew + (RF − 1) × wogl



if join att is prim-key,



0
if no atts. in new table are updated.
Delete
(
rnewf k + wnew
rnewf k + wnewf k
if only one referring,
otherwise.
Insert
(
rnew + woglf k
rnew + wogl + wnewf k
if no join match in new table,
if join match in new table.
(
rnew + wogl + 2 × wnewf k
Update
0
Delete
New
New Schema
Old Schema
Insert
Added cost


rnew + wnew + (RF − 1) × wogl
rnew + wnewf k


wnew
Insert1
rnewf k + wnewf k
(
RF × wogl
|original| × rogl + RF × wogl
if join attribute is updated,
otherwise.
if index on join att. in original,
otherwise.
Update RF × wogl
Delete
RF × (roglf k + wogl )
Table 3.4: Added Cost Incurred by Vertical Split Schema Transformation
Methods. (1 ) Note that RF is often 0 for inserts into the new table.
CHAPTER 3. A SURVEY OF TECHNOLOGIES RELATED TO
NON-BLOCKING DERIVED TABLE CREATION
39
with all other persons in the “PersonInfo” table. The cost is:
Cex1 = Cnormal + Cadded
= wogl + rnew + wnewf k
For readability, assume that a read and a write operation has the same
cost in terms of IO and CPU, and that a write of a foreign key has the
same cost as other write operations. With these assumptions, the simple
insert has to perform three times more operations than it would without the
transformation. Furthermore, it has to read lock a record in the new table.
Example 3.1.3 (Cost of Update During Vertical Split)
During the transformation described in Example 3.1.2, another user updates
the city of a person in the original table. Since this transformation performs
a split over a functional dependency, the primary key of the PostalAddress
table is zip code. Thus, when the update triggers an update of the record in
the new table, that update triggers an update of all original records with the
same zip code:
Cex2 = Cnormal + Cadded
= wogl + rnewf k + wnew + (RF − 1) × wogl
= rnewf k + wnew + 1000 × wogl
Hence, the update results in 1001 more operations and 1000 more locks than
would be the case without the transformation.
Example 3.1.4 (Cost of Update in New Schema)
The transformation described in Example 3.1.2 is made soft, and a third user
therefore gets access to the new schema. The user updates a postal address:
Cex3 = Cnormal + Cadded
= wnew + RF × wogl
= wnew + 1000 × wogl
The update results in 1000 more locks and operations than it would without
the transformation.
As can be seen from Tables 3.3 and 3.4 and the examples, the cost of
operations during a schema transformation varies enormously. In almost all
cases, however, the cost is at least two to three times higher in terms of
operations and locks than it would be without the transformation.
40
3.2. FUZZY TABLE COPYING
3.2
Fuzzy Table Copying
Fuzzy copying is a technique used to make a copy of a table in parallel
with other operations, including updates. There are two variants: the first
method is block oriented, and works like a fuzzy checkpoint (Hagmann, 1986)
that is made sharp, i.e. consistent (Gray and Reuter, 1993). The second
method is record oriented, and is better suited when the copy is installed in
an environment that is not completely identical to the source environment.
An example is copying a table from one node to another in a distributed
system. The block size may differ between these nodes, and the physical
address of the records in the copy would therefore differ from those in the
source.
Both fuzzy copy methods work in two steps: in the first step, the source
table is read without using any table or record locks, and therefore results in
an inconsistent copy. The method gets its “fuzzy” name because this initial
copy is not consistent with a state the source table has had at any point in
time. In the second step, the copy is made consistent by applying log records
that have been generated by concurrent operations during the first step.
In the block oriented fuzzy copy (BoFC) method (Gray and Reuter, 1993;
Bernstein et al., 1987), the source table is read one block at a time. Locks are
ignored, but block latches4 (Gray, 1978) are used to ensure that the blocks
are action consistent. The log records that have been generated while reading
the blocks are then redone to the fuzzy copy. Since the method copies blocks,
the addresses of records in the copy are the same as in the source. Hence,
logical and physiological logging (see Section 2.3) can both be used. Also,
all four record identification schemes, described in Section 2.4, will work.
The record oriented fuzzy copy (RoFC) method (Bratsberg et al., 1997a)
copies records instead of blocks in the first step. As for BoFC, only latches are
used during this process. The copied records may then be inserted anywhere;
there are no restrictions on the physical address of the records at the new
location. Once the inconsistent copy has been installed, the log records
are applied to it. Since the physical addresses of records may be different
at the new location, only logical logging can be used. Furthermore, state
identifiers must be assigned to records instead of blocks, and the records must
be identified by a non-physical record ID, as described in Section 2.4. Due
to its location independent nature, this strategy is suitable for declustering
in distributed database systems.
4
Latches, also called semaphores, are locks held for a very short time (Elmasri and
Navathe, 2004). They are typically used to ensure that only one operation is applied to a
disk block at a time (Gray and Reuter, 1993).
CHAPTER 3. A SURVEY OF TECHNOLOGIES RELATED TO
NON-BLOCKING DERIVED TABLE CREATION
3.3
41
Materialized View Maintenance
Materialized views (MVs) are used in database systems, e.g., to speed up
queries by precomputing and storing query results, and in data warehouses.
The work on MVs started with Snapshots (Adiba and Lindsay, 1980). These
were able to answer queries on historical data and speed up the query processing. As the benefits of Snapshots were appreciated, the concept was
extended to be able to answer queries on current data to lower query cost.
This extension to Snapshots is called Materialized Views (MVs).
During the last two decades, MVs have evolved to become a very beneficial addition to DBMSs. Benefits include less work when processing queries
and less network communication in distributed queries. The purpose of the
chapter is to address the problems with MVs and to show their proposed
solutions from the literature.
As is evident from the following chapters, methods to keep MVs up to date
have been researched extensively. The initial creation of MVs has, however,
been neglected.
3.3.1
Snapshots
Database Snapshots marks the beginning of what is now known as Materialized Views (MVs). They are defined by a query and are populated by
storing the query result in the Snapshot table. Once created, transactions
may query them for historical data (Adiba and Lindsay, 1980).
Snapshots can later be refreshed to reflect a newer state. This can be
done by deleting the content of the snapshot and then reevaluate the query
(Adiba and Lindsay, 1980). An alternative is to take advantage of recovery
techniques such as differential files and the recovery log to compute deltavalues to the old Snapshot only (Kahler and Risnes, 1987). This generally
requires less work than the first method.
An algorithm using the second strategy is presented by Lindsay et al.
(Lindsay et al., 1986). The algorithm associates a timestamp value with
every record in the source relation of the Snapshot. The timestamp is a
monotonically increasing value with which it is easy to decide whether an
update occurred before or after another update. When a Snapshot is updated, the update transaction uses the timestamp to find updates that took
place after the previous Snapshot. Only records with a higher timestamp
value need to be updated in the Snapshot.
42
3.3. MATERIALIZED VIEW MAINTENANCE
PostalAddress
MV
Zip
City
City
7020
7030
0340
Trondheim
Trondheim
Oslo
Trondheim
Oslo
Figure 3.8: Illustration of Example 3.3.1: A Materialized View stores all city
names in the PostalAddress table.
3.3.2
Materialized Views
In contrast to Snapshots, MVs are typically not allowed to be out of date
when they are queried. They are divided into two main groups: immediate
and deferred update. These two groups differ only in when they are refreshed;
the former method forwards updates within the user transaction that updated
the source records while the latter leaves the MV update work to a separate
view update transaction.
An important part of MV maintenance is to keep the MVs consistent
with the source tables. To do this, updates to the source tables have to be
forwarded correctly to the view. The following example illustrates one of the
problems of keeping consistency:
Example 3.3.1 (An MV Consistency Problem)
An MV defined as the set (i.e. no duplicates) of all cities in the PostalAddress table, as illustrated in Figure 3.8. The current state of the source
table and the MV is illustrated above. Suppose a transaction deletes the
record < 0300, Oslo > from the source table. A correct execution in the
MV is to delete the record < Oslo >. On the other hand, if the transaction
deletes the record < 7020, T rondheim >, a correct execution is to not delete
< T rondheim > from the MV.
Multiple solutions have been proposed to address the consistency problem. Blakeley et al. (Blakeley et al., 1986) showed that counters could be
used to keep track of the multiplicity of records in MVs. The counter is
increased by insertion of duplicates, and decreased by deletion of duplicates.
If the counter reaches zero, the record is removed from the MV. The method
could originally only handle select-project-join (SPJ) views, but has later
been improved to handle aggregates, negations and unions (Gupta et al.,
1992; Gupta et al., 1993).
CHAPTER 3. A SURVEY OF TECHNOLOGIES RELATED TO
NON-BLOCKING DERIVED TABLE CREATION
43
Gupta et al. (Gupta et al., 1993) presents another algorithm called
“Delete and Rederive” (DRed). An overestimate of records that may be
deleted from the MV is first computed. The records with alternative derivations are then removed from the delete set. Finally, new records that need
to be inserted are computed. The authors recommend the DRed method
when dealing with recursive MVs, and the counting method when dealing
with non-recursive MVs.
In contrast to the methods described so far, Qian and Wiederhold (Qian
and Wiederhold, 1991) and Griffin et al. (Griffin and Libkin, 1995; Griffin
et al., 1997) use algebra as the basis for computing updates. They argue
that it is easier to prove correctness of algebra than algorithms and to derive
rules for other languages (Griffin et al., 1997). Qian et al. present algebra
propagation rules that take update operations on SPJ views as input and
produce an insert set ∆R and a delete set ∇R (Qian and Wiederhold, 1991).
The method has also been extended to handle bag semantics (Griffin and
Libkin, 1995).
All methods described so far are immediate. Immediate maintenance
has a serious drawback, however: an extra workload is incurred on all user
transaction operations that have to be propagated. In deferred MVs, the
view is maintained by a view update transaction, not the user transaction.
Accordingly, this does not incur extra update work on each user transaction,
and deferred methods should therefore be used whenever possible (Colby
et al., 1996; Kawaguchi et al., 1997). The update transaction is typically
invoked periodically or by a query to the view.
The algorithms described for immediate update cannot be used in a deferred strategy without modification. The reason for this is that the immediate methods use data from the source tables to update the MVs. When
deferred methods are used, the state of the source tables may have changed
before the maintenance starts. This is called the “state bug” (Colby et al.,
1996). Colby et al. (Colby et al., 1996) extend the algebra by Qian et al.
(Qian and Wiederhold, 1991) to overcome this problem: both a log L and
two view differential tables (∆MV and ∇MV for inserts and deletes, respectively) are used. The MV is in a state consistent with a previous state sp of
the source tables. ∆MV and ∇MV contain the operations that need to be
applied to the MV to make it consistent with a state closer to the present.
The log L is used to maintain ∆MV and ∇MV. I.e., the MV has a state
sp which is before or equal in time to an intermediate state si that can be
reached by applying the updates in ∆MV and ∇MV. The current state of
the source tables sc can again be reached from si by applying the updates in
L. By using the log, the authors compute the state si that the differential
tables are in. This is the pre-state needed to use the immediate propagation
44
3.4. SCHEMA TRANSFORMATIONS AND DT CREATION IN
EXISTING DBMSS
methods without encountering the state bug. The algorithm uses two propagation transactions: one for updating the differential tables using the log,
and one for updating the MVs using the differential tables. This imposes
very little overhead on ordinary transactions as the only extra work they
have to do is to write a log record without any further computation.
Self-Maintainability
Operations that can be forwarded to derived tables without requiring more
data than the table itself and the operation are called Autonomously Computable Updates (ACUs) (Blakeley et al., 1989). Self-maintainable (Self-M)
Materialized Views are MVs where all operations are ACUs (Gupta et al.,
1996).
When operations are applied to an MV that is not Self-M, the source
tables must be queried for the missing information. Self-M is therefore a
highly desirable property in systems with fast response time requirements
(Gupta et al., 1996) and when the MV is stored on a different node than the
source tables. For Self-M MVs, only the log has to be shipped to the MV
node.
Only a very limited set of views are Self-M, and Quass et al. (Quass et al.,
1996) therefore extend the concept to also include views where auxiliary
information makes the view Self-M. The auxiliary information, typically a
table, is stored together with the MV, and is updated accordingly.
Our derived table creation method benefits from self-maintainability in
the same way as MV maintenance. Hence, in DT creation of all relational
operators where the DTs themselves are not Self-M, an auxiliary table with
the missing information will be added.
3.4
Schema Transformations and DT creation
in Existing DBMSs
Existing database systems, including IBM DB2 v9 (IBM Information Center,
2006), Microsoft SQL Server 2005 (Microsoft TechNet, 2006), MySQL 5.1
(MySQL AB, 2006) and Oracle 10g (Lorentz and Gregoire, 2003b), offer only
simple schema transformation functionality (Løland, 2003). These include
removal of and adding one or more attributes to a table, renaming attributes
and the like. Removal of an attribute can be performed by changing the
table description only, thus leaving the physical records unchanged for an
unspecified period of time.
CHAPTER 3. A SURVEY OF TECHNOLOGIES RELATED TO
NON-BLOCKING DERIVED TABLE CREATION
45
Complex schema transformations and MV creations are performed by
blocking operations. The source tables are locked with a shared lock while
the content is read and the result of the query inserted into the DTs (Løland,
2003). Throughout this thesis, we will call this the insert into select method
due to the common SQL syntax of these operations. For example, the SQL
syntax for DT creation in MySql is (MySQL AB, 2006):
insert into <table-name> select <select-statement>
3.5
Summary
In this chapter, research related to non-blocking creation of derived tables
was presented.
A method that can be used for vertical and horizontal merge and split
schema transformations has been suggested by Ronström (Ronström, 2000).
The solution involves creation of derived tables in the horizontal merge and
split cases. In addition, one of the resulting tables in vertical split is a DT.
It is therefore likely that these relational operators can be used for other
DT creation purposes than schema transformations as well. One example is
creation of materialized views, although this possibility is not discussed by
Ronström.
Although test results have not been published on the method, the cost
analysis in Section 3.1 indicates that it is likely to degrade throughput and
response time significantly for the duration of the transformation. The reason
for this is that write operations in one schema (old or new) trigger a varying
number of write operations in the other schema.
The DT creation method we suggest in Part II of this thesis extends the
ideas from the record oriented fuzzy copy (RoFC) technique described in
Section 3.2. Similar to RoFC, we make an inconsistent copy of the involved
tables and use the log to make the copied data consistent. An important
difference is, however, that we apply relational operators to the inconsistent
copies. Because of this, the log records can not be applied to the copied data
in any straightforward way.
The DT creation method developed in this thesis is also related to materialized view maintenance. In particular, we will use auxiliary tables to
achieve Self-Maintainable derived tables. The data in these auxiliary tables
will be needed whenever the DTs themselves do not contain all data required
to apply the log.
The “insert into select” method for DT creation will not be used in our
solution. We will, however, compare our DT creation method to the “insert
46
3.5. SUMMARY
into select” method, and discuss under which circumstances our method is
better than the existing solution and vice versa.
Part II
Derived Table Creation
Chapter 4
The Derived Table Creation
Framework
In Chapter 1, we presented the overall research question of this thesis:
How can we create derived tables and use these for schema transformation and materialized view creation purposes while incurring minimal performance degradation to transactions operating
concurrently on the involved source tables.
With the research question in mind, this part of the thesis describes our
suggested method for creating derived tables (DTs) without blocking other
transactions. Once a DT has been created, it can be used as a materialized
view (MV) or to transform the schema. The method aims at degrading
the performance of concurrent transactions as little as possible. To which
extent the method meets the performance aspects of the research question is
discussed in Part III: Implementation and Testing.
In this chapter, we suggest a framework that can be used in the general
case to create DTs in a non-blocking way. As such, this chapter presents an
abstract solution to the first part of the research problem stated above.
In Chapter 5, we identify common problems encountered when the framework is used for DT creation using relational operators. General solutions to
these problems are also suggested. Chapter 6 contains detailed descriptions
of how the framework is used to create DTs using the six relational operators.
4.1
Overview of the Framework
The non-blocking DT creation framework presented in this chapter operates
in four steps. As illustrated in Figure 4.1, these are: preparation, initial
population, log propagation and synchronization.
50
4.1. OVERVIEW OF THE FRAMEWORK
2) Initial Population
Users
DBA
Algebra
Nonblocking
read
Old Schema
Office
Employee
Roomnumber
Telephone
Address Position
City
Positioncode
Product Salary
Firstname
Surname
PositionPAddress
Address
ZipCodeZipCode
City
Latch State
ProductCode
ItemsInStock
Price
4) Synchronize
Insert
1) Preparation
New Schema
ModifiedEmp
Firstname
Surname
Position
Address
ZipCode
City
State
3) Log Propagation
Log
Figure 4.1: The four steps of DT creation.
During the preparation step, necessary tables, indices etc are added to
the database schema. These should not be visible to users until DT creation
reaches the final step; synchronization. Once the required structures are in
place, the initial population step writes a fuzzy mark to the log and then
starts to read the involved source tables. The fuzzy mark is later used to
find the point where reading started. The relational operator used to create
the DT is then applied, and the result is inserted into the newly created
DTs. Note that no table or record locks are used, and the derived records
are therefore not necessary consistent with the source table records1 .
Log propagation is then started. It works in iterations, and each iteration
starts by writing a place-keeper mark, called fuzzy mark, to the log. The
write operations that have been executed on source table records between the
1
They are consistent if no modifying operations have been performed on the source
records during the copying.
CHAPTER 4. THE DERIVED TABLE CREATION FRAMEWORK
51
last mark and the new one are then propagated, or forwarded, to the DTs by
applying recovery techniques. These techniques must, however, be modified
since relational operators have been applied to create the derived records. If
a considerable amount of updates have been performed on the source records
during an iteration, a new iteration is started. This is repeated until there
are few operations that distinguish the source tables from the DTs.
The fourth step, synchronization, latches the source tables while the remaining logged operations are applied to the DTs. Since log propagation
was repeated until there were only a few operations left to apply to the DTs,
these latches are held for a very short period of time. When all log records
have been forwarded, the DTs are in the same state as the source tables, and
are ready to be used.
The four steps of non-blocking DT creation are described in more detail in
the rest of this chapter. Materialized Views can use this framework without
modification. Note, however, that even though DTs can also be used to
perform schema transformations, the framework must be slightly modified
to do so. The reason for this is that different transactions are allowed to
concurrently operate in the two schema versions. These modifications to the
framework are discussed in Section 4.6.
4.2
Step 1: Preparation
DT creation starts by adding the derived tables to the database schema. This
is done by create table SQL statements. In addition to the wanted subset of
attributes from the source tables, the DTs typically have to include a record
state identifier2 , and a Record ID (RID) from each source record contributing
to the derived records. In this thesis, we assume that the RID is based on
logical addressing, but physical identification techniques can also be used
if a RID mapping table is maintained. The record and state identification
concepts were described in Section 2.3.
Depending on the relational operator used for DT creation, attributes
other than RID and LSN may also be required. An example is vertical merge,
i.e. full outer join, in which the join attributes are required to identify which
source records should be merged in the DT. If any of these required attributes
are not wanted in the DT, they must be removed after the DT creation has
completed. This can be done by a simple schema transformation, which is
available in modern DBMSs.
Constraints, both new and from the source tables, may be added to the
new tables. This should, however, be done with great care since constraint
2
I.e., a Log Sequence Number (LSN) on record (Hvasshovd, 1999).
52
4.2. STEP 1: PREPARATION
violations may force the DT creation to abort, as illustrated in Example
4.2.1:
Example 4.2.1 (Bad Constraint)
Consider a one-to-many full outer join of the tables Employee and PostalAddress, as illustrated in Figure 4.1. A unique constraint has been defined for
the ZipCode attribute in PostalAddress. If this unique-constraint is added
to the derived table “ModifiedEmp”, the transformation will have to abort
if more than one person has the same zip code.
Any indices that are needed on the new tables to speed up the DT creation
process should also be added during this step. In particular, all attributes
that are used by DT creation to identify records should be indexed. Examples
include RIDs copied from the source tables, and join attributes in the case
of vertical merge. These indices decrease the time used to create the DTs
significantly. The source record ID3 is, e.g., often used to identify derived
records affected by a logged operation. Without an index on this attribute,
log propagation of these operations have to scan all records in all DTs to find
the correct record(s). With the index, the record(s) can be identified in one
single read operation. Which indices are required differ for each relational
operator, and are therefore described in more detail in Chapter 6. Note that
the indices created during the preparation step will be up to date at any
time, including immediately after the DT has been created.
The DT creation for some of the relational operators requires information
that is not stored in the DTs. Consider the following example:
Example 4.2.2 (Auxiliary Tables)
Two DTs, “GoldCardHolder” and “PlatinumCardHolder”, are created by
performing a horizontal split4 of the table “FrequentFlyer Customer”. Marcus, who has a Silver Frequent Flyer Card, does not qualify for any of these
DTs.
While the DTs are being created, however, Marcus buys a flight ticket to
Hawaii. With this purchase, he is qualified for a gold card. His old customer
information is now required by the DT creation process so that Marcus can
be added to the “GoldCardHolder” DT. This information can not be found
in either of the DTs.
3
4
The RID of the source record a DT record is derived from.
Horizontal Split is the inverse of union.
CHAPTER 4. THE DERIVED TABLE CREATION FRAMEWORK
53
In cases like the one in Example 4.2.2, auxiliary tables must also be added to
the schema. The auxiliary tables store the information required by the DT
creation method, and are similar to those used to make MVs self-maintainable
(Quass et al., 1996). The detailed DT creation descriptions in Chapter 6
describe the required auxiliary tables when these are required.
4.3
Step 2: Initial Population
The newly created DTs have to be populated with records from the source
tables. This is done by a modified fuzzy copy technique (see Section 3.2), and
the first step of populating the DTs is therefore to write a fuzzy mark in the
log. This log record must include the transaction identifier of all transactions
that are currently active on the source tables. This is a subset of the active
transaction table (Løland and Hvasshovd, 2006c). The transaction table will
be used by the next step, log propagation, to identify the oldest log record
that needs to be applied to the DTs.
The source tables are then read without setting locks. This results in an
inconsistent read (Hvasshovd et al., 1991). The relational operator used for
DT creation is then applied, and the results, called the initial images, are
inserted into the DTs.
4.4
Step 3: Log Propagation
Log propagation is the process of redoing operations originally executed on
source table records to records in the DTs. All operations are reflected sequentially in the log, and by redoing these, the derived records will eventually
reflect the same state as the source records.
The log propagation step, which works in iterations, starts when the
initial images have been inserted into the DTs. Each iteration starts by
writing a new fuzzy mark to the log. This log record marks the end of
the current log propagation iteration and the beginning of the next one. Log
records of operations that may not be reflected in the DTs are then inspected
and applied if necessary. In the first iteration, the oldest log record that may
contain such an operation is the oldest log record of any transaction that
was active when the first fuzzy mark was written. The reason for this is that
the transactions that were active on the source tables may have been able to
log a planned operation but not perform it yet at the time the initial read
was started. This is a consequence of Write-Ahead Logging, as described in
Section 2.3, which requires that write operations are logged before the record
54
4.5. STEP 4: SYNCHRONIZATION
is updated. In later iterations, only log records after the previous fuzzy mark
needs to be propagated.
When the log propagator reads a new log record, affected records in the
DTs are identified and changed if the LSNs indicate that the records represent an older state than that of the log record. The effects of applying the
log records depend on the relational operator used for the DT creation in
question, and are therefore described in more detail in Chapter 6.
The synchronization step should not be started if a significant portion of
the log remains to be propagated. The reason for this is that synchronization
involves latching the source tables while the last portion of the log is propagated. These latches effectively pauses all transactions on the source tables.
Each log propagation iteration therefore ends with an analysis of the remaining work. The analysis can, e.g., be based on the time used to complete the
current iteration, a count of the remaining log records to be propagated,
or an estimated remaining propagation time. Based on the analysis, either
another log propagation iteration or the synchronization step is started.
A consequence of the described log propagation strategy is that this step
will never finish iterating if more log records are produced than the propagator can process during the same time interval. We suggest four possible
solutions for this case, none of which are optimal: One possibility is to abort
the DT creation transaction. If so, the DT creation work performed is lost,
but normal transaction processing will be able to continue as normal. Alternatively, the DT creation process may get a higher priority. The effect
of this is that more log is propagated at the cost of lower performance for
other transactions. A third possibility is to reduce the number of concurrent
transactions by creating a transaction queue. Like the previous alternative,
this increases response time and decreases throughput for other transactions.
As a final alternative, we may stop log propagation and go directly to the synchronization step. Synchronization will in this case have to latch the source
tables for a longer period of time. Depending on the remaining number of
log records to propagate, this strategy can still be much quicker than the
insert into select strategy used in modern DBMSs.
4.5
Step 4: Synchronization
When synchronization is initiated, the state of the DTs should be very close
to the state of the source tables. This is because the source tables have to
be latched during one final log propagation iteration that makes the DTs
consistent with the source tables.
We suggest two ways to synchronize the DTs to the source tables and
CHAPTER 4. THE DERIVED TABLE CREATION FRAMEWORK
55
thereby complete the DT creation process. These are blocking synchronization and non-blocking synchronization. The blocking method makes the DTs
transaction consistent with the source tables, while the non-blocking method
only enforces action consistency. Note that the choice of strategy affects the
synchronization step only; the three first steps of DT creation are unaffected.
Blocking synchronization
Blocking synchronization blocks all new transactions that try to access any
of the involved tables. Transactions that already have locks on the source
tables are either allowed to complete or forced to abort. Table locks are
then acquired on the source tables before a final log propagation iteration
is performed. This log propagation makes the DTs transaction consistent
with the source tables. Blocking complete is the least complex synchronization strategy, but it does not satisfy the non-blocking requirement for DT
creation.
Non-blocking synchronization
The non-blocking strategy latches the source tables for the duration of one
final log propagation iteration. Latching effectively pauses ongoing transactions that perform update work on the source tables, but the pause should
be very brief since the state of the DTs is very close to that of the source
tables. Note that read operations are not paused. Once the log propagation
completes, the DTs are in the same state as the source tables.
The newly created DTs are now almost ready to be used as MVs. The
only remaining task is to add the preferred MV maintenance strategy to it,
e.g. one of those described in Section 3.3. From now on, the MV maintenance
strategy is responsible for keeping the DTs consistent with the source tables.
The latches are then released, allowing transactions to resume their update
operations on the source tables.
The above distinction between the blocking and non-blocking strategies
may seem artificial. However, the difference in time interval in which updates are blocked from the source tables may be considerable. In the former
method, new transactions are blocked from the source tables until all transactions that have already accessed them have completed. The updates performed by these transactions also have to be redone by the log propagation,
which adds further to the blocking time. The time required for a transaction
to complete can not be easily controlled.
In the latter strategy, transactions are only blocked during log propagation. As previously discussed, we can easily control the time needed by this
56
4.6. CONSIDERATIONS FOR SCHEMA TRANSFORMATIONS
log propagation by not starting the synchronization step until the states of
the DTs and the source tables are very close.
4.6
Considerations for Schema Transformations
Materialized View creation is a straightforward application of DTs, and can
therefore use the non-blocking DT creation framework as described in the
previous sections. On the other hand, using the framework to perform schema
transformations is more complex. The reason is that in contrast to the MV
creation case, transactions will be active in both the source tables and the
DTs at the same time during non-blocking synchronization. Consider the
following example:
Example 4.6.1 (A Non-blocking Schema Transformation)
A non-blocking schema transformation is being performed, in which the
tables “Employee” and “PostalAddress” are vertically merged into “ModifiedEmp”. This is illustrated in Figure 4.1. The first three steps of the DT
creation process have already been executed; only non-blocking synchronization remains.
The synchronization step starts by latching “Employee” and “PostalAddress”, before propagating the remaining log records to “ModifiedEmp”. At
this point, the records in the source tables and the DT reflect the same
state, and schema transformation is nearly complete. New transactions are
now given access to the “ModifiedEmp” table instead of the source tables.
However, the transactions that were paused by the latches may have more
work to do. Thus, until these old transactions have completed, new transactions will be updating records in “ModifiedEmp” while the old transactions
are updating records in “Employee” and “PostalAddress”.
Example 4.6.1 illustrates that we have to make sure that transactions operating on the same data objects in two different schema versions do not conflict.
For example, a transaction in the new schema should not be allowed to change
the position of Eric to “Software Engineer” if a currently active transaction in
the old schema has already modified the same Eric record. Recall from Section 2.2 that the main responsibility of the scheduler is to provide isolation,
and that this property is guaranteed if the histories are serializable. Thus,
when synchronizing schema transformations, we must ensure serializability
between operations performed on records stored in different tables.
CHAPTER 4. THE DERIVED TABLE CREATION FRAMEWORK
57
Note that if the blocking synchronization strategy is used, serialization is
not a problem. In this case, transactions that are active in the old schema
complete their work before transactions are given access to the new schema.
Thus, this synchronization strategy can be adopted without modification.
We therefore focus only on non-blocking synchronization in the rest of this
chapter.
The non-blocking synchronization strategies are divided into two strategies for schema transaction purposes. These are non-blocking abort and nonblocking commit. Here, “abort” and “commit” refers to whether the transactions active in the old schema are forced to abort or are allowed to continue
work after the source table latches have been removed. The reason for making this distinction is that in the former case, transactions in the old schema
are not allowed to acquire new locks. This scenario is significantly easier
to handle than the latter case, in which new locks may be acquired in both
schema versions.
Both non-blocking strategies ensure serializable histories across the two
schema versions. This is done by using a modified Strict Two Phase Locking
(2PL)5 (Bernstein et al., 1987) strategy, in which locks are set on all versions
of the data record.
Non-blocking Abort Synchronization
The non-blocking abort strategy latches the source tables for the duration
of a log propagation iteration. As previously discussed, this pauses write
operations on the source table records. Once the log propagation completes,
the DTs are in the same state as the source tables. The locks acquired
by transactions in the source tables are then forwarded to the respective
records in the DTs. At this point, new transactions are given access to
the unlocked parts of the DTs. The source table latches are now released,
and all transactions in the old schema are forced to abort. Aborting all old
transactions incurs that they may not acquire new locks6 .
Log propagation continues to iterate until all operations performed on the
source tables have been forwarded to the DTs. In addition to redoing write
operations, the propagator also ensures that source table locks forwarded to
the DTs are released. Forwarded locks are released as soon as log propagation
has processed the abort log record of the transaction owning the lock. The
source tables can be removed from the schema once all old transactions have
5
See Section 2.2 for a description on 2PL.
An alternative is to only abort the old transactions that try to acquire new locks. The
non-blocking abort strategy works for both cases since new locks are not acquired in the
old schema in either of the cases.
6
58
4.6. CONSIDERATIONS FOR SCHEMA TRANSFORMATIONS
aborted.
Non-blocking Commit Synchronization
Non-blocking commit synchronization is equal to the previous strategy in
many aspects: the source tables are latched, a log propagation iteration is
used to synchronize the DT states to the source table states, and source
table locks are forwarded to the DTs. In contrast to the previous strategy,
however, transactions on the source tables are allowed to continue forward
processing after the latches have been removed. Ronström calls this a soft
transformation (Ronström, 2000).
A consequence of allowing source table transactions to acquire new locks,
is that new conflicts may occur across different schema versions. This strategy therefore requires that when a lock is acquired, all other versions of that
record are immediately locked as well. A thorough discussion of implications
can be found in Section 5.3, but simply put, a transaction that wants to access a record rdt in a DT has to set a lock both on rdt and on all records that
rdt is derived from. Likewise, a transaction accessing a record in a source
table has to lock both that record and all DT records derived from it. To
avoid unnecessary deadlocks, locks are always acquired in the source tables
first, and then in the DTs.
Log propagation has to be modified to forward DT operations not only
from source to derived records, but from derived to source records as well.
This is done so that source table transactions can see the updates performed
by transactions in the new schema. Log propagation is also responsible for
removing source table locks from the DTs and vice versa. This is done in a
similar manner as described for the non-blocking abort strategy.
It is clear that the non-blocking abort strategy produces serializable histories: the scheduler uses Strict 2PL to produce serializable histories within
each table. Before any transaction is given access to the records in the derived tables, all records that are locked in the source tables are also locked
in the derived tables. These forwarded source locks are not released in the
new schema until the log propagator has applied all operations executed by
the transaction owning the lock. Hence, transactions in the new schema can
only access committed values.
Although less intuitive, non-blocking commit synchronization also produces serializable histories: locks are acquired immediately in both schema
versions. If a transaction is not able to lock all versions of a record, the
transaction has to wait for the lock. Furthermore, forwarded locks are not
released until after all operations by that transaction have been propagated.
Hence, transactions in either schema can only access committed values with
CHAPTER 4. THE DERIVED TABLE CREATION FRAMEWORK
59
this strategy as well.
Because of the added complexity and increased chance for locking conflicts associated with non-blocking commit, the non-blocking abort strategy
may be a better choice for some schema transformation operators. This is
especially true for operators where one DT record may be composed of multiple source records, since one single DT lock requires multiple source table
locks. Thus, a decision must be made on whether or not aborting the transactions active in the source tables is worse than risking lock contention. This
problem will be discussed in greater detail for each relational operator in
Chapter 6.
4.6.1
A lock forwarding improvement for schema transformations
We have argued that for schema transformations, transactions cannot be
given access to the new schema before all locks acquired in the old schema
have been forwarded. Forwarding all source table locks to the DTs may
require a considerable amount of work. This may be unacceptable since
it has to be done during the critical time interval of synchronization when
transactions are not given access to any of the involved tables.
Two modifications are required to remove the lock forwarding phase from
this critical time interval: first, the initial population step must be modified
to store the source table locks to the first fuzzy mark. Second, DT lock
acquisition and release must be included as part of the log propagation. By
doing so, locks will be continuously forwarded, and therefore in place when
the synchronization step is started.
4.7
Summary
In this chapter we have described a framework that will be used to create
DTs for the six relational operators focused on. The framework can be used
both to perform schema transformations and to create materialized views.
An important purpose of the framework has been to degrade the performance of concurrent transactions as little as possible. The framework is
therefore based on copying the source tables in a non-blocking way. Since the
source tables are not locked, the copied records, which are inserted into the
DTs, may not be consistent with the records in the source tables. Logged
operations originally applied to source records are then propagated to the
DTs. When all logged source operations have been propagated, the DTs are
consistent with the source tables and are ready to be used.
Chapter 5
Common DT Creation
Problems
In this chapter, we identify five problems that are encountered by DT creation for multiple relational operators. We call these the missing record and
state identification, missing record pre-state, lock forwarding during transformation and inconsistent source record problems. In what follows, we discuss
these problems and suggest solutions.
5.1
Missing Record and State Identification
For logging to be useful as a recovery mechanism, there must be a way to
identify which record the logged operation applies to. Records therefore have
a unique identifier, assumed in this thesis to be a Record Identifier (RID),
that is stored in each log record. Record IDs are described in more detail in
Section 2.1.
Since RIDs are unique, however, a record in a DT can not have the same
RID as the source record(s) it is composed of. Furthermore, even if the
source RIDs could have been reused, the DT creations where one DT record
may be composed of two source records would still be problematic. We call
the problem of mapping record identification from source records to derived
records the record identification problem. It is solved by letting the records
in the DTs have their own RIDs, but at the same time store the RID of
each source record contributing to it. In vertical merge (full outer join) DT
creation, e.g., the RID of both source tables would be stored in the DT.
Log Sequence Numbers (LSNs) on records are used as state identifiers to
ensure idempotence during recovery and when making fuzzy copies. During
recovery or fuzzy coping, each log record is compared to the record with a
CHAPTER 5. COMMON DT CREATION PROBLEMS
Employee
61
Position
Name
Address
Hanna
Erik
Markus
Sofie
Moholt 3
Torvbk 6
Mollenb.7
Berg 1
PosID
RID
LSN
PosID
PosTitle
Salary
RID
LSN
005
005
050
052
r01
r02
r03
r04
10
11
12
13
001
005
052
sec.tary
QA
proj mgr
$23’
$31’
$48’
r11
r12
r13
14
15
16
Employee
Name
Address
PosID
PosTitle
Salary
Hanna
Erik
Markus
Sofie
NULL
Moholt 3
Torvbk 6
Mollenb.7
Berg 1
NULL
005
005
050
052
NULL
QA
QA
NULL
proj mgr
sec.tary
$31’
$31’
NULL
$48’
$23’
RID_L RID_R LSN_L LSN_R
r01
r02
r03
r04
NULL
r12
r12
NULL
r13
r11
10
11
12
13
NULL
15
15
NULL
16
14
Figure 5.1: The Record and State Identification Problems are solved by including the record IDs and LSNs from both contributing source records in each
derived record.
matching RID, and is applied only if the logged state represents a newer state
than that of the record.
The LSNs from the source records may be used in the same way for
DT creation. Derived records may, however, be composed of more than one
source record. In these cases, one LSN is not enough to identify the state
of the derived record. We call this the state identification problem. The
problem is solved by including the LSN of all contributing source records.
Both the record and state identification problems are illustrated in Figure
5.1.
5.2
Missing Record Pre-States
The suggested DT creation method is based on applying operations in the log
to derived records during log propagation. For the horizontal split, difference
and intersection operators, however, some source records may not belong to
any of the DTs. The missing record pre-state problem is encountered if any
of the records not included in a DT are needed by the log propagator.
The problem can be solved by letting the log propagator acquire the
missing information from the source tables. Since the source tables are in
a different state than the DTs, however, this solution complicates the log
propagation rules. Furthermore, it incurs that the method is no longer self-
62
5.2. MISSING RECORD PRE-STATES
Do Not Sell These
Vinyl Records
Artist
Record
Smith, Jimmy
Evans, Bill
Davis, Miles
Root Down
Intuition
Kind of Blue
Cassidy, Eva
Imagine
RID LSN
r1
r2
r3
101
102
103
r20
170
RID
LSN
r10
r11
r12
151
152
153
?
Artist
Record
Smith, Jimmy
Davis, Miles
Root Down
Kind of Blue
RIDSrc LSN
r1
r3
101
103
CD Records
Artist
Krall, Diana
Peterson, O.
Evans, Bill
Record
All for You
The Trio
Intuition
(a) A DT, “Do Not Sell These”, storing the difference between Vinyl and CD records,
is created. The log propagator does not have the information needed to decide whether
the new Eva Cassidy vinyl should belong to the DT or not.
Do Not Sell These
Vinyl Records
Artist
Record
Smith, Jimmy
Evans, Bill
Davis, Miles
Root Down
Intuition
Kind of Blue
Cassidy, Eva
Imagine
RID LSN
r1
r2
r3
101
102
103
r20
170
RID
LSN
r10
r11
r12
151
152
153
Artist
Record
Smith, Jimmy
Davis, Miles
Root Down
Kind of Blue
Cassidy, Eva
Imagine
RIDSrc LSN
r1
r3
101
103
r20
170
CD Records
Artist
Krall, Diana
Peterson, O.
Evans, Bill
Record
All for You
The Trio
Intuition
CompareTo
Artist
Krall, Diana
Peterson, O.
Evans, Bill
Record
All for You
The Trio
Intuition
RIDSrc LSN
r10
r11
r12
151
152
153
(b) By adding the derived state of the CD record table, the log propagator is able to
determine that the new vinyl should be inserted into the DT.
Figure 5.2: The missing record pre-state problem of Example 5.2.2 is solved
for inserts into the first source table of a difference DT. Note that this only
solves one of the missing record pre-state problems for difference DT creation.
maintainable (Blakeley et al., 1989; Quass et al., 1996), as described in Chapter 2. The other solution is to add auxiliary tables to store information on
missing records that are necessary to make the DT self-maintainable. Auxiliary tables were originally suggested for this purpose in MV maintenance by
Quass et al. (Quass et al., 1996).
Consider the following examples:
CHAPTER 5. COMMON DT CREATION PROBLEMS
63
Example 5.2.1 (Missing Record Pre-State in Horizontal Split)
Consider a DT creation using the horizontal split operator where one table,
“Employee”, is split into “LondonEmployee” and “ParisEmplyee”. John,
who is the company’s only salesman in New York, would not be represented
in either of these derived tables. If John later moves to Paris (i.e. an update
operation is encountered in the log) the previous state of John’s derived
record is needed by the log propagator before it can be inserted into the
ParisEmplyee table.
Example 5.2.2 (Missing Record Pre-State in Difference)
A DT “Do Not Sell These” is created as the difference between the tables
“Vinyl Records” and “CD Records”. During log propagation, a new Eva
Cassidy vinyl record is inserted. As can be seen in Figure 5.2(a), the log
propagator does not know whether to insert this record into the derived table
or not. The log propagator could scan the “CD Records” for an equal record,
but that table represents a different state. Furthermore, the operation would
not be self-maintainable.
By adding the compare-to table containing the derived state of the CD
records, the log propagator may scan that table instead. This reveals that
the Eva Cassidy record should belong to the DT, as illustrated in Figure
5.2(b).
5.3
Lock Forwarding During Transformations
When either the non-blocking abort or commit strategy is used for synchronization of a schema transformation, old transactions are allowed to operate
on the source table records while new transactions operate on the DT records
at the same time. As described in Section 4.5, this incurs that locks have to
be forwarded from the source records to the DT records. For non-blocking
commit, locks also have to be forwarded from the DTs to the source tables
since the source table transactions are allowed to acquire new locks. Derived
locks, i.e. locks that have been forwarded, are released once the log propagator has processed a log record describing the completion of the transaction
that owns the lock. Lock forwarding ensures that concurrency control is
enforced between the source and DT versions of the same record.
In this section, four lock forwarding cases are discussed. In the first case,
simple lock forwarding (SLF), each DT record is derived from one and only
64
5.3. LOCK FORWARDING DURING TRANSFORMATIONS
Do Not Sell These
Vinyl Records
Artist
Smith, J
Artist
Record
Evans, Bill
Davis, Miles
Root Down
Intuition
Kind of Blue
Cassidy, Eva
Imagine
RID LSN
r1
r2
r3
171
r20
170
RID
LSN
r10
r11
r12
151
152
153
102
103
Record
RIDSrc LSN
r1
r3
171
Davis, Miles
Root Down
Kind of Blue
Cassidy, Eva
Imagine
r20
170
Smith, J
103
CD Records
Artist
Krall, Diana
Peterson, O.
Evans, Bill
Record
All for You
The Trio
Intuition
Figure 5.3: Simple Lock Forwarding (SLF) during Non-Blocking Commit.
Source record locks require only one DT record lock to be acquired and vice
versa.
one source record. Furthermore, a source record contributes to one and only
one derived record. The second case, many-to-one lock forwarding (M1LF),
applies when each source record contributes to one derived record only, but
a record in the DT may be derived from multiple source records. Third,
one-to-many lock forwarding (1MLF) is discussed. As the name suggests, it
applies when one source record may contribute to many derived records, but
each record in the DTs is derived from one source record only. The fourth
case, many-to-many lock forwarding (MMLF), inherits the problems of both
M1LF and 1MLF.
Simple Lock Forwarding (SLF)
Because of the one-to-one relationship between source and DT records in
the simple lock forwarding case, lock forwarding is straightforward. When a
source record is locked, the DT record with the same source RID is locked
and vice versa. The horizontal merge with duplicate inclusion, horizontal
split into disjoint DTs, difference and intersection operators work like this.
Many-to-One Lock Forwarding (M1LF)
Horizontal merge with duplicate removal and vertical merge of one-to-one
relationships are the only transformations presented that need many-to-one
lock forwarding. As always, locks on source records ensure that conflicting
source operations are not given access at the same time. Concurrency control
requires these locks to be forwarded to the DTs before new transactions can
be given access to the DTs. If the normal shared (S) and exclusive (X)
CHAPTER 5. COMMON DT CREATION PROBLEMS
Src.S
Src.X
DT.S
DT.X
Src.S
Src.X
DT.S DT.X
Y
Y
Y
N
Y
Y
N
N
Y
N
Y
N
65
N
N
N
N
Figure 5.4: Lock compatibility matrix for many-to-one lock forwarding. Locks
transfered from the source tables do not conflict with each other, but conflict
as normal with locks set on the target tables. The compatibility matrix can
easily be extended to multigranularity locking.
locks are used in the DTs, however, these non-conflicting source operations
could nevertheless be conflicting. This happens if more than one source
record contributing to the same derived record is locked. Since the scheduler
guarantees that operations on the source records are serializable (Bernstein
et al., 1987), there is no need for these locks to conflict. New locks are
therefore suggested.
Figure 5.4 shows the lock compatibility matrix used to avoid conflict between non-conflicting operations forwarded from the source tables. Conflicting operations on the target table are still blocked. These new locks solve
the concurrency issues with M1LF. Locks can now be forwarded from the
source records to the DTs without causing conflicts for both non-blocking
strategies. If non-blocking commit is used, locks set on DT records must also
be set on all source records contributing to the record in question.
One-to-Many Lock Forwarding (1MLF)
The third case is that of one-to-many lock forwarding. It applies to vertical
split of one-to-one relationships and to horizontal split since the resulting
DTs may be overlapping, i.e. non-disjoint. Since one DT record may be
derived from one source record only, lock compatibility remains unchanged
for the non-blocking abort strategy. Thus, the only difference between nonblocking abort in SLF and in 1MLF is that one source record lock may result
in many DT record locks.
Non-blocking commit, however, is more complicated. The reason for this
is that if a DT record is updated, the update must be applied not only to
the source record it is derived from, but also to all DT records derived from
it. Locks must also be set on all these records immediately. Otherwise, the
target and source versions of the record would not be equal. By doing this, the
behavior of transactions operating on the DTs during synchronization differs
66
5.4. INCONSISTENT SOURCE RECORDS
CD Records
Vinyl Records
Artist
Record
Artist
Record
Smith, Jimmy
Jones, Norah
Root Down
Come Away With Me
Kind of Blue
Krall, Diana
Davis, Miles
Evans, Bill
The Look of Love
Miles Ahead
Waltz for Debby
Kind of Blue
Davis, M
Davis, M
U
Unique Records
Artist
Record
Smith, Jimmy
Jones, Norah
Root Down
Come Away With Me
Kind of Blue
The Look of Love
Miles Ahead
Waltz for Debby
Davis, M
Krall, Diana
Davis, Miles
Evans, Bill
Figure 5.5: Many-to-One Lock Forwarding (M1LF) during Non-Blocking
Commit. During synchronization of horizontal merge with duplicate removal,
a DT record lock results in a lock of all records it is derived from. Note that
record and state identification information is not included.
from the behavior after synchronization is complete. If this is considered a
problem, non-blocking abort must be used instead. This problem will be
elaborated on in the detailed operator description sections.
Many-to-Many Lock Forwarding (MMLF)
Vertical merge and split of one-to-many relationships belong to the fourth
category, many-to-many lock forwarding. These operators inherit the problems from both the 1MLF and M1LF cases. This means that the modified
lock compatibility scheme must be used for both non-blocking strategies. In
addition, operations performed on derived records may have to be forwarded
both to multiple source records and to all DT records derived from these if
the non-blocking commit strategy is used.
5.4
Inconsistent Source Records
As pointed out when describing the schema transformation method by Ronström in Section 3.1, inconsistencies between records in the source tables may
CHAPTER 5. COMMON DT CREATION PROBLEMS
Employee
67
PostalCode
F.Name
S.Name
Address
Hanna
Erik
Markus
Sofie
Valiante
Olsen
Edwards
Moholt 3
Torvbk 6
Mollenb.7
Berg 1
Clark
PCode
RecID LSN
PCode
7030
5121
7020
7020
r01
r02
r03
r04
5121
7020
9010
10
11
53
13
City
Bergen
Moss
Tromsø
RecID LSN
r11
r12
r13
14
53
16
EmployeePost
F.Name
S.Name
Address
Hanna
Erik
Markus
Sofie
NULL
Valiante
Olsen
Edwards
Moholt 3
Torvbk 6
Mollenb.7
Berg 1
NULL
Clark
NULL
PCode
7030
5121
7020
7020
9010
City
NULL
Bergen
Moss
Moss
Tromsø
RID_L LSN_L RID_R LSN_R
r01
r02
r03
r04
NULL
10
11
53
13
NULL
NULL
r11
r12
r12
r13
NULL
14
53
53
16
Figure 5.6: Many-to-Many Lock Forwarding (MMLF) during Non-Blocking
Commit. A DT record lock on Markus results in two additional source and
one additional DT record lock.
Employee
F.Name
Hanna
Erik
Markus
Sofie
Peter
S.Name
Address
Valiante
Olsen
Oaks
Clark
Smith
Moholt 3
Torvbk 6
Mollenb.7
Berg 1
Nordre 2
PCode
7030
5121
7020
7020
7020
City
NULL
Bergen
Tr.heim
Tr.heim
Tr.hemi
Figure 5.7: Example of Inconsistent Source Records. Three employees have
postal code 7020, but the city names are not equal.
be inevitable and may cause problems for the DT creations where multiple
source records contribute to the same derived record. In the detailed DT
creation descriptions in Chapter 6, this problem is relevant to vertical split,
and to vertical merge for schema transformations.
Figure 5.7 illustrates a typical inconsistency where records with the same
postal code have different city names. The illustrated inconsistency is an
anomaly since it breaks a functional dependency (Garcia-Molina et al., 2002).
The functional dependency is only intended, however; the DBMS does not
enforce it.
How to handle inconsistencies have been studied for different applications
where data from multiple sources are integrated into one. These include
merging of knowledge bases in the field of Artificial Intelligence, and merg-
68
5.4. INCONSISTENT SOURCE RECORDS
ing database records from distributed database systems. The problem for
both is that even though integrity constraints guarantee consistency for each
independent information source, the combination of records from multiple
sources may still be inconsistent.
The solutions in the literature either focus on repairing the integrated
records (i.e. make the records consistent) or answering queries consistently
based on the inconsistent records. Only the former solution is relevant in
this thesis.
5.4.1
Repairing Inconsistencies
Before a repair can be performed, the records from all sources are integrated
into one table. Conflicts may be removed already at this stage, depending
on the selected integration operator. It may, e.g., be performed by the merge
(Greco et al., 2001a), merge by majority (Lin and Mendelzon, 1999) or prioritized merge integration operators (Greco et al., 2001b). As the names
suggest, merge simply includes all record versions from all sources while
merging by majority tries to resolve conflicts by using the value agreed upon
by the majority of sources. Prioritized merge orders sources so that when
conflicts are encountered, the version from the source with higher priority
is chosen. Other integration operators also exist, see e.g. (Lin, 1996; Greco
et al., 2003; Caroprese and Zumpano, 2006). If an inconsistency can not be
resolved during integration, the different versions are all stored in the result
table. To enable multiple records with the same primary key to exist at the
same time, an attribute that refers to the originating source table is added
to the primary key (Agarwal et al., 1995).
If there are inconsistencies between integrated records, the repair operator
is applied to the result table. It identifies the alternative sets of insertions and
deletions that will make the table consistent (Greco et al., 2003), and then
applies one of these insert-delete sets. Preference rules, or Active Integrity
Constraints (AICs), may be added so that in the case of conflict, one solution
is preferred to the others (Flesca et al., 2004). An example AIC presented
by Flesca et al. (Flesca et al., 2004), states that if two conflicting records for
an employee are found, the one with the lowest salary should be kept. Even
if AICs are used, however, there may be many alternative insert/delete sets.
To the authors knowledge, the choice of which alternative to choose has not
been discussed in the literature.
In vertical split DT creation, merging by majority is used as the integration operator, i.e. during initial population. This integration may be
constrained to only resolve conflicts if there is a major majority, e.g. >75%.
The derived records where the conflicts are not resolved are then tagged
CHAPTER 5. COMMON DT CREATION PROBLEMS
Problem
Missing
Record ID
Missing
State ID
Missing
Record
Pre-State
Lock Forwarding
during
Transformations
Inconsistent
Source
Records
Description
Unable to identify which derived record a logged operation applies to.
Unable to identify whether
a logged operation has already been applied to a derived record.
Information about a record
that is not stored in any of
the DTs is required.
69
Solution
Add RID of all contributing
source records to each DT
record.
Add LSN of all contributing
source records to each DT
record.
Store the information in an
auxiliary table.
Scheduling problem for
records existing in two
schema versions.
Forward locks between
schema versions and Modify
the Lock Compatibilities.
Anomalies are found in the
source tables.
Resolve by major majority
or tag record as inconsistent
and ask DBA.
Table 5.1: Summary of common DT creation problems and their solutions.
with an inconsistent mark. The repair algorithm will identify these records
and present the different alternatives so that the DBA may decide which
alternative is correct.
5.5
Summary
In this chapter, we have identified five problems encountered when our framework is used for DT creation. Identifying these common problems and showing how they can be solved in general, makes it easier to explain DT creation
for each of the six relational operators. They also make it easier to develop
DT creation methods for other relational operators if they share the problems. A summary of the problems and their solutions is shown in Table
5.1.
Chapter 6
DT Creation using Relational
Operators
This chapter describes how the DT creation process is performed for each
relational operator. All methods follow the general framework presented in
Chapter 4. The methods are described in order of increasing complexity. As
shown in Table 6.1, this is the same order as the lock forwarding categories
for schema transformations described in the previous chapter.
The detailed DT creation descriptions start with the difference and intersection operators and horizontal merge with duplicate inclusion. Schema
transformations using these operators belong to the Simple Lock Forwarding
(SLR) category, and are the least complex operators.
Horizontal merge with duplicate removal is more complex than the duplicate inclusion case since records in the DT may be derived from multiple source records. Hence, it belongs to the Many-to-One Lock Forwarding
(M1LF) category.
The next operator, horizontal split, is the inverse of union. Since the split
is allowed to form overlapping result sets, one source record may be derived
into multiple DTs. Horizontal split schema transformation belongs to the
One-to-Many Lock Forwarding (1MLF) category.
The two final operators are vertical merge and split. With these operators, one source record may contribute to multiple derived records. Furthermore, one derived record may be derived from multiple source records1 .
The schema transformations using these operators require Many-to-Many
Lock Forwarding (MMLF). Vertical split DT creation is also complicated by
possible inconsistencies between source records, and is therefore the most
1
An exception is vertical split over a candidate key. With this operator, records in the
DTs are derived from exactly one source record, and it therefore belongs to the 1MLF
category.
CHAPTER 6. DT CREATION USING RELATIONAL OPERATORS 71
DT Creation
Operator
Lock forwarding category
Section
Difference,
intersection
Difference,
intersection
SLF
6.1
Horizontal Merge,
Dup Inclusion
Union
SLF
6.2
Horizontal Merge,
Union
Dup Removal
M1LF
6.3
Horizontal Split
Selection
1MLF
6.4
Vertical Merge
Full Outer Join
MMLF
6.5
Vertical Split,
candidate key
Projection
1MLF
6.6
Vertical Split,
non-candidate key
Projection
MMLF
6.7
Table 6.1: DT Creation Operators.
complex operator.
6.1
Difference and Intersection
Difference and intersection (diff/int) DT creations are so closely related that
the same method is applied to both operations. The method takes two source
tables, Sin and Scomp (compared-to), as input. Sin contains the records that
belong to either the difference or intersection set, based on the existence of
equal records in Scomp . The output is a DT containing the difference (DTdif f )
or intersection (DTint ) set of the source tables. An example DT creation is
shown in Figure 6.1. In the figure, DTaux is created to solve the missing
record pre-state problem described in Chapter 5. Note that in many cases,
Scomp is not removed even if the DT is used for a schema transformation.
6.1.1
Preparation
During preparation, the derived table is added to the database schema. It
may contain any subset of attributes from Sin that are wanted in the new
DT. It is assumed that if a candidate key is not among these attributes, a
generated primary key, e.g. an auto-incremented integer, is added to the DT.
72
6.1. DIFFERENCE AND INTERSECTION
Difference
Vinyl Records
Artist
Record
Smith, Jimmy
Evans, Bill
Davis, Miles
Root Down
Intuition
Kind of Blue
RID LSN
r1
r2
r3
101
102
103
CD Records
Artist
Krall, Diana
Peterson, O.
Evans, Bill
Artist
Record
Smith, Jimmy
Davis, Miles
Root Down
Kind of Blue
All for You
The Trio
Intuition
RID LSN
r10
r11
r12
151
152
153
r1
r3
101
103
Intersection
Artist
Evans, Bill
Record
RIDSrc LSN
Record
Intuition
RIDSrc LSN
r2
102
Auxiliary
Artist
Krall, Diana
Peterson, O.
Evans, Bill
Record
All for You
The Trio
Intuition
RIDSrc LSN
r10
r11
r12
151
152
153
Figure 6.1: Difference and intersection DT creation. Grey attributes are
used internally by the DT creation process, and are not visible to normal
transactions.
The generated primary key will not be considered when checking which of
DTint or DTdif f a record from Sin belongs to.
Duplicates may form as a result of only including a subset of source
attributes in the DTs. Implications of this is not discussed since the method
used for horizontal merge with duplicate removal, described later in Section
6.3, can be used for diff/int as well.
In addition to the source attributes and the key, the DT must include the
source RID and LSN to solve the record and state identification problems.
These are shown as grey attributes in Figure 6.1.
The diff/int DT creations suffer from the missing record pre-state problem
if only one derived table, storing either the difference or intersection set, is
created. The problem is twofold. First, a record t derived from Sin may at
one point in time belong to the difference set and later to the intersection set
or vice versa. This may be caused by an update of the record itself, or by an
insert, delete or update of a record in Scomp . The old state of t is needed in
both cases. Thus, the first missing record pre-state problem is that the state
of records from Sin that do not belong to the difference or intersection DT
being created are also needed. The problem is solved by adding an auxiliary
table storing the Sin records that are not in the result set. Thus, both DTdif f
and DTint are needed during both DT creations.
Second, the state of records derived from Scomp are frequently needed to
determine if a record derived from Sin should belong to DTdif f or DTint . This
happens every time a log record describes an update or insert of a record in
Sin as well as when records in Scomp are updated or deleted. In the case
that an Scomp record r is updated, e.g., records in DTint that are equal to
CHAPTER 6. DT CREATION USING RELATIONAL OPERATORS 73
the old state of r may have to be moved to DTdif f , and records in DTdif f
that are equal to the new state of r should be moved to DTint . Thus, the
second missing record pre-state problem is that the derived state of records
from Scomp are needed as well, and is solved by storing these records in an
auxiliary table called DTaux .
Because of the missing record pre-state problems described above, three
tables are created during the preparation step. These are DTdif f , DTint and
DTaux . Both auxiliary tables must have the same attributes as the DT.
Indices are created on the source RID attributes of all derived tables. If
candidate keys from the source tables are included in the DTs, indices should
also be created on one of these in all derived tables. With these indices, only
one record has to be read when log propagation searches for equal records
in any of the DTs2 . If a candidate key is not included, an index should be
added to one or more attributes that differ the most between records. In the
worst case scenario, i.e. without an index on any derived attribute, initial
population and log propagation must read all records in the derived tables
when testing for equality. Unless the source tables contain few records, such
DT creations are in danger of never completing.
6.1.2
Initial Population
Once the derived and auxiliary tables have been created, the fuzzy mark is
written to the log. Both source tables are then read fuzzily. Records from
Scomp are inserted directly into DTaux whereas records from Sin are first
compared to the records in DTaux . If an equal record is found in DTaux , the
Sin record is inserted into DTint . Otherwise, it is inserted into DTdif f . When
this step is complete, the DTs are said to contain the initial image.
6.1.3
Log Propagation
Log propagation is organized in iterations. Each iteration starts by writing
a fuzzy mark in the log L, and then retrieves all log records relevant to
the source tables. The oldest log record that must be retrieved depends
on whether or not this is the first log propagation iteration, as discussed in
Section 4.4.
If the DT will be used to transform the schema, the synchronization
strategy (step 4) must be decided on now. If either of the non-blocking synchronization strategies will be used, locks should be maintained continuously
2
Although reading one record may involve reading many disk blocks if the index is
partially stored on disk.
74
6.1. DIFFERENCE AND INTERSECTION
during log propagation so that the locks are in place when synchronization
is started.
The log records are applied to the DTs in sequential order. Thus, the log
L consists of a partially ordered (Bernstein et al., 1987), finite number of log
records, `1 ...`m , that are applied to the DTs in the same order as the logged
operations were applied to the source tables. Note that a partial order only
guarantees the ordering of conflicting operations.
In the diff/int DT creations, source records may contribute to only one
derived record. Furthermore, since it is assumed that duplicates are not removed, each derived record is derived from only one source record. Log propagation has much in common with ARIES redo recovery processing (Mohan
et al., 1992) due to this one-to-one relationship between source and derived
records. A difference is, however, that records may move between DTint and
DTdif f . Since the source candidate keys may not be included in the DTs,
multiple records derived from Sin may be equal to each DTaux record. This
is reflected in the log propagation rules described next.
Propagation of Sin log records
Consider a log record ` ∈ L, describing an insert, update or delete of a
record in Sin . Independent of the operation described in `, the first step of
propagation is to perform a lookup of the record ID in ` in the source RID
index of DTint and DTdif f . The record found is called t.
If the logged operation is an insert of a record r into Sin , and a record t
was found in the source RID lookup, the logged operation is already reflected
and is therefore ignored. If a record t was not found in any of the DTs,
DTaux is scanned for a record with equal attribute values. r is then inserted
into either DTint or DTdif f , depending on whether or not an equal record
was found in DTaux . As mentioned in the preparation section, the cost of
scanning for equal records in DTaux varies greatly with the availability of
good indices. If an index on a unique attribute exist, at most one record has
to be read to determine which table r belongs to. If no indices are present
at all, all records in DTaux may have to be read.
Let `upd describe the update of a record r ∈ Sin . If the record ID of r was
not found in the source RID lookup in DTint and DTdif f , `upd is ignored. t is
guaranteed to reflect all relevant operations that happened before `upd (and
possibly some that happens later) (Løland and Hvasshovd, 2006c). Thus,
not finding t in any of the DTs can only mean that a delete of r will be
described by a later log record `2 ∈ L, `upd ≺ `2 , where a ≺ b means that a
happens before b.
If the record t was found, and `LSN > tLSN , the update described in `upd is
CHAPTER 6. DT CREATION USING RELATIONAL OPERATORS 75
applied. This update may require t to move between the DTs: if the updated
version of t is equal to a record in DTaux , it should be in DTint . Otherwise,
it should be in DTdif f . Moving t to the other DT is done by deleting the old
version of t from one table and inserting the updated version into the other.
Log propagation of delete operations is straightforward. If the record t
was found in the source RID lookup, t is deleted.
Propagation of Scomp log records
In contrast to derived Sin −records, derived Scomp −records may only belong
to one table: DTaux . The reason for maintaining DTaux is only to decide
which of DTint or DTdif f an Sin record should belong to.
Consider a log record `ins ∈ L, describing the insertion of a record r
into Scomp . The log record is ignored if the RID of r is found in a lookup
in the source RID index of DTaux . This means that `ins is already reflected.
Otherwise, r is inserted, and DTdif f is scanned to check if equal records
e1 , . . . , em are represented there. If found, e1 , . . . , em are moved to DTint .
Let `upd ∈ L describe an update of a record r in Scomp . If the source RID
of r is not found in DTaux , `upd is ignored. Otherwise, if the record t with
the described source RID is found, and if `LSN > tLSN , t is updated. This
update may require records to be moved between DTint and DTdif f . DTint
and DTaux are first scanned for records equal to the old version of t. If the
records e1 , . . . , em in DTint are found, and no equal records were found in
DTaux , e1 , . . . , em are moved to DTdif f . DTdif f is then scanned for records
equal to the updated version of t. All matching records are moved to DTint .
Propagation of a delete log record `del ∈ L starts by identifying the derived version r of the record to delete in DTaux . This is done by a lookup
in the source RID index. r is then deleted. If DTaux does not contain other
records that are equal to r, DTint is scanned. All records in DTint that are
equal to r are then moved to DTdif f .
6.1.4
Synchronization
As argued in Section 4.5, synchronization should not be started until the
states of the DTs are very close to the states of the source tables. Hence, in
what follows, we assume that this is the case.
If the blocking complete strategy is used, new transactions are first blocked
from accessing Sin and Scomp . Transactions already active on the source tables
are then either allowed to commit or are forced to abort. When all these
transactions have terminated, a final log propagation iteration is executed.
76
6.1. DIFFERENCE AND INTERSECTION
This makes the derived tables transaction consistent with the source tables.
The DTs are now ready to be used in a schema transformation or as MVs.
The non-blocking strategies differ between schema transformation and
MV creation purposes. They are therefore described separately.
Synchronization for Schema Transformations
When performing non-blocking synchronization for schema transformations,
transactions are allowed to perform updates in the source and derived tables
at the same time. Concurrency control is needed to ensure that the different
versions of the same record are not updated inconsistently. As described in
Section 5.3, this is done by setting locks in both tables.
Each record in Sin or Scomp is derived into only one DT record, and each
DT record is composed of only one source record. The diff/int schema transformations therefore belong to the simple lock forwarding (SLF) category
described in Section 5.3.
Synchronization starts by latching Sin and Scomp for the duration of a
log propagation iteration. Read operations are, however, not affected by
this latch. Since the states of the source and derived tables do not differ
much, this pause should be very brief. This log propagation makes the DTs
action consistent with the source tables. Remember that since locks have
been continuously forwarded by log propagation, all locks on source records
are also set on the derived records.
If the non-blocking abort strategy is used, transactions active on Sin and
Scomp are then forced to abort while new transactions are allowed to access
DTdif f and/or DTint . The aborting transactions can not acquire new locks.
Log propagation continues to forward the undo operations performed by the
aborting source table transactions. Locks forwarded from the source tables
are released once the log propagator encounters the abort log record of the
transaction holding it. When all source table transactions have terminated,
Sin and Scomp may be removed from the schema.
With non-blocking commit, source table transactions are allowed to access
new records. In addition to the lock and operation forwarding from source
to derived table performed in non-blocking abort, locks and operations must
also be forwarded from the derived tables to the source tables. The reason
for this is, of course, that transactions operating on the source tables may
access the records that have been modified in the derived tables. With SLF,
an operation and lock on one DT record t results in an operation and lock
on only the one record that t is derived from. Locks are always acquired
immediately on both versions of the record whereas locks are not released
until the log propagator encounters the transaction’s commit (or abort) log
CHAPTER 6. DT CREATION USING RELATIONAL OPERATORS 77
Vinyl Records
RecID
r1
r2
r3
Artist
Record
LSN
Smith, Jimmy
Jones, Norah
Davis, Miles
Root Down
Come Away With Me
Kind of Blue
101
102
103
CD Records
RecID
r10
r11
r12
Artist
Record
LSN
Krall, Diana
Davis, Miles
Evans, Bill
The Look of Love
Miles Ahead
Waltz for Debby
151
152
153
U
Records
RecID
RIDSrc
r101
r102
r103
r104
r105
r106
r1
r2
r3
r10
r11
r12
Artist
Smith,
Jones,
Davis,
Krall,
Davis,
Evans,
Jimmy
Norah
Miles
Diana
Miles
Bill
Record
LSN
Root Down
Come Away With Me
Kind of Blue
The Look of Love
Miles Ahead
Waltz for Debby
101
102
103
151
152
153
Figure 6.2: Horizontal Merge DT creation.
record.
Synchronization for MV Creations
Since transactions do not update records in the DTs when used to create
MVs, operations will not be forwarded from DTdif f and DTint to Sin . Sin
and Scomp are first latched during one final log propagation iteration. Read
operations are still allowed, however. This log propagation makes the DTs
action consistent with the source tables. An MV maintenance method is
then added to DTdif f and/or DTint , before the source table latches are removed. The MV maintenance strategy is now responsible for keeping the
MV consistent with the source tables, and DT creation is now complete.
6.2
Horizontal Merge with Duplicate Inclusion
The horizontal merge DT creation uses the union relational operator. It takes
records from m source tables, S1 , . . . , Sm , and inserts these into a derived
table DThm . DThm may contain any subset of attributes from the source
78
6.2. HORIZONTAL MERGE WITH DUPLICATE INCLUSION
Vinyl Records
RecID
r1
r2
r3
Artist
Record
LSN
Smith, Jimmy
Jones, Norah
Davis, Miles
Root Down
Come Away With Me
Kind of Blue
101
102
103
CD Records
RecID
r10
r11
r12
r13
Artist
Krall,
Davis,
Evans,
Davis,
Diana
Miles
Bill
Miles
Record
LSN
The Look of Love
Miles Ahead
Waltz for Debby
Kind of Blue
151
152
153
154
U
Records
RecID RIDSrc
r101
r102
r103
r104
r105
r106
r107
r1
r2
r3
r10
r11
r12
r13
Artist
Smith,
Jones,
Davis,
Krall,
Davis,
Evans,
Davis,
Jimmy
Norah
Miles
Diana
Miles
Bill
Miles
Record
LSN
Root Down
Come Away With Me
Kind of Blue
The Look of Love
Miles Ahead
Waltz for Debby
Kind of Blue
101
102
103
151
152
153
154
Figure 6.3: Horizontal Merge with Duplicate Inclusion. Notice the duplicate
Miles Davis albums (record IDs r103 and r107 in the derived table).
tables. An example horizontal merge between two source tables, “Vinyl
records” and “CD records”, is illustrated in Figure 6.2.
The DT may be defined to keep or remove duplicates. If duplicates are
kept, all records in the source tables are represented in the DT. In this case,
DThm is self-maintainable. When duplicates are removed, however, multiple
source records may contribute to the same derived record. DT creation using
this operator requires additional information stored in an auxiliary table to
be self maintainable (Quass et al., 1996). Horizontal merge with duplicate
inclusion is described in this section, whereas duplicate removal is discussed
in Section 6.3.
Figure 6.3 shows a slightly modified version of Figure 6.2 where the two
source tables contain one version of the Miles Davis album “Kind of Blue”
each. As expected when duplicates are not removed, the resulting DT contains both. Notice that the different Source Record IDs (RIDSrc) enables us
to identify which record is derived from which source record.
CHAPTER 6. DT CREATION USING RELATIONAL OPERATORS 79
6.2.1
Preparation
Since all records from S1 , . . . , Sm are represented exactly once in DThm , horizontal merge with duplicate inclusion only suffers from the missing record
and state identification problems. By including the source RID and LSN in
DThm , the derived table is made self-maintainable.
During preparation, the derived table DThm is first added to the schema.
The table may include any subset of attributes from the source tables. Since
it is common for DBMSs to require a primary key in each table, an auto generated primary key may have to be added to DThm . This generated primary
key is not shown in the figures of this section.
DT creation only uses the source RID attribute for record identification.
An index is therefore only required on this attribute.
6.2.2
Initial Population
Initial population starts by writing a fuzzy mark, containing the identifiers of
all transactions active in S1 , . . . , Sm , in the log. The source tables are then
read without the use of locks, and each record is then inserted into DThm .
The resulting initial image in DThm is not consistent with the source tables
at any point in time.
6.2.3
Log Propagation
All source records are represented in DThm when log propagation starts.
Furthermore, each source record contributes to only one derived record, and
each record in DThm is derived from only one source record. Log propagation
can therefore be performed like normal ARIES crash recovery redo work
(Mohan et al., 1992), which was discussed in Section 2.3.
As for difference and intersection DT creation, a fuzzy mark is first written
to the log. The relevant log records, i.e. operations on records in any of
S1 , . . . , Sm since the last fuzzy mark, are then retrieved and applied to DThm
in sequential order.
A log record `ins ∈ L describing the insert of a source record r into
S1 , . . . , Sm starts by checking if the record is already represented in DThm .
This is done by a lookup on r’s RID in the source RID index of DThm . If a
record is found, the log record is ignored; the derived table already reflects
this state (Løland and Hvasshovd, 2006c). If the RID is not found, the
derived version of r is inserted into DThm .
Let the log records `upd ∈ L and `del ∈ L describe the update and deletion,
respectively, of a record r in one of the source tables. Propagation of both
80
6.2. HORIZONTAL MERGE WITH DUPLICATE INCLUSION
Vinyl Records
RecID
r1
r2
r3
r111
CD Records
RecID
r10
r11
r12
r13
Artist
Krall,
Davis,
Evans,
Davis,
Diana
Miles
Bill
Miles
Artist
Record
LSN
Smith, Jimmy
Jones, Norah
Davis, Miles
Peterson, O
Root Down
Come Away With Me
Kind of Blue
The Trio
101
102
103
160
Record
LSN
The Look of Love
Miles Ahead
Waltz for Debby
Kind of Blue
151
152
153
154
U
Records
RecID RIDSrc
r101
r102
r103
r104
r105
r106
r107
r110
r1
r2
r3
r10
r11
r12
r13
r111
Artist
Record
Smith, Jimmy
Jones, Norah
Davis, Miles
Krall, Diana
Davis, Miles
Evans, Bill
Davis, Miles
Peterson, O
Root Down
Come Away With Me
Kind of Blue
The Look of Love
Miles Ahead
Waltz for Debby
Kind of Blue
The Trio
Type
LSN
Vin
Vin
Vin
CD
CD
CD
CD
Vin
101
102
103
151
152
153
154
160
Figure 6.4: The Horizontal Merge shown in Figure 6.3, but with an added
“Type” attribute in the “Records” table. With this information, the log propagator is able to insert the new “Oscar Peterson” record into the correct source
table.
operations starts with a lookup in the source RID index of DThm . If no record
is found, the log record is ignored. If a record t is found, log propagation of
`del simply deletes t. Log propagation of `upd checks the LSN and updates t
if `LSN > tLSN .
If the derived table will be used for a schema transformation, locks are
maintained in DThm as part of log propagation.
6.2.4
Synchronization
The synchronization step is performed in the same way as synchronization
of diff/int DT creation and is therefore not repeated. There is, however, one
potential problem with non-blocking commit during schema transformations.
Consider Example 6.2.1:
CHAPTER 6. DT CREATION USING RELATIONAL OPERATORS 81
Example 6.2.1 (Lack of Information During Non-blocking Commit)
A horizontal merge between two tables containing CD records and Vinyl
records was illustrated in Figure 6.3 on page 78. Notice that the derived
table does not include any information that can be used to determine which
of the source tables a derived record would belong to.
During non-blocking commit synchronization, a transaction inserts a new
vinyl record, “Oscar Peterson, The Trio”, into the derived table “Records”.
Since the fact that the new record is a vinyl record can not be expressed in
the attributes of the DT, the log propagator has no way of knowing which
source table it belongs to.
Example 6.2.1 illustrates an important problem: non-blocking commit
synchronization can only be used for horizontal merge if the log propagator
can determine which source table an inserted DThm record belongs to. When
this is not the case, non-blocking abort must be used instead. Figure 6.4
illustrates how adding a “Type” attribute can be used to solve the problem
of Example 6.2.1.
6.3
Horizontal Merge with Duplicate Removal
Horizontal merge DT creation is more complex when duplicates are removed
since this incurs that multiple source records may contribute to the same
derived record. Although the method still only suffers from the missing
record and state ID problems, these can not be solved simply by adding
source RID and LSN as attributes in DThm . The proposed solution is to
create an auxiliary table A in addition to DThm . The source RIDs and LNSs
are then stored in A.
Figure 6.5(a) illustrates the same horizontal merge shown in Figure 6.3,
but this time with duplicate removal. Thus, the duplicate Miles Davies album
“Kind of Blue” is stored only once in the DT. As shown, the auxiliary table
A contains three attributes: the RID of the record in the derived table, in
the source table, and the current LSN of the record.
Another example of horizontal merge with duplicate removal is illustrated
in Figure 6.5(b). The source tables are equal to those in Figure 6.5(b), but
this time DThm only contains the artist attribute. The result is that all three
“Miles Davis” albums in the source tables are merged into one derived record.
Regardless of the number of source records contributing to a record in DThm ,
A is able to store the record and state identification information required to
perform the creation. Together, DThm and A are self-maintainable.
82
6.3. HORIZONTAL MERGE WITH DUPLICATE REMOVAL
Vinyl Records
RecID
r1
r2
r3
Artist
Record
LSN
Smith, Jimmy
Jones, Norah
Davis, Miles
Root Down
Come Away With Me
Kind of Blue
101
102
103
CD Records
RecID
r10
r11
r12
r13
U
Artist
Krall,
Davis,
Evans,
Davis,
Diana
Miles
Bill
Miles
Unique Records
RecID
LSN
151
152
153
154
ID
Artist
r101
r102
r103
r104
r105
r106
Record
The Look of Love
Miles Ahead
Waltz for Debby
Kind of Blue
Smith,
Jones,
Davis,
Krall,
Davis,
Evans,
Record
Jimmy
Norah
Miles
Diana
Miles
Bill
RIDder RIDSrc LSN
Root Down
Come Away With Me
Kind of Blue
The Look of Love
Miles Ahead
Waltz for Debby
r101
r102
r103
r103
r104
r105
r106
r1
r2
r3
r13
r10
r11
r12
101
102
103
154
151
152
153
(a) Horizontal Merge DT creation with duplicate removal. DTid stores record
and state identification information on derived records.
Vinyl Records
RecID
r1
r2
r3
Artist
Record
LSN
Smith, Jimmy
Jones, Norah
Davis, Miles
Root Down
Come Away With Me
Kind of Blue
101
102
103
CD Records
RecID
U
Unique Artists
RecID
r101
r102
r103
r104
r105
r10
r11
r12
r13
Artist
Krall,
Davis,
Evans,
Davis,
Diana
Miles
Bill
Miles
Record
LSN
The Look of Love
Miles Ahead
Waltz for Debby
Kind of Blue
151
152
153
154
ID
Artist
Smith,
Jones,
Davis,
Krall,
Evans,
Jimmy
Norah
Miles
Diana
Bill
RIDder RIDSrc LSN
r101
r102
r103
r103
r103
r104
r105
r1
r2
r3
r11
r13
r10
r12
101
102
103
154
151
152
153
(b) Horizontal Merge DT creation with duplicate removal. Two records from
“CD” and one from “Vinyl” contribute to the same derived Miles Davis
record.
Figure 6.5: Horizontal Merge DT Creation with Duplicate Removal.
CHAPTER 6. DT CREATION USING RELATIONAL OPERATORS 83
6.3.1
Preparation Step
As discussed, horizontal merge with duplicate removal suffers from the missing record and state identification problems. To solve these, two tables are
required: the derived table, DThm , and an auxiliary table A. A will include
an attribute for the record ID in the derived and source tables, in addition
to the LSN. DThm may consist of any subset of attributes from S1 , . . . , Sm .
Derived records are identified by performing a lookup in A, and DThm does
therefore not have to include the source RID.
An index is created on the RID in DThm and on both source RID and derived RID in A. Records are considered duplicates in DThm if they have equal
attribute values. This check for equality must be performed very frequently
by the DT creation process. Another index should therefore be added to the
attribute in DThm that differ the most between the source records. If it is
not clear to the DBA which attribute is more divergent between the records,
statistics should be acquired from the DBMS.
6.3.2
Initial Population Step
As for DT creation using other relational operators, the initial population
starts by writing a fuzzy mark to the log. This log record contains the
identifiers of all transactions active on any of the source tables, S1 , . . . , Sm .
The source tables are then read fuzzily, and the resulting set of records,
denoted SR, are inserted into DThm .
Insertion of a source record r ∈ SR into DThm starts by performing a
lookup in DThm . This is done to identify if a record teq with equal attribute
values is already represented. An index on a divergent attribute, as described
in the previous section, would speed up this search tremendously. If no equal
record is found, a record tnew , containing the wanted subset of attribute
values from r, is inserted into DThm . A record a is then inserted into A. a
consists of the RID of r, the RID of tnew and the LSN of r.
If an equal record teq was found in DThm , the insertion into DThm is not
performed. Instead, a is inserted into A, consisting of the RID of r, the RID
of teq and the LSN of r.
6.3.3
Log Propagation Step
Since the source RIDs are only stored in the auxiliary table A, all log propagation rules must perform lookups in this table to identify derived records.
When the log propagator has written a fuzzy mark to the log, all log
records relevant to S1 , . . . , Sm is retrieved. The log records are then applied
84
6.3. HORIZONTAL MERGE WITH DUPLICATE REMOVAL
in sequential order. If DThm will be used to perform a non-blocking schema
transformation, locks should be maintained as part of log propagation, as
discussed in Section 5.3.
Let the log record `ins ∈ L describe the insertion of a record r into one
of the source tables S1 , . . . , Sm . Propagation starts by performing a lookup
on the RID of r in A. If the RID is found, the log is already reflected in
DThm , and `ins is therefore ignored. Even if the record is not represented in
DThm , it may still be a duplicate of an existing record. Thus, DThm must be
scanned to check if an existing record teq with equal attribute values exists.
Assuming that an equal record is not found in the scan of the derived
table, a record tnew , derived from r, is inserted into DThm . A record anew ,
containing the RID of r and tnew , is then inserted into A. The LSN of this
record is set to that of `ins .
If a duplicate record teq is found in DThm , however, only the insert of aeq
into A is performed. This record stores the RID of teq instead of tnew , but is
otherwise equal to anew .
Consider a log record `del ∈ L, describing the deletion of a record r from
any of S1 , . . . , Sm . A lookup is first performed on the source RID index of A.
Assuming that a record adel with the RID of r is found, another lookup is
performed on A’s derived RID index. If adel is the only record in A with this
derived RID, r is the only source record contributing to the derived record
t ∈ DThm , where the RID of t is equal to the derived RID of adel . In this
case, both t and adel are deleted. Otherwise, if adel is not the only record in
A with this derived RID, only adel is removed.
New duplicates may form, and old duplicates may be split as a result of
update operations. As for insert and delete operations, log propagation of a
log record `upd ∈ L, describing an update of a source record r, starts with a
lookup in the source RID index of A. If a record a ∈ A with the source RID
of r is not found, or if the LSN of a indicates that a newer state is already
reflected, the `upd is ignored. If `LSN > aLSN , however, the update should be
applied. In this case, a lookup in the derived RID index of A is performed
to identify any duplicates to the pre-update version of the record t ∈ DThm
derived from r. If a is the only record in A with this derived RID, t does not
represent duplicates.
Assume for now that t is only derived from r. DThm is now scanned to
find if there is a record with equal attribute values to t after `upd has been
applied to it. If there is not, t is updated as dictated by `upd . If the updated
record is a duplicate of a record tdup , however, t is deleted, and the derived
RID of a is updated to refer to the RID of tdup .
In the case that t is derived from more source records than r alone,
`upd can not be applied directly to t. DThm is first scanned to find if t has
CHAPTER 6. DT CREATION USING RELATIONAL OPERATORS 85
duplicates after `upd has been applied. If the updated record is a duplicate of
tdup ∈ DThm , the derived RID of a is updated to refer to tdup . If the updated
version of t is not equal to any existing record in DThm , a new record tnew is
inserted into DThm . tnew represents t after `upd has been applied to it. The
derived RID of a is then set to the RID of tnew .
In all four update cases described above, the LSN of a is updated to the
LSN value of `upd .
6.3.4
Synchronization Step
The blocking complete synchronization, non-blocking abort synchronization
for schema transformations and non-blocking synchronization for MV creation strategies works like described for difference and intersection. These
strategies are not described further. The non-blocking commit strategy for
schema transformations is different, however, and is therefore described next.
The reason for this difference is that multiple source records may contribute
to the same derived record. Hence, horizontal merge with duplicate removal
belongs to the Many-to-One Lock Forwarding (M1LF) category.
Non-blocking commit synchronization of schema transformations start by
latching S1 , . . . , Sm while a log propagation iteration is performed. The
latches do not affect read operations in the source tables. When the iteration is complete, DThm is action consistent with S1 , . . . , Sm .. Because locks
have been maintained as part of log propagation, locks that are set on source
records are also set on their counterparts in DThm . These locks must use the
modified lock compatibility matrix in Figure 5.4 on page 65.
With the modified lock compatibility, locks forwarded from source records
do not conflict with each other. New transactions are now allowed to operate
on records in DThm , and the transactions that are active in S1 , . . . , Sm may
continue processing on the source tables. Since the old transactions are
allowed to access new records, locks and operations must also be forwarded
from the DT to the source tables. Hence, for the rest of the synchronization
step, transactions in DThm must acquire locks on both the DThm record
t it tries to access and all source records in S1 , . . . , Sm that t is derived
from. The log propagator continues to process the log to ensure that the
operations executed in the source tables are also executed in the DT and vice
versa. Forwarded locks are not released until the log propagator processes
the commit or abort log record of the transaction owning a lock. When all
source transactions have terminated, S1 , . . . , Sm and A may be removed from
the schema.
86
6.4. HORIZONTAL SPLIT TRANSFORMATION
Music
Artist
Record
Type
Smith, Jimmy
Evans, Bill
Davis, Miles
Krall, Diana
Peterson, O
Root Down
Intuition
Kind of Blue
Live In Paris
The Trio
Vinyl
Vinyl
CD
DVD
CD
RID LSN
r1
r2
r3
r4
r5
101
102
103
104
105
Vinyl Records
Artist
Record
Smith, Jimmy
Evans, Bill
RIDSrc LSN
Root Down
Intuition
r1
r2
101
102
CD Records
Artist
Davis, Miles
Peterson, O
Record
RIDSrc LSN
Kind of Blue
The Trio
r3
r5
103
105
Music DVDs
Artist
Record
Krall, Diana
Live In Paris
RIDSrc LSN
r4
104
Figure 6.6: Horizontal Split DT creation. Grey attributes are used internally
by the DT creation process, and are not visible to normal transactions.
6.4
Horizontal Split Transformation
Horizontal split DT creation uses the selection relational operator. The transformation takes records from one source table, S, and distributes them into
two or more derived tables DT1 , . . . , DTm by using selection criterions. An
example horizontal split of a table containing music on different media is illustrated in Figure 6.6. Other examples include splitting an employee-table into
“New York employee” and “Paris employee” based on location, or into “high
salary employee” and “low salary employee” based on a salary condition like
“salary > $40.000”. The selection criterions may result in non-disjoint, i.e.
overlapping, sets, and may not include all records from the source table.
6.4.1
Preparation
Horizontal split suffers from the missing record pre-state problem since the
selection criterions may not include all records. As an example, consider the
employee table that was split into New York and Paris offices. An employee
in London would not match any of these, and is therefore not part of any
of the resulting DTs. If, during DT creation, the employee is moved to the
CHAPTER 6. DT CREATION USING RELATIONAL OPERATORS 87
Paris office, the old state of the record is required before it can be updated
and inserted into the table containing Paris employees. The reason for this
is that update log records only contain the new values of the attributes that
are changed.
The missing record pre-state problem is solved by adding an auxiliary
table A, containing all records that do not qualify for any of the DTs. The
selection criterion for this table is the negated selection criterion of all the
DTs. Thus, all records are guaranteed to belong to either one or more of the
derived tables DT1 , . . . , DTm , or to A.
Horizontal split DT creation also suffers from the missing record and state
identification problems. These problems are solved by including the source
RID and LSN in all derived tables.
The preparation step consists of creating one table for each selection
criterion result set and one for the auxiliary information. In this section,
DT1 , . . . , DTm , A will be called the derived tables.
All tables must include the source RID and LSN, in addition to any subset of attributes from S. As for difference and intersection, it is assumed
that a candidate key from S is among the included subset of attributes.
Alternatively, the derived tables may include a generated key, e.g. an autoincremented number, that is assigned to all records. Thus, duplicate removal
is not considered here. If required, duplicate removal as described for horizontal merge in Section 6.3 may be used.
The log propagation rules always use the source record ID to identify
records. Indices on other attributes are therefore not required.
6.4.2
Initial Population
Initial population starts by writing a fuzzy mark, containing the transaction
identifiers of all transactions active on S, in the log. S is then read without
setting locks, and each source record is then inserted into one or more derived
tables, depending on the selection criterions it satisfies. If the record does
not match any selection criterion, it is inserted into A.
6.4.3
Log propagation
After initial population, all source records are represented in at least one
derived table. Also, the derived records have a source RID and LSN defining
which record and state they represent. With this information, the derived
tables are self-maintainable.
Each log propagation iteration starts by writing a fuzzy mark to the log L.
All log records between the last fuzzy mark and the new one that are relevant
88
6.4. HORIZONTAL SPLIT TRANSFORMATION
to S are then retrieved. The log records are then processed sequentially using
the propagation rules described below. If the DTs will be used to perform a
non-blocking schema transformation, locks are maintained on derived records
as part of the log propagation.
Consider a log record `ins ∈ L, describing the insert of a record r into
S. A lookup is first performed on the source RID indices of DT1 , . . . , DTm
and A. If r’s record ID, rRID , is found in any of these tables, the operation
is already reflected and is ignored. Otherwise, r is evaluated with respect to
the selection criterions and inserted into all DTs it matches.
Propagation of an update log record `upd ∈ L, updating a record r in S,
starts by identifying all records t1 , . . . , tn ∈ DT1 , . . . , DTm , A derived from
it. This is done by performing a lookup of rRID in the source RID index of
the derived tables.
Since operations performed on a source record are always applied to all
derived versions of it, all records in DT1 , . . . , DTm , A derived from r have the
same LSNs. More formally,
∀tx ∀ty (tx , ty ∈ {DT1 , . . . , DTm , A}, txRIDSrc = tyRIDSrc ⇒ txLSN = tyLSN )
where txRIDSrc and tyRIDSrc are the source RIDs for the derived records tx
and ty , respectively. This can be used to determine whether or not `upd
has already been applied by inspecting the LSN of only one of the derived
records.
If none of the attributes that are updated by `upd are used in the selection
criterions, t1 , . . . , tn are simply updated. If a selection criterion attribute is
updated, however, two sets of derived tables are identified. The first is the
set Ppre of derived tables where the pre-update version of the records derived
from r were stored. These are the same tables that t1 , . . . , tn were found in.
The second is the set Ppost of DTs where the updated versions of the records
derived from r should be stored. The update is processed in two steps:
First, for all derived records t ∈ {t1 , . . . , tn } identified by the initial source
RID lookup, t is deleted if t is stored in table T and T 6∈ Ppost . Otherwise,
if t is stored in T and T ∈ Ppost , t is updated with the new attribute values
of `upd . When all records found in the initial lookup have been processed,
the updated version of the derived record is inserted into all tables I where
I 6∈ Ppre and I ∈ Ppost .
When a delete log record `del ∈ L is encountered by the log propagator,
a lookup is performed on the source RID index of all derived tables, using
the RID of the record described in the `del . The records that are found are
simply deleted.
CHAPTER 6. DT CREATION USING RELATIONAL OPERATORS 89
6.4.4
Synchronization
The blocking complete and non-blocking MV synchronization strategies work
in the same way as described for difference and intersection. Hence, we only
focus on non-blocking synchronization for schema transformations here.
Synchronization for Schema Transformations
Non-blocking abort starts by latching S during a log propagation iteration
that makes DT1 , . . . , DTm , A action consistent with S. The latch does not
affect read operations. With horizontal split, a record in DT1 , . . . , DTm , A
may only be derived from one source record, while one source record may contribute to multiple DT records. Hence, this schema transformation belongs
to the one One-to-Many Lock Forwarding (1MLF) category.
In 1MLF, one source lock may have to be forwarded to multiple DT
records since a derived record may belong to multiple DTs. As always, the
next steps of non-blocking abort are to release the latch and force transactions in the source table to abort. Locks forwarded from S are released in
DT1 , . . . , DTm once the abort log record of the transaction holding the lock
is encountered by the log propagator. When all source transactions have
terminated, S and A may be removed from the schema.
Since the transactions on S may access new records, non-blocking commit
synchronization requires the log propagator to forward operations performed
on a record r in DT1 , . . . , DTm to the record s ∈ S it is derived from. However, the operation must also be propagated to all records s contributes to,
as described in Section 5.3. If not, the other records t1 , . . . , tu derived from s
would not be consistent with r. As discussed in Section 5.3, this transaction
behavior differs from the behavior after synchronization has completed. If
this is not acceptable, non-blocking abort should be used instead.
6.5
Vertical Merge
The vertical merge DT creation method creates a derived table DTvm by
applying the full outer join (FOJ) operator on two source tables, Sl and Sr .
Sl is the left, and Sr the right table of the join. In contrast to inner join and
left and right outer join operators, FOJ is lossless in the sense that records
with no join match are included in the result. In addition to being lossless,
there are multiple reasons for focusing on full outer join. First, the full outer
join result can later be reduced to any of the inner/left/right joins by simply
deleting all records that do not have the necessary join matches, whereas
going the opposite direction is not possible. Second, full outer join is the
90
6.5. VERTICAL MERGE
Employee
PostalAddress
F.Name
S.Name
Address
Zip
RecID LSN
Hanna
Erik
Markus
Sofie
Valiante
Olsen
Oaks
Clark
Moholt 3
Torvbk 6
Mollenb.7
Berg 1
7030
5121
7020
7020
r01
r02
r03
r04
10
11
12
13
Zip
City
5121
7020
9010
Bergen
Tr.heim
Tromsø
RecID LSN
r11
r12
r13
14
15
16
EmployeePost
F.Name
S.Name
Address
Hanna
Erik
Markus
Sofie
NULL
Valiante
Olsen
Oaks
Clark
NULL
Moholt 3
Torvbk 6
Mollenb.7
Berg 1
NULL
PCode
7030
5121
7020
7020
9010
City
NULL
Bergen
Tr.heim
Tr.heim
Tromsø
RID_L LSN_L RID_R LSN_R
r01
r02
r03
r04
NULL
10
11
12
13
NULL
NULL
r11
r12
r12
r13
NULL
14
15
15
16
Figure 6.7: Example vertical merge DT creation.
only one of these operators that does not suffer from the missing record prestate problem since all source records are represented at least once in the DT
(Løland and Hvasshovd, 2006b). An example vertical merge DT creation is
shown in Figure 6.7. The figure will be used as an example throughout this
section.
Vertical merge DT creation suffers from the missing record and state identification problems. As argued in Section 5.1, these problems can be solved
by including the record IDs and LSNs from both source tables in DTvm . This
method is used in this section. An alternative method has been presented by
the authors (Løland and Hvasshovd, 2006c), however. It uses candidate keys
to identify records and totally ignores LSNs in DTvm . Because the latter
method does not solve the record and state identification problems, it has
some flaws compared to the one used here. First, the log propagation rules
are much less intuitive and second, it cannot handle semantically rich locks
(Løland and Hvasshovd, 2006b). Thus, in contrast to the method presented
here, it cannot handle delta updates (Korth, 1983). On the other hand, it
requires slightly less storage space since the two source RID attributes and
an additional LSN are not added to DTvm .
In the following sections, the four steps of DT creation are explained in
detail for the vertical merge operator.
CHAPTER 6. DT CREATION USING RELATIONAL OPERATORS 91
6.5.1
Preparation
During preparation, the derived table DTvm is added to the database schema.
This table may include a subset of attributes from the two source tables, but
some attributes are mandatory. To solve the record and state identification
problems, the source RIDs and LSNs from both Sl and Sr are needed.
Since records that should be affected by a logged operation are identified
by the RID provided by the log record, indices are added to each of the
source RID attributes. In addition, an index is added to the join attribute(s)
in DTvm since new join matches may have to be identified as a result of inserts
and updates of source table records. Together, these indices provide direct
access to all affected records for any operation that may be encountered.
6.5.2
Initial Population
As for the other relational operators, initial population starts by writing a
fuzzy mark to the log, containing the identifiers of transactions that have
accessed two source tables Sl or Sr . The source tables are then read without
using locks. Once read, the full outer join operator is applied, and the joined
records are inserted into DTvm . At this point, the state of DTvm is called the
initial image.
6.5.3
Log Propagation
No assumption is made on whether the join is over a one-to-one, one-to-many
or a many-to-many relationship. An implication of this is that records from
both source tables may be represented in multiple records in DTvm . In what
follows, it is assumed that the vertical merge is defined over an equijoin. The
method can, however, be modified to use other comparison operators or, in
the case of cartesian product, no comparison operator.
Insert rules
Consider a log record `ins ∈ L, describing the insertion of a record r into
source table S, S ∈ {Sl , Sr }. The first step of propagating `ins is to perform
a lookup of the RID of r, denoted rRID , in either the left or right source
RID index of DTvm . The index to use depends on which source table r was
originally inserted into. If one or more records in DTvm have rRID as the
source RID, the logged operation is already reflected, and `ins is ignored.
If no records with the source RID value of rRID are found in DTvm , every
DTvm record with a join attribute matching that of r, denoted rJM , are
identified. The set of records with a matching join attribute value is called
92
6.5. VERTICAL MERGE
JM (Join Match). If no join matches were found, i.e. JM = ∅, r is joined
with the null-record and inserted into DTvm . The source RID and LSN is set
to that of the `ins .
If one or more join matches were found, all records t ∈ JM are composed
of two source records. One of these is from the same table as r, and is denoted
t1 . The other part, t2 , is from the other source table. If two or more records
in JM consist of the same t2 part,only one of these records are used. Thus,
for each record t ∈ JM with a t2 −part that has not already been processed,
t is updated with the attribute values of r iff t1 is the null-record. If t1 is not
the null-record, the attribute values of t2 are read and joined with r. This
new record is then inserted into DTvm with source RIDs and LSNs from both
`ins and t2 .
Update rules
Propagation of an update log record `upd ∈ L, updating the record r from
source table S, S ∈ {Sl , Sr }, starts by identifying all records in DTvm partially
composed of r. This is done by a lookup of rRID in either the left or right
source RID index of the derived table, depending on which table r belongs
to. The set of records that are found is called P. If P = ∅, or if the LSN of all
records p ∈ P is greater than the LSN of `upd , the log record is ignored. As
argued in Section 6.4.3, the LSN of all p ∈ P are equal. The LSN is therefore
checked only once.
The logged update is applied to DTvm if P 6= ∅ and the LSN of p is lower
than that of `upd . If the join attribute values of r are not updated, all records
p ∈ P are simply updated with the new attribute values and LSN of `upd .
This is similar to crash recovery redo work as described in Section 2.3, but
applied to multiple instances of the same source record.
If the join attribute of r is updated, however, log propagation becomes
more complex. An additional set, N (new join matches), is first defined. N
contains all records in DTvm that matches the updated join attribute value
of r. It is found by a lookup on the join attribute index on DTvm .
Since we assume that the vertical merge DT creation is defined over an
equijoin, P and N are disjoint, i.e. not overlapping sets. Records in P
are first processed. For each DTvm record p, p ∈ P : p is composed of two
joined source records, r and one record p2 from the other source table. If p2 is
represented in at least one other record in DTvm , p is deleted. This is checked
by a lookup of p2 source RID in the index of DTvm . If p2 is not represented
in any other record in DTvm , however, p2 is joined with the null-record. This
is done because all source records must be represented when using the full
outer join operator.
CHAPTER 6. DT CREATION USING RELATIONAL OPERATORS 93
If N = ∅, r is padded with null-values and inserted. If N 6= ∅, each record
n ∈ N is analyzed. Again, n is composed of two joined records: one record
n1 from the same table as r, and one record n2 from the other table. If n is
composed of n2 and the null-record, n is updated with the attribute values
of r. If n is the join of n2 and another record n1 , a new record is inserted
into DTvm containing the join of the r and n2 . In both cases, source RID and
LSN is set to reflect the log record.
Delete rules
The propagation of a delete log record `del ∈ L is fairly intuitive. First, the
set D of all records in DTvm consisting of r are identified. For each record
d ∈ D, d is deleted if it consists of r joined with the null record or a record
d2 that is represented in at least one other record in DTvm . If d is the only
record in DTvm that contains d2 , d2 is joined with the null record.
6.5.4
Synchronization
The synchronization step is started when the state of DTvm is very close to the
states of Sl and Sr . The blocking complete and non-blocking synchronization
of MV strategies work as described for difference and intersection. These are
not explained further.
Non-blocking Synchronization for Schema Transformations
As discussed in Section 5.3, vertical merge DT creation belongs to the manyto-many lock forwarding (MMLF) category. This means that the modified
lock compatibility matrix, shown in Figure 5.4 on page 65, must be used so
that locks forwarded from the source tables to DTvm do not conflict with
each other. The source locks do, however, conflict with locks acquired on
DTvm records. Note that if there are many more Sl records than Sr records,
a few locks in Sr may result in a considerable amount of locks in DTvm .
If the non-blocking abort strategy is used, transactions active on Sl and
Sr are forced to abort, while new transactions are allowed to start processing
on the unlocked records in DTvm . Log propagation continues to apply the
undo operations performed by the aborting source table transactions. Source
locks in DTvm are removed once the abort log record of the owner transaction
has been processed. Sl and Sr may be removed when all source transactions
have terminated.
In the case of non-blocking commit, transactions active on the source
tables are allowed to access new records. A consequence of this is that locks
94
6.5. VERTICAL MERGE
Employee
Position
Name
Address
Hanna
Erik
Markus
Sofie
Moholt 3
Torvbk 6
Mollenb.7
Berg 1
PosID
RID
LSN
PosID
PosTitle
Salary
RID
LSN
005
005
050
052
r01
r02
r03
r04
10
11
12
13
001
005
052
050
sec.tary
QA
proj mgr
sw arch
$23’
$34’
$48’
$33’
r11
r12
r13
r18
14
72
16
68
Employee
Name
Address
PosID
PosTitle
Salary
Hanna
Erik
Markus
Sofie
NULL
Moholt 3
Torvbk 6
Mollenb.7
Berg 1
NULL
005
005
050
052
NULL
QA
QA
sw arch
proj mgr
sec.tary
$34’
$34’
$33’
$48’
$23’
RID_L RID_R LSN_L LSN_R
r01
r02
r03
r04
NULL
r12
r12
r18
r13
r11
10
11
12
13
NULL
72
72
68
16
14
Figure 6.8: The updated salary of Hanna (see Example 6.5.1) requires a
salary update of the “QA” position, resulting in an increased salary for all
“QA” personel.
and operations from transactions in DTvm must be forwarded to Sl and Sr .
This makes synchronization more complicated because the update of a record
t in DTvm may have to be propagated not only to the source records tl and
tr that t is derived from, but to all DTvm records derived from tl and tr .
Consider the following example:
Example 6.5.1 (Updates during non-blocking commit)
Figure 6.8 illustrates the vertical merge schema transformation of source
tables Employee and Position. There are two employees with position “QA”.
During non-blocking commit synchronization, a transaction in DTvm updates
the salary attribute of Hanna from “$33,000”to “$34,000”. This update
requires that the “QA” record in the Position source table is locked and
updated accordingly. Furthermore, the Erik record in DTvm , which is also
derived from this source record, has to be locked and updated with a new
salary to maintain consistency.
It should be clear from Example 6.5.1 that the choice of non-blocking
abort vs. commit is not transparent for transactions operating on DTvm .
With non-blocking commit, the behavior of transactions operating during
synchronization differs from that of transactions after synchronization completes.
CHAPTER 6. DT CREATION USING RELATIONAL OPERATORS 95
Employee
EmpID
Name
Address
01
02
03
04
Hanna
Erik
Markus
Sofie
Moholt 3
Torvbk 6
Mollenb.7
Berg 1
Salary
$40’
$32’
$42’
$35’
RID LSN
r1
r2
r3
r4
101
102
103
104
Salary
ModifiedEmp
EmpID
Name
Address
01
02
03
04
Hanna
Erik
Markus
Sofie
Moholt 3
Torvbk 6
Mollenb.7
Berg 1
RID LSN
r1
r2
r3
r4
101
102
103
104
EmpID
Salary
01
02
03
04
$40’
$32’
$42’
$35’
RID LSN
r1
r2
r3
r4
101
102
103
104
Figure 6.9: Vertical split over a Candidate Key.
Before choosing to use non-blocking commit synchronization, the consequences must be considered carefully. The increased number of locks required, forwarding of numerous update operations between the tables and
the non-transparent behavior of transactions operating on DTvm during and
after synchronization may outweigh the fact that transactions on the source
tables are not aborted.
6.6
Vertical Split over a Candidate Key
Vertical split is the inverse of the full outer join DT creation method described
in the previous section, and uses the projection relational operator. It takes
one source table S as input, and creates two derived tables, DTl (left result)
and DTr (right result), each containing a subset of the source table attributes.
Some attributes, called the split attributes, must be included in both DTs.
These attributes can later be used to join DTl and DTr . In what follows, we
assume that the only overlap between the attribute sets in DTl and DTr are
the split attributes.
If the split attributes form a candidate key in S, each source record will be
derived into exactly one record in each of the DTs, and each record in the DTs
will be derived from exactly one source record. The DT creation described in
this section therefore belongs to the One-to-Many Lock Forwarding category.
An example split is illustrated in Figure 6.9.
If S is split over a functional dependency that is not a candidate key in
S, multiple source records may have equal split attribute values and may
96
6.6. VERTICAL SPLIT OVER A CANDIDATE KEY
therefore contribute to the same derived record in DTr . This type of split
is typically executed to perform a normalization of the database schema.
Vertical split DT creation over a functional dependency is described in Section 6.7.
6.6.1
Preparation
In vertical split, two derived tables, DTl and DTr , are first added to the
schema. They typically include two different subsets of attributes from S,
but must both include the candidate key used as the split attribute.
Both DTl and DTr suffer from the missing record and state identification
problems. Since the records in the DTs are derived from exactly one source
record, these problems are solved by adding source RID and LSN directly to
the derived tables.
Since log propagation will identify all derived records based on the source
RID, indices are only required on this attribute in DTl and DTr .
6.6.2
Initial Population
Initial population starts by writing the fuzzy mark, containing the identifiers
of all transactions active in S, to the log. S is then read fuzzily, and for each
record found in S, one record is inserted into DTl and DTr .
6.6.3
Log Propagation
Log propagation is run in iterations. Each iteration writes a new fuzzy mark
to the log L, and then retrieves log records relevant to S since the last fuzzy
mark (or since the oldest operation of active transactions in the first iteration). These log records are then applied to DTl and DTr in sequential order.
If the DTs will be used to perform a non-blocking schema transformation,
locks are maintained as part of log propagation.
When a log record `ins ∈ L, describing the insertion of a record r into S,
is encountered by log propagation, a lookup of the RID of r is first performed
in the source RID index of one of the DTs. If the RID is found, r is already
reflected in the DTs. If not, the wanted subset of attributes from r is inserted
into DTl and DTr . Note that it is not necessary to perform a RID lookup in
both derived tables since r is either reflected in both or none of them.
A delete log record, describing the deletion of a record r from S, is propagated by performing a lookup in the source RID index of one of the DTs.
The log record is ignored if the source RID is not found. Otherwise, the
CHAPTER 6. DT CREATION USING RELATIONAL OPERATORS 97
identified record is deleted, and the same process is then applied to the other
DT.
Consider a log record `upd ∈ L, describing an update of a record r in S.
Again, log propagation starts with a lookup in the source RID index of one of
the DTs. `upd may affect attributes in only DTl or DTr . If so, the lookup is
performed in this DT. Assuming that a derived record t with correct source
RID is found, and that t has a lower LSN than `upd , the described update is
applied. If `upd affects attributes in the other DT as well, the procedure is
repeated for that table.
Most DBMSs do not allow primary key updates. Thus, the described
rules work under the assumption that the DT primary key attributes are not
updated. This assumption holds if the same attributes are used as primary
keys in S and the DTs. The vertical split example in Figure 6.9 illustrates
this. If another candidate key from S is used as primary key in DTl and DTr ,
however, the log propagator may encounter updates of these attributes. If
this is the case, the described update rules must be modified to delete the preupdate derived records and then insert the updated ones unless the DBMS
allows primary key updates.
6.6.4
Synchronization
Vertical split over a candidate key belongs to the 1MLF category. The synchronization strategy works exactly as described for horizontal split in Section 6.4, and is therefore not repeated here.
6.7
Vertical Split over a Functional Dependency
This section describes vertical split DT creation when the split attributes are
not a candidate key in S. This may, e.g., be done to perform a normalization
(Elmasri and Navathe, 2004) of the database schema. An example split over
a non candidate key is illustrated in Figure 6.10. As can be seen, the source
table “Employee” has two functional dependencies, and is split over “zip”,
which is not a candidate key:
firstname,surname → zip
zip → city
The legend is the same as was used in the previous chapter. Thus, the DT
creation method splits a table S into two derived tables DTl and DTr , each
98
6.7. VERTICAL SPLIT OVER A FUNCTIONAL DEPENDENCY
Employee
F.Name
S.Name
Zip
City
Hanna
Erik
Markus
Sofie
NULL
Valiante
Olsen
Oaks
Clark
NULL
7030
5121
7020
7020
9010
NULL
Bergen
Tr.heim
Tr.heim
Tromsø
RID LSN
r1
r2
r3
r4
r5
101
102
103
104
105
PostalAddress
ModifiedEmp
F.Name
S.Name
Zip
RIDSrc LSN
Zip
City
#
Hanna
Erik
Markus
Sofie
Valiante
Olsen
Oaks
Clark
7030
5121
7020
7020
r1
r2
r3
r4
5121
7020
9010
Bergen
Tr.heim
Tromsø
1
2
1
101
102
103
104
Figure 6.10: Vertical split over a non candidate key.
containing a subset of attributes from the source table. Both tables must
include the split attributes.
A consequence of splitting S over a non candidate key is that multiple
source records may have the same split attribute value, e.g. multiple employees with the same zip. These source records should be derived into only
one record in DTr . Furthermore, a record in DTr should only be deleted if
there are no more records in S with that split attribute value. To be able to
decide if this is the case, a counter, similar to that of Gupta et al. (Gupta
et al., 1993), is associated with each DTr record. When a DTr record is first
inserted, it has a counter of 1. After that, the counter is increased every time
an equal record is inserted, and decreased every time one is deleted.
Before the method is described in detail, we show that S may contain inconsistencies that complicates DT creation. Consider the following example:
Example 6.7.1
Consider the table “Employee” below. This table is used as a source table
to perform the DT creation illustrated in Figure 6.10.
CHAPTER 6. DT CREATION USING RELATIONAL OPERATORS 99
Firstname
Hanna
Erik
Markus
...
Sofie
Surname
Valiante
Olsen
Oaks
...
Clark
Zip
9010
5121
7020
...
7020
City
Tromsø
Bergen
Trondheim
...
Trnodheim
There are intentionally two functional dependencies in this table:
firstname,surname → zip
zip → city
Notice, however, that there is an inconsistency between employees Markus
and Sofie since the zips are the same, whereas the city names differ. Nothing prevents such inconsistencies from occurring in this table, and the DT
creation framework can not decide whether “Trondheim” or “Trnodheim” is
correct. One of the main reasons for normalization is to avoid such inconsistencies from occurring in the first place.
If inconsistencies like the one in Example 6.7.1 exist in S, we are not able
to perform a split transformation without taking measures.
For readability, vertical split over a non candidate key is first explained
under the unrealistic assumption that inconsistencies never appear between
records in S 3 . This provides an easy-to-understand basis for the scenario
where inconsistencies may occur. An extension that can handle inconsistencies is then explained in Section 6.7.5.
6.7.1
Preparation
As for vertical split over a candidate key, the DTs suffer from the record
and state identification problems. For DTl , this problem can be solved by
adding the source RID and LSN as attributes to the table. This can not
easily be done with DTr , however. The reason for this is that each record
in DTr may be derived from multiple source records. A possible solution
to the missing record and state identification problems of DTr would be to
create an auxiliary table A, containing the record IDs from DTr , the source
record IDs and the LSNs. This solution was used in the horizontal merge
with duplicate removal DT creation described in Section 6.3. As will be
3
Note that the simplified method can not handle semantically rich locks. Semantically
rich locks (Korth, 1983) were described in Chapter 2.
100
6.7. VERTICAL SPLIT OVER A FUNCTIONAL DEPENDENCY
clear from the following sections, however, all the required record and state
identification information can be found in DTl . Hence, the auxiliary table is
not needed.
During preparation, DTl is first added to the database schema. In addition to the wanted subset of attributes from S and the split attributes, source
RID and LSN are required. The LSNs will be used to achieve idempotence
in both derived tables. The DT creation process will use the source RID
attribute of DTl for all lookups. Hence, an index should be added to this
attribute.
DTr is then added with a subset of attributes from S. Only the split attributes are allowed in both DTs. Instead of the normal source RID attribute,
a counter is added to DTr . Since the split is over a functional dependency,
the split attributes form a candidate key in DTr , and these should therefore
be defined as either primary key or unique. The DT creation process will
use the split attributes for lookup, and an index should therefore be added
to these.
If the DTs are used to perform a schema transformation, an alternative
strategy is to only create the DTr table. Since all attributes needed in DTl
are already present in S, S can be renamed to DTl during synchronization
after removing unwanted attributes from it. The transformation would require less space, and updates that would not affect attributes in DTr could
be ignored by the log propagator. Unfortunately, the log propagator needs
information on both the LSN and the split attribute value of each record in
DTl . An auxiliary table A would therefore be needed to keep track of this information during propagation. Although A may potentially be much smaller
than DTl , this section describes how the method works when DTl is created
as a separate table. Only minor adjustments are needed for the alternative
auxiliary method to work.
6.7.2
Initial Population
Initial population starts by writing a fuzzy mark in the log. The fuzzy mark
contains the identifier of all transactions active on S at this point in time.
After performing a fuzzy read of S, the records are inserted into DTl and
DTr . Insertion into DTl is straightforward; the wanted subset of attributes is
inserted together with the source RID and LSN of the record in S. A lookup
is then performed on the split attribute index of DTr .
If a record with the same split attribute value already exists, the counter
of that record is increased4 . If the split attribute value is not found in DTr ,
4
Recall that for now, it is assumed that all records with equal split attribute values are
CHAPTER 6. DT CREATION USING RELATIONAL OPERATORS 101
a new record is inserted. It consists of the wanted subset of attributes, and
a counter value of one.
6.7.3
Log Propagation
Log propagation is started once the initial images have been inserted into
DTl and DTr . Each iteration starts by writing a fuzzy mark to the log L,
and then retrieves all log records relevant to S. The log records retrieved
are then applied in sequential order to the DTs. If the DTs will be used in
a non-blocking schema transformation, locks are maintained as part of log
propagation.
In general, log propagation for records in DTl is more intuitive than for
records in DTr . The reason for this is that each record in DTl is derived from
exactly one source record. This is not the case for the records in DTr , which
may be derived from multiple source records.
Log propagation of records in DTr must be treated with care. Since
an arbitrary number of source records may contribute to the same derived
record, the source RID can not be used for identification simply by adding
it as an attribute. Instead, the split attribute value of the corresponding
DTl record is used for identification. Since there is a one-to-one relationship
between source records and records in DTl , the value to search for is found
by reading the record tl ∈ DTl with correct source RID. Furthermore, DTr
does not provide correct state identifiers since multiple source records may
contribute to each record. Thus, the LSN of tl will be used to determine if
a log record is already reflected in DTr . By reading tl , both the record and
state identification problems are solved.
The records in DTr may have incorrect state identifiers during DT creation. The reason for this is that there is only one LSN for each derived
record tr ∈ DTr . If the source record that last updated tr is later deleted,
the LSN of tr will have a state ID that belongs to a source record no longer
contributing to it. Nevertheless, the LSN of tr reflects the last update or insert propagated to it. Since all source records contributing to tr are assumed
to be consistent, and since the LSN of tr is not used to achieve idempotence,
this is not a problem.
Consider a log record `ins ∈ L, describing the insert of a record r into S.
The RID of r is first used to perform a lookup in the source RID index of
DTl . If a record with this source RID is found, `ins is already reflected in the
DTs and is therefore ignored. Otherwise, a record tl with the wanted subset
of attribute values from r, including the source RID and LSN, is inserted into
consistent.
102
6.7. VERTICAL SPLIT OVER A FUNCTIONAL DEPENDENCY
DTl .
A lookup is then performed on the split attribute index of DTr . Assuming
that a record told ∈ DTr with the same split attribute values is found, the
counter of told in increased. If `ins has a higher LSN than the record, the
attribute values of told are updated as well. The LSN is then set to the
higher LSN value of r and told .
If the split attribute value of r is not found in DTr , a new record tr is
inserted. It contains the wanted subset of attributes and the LSN from r, in
addition to a counter value of one.
Log propagation of a delete log record `del ∈ L starts with the same source
RID lookup in DTl . If the RID of the deleted source record is not found, the
log record `del is already reflected in the DTs. If a record tl with the correct
source RID is found, however, tl is deleted. A lookup is then performed on
the split attribute index of DTr , using the split value of tl . If the record tr
found in this lookup has a counter of one, the record is deleted. Otherwise,
the counter is decreased by one.
Let the `upd ∈ L be a log record describing the update of a record r in S.
Propagation of `upd starts by performing a lookup in the source RID index
of DTl . If no record with this source RID exists, `upd is ignored. Otherwise,
if a record tl ∈ DTl is found, and if `upd represents a newer state, i.e. has a
higher LSN than tl , the update is applied.
`upd is now applied to the attributes in tl if any of these are updated.
Even if no attributes in DTl are updated, the LSN of tl is set to that of `upd .
Log propagation then continues in DTr if `upd describes updates of attributes
there.
Assume for now that the split attribute values of tl are not updated. A
lookup is then performed in the split attribute index of DTr , using the split
values read from tl . The record found in DTr is called tr . If `upd represents a
newer state than tr , i.e. the LSN is higher, `upd is applied, and the LSN is
set to reflect the new state.
If the split attribute value is updated, log propagation in DTr works by
delete and insert. The record told ∈ DTr is first read by a lookup in the
split attribute index of DTr , using the pre-update split attribute value. The
counter of told is decreased, and a new record tnew with updated attribute
values is inserted, as described for insert log records.
6.7.4
Synchronization
The blocking complete, non-blocking synchronization for MV creation and
non-blocking abort for schema transformation strategies work like described
CHAPTER 6. DT CREATION USING RELATIONAL OPERATORS 103
for vertical merge. Hence, only non-blocking commit for schema transformations is discussed here.
Since each source record is split into two derived records, and each record
in DTr may be derived from multiple source records, vertical split transformation over a non candidate key requires many-to-many lock forwarding
(MMLF).
As previously discussed, non-blocking commit allows transactions in S to
access new records. This incurs that operations and locks set on a record t
in DTl or DTr must be forwarded to all the records in S that t was derived
from. To allow fast lookup of these records, a split attribute index should
be added to S. Furthermore, if the operation on t changes the split attribute
values, the operation must also be forwarded to the record derived from r in
the other DT.
As argued for vertical merge schema transformation in Section 6.5.4, nonblocking abort may be a better choice than non-blocking commit since it is
much less prone to locking conflicts. The commit algorithm is also much
more complex.
6.7.5
How to Handle Inconsistent Data - An Extension
to Vertical Split
In this section, we extend the vertical split DT creation method just described
to handle inconsistent source records. The extension is inspired by solutions
to similar problems in merging of knowledge bases (Lin, 1996; Greco et al.,
2003; Caroprese and Zumpano, 2006) and merging of records in distributed
database systems (Flesca et al., 2004) described in Section 5.4.
A flag attribute is added to records in DTr . The flag may either signalize
Consistent (C) or Unknown (U ). A C flag is used when a derived record is
known to be derived from consistent source records, and the U flag is used
when it is known to be derived from inconsistent source records or has an
unknown consistency state.
During initial population, all records in DTr that were consistent in the
fuzzy read get a C−flag. All other records get a U −flag. The log propagation
rules must also be modified to maintain these flags.
If the log propagator inserts a record tnew into DTr , and a record told
already has that split attribute, the flag of told is changed from C to U iff
told 6= tnew . The flag change from C to U is also performed for updates if the
derived record in DTr has a counter greater than one. A U −flag can only be
changed to C if a logged update applies to all attributes that are not part
of the split attributes, and the record has a counter of one.
104
6.7. VERTICAL SPLIT OVER A FUNCTIONAL DEPENDENCY
A “Consistency Checker” (CC) is run regularly as a background thread.
A record with a U flag, tu ∈ DTr , is chosen. The CC then writes a “Begin
Consistency Check on tu ” mark to the log. All records in S contributing to
tu are then read without using locks5 . If these are consistent in S, another
mark stating that u is consistent is written to the log together with the
correct image of tu .
The CC marks are later encountered by the log propagator. Assuming
that no logged operations apply to tu between the begin and end CC log
marks, tu is guaranteed to be consistent and is changed accordingly. Any
modification that applies to tu between these marks invalidate the result,
however. Note that all records in DTr should have a C−flag before synchronization is started since a DBA will have to manually fix the problem if
inconsistent records still exist. This may take considerable time.
If the source records contributing to tu are not consistent, the “Consistency Remover” (CR) is started. It starts by collecting statistics on the
source records contributing to tu . This corresponds to identifying repairs
that may remove inconsistencies (Greco et al., 2003). Based on these statistics, the CR may either remove the inconsistencies based on predefined rules,
or may suggest solutions to the DBA.
The CR makes inconsistency removal decisions based on rules inspired by
integration operators (Lin, 1996; Greco et al., 2003; Caroprese and Zumpano,
2006) and Active Integrity Constraints (AIC) (Flesca et al., 2004). A rule
may, e.g., state that the attribute values agreed upon by a majority of source
records should be used if more than 75% agree. Many rules may be defined,
but if none of these evaluate to true, the DBA must decide what to do.
Example 6.7.2 illustrate how the CR works during removal of inconsistencies.
Example 6.7.2
Consider the inconsistency in Example 6.7.1 on page 98 one more time. Assume that the table is split over the zip functional dependency, and that CR
is trying to solve the inconsistency between records with postal zip “7020”.
Based on a read of the employee table, CR has found the following statistics:
Total number of records with zip ‘‘7020’’: 306
Records agreeing on ‘‘Trondheim’’: 77%
(235)
Records agreeing on ‘‘trondheim’’: 22%
(68)
Records agreeing on ‘‘Trnodheim’’: 1%
(3)
5
Note that since the source table is read, this DT creation is not self-maintainable.
CHAPTER 6. DT CREATION USING RELATIONAL OPERATORS 105
Only one CR rule has been defined, stating that in the case of a 75 % majority
or more, the majority value is used. Thus, CR can now update the 71 records
with cities not equal to “Trondheim”.
When the attribute values have been decided upon, either automatically
or by the DBA, the CR is ready to remove the inconsistency. All records
on S that do not agree on the decided upon values are now updated one
record at a time. The CR must acquire write locks on the involved records
to do this, but only one record is locked and updated at a time. When
all source records with incorrect attribute values have been updated, CC is
again executed for tu . If no transactions have introduced new inconsistencies
during this process, CC will now inform the log propagator to set a C−flag.
6.8
Summary
In this chapter, we have described in detail how the DT creation framework
can be used to create derived tables using the six relational operators. The
solutions to the DT creation problems described in Chapter 5 have been used
extensively in the design of the DT creation methods. Table 6.2 shows a summary of which problems were encountered by which DT creation operator,
and how the problems were solved.
The DT creation operators have been presented in order of increasing
complexity. This order is closely related to the lock forwarding categorization and therefore the “cardinality” of the operation6 . It is clear that many
records may have to be locked during non-blocking commit synchronization
of schema transformations, especially in the MMLF methods. If too many
records require locking, it may be better to use the non-blocking abort strategy. However, the number of locks depends heavily on parameters like which
types of modifications are frequently performed7 , the number of records in
each source table etc. Thus, there is no simple answer for when one method
should be used instead of the other. Note that lock contention is not a
problem for MV creation.
Although DT creation for all the relational operators use the same framework described in Chapter 4, it is clear that the work required to apply log
6
How many derived records a source record may contribute to, and how many source
records a derived record may be composed of.
7
In vertical merge, e.g., modifications to records in Sl cause far fewer locks in the DT
than modifications to records in Sr .
106
6.8. SUMMARY
records to the DTs varies from operator to operator. In DT creation using
the difference operator, source record modifications requires the log propagator to lookup and modify records in two or even three DTs. For other
operators, e.g. horizontal merge with duplicate inclusion, log propagation of
each logged operation only requires lookup and modification of one record.
These differences are expected to cause variations to the incurred performance degradation of transactions running concurrently with DT creation.
In the following chapters, we focus on implementation and testing of the
DT creation methods. Our goal is to validate the methods and to indicate
to which extent they degrade performance of other transactions.
-
Add source RID
and LSN to both
DTs
Add source RID
and LSN to left DT
Vertical Merge
Vertical Split,
candidate key
Vertical Split,
non-candidate
key
MMLF
1MLF
MMLF
Run Consistency
Check and Repair
in parallel with
log propagation
-
-
-
-
-
-
Inconsistent
Records
Table 6.2: Problems and solutions for the DT Creation methods.
-
-
Add source RID
and LSN to DT
Horizontal Split
1MLF
Add auxiliary table for records not
qualifying for DTs
Add source RID
and LSN to DTs
M1LF
-
SLF
SLF
Store source RID
Horizontal Merge,
and LSN in auxilDup Removal
iary table
Add two auxiliary
tables
Lock Forwarding
Category
-
Add source RID
and LSN to all DTs
Missing
Pre-State
Horizontal Merge, Add source RID
Dup Inclusion
and LSN to DT
Difference,
intersection
Missing Record
and State ID
CHAPTER 6. DT CREATION USING RELATIONAL OPERATORS 107
Part III
Implementation and Evaluation
Chapter 7
Implementation Alternatives
In the previous part of this thesis, we described how the derived table creation
framework can be used to perform schema transformations and create MVs
using six relational operators. We now want to determine the quality of the
described methods. More precisely, we want to determine a) whether the
methods work, and b) what the implied costs are in terms of performance
degradation to concurrent transactions.
As discussed by Zelkowitz and Wallace (Zelkowitz and Wallace, 1998),
experimentation is of great value in validation and evaluation of new techniques in software. In this thesis, two types of experiments are of particular
importance: empirical validation and performance experiments.
In the empirical validation experiments, the methods are tested in a controlled environment. An implementation of the methods is viewed as a “black
box”, and the output of the black box is compared to what we consider to
be the correct output. This type of experiment can not be used as a proof
of correct execution1 , but rather as a clear indication of correctness. To confirm the results, empirical validation experiments can be “triangulated”, i.e.,
performed in two or more different implementations (Walker et al., 2003).
Due to time constraints, triangulation is not performed in this thesis.
In the performance experiments, we consider relative performance, as
opposed to absolute performance, the interesting metric2 . This experiment
type is highly relevant since an important design goal has been to impact the
performance of concurrent transactions as little as possible.
Ideally, non-blocking DT creation should be implemented in a full-scale
1
Although it can be used to prove incorrect execution (Tichy, 1998).
In this thesis, relative performance denotes the difference in performance while not
running DT creation, compared to performance during DT creation. Absolute performance
denotes the actual response time or throughput numbers that are acquired from processing
capacity benchmarks.
2
112
7.1. ALTERNATIVE 1 - SIMULATION
DBMS. This would have provided excellent conditions for both types of experiments. However, implementing a DBMS comparable to IBM DB2 or
Oracle would be an impossible task due to the very high complexity of such
systems. This leaves us with three alternative approaches:
Simulator Model a DBMS and the non-blocking DT creation functionality,
and implement the model in a simulator.
Open Source DBMS Add the described functionality to an existing open
source DBMS, e.g. MySQL, PostgreSQL or Apache Derby.
Prototype Implement a prototype DBMS from scratch. This prototype
has to be significantly simplified compared to modern DBMSs in many
aspects, especially those considered not to affect DT creation.
The alternative we decide to use should be usable for both types of experiments. Due to time constraints, we also consider the implementation cost
an important factor. Hence, in the following sections, the alternatives are
evaluated on three criteria: usability for empirical validation, usability for
performance testing, and the cost (time and risk of failure) of development.
An evaluation summary is presented in Section 7.4.
7.1
Alternative 1 - Simulation
Assuming that a DBMS and the DT creation strategies can be modelled
precisely, simulations can be used to get good estimates of the incurred performance degradation in any simulated hardware environment (Highleyman,
1989). The model would require accurate waiting times for processing and
I/O, and correct distributions for task arrivals for all queues.
Implementing a model and performing simulations in a simulation program like Desmo-J (Desmo-J, 2006) requires little to moderate implementation work. While it can be used for performance experiments, it can not be
used for empirical validation of the non-blocking DT creation methods.
7.2
Alternative 2 - Open Source DBMS
If DT creation functionality was added to an open source DBMS, the modified system could be used both for empirical validation and performance
testing. In contrast to simulations, in which any hardware environment can
be simulated, experiments using this alternative will only be executed in one
hardware environment.
CHAPTER 7. IMPLEMENTATION ALTERNATIVES
113
Many open source Database Management Systems exist. From these,
five well-known systems have been selected as potential DBMSs in which
the non-blocking DT creation methods may be implemented. These are:
Berkeley DB (and Berkeley DB Java Edition), Apache Derby, MySQL with
InnoDB, PostgreSQL and Solid Database Engine.
As discussed in Part II, the suggested non-blocking DT creation methods
have rigid requirements for the internal DBMS design. Most importantly,
the methods require Compensating Log Records (CLR), logical redo logging
and state identifiers on record granularity. Hence, in what follows, the six
DBMS candidates are evaluated with emphasis on these three requirements.
Berkeley DB and Berkeley DB Java Edition
Berkeley DB and Berkeley DB Java Edition are C and Java implementations
of the same design (Oracle Corporation, 2006b); unless otherwise noted, the
name Berkeley DB will be used for both in this thesis. It is not a relational
DBMS, but rather a storage engine with transaction and recovery support.
Our DT creation methods operate on relations, and a mapping from relations
to the physical structure would therefore be needed. This can be solved by
using the product as a storage engine in MySQL. However, Berkeley DB uses
redo logging physical to page and page state identifiers (Oracle Corporation,
2006a). It is therefore not considered a suitable candidate for the DT creation
methods.
Apache Derby
Apache Derby (Apache Derby, 2007a) is a relational DBMS implemented in
Java. It uses ARIES (Mohan et al., 1992) like recovery with Write Ahead
Logging and Compensating Log Records. However, redo log records are
physical (Apache Derby, 2007b), and state identifiers are associated with
blocks, not records (Apache Derby, 2007c). This renders Apache Derby
unsuited as an implementation candidate.
MySQL with InnoDB
MySQL (MySQL AB, 2007) is designed with a highly modular architecture,
and is best described as a high level DBMS with a storage engine interface
(MySQL AB, 2006). The high level functions include SQL Parsing, query
optimization etc, while the storage engines are responsible for concurrency
control and recovery (MySQL AB, 2006). Many storage engines exist, e.g.,
Berkeley DB, InnoDB and SolidDB. MySQL with InnoDB is described below,
114
7.2. ALTERNATIVE 2 - OPEN SOURCE DBMS
whereas the Berkeley DB and SolidDB alternatives are treated as individual
products.
MySQL with InnoDB is the recommended combination when transaction
support is required (MySQL AB, 2006; Kruckenberg and Pipes, 2006). The
InnoDB storage engine uses physiological logging (Zaitsev, 2006) and page
level LSNs (Kruckenberg and Pipes, 2006). It is therefore not considered a
good candidate for non-blocking DT creation.
Solid Database Engine
The Solid Database Engine can be used as a storage engine either in one of
Solid Information Technology’s embedded DBMSs (e.g. BoostEngine and
Embedded Engine), or in MySQL (Solid Info. Tech., 2007). The Solid
Database Engine uses normal Write Ahead Logging with physical redo logging (Solid Info. Tech., 2006b). Furthermore, source code inspection reveals
that state identifiers are associated with blocks. Hence, we do not consider
Solid Database Engine to be a good implementation candidate.
PostgreSQL
PostgreSQL, formerly known as Postgres and Postgres95, was originally created for research purposes during the mid-eighties (PostgreSQL Global Development Group, 2007). Until version 7.1, PostgreSQL used a force buffer
strategy, and hence did not write redo operations to log at all (PostgreSQL
Global Development Group, 2001). In version 7.33 , undo operations, and
therefore CLRs, were not logged (PostgreSQL Global Development Group,
2002). It is also clear from source code inspection that the redo log is physical to page, and that state identifiers are associated with pages rather than
records4 . The lack of CLRs, redo log records that are physical to page and
page state identifiers render PostgreSQL unsuited for the DT creation methods.
Open Source DBMS Discussion
It is clear that none of the five open source DBMSs evaluated in this section
are good implementation candidates for the non-blocking DT creation methods. Since neither the log formats or the state identifiers can be used, both
the recovery and cache managers would have to be modified significantly. We
3
4
This was the newest version when the implementation alternatives were evaluated.
See access/xlog.h for details
CHAPTER 7. IMPLEMENTATION ALTERNATIVES
DBMS
Redo Log Format
Berkeley DB
Derby
MySQL/Innodb
Solid DB
PostgreSQL
Physical
Physical
Physical
Physical
Physical
115
Granularity of State
Identifiers
Block
Block
Block
Block
Block
Table 7.1: Evaluation of Open Source DBMS alternatives.
consider the implementation cost of making significant changes to unfamiliar
code to be very high.
7.3
Alternative 3 - Prototype
A prototype DBMS that includes the non-blocking DT creation methods
can be used for empirical validation and performance testing in the same
way as an open source DBMS. The two strategies share the problem of fixed
hardware, meaning that experiments will be performed in one hardware environment only. As described in the introduction, this is not considered a
problem since we are interested in relative performance only.
It is not feasible to implement a new, fully functional DBMS from scratch
due to the complexity of such systems. A prototype should therefore only
include the parts that are most relevant to DT creation.
The prototype is required to function in a manner similar to traditional
DBMS, and should therefore use a standard DBMS design to the largest
possible extent. Figure 7.1 shows a widely accepted DBMS design close to
what is used in, e.g., MySQL Enterprise Server 5 (MySQL AB, 2006) and
Microsoft SQL Server 2005 (Microsoft TechNet, 2006). The figure also bears
close resemblance to the model described by Bernstein et al. (Bernstein et al.,
1987). To get an idea of the implementation cost of a using a prototype, we
consider possible simplifications on a module by module basis in what follows.
Modules Operating on the Logical Data Model
In full-scale DBMSs, the Connection Manager is responsible for tasks like
authentication, thread pooling and providing various connection interfaces.
In a prototype, only one connection interface and a thread pooling strategy
is required. Authentication, e.g., does not affect DT creation.
The SQL Parser of the prototype has to recognize the SQL statements
116
7.3. ALTERNATIVE 3 - PROTOTYPE
Connection Manager
SQL Parser
Logical Data
Model
Physical Data
Model
Relational Manager
Scheduler
Recovery Manager
Cache Manager
Data
Figure 7.1: Possible Modular Design of Prototype.
used in the experiments. Hence, by first performing an analysis of the experiments, the prototype SQL Parser can be significantly simplified to understand
only a very limited subset of the SQL language.
A Relational Manager is typically responsible for mapping between the
logical data model seen by users, and the physical data model used internally
in the DBMS. The module also performs query optimization, which is used
to choose the most efficient access order when multiple tables are involved in
one SQL statement. Query optimization is a highly sophisticated operation
which involves statistical analysis of access paths. This can be totally ignored
in the prototype since the DT creation methods do not rely on it. However,
this simplification requires careful construction of all SQL statements that
will be used in the experiments. In practice, this can be done by e.g. always
stating tables in the most efficient order. With query optimization removed
from the module, the relational manager is reduced to perform mapping
between the logical and physical data models.
Modules Operating on the Physical Data Model
Schedulers are responsible for enforcing serializable execution of operations.
CHAPTER 7. IMPLEMENTATION ALTERNATIVES
117
As discussed in Section 2.2, two-phase locking (2PL) is the most commonly
used scheduling strategy in modern DBMSs. 2PL is also fairly simple to
implement, and should therefore be used in a prototype.
The primary responsibility of Recovery Managers is to correct transaction, system and media failures. In most DBMSs, including all the open
source DBMSs evaluated in the previous section, this is done by maintaining a log. In the non-blocking DT creation methods, this log is also used
extensively to forward updates to the derived tables.
A prototype Recovery Manager implementation is required to maintain
a log that can be used to fully recover the database. The ARIES recovery
strategy (Mohan et al., 1992) is a good candidate since it is widely accepted
and used in many DBMSs, e.g. in Apache Derby (Apache Derby, 2007b).
To be usable by the DT creation methods, the module is also required to use
Compensating Log Records and logical redo logging.
The final module, the Cache Manager, is responsible for physical data
access. In most DBMSs, this includes reading disk blocks into main memory
and writing modifications back to disk. A good strategy for choosing which
blocks to replace when the memory is full, e.g. the Least Recently Used
(LRU) algorithm, is also necessary. As argued in Section 2.3, it is common for
Cache Managers to use a steal/no-force buffer strategy. With this strategy,
the Cache Manager must cooperate closely with the Recovery Manager so
that all operations are recoverable.
In a prototype, the Cache Manager is required to cooperate with the
Recovery Manager to achieve recoverable operations. Furthermore, the DT
creation methods require state identifiers to be associated with records, not
blocks. As was clear from evaluating the open source alternative in the
previous section, record state identifiers are not normally used in todays
DBMSs.
7.4
Implementation Alternative Discussion
In this chapter, we have evaluated simulations, implementation in open source
DBMSs and implementation in a prototype with respect three important criteria. These criteria were: usability for empirical validation, usability for
performance testing and implementation cost.
As is clear from Table 7.2, simulations can not be used to empirically
validate the DT creation method. For this experiment type, the output of
a system is compared to what is considered correct output. The quality of
the experiment result is therefore determined by the quality of the test sets.
Hence, the open source DBMS and prototype alternatives are considered
118
7.4. IMPLEMENTATION ALTERNATIVE DISCUSSION
Simulation
Open Source
Prototype
DBMS
Usability;
Empirical
Validation
-
High
High
Usability;
Performance
Testing
Medium
High
Medium
Implementation Cost
Low
High
Medium
Risk; Unsuitable Design
Low
Medium
Low
Table 7.2: Evaluation of implementation alternatives.
equally well suited for this purpose.
All three alternatives can be used for performance experiments. Using
an existing open source DBMS would provide the most reliable performance
results. In contrast to the alternatives, these DBMSs are all fully functional
systems that include most aspects of common DBMS technology.
Both the simulation and prototype alternatives rely on simplified models. However, we consider the latter alternative to provide the most accurate
performance results. The reason for this is that it is easier to verify the correctness of the prototype design (Zelkowitz and Wallace, 1998), and because
we do not have to make assumptions to processing and waiting times with
this alternative.
When it comes to implementation cost, simulation is clearly the least
costly alternative. Furthermore, if an open source DBMS with a design
suitable for non-blocking DT creation had been found in Section 7.2, the
open source alternative would be considered less costly than implementing a
prototype. However, the evaluation in Section 7.2 showed that none of the
open source DBMSs had a design that was suitable for DT creation. If any
of these systems were to be used, both the Cache and Recovery Managers of
that DBMS would require significant modifications to support logical logging
and record state identifiers. In addition, the Scheduler module would have to
be changed to handle modified lock compatibilities and forwarding of locks
between source and derived tables. Hence, only the high level modules of the
chosen open source DBMS would be usable without drastic changes. Making
CHAPTER 7. IMPLEMENTATION ALTERNATIVES
119
extensive changes to unfamiliar code is considered by the author to be both
more costly and have a higher risk of failure than implementing a prototype.
In contrast to simulations, the prototype alternative is good for both
types of experiments. Furthermore, it has a lower implementation cost and
risk than that of the open source alternative. Based on this evaluation, we
consider a prototype to be the better alternative due to time considerations.
Chapter 8
Design of the Non-blocking
DBMS
This chapter describes the design of a prototype Database Management System, called the “Non-blocking Database Management System” (NbDBMS),
which will be used for empirical validation and performance experiments
described in Chapter 9. As illustrated in Figure 8.1, the prototype has
a modular design inspired by what is used in many modern DBMSs, e.g.
MySQL Enterprise Server 5 (MySQL AB, 2006) and Microsoft SQL Server
2005 (Microsoft TechNet, 2006). In addition to providing normal DBMS
functionality, NbDBMS is capable of performing the six non-blocking DT
creations described in Chapter 6. Figure 8.2 shows a UML Class diagram
of the most important parts of the prototype. Note that each module in
NbDBMS can be replaced by another implementation as long as the module
interface remains unchanged.
NbDBMS is simplified significantly compared to modern DBMSs. E.g.,
only a limited subset of the SQL language is supported, and only a few relational operators are available for user transactions. Furthermore, NbDBMS
stores records in main memory only, making buffer management unnecessary.
In the following sections, each module is described with emphasis on their
similarity to or difference from standard DBMS solutions. The effects of the
simplifications are then discussed in Section 8.1.7.
CHAPTER 8. DESIGN OF THE NON-BLOCKING DBMS
Client
...
Client
Admin
Java RMI
Non-blocking
Database
Communication Manager
Sql Parser
Relational Manager
Scheduler
Recovery Manager
Data Manager
Log
Tbl
Tbl
Figure 8.1: Modular Design Overview of the Non-blocking DBMS.
121
122
Figure 8.2: UML Class Diagram of the Non-blocking Database System.
CHAPTER 8. DESIGN OF THE NON-BLOCKING DBMS
8.1
8.1.1
123
The Non-blocking DBMS Server
Database Communication Module
The Communication Module (CM) acts as the entry point for all access to
the database system. When a client or administrator wants to connect to
the database system, the CM sets up a network socket for that program.
Java RMI is used for communication, but the sockets have been modified to
not buffer replies, thus emphasizing response time. This is necessary because
many client programs will be run from the same physical node to simulate
high workloads during the experiments. However, replies to different clients
at the same physical node should not be buffered and sent as one network
package, which would be the default Java RMI behavior. Once a connection
has been established, the CM simply forwards requests from the client or
administrator to the appropriate DBMS module. Depending on the request,
this is either the SQL Parser Module or the Relational Manager Module.
When performance tests are executed, the module is also responsible for
writing a performance report file. During response time tests, e.g., clients
periodically report their observed response times, which are written to the
report file for later analysis.
8.1.2
SQL Parser Module
All user operations and some administrator operations are requested through
SQL statements. These statements must be interpreted by the SQL Parser
Module (SPM) before they can be processed further.
The experiments require a way to perform all basic operation, i.e. insert,
delete, update and query. Thus, SPM is designed to interpret a small subset
of the SQL language, including one single syntax for each of these operators
Likewise, for queries, only one single way of writing projections1 , selection
criterions, joins, unions etc. is supported. Consult Appendix A for further
details about accepted SQL statements.
The SPM works by first checking that the SQL statements have correct
syntax. Statements with incorrect syntax are rejected, while accepted statements are tokenized. Tokenization is the process of splitting strings into
meaningful blocks, called tokens (Lesk and Schmidt, 1990), which are used
as parameters in method calls to the Relational Manager. Consider the following example tokenization:
1
Selection of a subset of attributes(Elmasri and Navathe, 2004).
124
8.1. THE NON-BLOCKING DBMS SERVER
Example 8.1.1 (Tokenization)
Select statement:
select firstname, surname from person where zip=7020;
Tokens:
statement_type:
table:
attributes:
select_criterion_eq:
order_by:
{select}
{person}
{firstname,surname}
{{zip,7020}}
{}
These tokens can then be used in a call to the Relational Manager procedure:
executeQuery(Long transID, String table, Array attributes,
Array select_criterion, Array order_by)
Regular Expressions (regex) (Friedl, 2006) are used for both syntax checking and tokenization of SQL statements. Regex is powerful, but become
complex if many different statement syntaxes are allowed. However, since
only a limited set of the SQL language needs to be recognized in NbDBMS,
this is not a significant problem in the current implementation. If more complex SQL statements are to be allowed in a future implementation, a lexical
analyzer like Lex (Lesk and Schmidt, 1990) should be used instead.
8.1.3
Relational Manager Module
The Relational Manager Module (RMM) maps the logical data model seen
by users to the physical data model used internally by NbDBMS. Hence, this
is the lowest level module in which table or attribute names are meaningful.
The module consists of three classes: RelationalManager, TableMapper
and RelationalAlgorithmHelper. RelationalManager is the main class of the
module. It serves as the module’s interface to higher level modules, and organizes the logical to physical data mapping. The TableMapper class is used by
RelationalManager whenever information is needed about a database schema
object, e.g. a table. If the executeQuery method call in Example 8.1.1 is
processed, e.g., the RelationalManager has to ask the TableMapper for the
internal IDs of the attributes “firstname” and “surname”. Other responsibilities of TableMapper includes table creation and removal, and providing
descriptions of tables. Table descriptions include attribute names, data types
and constraints, and are used when derived tables are created.
CHAPTER 8. DESIGN OF THE NON-BLOCKING DBMS
125
All information on tables and their attributes are stored in two reserved
tables that are created at startup. Other than having reserved names, these
behave as other tables. The table manipulation and information gathering
performed by the TableMapper is therefore done by updating and querying
the records in these. For fast lookup, the TableMapper also maintains a
cache of vital schema information. To be able to guarantee that the cached
information is valid, only one TableMapper object may be created. This
TableMapper object is aware of all changes to the schema since schema manipulations are directed through it.
If the RelationalManager is processing a query that involves set operations, the RelationalAlgorithmHelper class is consulted. It contains static,
i.e. stateless, implementations of some set operations, including various joins,
union and sort.
The join algorithms are implemented using hash join (Shapiro, 1986).
Since the database only resides in main memory, this strategy is better than
both GRACE join and sort-merge join (Shapiro, 1986).
Union with duplicate removal is implemented with a Hashtable, and assumes that there are no duplicates in any of the two subquery results. All
records from the subquery with fewest records are first copied to the result
set, and a Hashtable is created on one of the attributes. The records from
the other subquery are then compared to the hashed records and added to
the result set if a record with identical attribute values is not found. Ideally,
an attribute with a unique constraint should be used in the hash. If this
is the case, each record from the second subquery must be compared to at
most one record. Note that records from the second table are not added to
the Hashtable.
The sort operation is implemented with Merge-Sort because this method
is both fast (n log n) (Knuth, 1998) and easy to implement.
Consider the sequence diagram in Figure 8.3. This diagram illustrate how
the module responds to the following query with a join:
select *
from (person join post on person.zip=post.zip)
where person.name=John;
As illustrated, the RMM first requests the attribute ID and type of the
“name” attribute in “person” from TableMapper. If the TableMapper has
not already cached this information, a read is requested from the reserved
“columns” table. The TableMapper then caches this information for future
requests. The RMM now knows the attribute ID and type (String) of the
“name” attribute, and it uses this to read all person records with the name
126
8.1. THE NON-BLOCKING DBMS SERVER
Figure 8.3: Relational Manager Module processing the query
select * from (person join post on person.zip=post.zip) where person.name=John;
CHAPTER 8. DESIGN OF THE NON-BLOCKING DBMS
127
“John”. The same process is repeated for the “post” table, but without a
selection criterion. Before the join is performed, the RMM needs to know
which attribute ID should be used to join the records. Again, the TableMapper is consulted. The information is now found in the cache of TableMapper.
Finally, the results of the subqueries and the join attribute IDs are sent to
the RelationalAlgorithmHelper class, which executes the join.
As already described, the RelationalManager is the lowest level module
in which the logical data model has any meaning. It is also the highest
level module with knowledge of the physical data model. For this reason,
the algorithms for most of the non-blocking DT creation methods are also
implemented here. Consult Chapter 6 for details on these algorithms.
8.1.4
Scheduler Module
The Scheduler is responsible for ordering transactional operations on records
so that the resulting order is serializable. Note that since schema information is stored as records in two reserved tables, this implies that schema
modifications are also performed in serializable order.
The Scheduler uses a strict Two Phase Locking (2PL) strategy. Thus,
locks are acquired when a transaction requests an operation, but are not
released until the transaction has terminated. As argued in Section 7.3,
this strategy was chosen for multiple reasons: strict 2PL is commonly used
in commercial systems, e.g. SQL Server 2005 (Microsoft TechNet, 2006)
and DB2 v9 (IBM Information Center, 2006), is easy to understand and
implement. As opposed to basic 2PL, strict 2PL also avoids cascading aborts
(Garcia-Molina et al., 2002).
The module supports both shared and exclusive locks on either record or
table granularity. If a transaction issues an operation that results in a locking conflict, the transaction is aborted immediately. This ensures correct,
deadlock-free execution (Bernstein et al., 1987), but comes with a performance penalty: in many cases, the conflict could have been resolved simply
by waiting for the lock to be released. If so, the transaction is aborted unnecessarily. On the other hand, deadlock detection can be ignored, thus
simplifying the module.
Normal transactional requests are processed in three steps in the module.
First, the TransactionManager class checks that the transaction is in the
active state. If the TransactionManager confirms that the transaction is
active, the lock type2 , the transaction ID and the object ID3 is sent to the
2
3
Shared and exclusive locks are supported.
The object ID is either a table name and a recordID, or only a table name.
128
8.1. THE NON-BLOCKING DBMS SERVER
null
null
T:1, LSN:1
T:1, LSN:2
T:2, LSN:3
T:2, LSN:4
T:3, LSN:5
T:1, LSN:6
null
Figure 8.4: Organization of the log.
LockManager. If another transaction has a conflicting lock on the object,
the LockManager returns an error code. This results in the abortion of the
transaction. Otherwise, if conflicting locks are not found, the LockManager
confirms the lock. The Scheduler then sends the operation request to the
Recovery Manager Module.
While all normal transactional operations require locks, the Scheduler
also provides lockless operations to the DT creation methods. Furthermore,
methods for lock forwarding from one table to another is implemented in the
module.
8.1.5
Recovery Manager Module
The next layer of abstraction, the Recovery Module, is responsible for making
the database durable. It is designed for a data manager using the steal and
no-force buffer strategies. To ensure durability for this type of data manager,
the ARIES protocol is adopted. ARIES is used in many modern DBMSs,
including open source DBMS Apache Derby (Apache Derby, 2007b). The
module maintains a logical log of all operations that modify data, and the
Write Ahead Logging (WAL) and Force Log at Commit (FLaC) techniques
are used to ensure recoverability. Furthermore, a Compensating Log Record
(CLR) is written to the log if an operation is undone.
Logical logging is sometimes called operation logging because each log
record stores the operation that was performed rather than the data values
themselves. The “partial action” and “action consistency” problems (Gray
and Reuter, 1993) are not encountered in NbDBMS since records are stored
in main memory only. This simplification is discussed in the next section. If
records were stored on disk, as in most DBMSs, the logical logging strategy
adopted here would have to be replaced by, e.g., a two level logging strategy
with one logical log and one physiological log. This technique is used in the
ClustRa DBMS (Hvasshovd et al., 1995).
As illustrated in Figure 8.4, log records are organized in linked lists. Like
in ARIES (Mohan et al., 1992), two sets of links are maintained: the first is
a link between two succeeding log records, thus maintaining the sequential
CHAPTER 8. DESIGN OF THE NON-BLOCKING DBMS
129
Key Index
1, a, x
2, a, y
3, b, x
"a"
4, c, z
"b"
5, a, z
"c"
6, c, y
7, d, y
8, c, x
"d"
Att 1 Index
Figure 8.5: Organization of data records in a table. The table has two indexes;
the primary key index (created automatically) and one user specified index
on attribute 1.
order of the log. The second link is between two succeeding log records in
the same transaction. The latter is only used to fetch log records when a
transaction is aborted. These links are maintained as object references in
main memory, but are changed to LSN references when written to disk.
In addition to maintaining a sequential log of all executed operations, the
Recovery Module is responsible for performing recovery after a crash and for
undoing an aborted transaction.
8.1.6
Data Manager Module
The Data Manager4 Module (DMM) is responsible for storage and retrieval of
records, and for performing the actual updates of data records. The records
are stored in Java hashtables (Sun Microsystems, 2007), which reside in main
memory only. In NbDBMS, these hashtables are called indices.
When a table is created, an index is created on the primary key attribute.
Indices can later be added to any other attribute. A table with a primary key
index and an additional attribute index is illustrated in Figure 8.5. When a
read operation is requested, the Data Manager chooses the index that is best
4
The module is called Data Manager instead of Cache Manager because records are
never written to disk.
130
8.1. THE NON-BLOCKING DBMS SERVER
suited to fetch the record. The choice is based on the best match between
the selection criterion and available indices. When an update is requested,
the record is fetched using the primary key index before being changed. If
the operation modifies an indexed attribute, the record must be rehashed in
that index.
Although numerous DBMSs designed to achieve low response times are
main-memory based (Garcia-Molina and Salem, 1992; Cha et al., 1995; Cha
et al., 1997; Bratsberg et al., 1997b; Cha and Song, 2004; Solid Info. Tech.,
2006a), DBMSs are more often disk based5 . Compared to disk based DBMSs,
keeping records in main memory only is probably the greatest simplification
in the Non-blocking DBMS. By doing so, Cache management, i.e. choosing
which disk blocks should reside in main memory at any time, can be totally
ignored. Furthermore, write operations that would change multiple disk
blocks, e.g. by splitting a node in a B-tree, are now atomic. This enables us
to use plain logical logging, as argued in the previous section.
8.1.7
Effects of the Simplifications
The previous sections have described the simplifications made in the prototype modules. In what follows, the implications of these are discussed.
As discussed in Section 8.1.2, NbDBMS only recognizes a very limited
SQL language, which is defined in Appendix A. This would obviously
be a huge drawback if the Non-blocking DBMS were to be used by real
applications. However, in the current version, NbDBMS is only intended
for empirical validation and performance testing. A predefined transactional
workload will be used in these experiments, hence the system only needs
to recognize a subset of the SQL language. This simplification is therefore
considered to not affect the experiments.
The Scheduler described in Section 8.1.4 is designed to abort transactions
that try to acquire conflicting locks. This is called Immediate Restart conflict resolution (Agrawal et al., 1987). Agrawal et al. compared this to the
more common strategy of not aborting until a deadlock is detected. The
comparison showed that the latter strategy, called blocking conflict resolution, enables higher throughput under most workload scenarios. Hence, we
expect this to reduce the maximum throughput in NbDBMS.
In most circumstances, the non-blocking DT creation methods described
in this thesis do not acquire additional locks. This means that the exact same
5
All the open source DBMSs evaluated in Chapter 7, IBM DB2 (IBM Information
Center, 2006), Microsoft SQL Server 2005 (Microsoft TechNet, 2006) etc. are all disk
based DBMSs.
CHAPTER 8. DESIGN OF THE NON-BLOCKING DBMS
131
number of locking conflicts should occur for transactions executed during
normal processing as for those executed during DT creation. Since immediate
restart affects transactions in both cases to the same extent, it is considered
not to affect the relative performance between the normal and DT creation
cases. Furthermore, the empirical validation experiments remain unaffected.
As thoroughly discussed in Chapters 4 - 6, there is one exception in which
DT creation does require additional locks. This is during non-blocking synchronization of schema transformations. Here, locks are forwarded between
the old and new table versions. In all DT creation operators where one source
record contributes to multiple derived records or vice versa, additional locking conflicts are expected. In these cases, immediate restart is expected
to cause a higher number of aborts than the blocking strategy would have.
NbDBMS is therefore expected to perform poorer in this particular case.
The Recovery Manager maintains a pure logical log. As argued by
Gray and Reuter, this alone does not provide enough information for crash
recovery since many disk operations are not atomic (Gray and Reuter, 1993).
The logical log includes all the information needed to create DTs and for
performing crash recovery in NbDBMS, however. Thus, the only consequence
of this design is a reduced log volume compared to disk based DBMSs. This
reduction in log volume is equal for transaction processing in the normal and
DT creation cases. It is therefore considered to affect the performance of
NbDBMB to a negligible extent.
Storing data in main memory only is likely to be the simplification
with greatest impact on the performance of NbDBMS. As discussed in Section
8.1.6, this greatly reduces the complexity of the Data Manager and enables
the use of a pure logical log. The chosen strategy is not common, but is used
in some DBMSs, including Solid BoostEngine (Solid Info. Tech., 2006a),
ClustRa (Hvasshovd et al., 1995) and P*Time (Cha and Song, 2004)
As discussed by Highleyman (Highleyman, 1989), the performance of a
DBMS is bound by a bottleneck resource. Example bottlenecks include CPU,
disk and network. The “main memory only” simplification implies that the
performance results of NbDBMS should be compared to DBMSs that are
bound by other resources than cache management. We expect that the empirical validation experiments are unaffected by this design. When it comes
to performance testing, the normal processing and DT creation cases are
both affected by the design. We therefore consider the relative performance
to be affected to a little extent.
132
8.2
8.2. CLIENT AND ADMINISTRATOR PROGRAMS
Client and Administrator Programs
Both the Database Client and Administrator programs have console user
interfaces, and connect to the Non-blocking DBMS through Java Remote
Method Invocation (RMI) sockets. As described in Section 8.1.1, the sockets
have been modified to not queue replies, thus reducing response time.
When a client program has connected to NbDBMS, it may perform operations on the database through a limited SQL language. The operations
are either issued through the executeRead method used by queries, or the
executeUpdate method used by inserts, deletes and updates. All operations
are requested on behalf of a transaction. Transactions are started by calling
the startTransaction method, and terminated by calling either the commit
or abort method.
There are two types of clients: one interactive client that accepts SQL operations from a user, and one automated client that generates semi-randomized
transactions to simulate workload. Figure 8.6 shows a screen shot of the interactive client. The automated client type is used in the experiments, and
is discussed further in Section 9.
The admin program has access to other operations in NbDBMS, but is
otherwise similar to the client programs. There are two types of admins. One
of these is interactive, and is used for manual verification of the DT creation
method. The other type is automated and is used in conjunction with the
automated clients in the experiments.
8.3
Summary
In this chapter, we have described the design of a prototype DBMS, and
discussed the effects of the simplifications we have made to it. The resulting
prototype, called the Non-blocking DBMS, is capable of performing basic
database operations, like queries and updates, in addition to our suggested
DT creation method. Altogether, the prototype consists of approximately
13,000 lines of code.
The prototype has been subject to both empirical validation and performance experiments. In the next chapter, we describe the experiments, and
discuss the results and implications of these.
CHAPTER 8. DESIGN OF THE NON-BLOCKING DBMS
Figure 8.6: Screen shot of the Client program in action.
133
Chapter 9
Prototype Testing
In this chapter, we focus on two types of experiments that can be used to
determine the quality of the DT creation methods. The first type, empirical
validation, is performed to determine whether the DT creation methods work.
While it is clear that empirical validation can not be used to prove absolute
correctness of a method (Tichy, 1998), it can provide a clear indication.
This experiment type is therefore commonly used in development of new
techniques in software (Tichy, 1998; Pfleeger, 1999; Zelkowitz and Wallace,
1998).
The second type of experiment is performance experiments. This experiment is highly relevant since an important design goal of the DT creation
framework has been to incur low degrees of performance degradation.
The Non-blocking DBMS has been subject to extensive empirical validation and performance experiments. In what follows, we first describe the
environment the experiments have been performed in. We then discuss the
results and implications of the experiments.
9.1
Test Environment
All tests described in this chapter are performed on seven PCs, called nodes,
each with two AMD 1400 MHz CPUs and 1 GB of main memory. All nodes
run the Debian GNU/Linux operating system with a 2.6.8 smp1 kernel, and
are connected with a 100Mb Ethernet LAN network. The Non-blocking
DBMS described in Chapter 8 is installed on one of the nodes, whereas the
other nodes are used for administrator (1 node) and client (5 node) programs. The reason for using five nodes with client programs is to resemble as
realistic workloads as possible with the available resources. In what follows,
1
A symmetric multiprocessing (smp) kernel is required to utilize both CPUs.
CHAPTER 9. PROTOTYPE TESTING
Type
Nodes
CPU
Memory
Operating System
Network
Java Virtual Machine
Java Compiler
Java VM Options, Server
Java VM Options, Admin
Java VM Options, Client
135
Value
7 (1 server, 1 admin, 5 clients)
2 x AMD 1400MHz per node
1 GB per node
Debian GNU/Linux, 2.6.8-686-smp kernel
100Mb Ethernet
Java HotSpot Server VM, build 1.5.0 08-b03
javac 1.5.0 08
-server -Xms800m -Xmx800m -Xincgc
-server
-server
Table 9.1: Hardware and Software Environment for experiments.
these nodes are called “server node”, “admin node” and “client nodes”, respectively. The prototype DBMS and all client and administrator programs
have been implemented in Java 2 Standard Edition 5.0 (Sun Microsystems,
2006a).
The Server Node
The NbDBMS server has been run with the following options in all experiments:
java -server -Xms800m -Xmx800m -Xincgc
The -server option selects the Java HotSpot Server Virtual Machine (VM),
which is optimized for overall performance. The alternative, the Client VM,
has faster startup times and a smaller footprint (i.e. requires less memory),
and is better suited for applications with graphical user interfaces (GUI) etc.
(Sun Microsystems, 2006b) . Both VMs have been tried for the NbDBMS
server, and the Server VM has outperformed the Client VM with ∼15-20%
higher maximum throughput.
The -Xms800m and -Xmx800m options are used to set the starting and
maximum heap sizes to 800 MB. The heap is the memory available to the Java
VM, and 800 MB has proven to be slightly below the limit where the Java VM
sporadically fails to start due to memory conflicts with other processes. By
setting the starting heap size equal to the maximum heap size, the overhead
of growing the heap is avoided (Shirazi, 2003).
-Xincgc is used to select the incremental garbage collector. This algorithm
frequently collects small clusters of garbage as opposed to the default method
136
9.1. TEST ENVIRONMENT
of collecting much garbage less frequent (Shirazi, 2003). Note that the impact
of using this garbage collection algorithm is increasing with the heap size.
The reason for this is that the default garbage collection algorithm must
collect more garbage in each iteration (Shirazi, 2003). In NbDBMS, this
option results in significantly lower response time variance.
The Client and Administrator Nodes
Like the Non-blocking DBMS server, the administrator and client programs
have also been run with the Server VM. The heap and garbage collection
options used on the NbDBMS server made no observable difference for these
programs, and are therefore not used in the experiments.
Each client node runs one organizer thread that spawns transaction threads
from a thread pool. When spawned, a transaction thread executes one transaction, consisting of six basic operations, before returning to the thread pool.
The organizer uses the Poisson distribution for transaction thread spawning,
meaning that the number of requests per second varies, but has a defined
mean. As argued by Highleyman, the Poisson distribution should be used
when we want to simulate requests from an “infinite” number of clients (Highleyman, 1989). By infinite, we mean many more clients than are currently
being processed by the database system.
The transactions requested by client threads are randomized within boundaries defined for each DT creation operator. In all experiments, six transaction types are specified. These are called Transaction Mixes, and a spawned
transaction thread executes one of the transactions specified in the appropriate mix. The transaction mixes are designed to reflect a varied workload so
that all log propagation rules are involved in the DT creation process. Hence,
the transaction mixes include inserts, updates, deletes and queries on all involved tables. The transaction mixes are shown in Tables 9.2 to 9.4. The
reason for having three transaction mixes is that the DT creation methods
have different requirements; some have one source tables while others have
two and so on.
The transactions are similar to those used in TPC-B benchmarks (Serlin,
1993) although the DT creation table setups are not equal to what is specified
in TPC-B. More thorough benchmarks, like TPC-C, TPC-D (Ballinger, 1993)
and AS3 AP (Turbyfill et al., 1993) exist, but these do to a much greater
extent test DBMS functionality that is of no interest to DT creation. A
good example is the query in TCP-D that joins seven tables with greatly
varying sizes meant to test the query optimizer, or AS3 AP that tests mixes
of batch and interactive queries (Gray, 1993).
In addition to the source tables, all experiments are performed with one
CHAPTER 9. PROTOTYPE TESTING
137
Transaction Mix 1
Nonsource
Source 1
Source 2
Scenario Scenario
1 (%)
2 (%)
1
-
6 updates
-
20
5
2
6 updates
-
-
20
20
3
4 reads
1 read
1 read
40
60
4
-
-
6 updates
5
2.5
5
3 inserts
3 deletes
-
-
10
10
6
-
2 inserts
2 deletes
1 insert
1 delete
5
2.5
Trans
Table 9.2: Transaction Mix 1, used in Difference and Intersection, Vertical
Merge and Horizontal Merge DT creation. Transactions 1, 4 and 6 require
log propagation processing. This corresponds to 30% of the operations in
scenario 1, and 10% in Scenario 2 which is more read intensive.
Transaction Mix 2
Nonsource
Source,
“Left”
part
Source,
“Right”
part
Scenario Scenario
1 (%)
2 (%)
1
-
6 updates
-
20
5
2
6 updates
-
-
20
20
3
4 reads
2 read
-
40
60
4
-
-
6 updates
5
2.5
5
3 inserts
3 deletes
-
-
10
10
6
-
3 inserts
3 deletes
-
5
2.5
Trans
Table 9.3: Transaction Mix 2, used in Vertical Split DT creation. There is
only one source table, but the attributes of this table are derived into either
the “left” or “right” derived table. Scenario 1 requires log propagation of 30%
of the operation, whereas Scenario 2 requires 10%.
138
9.1. TEST ENVIRONMENT
Transaction Mix 3
Nonsource
Source,
Other
Attribute
Source,
Selection
Attribute
Scenario Scenario
1 (%)
2 (%)
1
-
6 updates
-
20
5
2
6 updates
-
-
20
20
3
4 reads
2 read
-
40
60
4
-
-
6 updates
5
2.5
5
3 inserts
3 deletes
-
-
10
10
6
-
3 inserts
3 deletes
-
5
2.5
Trans
Table 9.4: Transaction Mix 3, used in Horizontal Split DT creation. There
is only one source table, but a derived record may have to move between the
derived tables if the attribute used in the selection criterion is updated.
Operation # and size of records in each table
Nonsource: 20,000 records, 100 bytes
Difference
Source 1: 20,000 records, 80 bytes
Intersection
Source 2: 5,000 records, 80 bytes
Nonsource: 20,000 records, ∼100 bytes
Horizontal
Source 1: 20,000 records, ∼80 bytes
Merge
Source 2: 20,000 records, ∼80 bytes
Nonsource: 20,000 records, ∼100 bytes
Horizontal
Source 1: 40,000 records, ∼80 bytes
Split
Source 2: N/A
Nonsource: 20,000 records, ∼100 bytes
Vertical
Source 1: 20,000 records, ∼100 bytes
Merge
Source 2: 1,300 to 20,000 records, ∼50 bytes
Nonsource: 20,000 records, ∼100 bytes
Vertical
Source 1: 20,000 records, ∼150 bytes
Split
Source 2: N/A
Table 9.5: Table Sizes used in the performance test experiments. Note that
the empirical validation experiments are performed with 5 times more records
in all source tables.
CHAPTER 9. PROTOTYPE TESTING
139
additional table in the schema. This table is called “nonsource”, and is not
involved in the DT creation. The idea of having this table is to be able to
generate varying workloads without necessarily changing the log propagation
work that needs to be done. We also consider it realistic to have a database
schema with more tables than those involved in the DT creation process.
Depending on the operator being used for DT creation, either one or
two source tables are defined in the original database schema. Before an
experiment is started, all tables are filled with records. The number of records
in each table are shown in Table 9.5.
Since multiple nodes with multiple threads request operations concurrently, it will often be the case that a request arrives at the server while
another request is being processed. The number of concurrent server threads
may influence the performance. However, we rely on Java RMI to decide on
the optimal number of concurrent threads on the server node. Also, as previously described, the server sockets used to connect the clients and the server
are modified to avoid buffering of replies to different transaction threads
executed on the same node.
9.2
Empirical Validation of the Non-Blocking
DT Creation Methods
Empirical validation experiments have been performed for all DT creation
methods in Non-blocking DBMS. The experiments were executed using the
following steps:
1. Populate the source tables with initial data as defined in Table 9.5.
Note that five times more records are used in these tests than are
specified in the table.
2. Start a workload of semi-random insert, update and delete operations.
The workload is described by the transaction mixes defined in Tables
9.2 to 9.4, but the read transaction types are ignored. Execute 200,000
transactions before stopping the workload.
3. Once the random workload has started, start the DT creation process.
Let log propagation run until all transactions have completed executing.
4. When all transactions have completed, compare the content of the
source tables to that of the derived tables. No attribute values should
differ between these schema versions.
140
9.2. EMPIRICAL VALIDATION OF THE NON-BLOCKING DT
CREATION METHODS
For continuity, the tests have been performed with similar source and
derived tables as used in the examples in Chapter 6. Thus, the figures used
there may be used for reference.
Difference and intersection
In the difference and intersection (diff/int) experiment, records from the
“Vinyl records” source table were stored in the difference or intersection
DTs based on the existence of equal records in the “CD records” source table.
All tables had three attributes: artist firstname, artist surname and record
title. To achieve a 20% overlap of records between the source tables, some
operations wrote completely random values while others wrote predefined
default values. After the transactions had completed, the records in the DTs
were ordered by artist and record name. The difference and intersection
operators were then applied on the source tables, and the sorted results were
stored in arrays. The records in these arrays were in turn compared to the
derived tables. The contents of “CD records” was also compared to that of
the auxiliary table. All records, including LSNs, were equal in the source
and derived tables.
Horizontal Merge
To reduce the implementation work, the horizontal merge experiments have
only been performed for the duplicate removal case. Duplicate removal was
chosen before duplicate inclusion because the former is more complex, as
argued in Section 6.3.
The source tables in the horizontal merge experiment were equal to those
used in diff/int. Approximately 20% of the records in “CD Records” were
duplicates of “Vinyl Records”. There were no duplicates within one table.
When all transactions had completed, the records in both source tables were
sorted on name and record title and inserted into an array. The unique
records in this array were then compared to the DT, and the record IDs were
compared to the auxiliary table. No inconsistencies were found.
Horizontal Split
In the horizontal split experiment, a “Record” source table was split into
“Vinyl records” and “CD records”. The attributes in the source table were
firstname, surname, record title and type. Type was used to determine which
DT a record should be derived to. Possible values of this attribute were
“vinyl” (49%), “cd” (49%) or “none” (2%). The latter value was used to
indicate that the record did not belong to either DT, and therefore had to
CHAPTER 9. PROTOTYPE TESTING
141
be stored in the auxiliary table. The comparison of the source and derived
table contents was performed by copying the source records into one of three
arrays, depending on the value of the type attribute. The records in each of
these arrays were then compared to the DTs. The comparison showed that
all records were equal in the source and derived tables.
Vertical Merge
The vertical merge experiment was conducted by joining the “employee”
and “postaladdress” source tables. The resulting DT was called “modified
employee”. When all transactions had completed, the comparison of records
in the source and derived tables was performed by using the full outer join
operator on the source table records. The join result and records in the
DT were then sorted on the primary key attribute, social security number
(SSN), and stored in arrays. Since the SSN was unique, comparison was
straightforward. No inconsistencies were found.
Vertical Split
Vertical split was performed over a functional dependency, in which the
source table “employee” was split into “modified employee” and “postaladdress”. As argued in Section 6.7, this type of vertical split is more complex
than the vertical split over a candidate key counterpart since source records
may be inconsistent in the former case. The records in the source table were
designed to split into four times as many employees as postal addresses. 99%
of all write operations to the attributes derived to the “postal address” DT
were default values. The remaining 1% were set to non-default values. This
resulted in approximately 4% inconsistent records in the final state of the
“postal address” DT. The consistency check program was executed in parallel with log propagation. In addition, the consistency check was performed
on all records that were flagged as Unknown2 when the transactions had completed. One final log propagation was then executed to achieve correct flag
states. The comparison of records was first performed between the source
table and the “modified employee” tables. The records from both tables were
sorted on the primary key, SSN, before the relevant subset of attributes were
compared. No inconsistencies were found in this comparison.
The “postal address” table was then checked. This was done by inserting
the records in the source table into an array. Only the subset of attributes
stored in “postal address” DT were stored, and the array was sorted on zip
2
Recall from Section 6.7.5 that derived records are flagged as either (C)onsistent or
(U)nknown.
142
9.3. PERFORMANCE TESTING
code. Equal zip codes were discarded after checking that the records were
equal; if different attribute values were found, the record in the array was
flagged as inconsistent. This content of the array was then compared to the
content of the “postal address” DT. Some inconsistencies were found, but
only on records marked with an Unknown flag in the DT. A cross check
revealed that all of these were marked as inconsistent in the array. Also,
no derived records with an Unknown flag were marked as consistent in the
array.
Empirical Validation Summary
With the exception of records flagged as Unknown in the vertical split experiment, no inconsistencies have been found between source records and
derived records. This indicates that the DT creation methods work correctly. We base this on what we consider to be extensive testing, in which
all basic write operations (insert, delete and update), and both normal and
abort transaction execution3 has been involved. 200,000 transactions with
6 operations each have been executed in each experiment. Thus, a total of
1,200,000 modifying operations have been made to 300,000 records.
No matter how extensive the experiments are, however, empirical validation can never be used as a proof of correctness (Tichy, 1998). Thus, the
experiment should ideally be repeated in another implementation to confirm
the results (Walker et al., 2003). Due to time considerations, the experiments
have only been performed on one implementation in this thesis.
9.3
Performance Testing
The following sections discus the performance test results from the nonblocking DT creation experiments. The same operators as in the empirical
validation experiments are considered. Hence, for horizontal merge, only the
duplicate removal case is discussed. Similarly, for vertical split, only the
functional dependency case is discussed.
There are two common measurements for database system performance.
These are response time, i.e. the time spent from a client requests an operation until it receives the response, and throughput, i.e. the number of
transactions processed per unit of time (Highleyman, 1989). Results for
both measurements are discussed.
The test results presented in this chapter will not be a benchmark comparisons between the Non-blocking DBMS and a fully functional DBMS. The
3
1%-3% of the transactions have been aborted due to locking conflicts.
CHAPTER 9. PROTOTYPE TESTING
143
reason is that the prototype lacks functionality vital to achieve good benchmark results, e.g. the aforementioned query optimizer. Furthermore, as is
clear from Section 9.1, the hardware used in the experiments is far from capable of running high performance DBMS benchmarks. What the tests will
be used for, however, is to show the relative performance of user transactions
when executed alone compared to when executed concurrently with the various DT creation methods. In the following sections, “user transactions” will
denote transactions sent from a client application. These are not involved in
the DT creation.
The performance of all steps of DT creation is tested, but most emphasis
will be put on performance during log propagation. The reason for this is
that the other three steps have much shorter execution times and therefore
impact performance to a lesser extent.
Thread Priorities
The experiments discussed in the performance test sections have been designed to degrade the performance of concurrent transactions as little as
possible. We achieve this by reducing the priority of the DT creation thread
to the point where the log propagator is only capable of applying as many
log records as are produced. Hence, with this priority, the number of log
records to redo remains unchanged. Small increases in the priority of this
thread should therefore result in long execution time with a minimal performance degradation. Similarly, a big priority increase should result in shorter
execution time at the cost of more performance degradation.
The priority of threads in Java 2 Standard Edition 5.0 can be set so that
a high priority thread is scheduled before a low priority thread. However,
despite setting the priority to an absolute minimum, DT creation tends to
complete very fast with the inevitable high performance penalties to concurrent transactions. The reason for this is that the Java VM uses Linux system
threads to implement Java threads (Austin, 2000), and that these have time
slices of 100ms in the Linux 2.6 kernel (Aas, 2005). Hence, every time the DT
creation thread is scheduled, it is allowed to run uninterrupted for 100ms.
By only modifying thread priorities, we are not able to achieve acceptable
transaction performance. The problem is that each requested operation is
processed in approximately 1 ms. This means that if there are 50 threads used
for transaction processing and one thread for DT creation, and all threads are
scheduled once, the DT creation thread gets twice the CPU time as all the
other threads together. Thus, to reduce the priority further, Thread.yield()
and Thread.sleep() calls are used on the DT creation thread. This forces
the thread to stop processing, thus reducing the time slice. By using this
144
9.3. PERFORMANCE TESTING
technique we are able to fine-tune the priority, and thus find the lowest
possible performance degradation for each DT creation method.
Determining the Maximum Capacity of NbDBMS
Most performance results in this chapter are presented on a 50% to 100%
workload scale. This implies that the point for 100% workload, i.e., the
maximum transaction capacity of NbDBMS, has to be determined. The
maximum capacity differs slightly between the DT creation operators since
the transaction mixes are not exactly equal, but the method described here
is used to find all.
As advised by Highleyman (Highleyman, 1989), the steps that should
be used to determine the maximum capacity of a database system is to first
define the maximum response time that is considered acceptable. Second, the
amount of transactions that are required to complete within the maximum
response time must be defined. The capacity can then be determined by
executing test runs and compare the results with the requirements.
We define 10 ms as the maximum acceptable response time of an operation. Considering the fact that all records are in main memory, requests are
only sent over a LAN, and the requested operations do not include complex
queries, 10 ms should suffice. A transaction that observes higher response
time than 10ms for any of its six operations is considered to have failed. It is
also decided that 95% of all transactions must complete within an acceptable
response time. This is often used as a requirement in telecom systems, e.g.
as in ClustRa (Hvasshovd et al., 1995).
Considering only transaction failure due to unacceptable response time,
5% transaction failure corresponds to too high response times in 0.85% of all
operations since all transactions consist of 6 operations:
0.95 = (1 − x)6
√
6
x = 1 − 0.95 = 0.0085
(9.1)
Figure 9.1(a) shows the mean operation response times with a workload
ranging from 100 to almost 500 transactions per second. It is clear from the
graph that the mean response time is much lower than 10 ms in all cases.
Figure 9.1(b) shows the upper quartiles for a 99% confidence interval using
the results from the same test runs as in the left graph. This graph shows
that an increasing number of transactions are not answered in time, especially
as the throughput increases above 400. The rapid response time increase in
both graphs is expected since the delay over a bottleneck resource is given
by (Highleyman, 1989):
CHAPTER 9. PROTOTYPE TESTING
145
Response Time vs Transactions Per Second
20
Response Time vs Transactions Per Second
2.0
●
●
Avg Response Time
Base Response Time
●
Response Time − Upper Quartile
●
10
Response Time (ms)
1.6
1.4
●
1.2
Response Time (ms)
15
1.8
●
●
●
●
●
●
5
1.0
●
●
●
●
●
150
●
●
●
200
0
0.8
●
●
250
300
350
400
450
150
200
Transactions Requested
250
300
350
400
450
Transactions Requested
(a) Response Time increases exponentially as the number of transactions per
second (tps) increases. The base response time before the rapid increase is
at 0.78 ms.
(b) Response time 99% upper quartile.
I.e., 0.5% of all response times are equal
to or higher than the plot.
Throughput for Difference and Intersection Transactions
400
●
●
●
●
300
●
●
200
●
●
●
●
100
Throughput, max response time 10 ms
Theoretical Throughput
Actual Throughput
●
0
●
●
0
100
200
300
400
Transactions Requested
(c) Theoretical and actual throughput when transactions are considered failed if the
response time is higher than 10 ms.
Figure 9.1: Response time and throughput for difference and intersection,
using Transaction Mix 1 scenario 1 (see Table 9.2).
146
9.3. PERFORMANCE TESTING
T
(9.2)
1−L
Here, T is the average service time over the bottleneck resource, and L is the
load. Hence, it is clear that the response time increases towards infinity as the
workload approaches 100%. Because of this rapid response time increase, a
higher maximum acceptable response time would not increase the maximum
throughput much.
Figure 9.1(c) illustrates the throughput for transactions that are processed within the 10 ms per operation response time requirement. The number of transactions processed within the time limit increases almost linearly
before the response time becomes so high that more and more transactions
fail. The highest throughput was observed at 440 transactions per second. If
the workload exceeds 440, the actual throughput starts decreasing. This is,
however, above the maximum capacity of NbDBMS with the current transaction set.
Delay =
A note on Locking Conflicts in the Performance Experiments
In the empirical validation experiments described in Section 9.2, the source
tables were populated with 100,000 or 200,000 records, as shown in Table 9.5.
The experiments were performed with approximately 60-70% of the workload
used in the performance tests, something that resulted in the abortion of 13% of the transactions due to locking conflicts.
To get statistically significant results in the performance experiments, a
huge amount of test runs are required. A total of 2000 test runs have been
performed to get the data for the graphs in this chapter. In addition, many
test runs have been performed to find the maximum capacity of NbDBMS,
ideal thread priorities for the DT creation thread and so forth. However,
running this many tests with the same number of records as in the empirical
validation tests would take too much time. Hence, the number of records in
the source tables has been reduced to 20,000 and 40,000, as shown in Table
9.5. This reduced the execution time of each iteration by more than one half.
While this reduction in records makes it possible to run the required number of experiments, it causes another problem. With much fewer records, a
very high number of transactions are forced to abort due to locking conflicts even at moderate workloads. With this setup, the maximum capacity
achievable without severely thrashing the throughput does not nearly utilize
the CPU capacity of the server node4 . Thus, this setup can be used to little
4
The throughput thrashed completely at approximately 65% of the maximum capacity
used in the experiments described in this section.
CHAPTER 9. PROTOTYPE TESTING
147
more than test the locking subsystem of NbDBMS. Since the DT creation
methods do not acquire additional locks in all but a few cases, tests with
such low workloads are not considered very useful.
To be able to perform a high number of test runs and at the same time test
the impact of DT creation under high workloads, the transactions used in the
performance experiments have been designed to operate on different records.
We call this clustered operation. This means that no locking conflicts will
occur. When it comes to conflicts, the effect of this design is the same as
having many times more records. There should be no difference (except the
execution time) to the performance results as long as all records fit in main
memory in all cases. To verify this, five diff/int test runs with 200,000 smallsized records in each table and no clustering of the operated on records have
been compared to the 20,000 record case with clustered operations. Apart
from garbage collection being more noticeable in the former case, the setups
performed with similar response times; the 200,000 setup was less then 10%
slower than the 20,000 setup. It is worth noticing that the total size of all
records in both setups5 were much smaller than the maximum Java VM heap
size (800MB).
9.3.1
Log Propagation - Difference and Intersection
We start the performance evaluation by a thorough discussion on the difference and intersection (diff/int) experiments. As will be clear, the tendencies
found here also apply to the experiments for other DT creation methods. We
will not repeat the arguments and discussion for these experiments. Refer to
Appendix B for plots of these experiments.
The diff/int tests use Transaction Mix 1, shown in Table 9.2, as workload.
Experiments are conducted for two scenarios: in the first scenario, the source
tables are frequently written to, i.e. 30% of all operations are either updates,
inserts or deletes in the source tables. The second scenario is much more
read intensive, and the number of write operations on the source tables is
reduced to 10%. Because write operations to the source tables are the only
operations that must be propagated to the DTs, scenario 1 should produce
three times more log records to propagate. Thus, scenario 1 is expected to
incur much higher performance degradation to normal transactions.
Response Time Distribution
Consider Figures 9.2(a) and 9.2(b), showing the distribution of operation
response times under 50% workload. The left figure shows the response
5
∼50 MB in the 200,000 case.
148
9.3. PERFORMANCE TESTING
50% workload − Unloaded
0
0
1
1
2
2
Density
Density
3
3
4
4
5
5
50% workload − Log Propagation
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
0.5
1.0
Response Time (ms)
1.5
2.0
2.5
3.0
3.5
4.0
Response Time (ms)
(a) Distribution of response time for 50%
workload before DT creation is started.
(b) Distribution of response time for 50%
workload during log propagation for difference DT creation.
4
3
Density
2
1
0
0
1
2
Density
3
4
5
80% workload − Log Propagation
5
80% workload − Unloaded
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
Response Time (ms)
(c) Distribution of response time for 80%
workload before DT creation is started.
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
Response Time (ms)
(d) Distribution of response time for 80%
workload during log propagation for difference DT creation.
Figure 9.2: Response time distribution for 50% and 80% workload workload
for difference and intersection transactions using scenario 1 from Table 9.2.
CHAPTER 9. PROTOTYPE TESTING
Workload
50 %
80 %
Median
Unloaded Loaded
0.709ms 0.714ms
0.750ms 0.770ms
%
0.7%
2.7%
149
Mean
Unloaded Loaded
0.772ms 0.778ms
0.868ms 1.102ms
%
0.8%
27%
Table 9.6: Summary of the response time distribution in the histograms of
Figure 9.2.
times without DT creation, and the right shows response times during log
propagation. The histograms show that the distributions are very similar, but
that the latter has slightly more outliers to the right. Thus, most operations
are processed equally quick in the unloaded case as in the log propagation
case, whereas a few operations observe much higher response times in the
latter case. Note that for readability, the horizontal axis of the histograms
stop at 4 ms, but the tendency of more outliers in the loaded cases is equally
clear beyond this limit.
The fact that most response times are equal in the unloaded and loaded
cases is further confirmed by comparing their median values: The unloaded
histogram in Figure 9.2(a) has a median of 0.709ms, while the median for
the loaded case, shown in Figure 9.2(b), is only 0.7% higher at 0.714ms.
This effect is also seen in the 80% workload histograms in Figures 9.2(c) and
9.2(d), in which the medians are 0.750 ms versus 0.770 ms (2.7% higher).
It is interesting to compare these median values to the respective average response times. In the 50% workload case, the mean for the unloaded
histogram is 0.772 ms whereas the mean of the log propagation histogram
is 0.778 ms. In the 80% workload case, the means have increased to 0.868
ms and 1.102 ms. This corresponds to 0.8% and 27% higher means in the
respective workloads. Hence, the means are highly affected by the response
time outliers whereas the medians are affected to a much lesser extent. This
is because there are relatively few outliers, but these have very high response
times, thus affecting the means.
The increased number of high response time observations in the log propagation cases compared to the unloaded ones is caused by the DT creation
thread. Since threads are given access to the CPU in time slices, and since
the DT creation thread has a low priority, most transactional requests arrive
at the server when log propagation is inactive. These requests observe the
same response times as in the unloaded case. However, when the DT creation
thread is active, all transaction requests must be scheduled on only one CPU.
These requests form much longer resource queues, and thus observe higher
response times.
Comparing the histograms with 50% workload to those with 80% work-
150
9.3. PERFORMANCE TESTING
load reveals that the response times increases with the workload. Furthermore, the effect of performing the DT creation increases significantly, as
indicated by the much flatter histogram in the lower right figure. For the
unloaded case, the higher response time is caused by a higher request rate,
which in turn increases average queue lengths at the server (Highleyman,
1989). A higher workload also produces more log records per second. Hence,
in the loaded case, the priority of the DT creation thread must be increased
to be able to propagate these additional log records within the same time
interval. Increasing the priority means providing the thread with more CPU
time, which in turn increases the probability that the DT creation tread is
active when a transactional request arrives at the server.
Most of the long response times observed in the unloaded cases are caused
by garbage collection. To determine the impact of garbage collection, three
algorithms have been tested. The default algorithm resulted in very high
standard deviation of the response times compared to the incremental algorithm used in all experiments in this chapter. Both these algorithms were
described in Section 9.1. Experiments with no garbage collection have also
been conducted. This option resulted in fewer, but not complete removal of,
high response time observations. The latter garbage collection alternative is
not used in the experiments because the memory quickly becomes full. We
assume that the few remaining high response time observations are caused
by processes running on the server node that are not under our control.
Response Time
An important aspect of the performance evaluation is how the response time
and throughput is affected by varying workloads. Consider Figure 9.3, which
shows response time means and 90% quartiles for the unloaded and log propagation cases. The first Figure, 9.3(a), shows the response time in the unloaded case of Transaction Mix 1, Scenario 1. The plot shows that the lower
quartiles are stable at around 0.54 to 0.55 ms whereas the upper quartiles
increase rapidly with the workload. This is the same effect that was seen
in the histograms in Figure 9.2; as the workload increases, the mean queue
lengths at the NbDBMS server increase. This is not surprising since Equation 9.2 determines that the response time should increase towards infinity
as the load over the bottleneck resource approaches 100%.
Figure 9.3(b) shows the same experiment as in the left plot, but with the
response times from both the unloaded and log propagation cases. It is evident from the plot that the response time performance penalty of performing
log propagation increases rapidly as the workload exceeds 75-80%. E.g., the
mean response times during log propagation are 20%, 84% and 200% higher
CHAPTER 9. PROTOTYPE TESTING
151
Response Time Average and 90% Quartile
8
2.0
Response Time Average and 90% Quartile
●
Unloaded
Unloaded
During Log Propagation
●
4
Response Time (ms)
1.5
1.0
Response Time (ms)
6
●
●
●
2
●
●
●
●
●
●
●
50
60
70
●
0.5
●
50
60
70
80
90
100
% Workload
(a) Scenario 1 - Response time mean and
90% quartiles for the unloaded case, i.e.
before DT creation is started.
90
3.0
Response Time Average
5
●
Unloaded
During Log Propagation
●
●
●
2.0
1.5
●
1
●
●
●
50
60
70
●
●
●
50
60
●
0.5
●
Unloaded Scenario 1
Unloaded Scenario 2
Log Prop Scenario 1
Log Prop Scenario 2
1.0
2
3
Response Time (ms)
4
2.5
●
100
(b) Scenario 1 - Response time mean and
90% quartiles for the unloaded and log
propagation cases.
Response Time Average and 90% Quartile
Response Time (ms)
80
% Workload
80
90
100
% Workload
(c) Scenario 2 - Response time mean and
90% quartiles for the unloaded and log
propagation cases.
70
80
90
100
% Workload
(d) Scenario 1 and 2 - Response time
mean for unloaded and log propagation
cases.
Figure 9.3: Response times for varying workloads before and during log propagation of difference and intersection DT creation using Transaction Mix
1.
152
9.3. PERFORMANCE TESTING
during in the 80%, 90% and 100% workload cases, respectively.
The lower plots, shown in Figure 9.3(c), show the response time means
when scenario 2 of Transaction Mix 1 is used. It is clear that this scenario
impacts the response time to a much lesser extent than scenario 1. The
reason for this is simply that the priority of the DT creation thread can be
kept lower since fewer log records need to be propagated.
The impact on response time of performing log propagation applies to
the upper quartiles in particular. This was also clear from the histograms
in Figure 9.2, and indicates that an increasing number of operations have
to wait in a queue for long periods of times. Recall from Section 9.3 that
the capacity of the Non-blocking DBMS is defined as the workload, measured in transactions per second, at which less then 5% of all transactions
observe higher operation response times than 10 ms. Further, the response
times increase very rapidly as the workload increases up to and beyond this
capacity. It is obvious that log propagation adds to the workload of the Nonblocking DBMS. And, since the transaction arrival rate is not reduced when
log propagation starts, it is not surprising that the response time averages
during log propagation increases quickly. As opposed to the upper quartile,
the lower quartile is relatively stable at approximately 0.55 to 0.56 ms for all
workloads.
Throughput
In addition to response time, throughput represents an important performance metric for database systems. Recall from Section 9.3 that all transactions with higher operation response times than 10 ms are considered failed.
Consider Figure 9.4(a), showing the throughput of the unloaded and log
propagation cases for scenario 1 of Transaction Mix 1.
It is clear from the plot that very few transactions fail at low workloads.
As the workload in the log propagation case increases beyond 70%, however,
more and more operations are not processed in time. This is consistent with
what the 99% upper quartile plot in Figure 9.1(b) and the response time plots
in Figure 9.3 indicated: The upper quartile of the response time increases
rapidly with the workload. Hence, at low workloads, even most of the “long”
response times are lower than the acceptable 10 ms. At approximately 70%,
the longest response times start to go beyond this. At even higher workloads,
the ever increasing amount of too long response times effectively thrashes
the throughput. Furthermore, the number of failed transactions increases
very rapidly when 70% workload is reached. Again, considering the rapidly
increasing response times in Figure 9.3, this rapid thrashing comes as no
surprise.
CHAPTER 9. PROTOTYPE TESTING
400
Throughput − Difference and Intersection Scenario 2
400
Throughput − Difference and Intersection Scenario 1
153
Theoretical Throughput
Unloaded
During Log Propagation
●
Theoretical Throughput
Unloaded
During Log Propagation
●
●
●
●
300
300
●
●
●
●
●
●
●
200
Throughput
●
200
Throughput
●
●
●
100
●
100
●
●
●
●
●
0
0
0
●
20
40
60
80
100
% Workload
(a) Scenario 1 - Throughput for difference and intersection using Transaction
Mix 1.
●
0
20
40
60
80
100
% Workload
(b) Scenario 2 - Throughput for difference and intersection using Transaction
Mix 1.
Figure 9.4: Throughput for varying workloads during log propagation of Difference and Intersection DT Creation.
Considering the unloaded case of the same plot, it is clear that the
throughput is only slightly reduced from approximately 80% and higher workloads. The reason why the reduction is kept relatively low is the way we have
defined 100% workload. Recall from Section 9.3 that 100% workload is defined as the point where 95% of all transactions observe acceptable response
times. Hence, by definition, all throughput plots should have 5% reduction
to the unloaded throughput at 100% workload.
Figure 9.4(a) shows the throughput of scenario 2 of the same transaction
mix. As can be seen in Table 9.2, this scenario has more read operations than
scenario 1. As previously discussed, this means that the DT creation thread
needs less CPU time to propagate the log records generated per second.
As shown in Figure 9.2(d), this results in less response time degradation
for concurrent transactions. It should be clear from the above discussion
why a lower response time degradation also results in a lower throughput
degradation.
154
9.3. PERFORMANCE TESTING
Response Time Average
Unloaded
During Log Propagation
1.6
1.4
1.2
●
●
1.0
Response Time (ms)
●
Unloaded Scenario 1 1x
Unloaded Scenario 1 5x
Unloaded Scenario 1 15x
Log Prop Scenario 1 1x
Log Prop Scenario 1 5x
Log Prop Scenario 1 15x
●
●
0.8
2
3
4
Response Time (ms)
5
●
1.8
6
Response Time Average and 90% Quartile
●
●
●
●
●
●
50
60
70
●
80
0.6
1
●
90
100
50
60
70
% Workload
(a) Response time mean and 90% quartiles for vertical merge DT creation.
Transaction Mix 1, Scenario 1.
80
90
100
% Workload
(b) Response times for three variations
of table sizes in source table 2. Transaction Mix 1, Scenario 1.
Figure 9.5: Response times and throughput for varying workloads during vertical merge DT creation.
9.3.2
Log Propagation - Vertical Merge
The vertical merge experiments are performed using scenario 1 and 2 of
Transaction Mix 1, shown in Table 9.2. This is the same mix as was used in
the diff/int experiments.
Consider Figure 9.5(a), showing the mean response time for scenario 1.
The graph shows the same rapid response time tendency as the diff/int experiments did. The response time distributions also have similar shapes as
those shown in the histograms in Figure 9.2. Hence, most requests are answered quickly, and the amount of requests with very long response times
increases with the workload. Although histograms are not shown here, this
distribution is indicated by the mean response times in Figure 9.5(a) being
much closer to the lower 90% quartile than the upper.
The left plot in Figure 9.5 shows the response time results for scenario 1
with 20,000 records in both source tables. We consider it likely that vertical
merge in many cases will be performed on tables with uneven number of
records, however. For example, if the “employee” and “postal address” source
tables are merged6 , it is likely that at least some employees share zip codes.
6
This has been used as an example of vertical merge throughout the thesis; refer to
CHAPTER 9. PROTOTYPE TESTING
155
Throughput, Unloaded and During Log Propagation
3.0
400
Response Time Average
Unloaded Diff/Int
Unloaded Vertical Merge
Log Prop Diff/Int
Log Prop Vertical Merge
●
●
300
●
Unloaded Diff/Int
Unloaded Vertical Merge
Log Prop Diff/Int
Log Prop Vertical Merge
●
●
●
●
●
●
50
60
70
80
●
●
●
●
●
●
0
0.5
●
●
●
●
●
●
100
1.0
●
●
●
●
●
200
●
Throughput
2.0
●
●
1.5
Response Time (ms)
2.5
●
90
100
% Workload
(a) Comparison of response times.
●
0
20
40
60
80
100
% Workload
(b) Comparison of throughput.
Figure 9.6: Comparison of response times and throughput for scenario 1 of
difference and intersection and vertical merge.
Hence, Figure 9.5(b) illustrates how variations in the number of source table
records affect response time degradation. The red line, called 1x, is the plot
for 20,000 records in both source tables. This is the same plot as shown
in the left figure. The blue 5x line shows test results with 4,000 records in
source table 2 while the green line is for 1.300 records.
It is worth noticing how a reduction in the number of source records incurs
higher degradation. The reason for this is that the records in source table 1
of this experiment always have a join match in source table 2. Hence, as the
number of records in source table 2 decreases, the number of join matches
for each of these increases. This means that a modification to a record in
source table 2 must be propagated to an average of 1, 5 and 15 records in the
three cases, respectively. This increases the work that must be done by log
propagation, which in turn results in an increased priority for the DT creation
thread. As previously discussed, a higher priority on the DT creation thread
incurs higher performance degradation for concurrent transactions.
As is clear from the above discussion, the response time of vertical merge
has a similar behavior as that of diff/int. This does not mean that the performance results are equal, however. In Figure 9.6, the results from the diff/int
experiments are shown together with those from vertical merge with 20,000
Section 6.5 for illustrations.
156
9.3. PERFORMANCE TESTING
records in both source tables. The plots clearly show that the former experiment degrades performance to a much greater extent than vertical merge.
To understand why, we have to investigate the amount of work performed
by the two log propagators. In the vertical merge case, each source record
modification is applied to one derived record on average. Even though more
than one record may be affected by modifications in source table 2, affected
records are always found by exactly one lookup in one DT. This is not the
case for the diff/int method, in which all source record modifications involve
lookup of records in two or even three derived tables. Furthermore, source
record modifications may require a derived record to move from the intersection DT to the difference DT or vice versa. Since the diff/int log propagator
has to perform more work for each log record, and since the same number of
log records are generated in the two experiments, the priority of the diff/int
DT creation thread must be higher than that of vertical merge. Hence the
higher diff/int degradation to both response time and throughput.
9.3.3
Low Performance Degradation or Short Execution Time?
All performance experiments described for log propagation have been performed with the lowest achievable performance degradation in mind. As
described in Section 9.3, we define this as the degradation incurred when
the log propagator is only capable of applying as many log records as are
produced. However, by using this DT creation thread priority, the states of
the DTs do not get closer to that of the source tables. Thus, log propagation
will never finish.
If the priority of the DT creation thread is increased from this point, it
gets more CPU time. Hence, it is capable of reducing the number of log
records that separate the states of the DTs and source tables. At the same
time, however, the performance degradation is increased.
Figure 9.7 illustrates the effect of changing the priority in diff/int DT
creation, running 30 iterations with Transaction Mix 1 scenario 1 at 50%
workload with 50,000 records in each source table. Starting at the leftmost
side of the plot where the DT creation is run at maximum priority, it is
clear that a slight decrease in priority results in much less degradation to
response time at the cost of little additional execution time. As the priority
gets lower, however, less and less reduction in degradation is observed. It is
up to the database administrator (DBA) to decide if short execution time or
low performance degradation is more important. Hence, we will not discuss
this further.
CHAPTER 9. PROTOTYPE TESTING
157
30
Performance Degradation vs Time Spent
●
15
20
Response time
Throughput
10
●
5
Performance Degradation (%)
25
●
●
0
●
4
6
8
10
12
14
Completion Time (s)
Figure 9.7: Total time for log propagation with varying DT creation thread
priorities. Diff/int DT creation method, running Transaction Mix 1 scenario
1 with 50% workload.
9.3.4
Other Steps of DT Creation
This far, only performance degradation during the log propagation step has
been discussed. In this section, the implied impact of the other steps are
discussed. Unless otherwise noted, difference and intersection DT creation
experiments, running Transaction Mix 1 scenario 1, are discussion. However,
the discussion applies to all the DT creation methods.
Preparation and Initial Population
During the preparation step, derived tables and indices are added to the
schema. This only involves modifications to the database schema; no records
or locks are involved. The performance implication of this step is negligible
since it completes very fast. An inspection of 100 performance report files
from all DT creation methods7 showed that the longest execution time of
this step was 36 ms whereas the shortest was 17 ms.
The performance impact of initial population is a completely different
matter, even though the priority of the DT creation thread can easily be lowered to the point where no performance degradation is observed at all. The
problem with such low priorities is that the step uses long time to complete.
Since the log propagation step executed next has to apply the log records
7
These were randomly picked from the 2,000 report files from the previous section.
158
9.3. PERFORMANCE TESTING
Workload
50 %
70 %
90 %
Initial Population
0.758ms
0.764ms
0.865ms
Log Propagation
0.750ms
0.778ms
0.872ms
% Difference
1.1%
-1.8%
-0.8%
Table 9.7: Average response times during the initial population and log propagation steps of vertical merge when both steps use the same priority.
Max
Response time
degradation
0.5%
Aborted
transactions
0.8%
DT Creation
Time
∞
Max
0.8%
1.0%
22.4 s
Max
3.8%
2.8%
14.9 s
Max
Max
Blocking
15.0%
29.1%
-
7.1%
15.4%
100%
10.4 s
7.3 s
3.6 s
Priority
1
5
2
5
3
5
4
5
Table 9.8: Effects on performance for different priorities of the DT creation
thread during the initial population and log propagation steps.
generated during initial population, the execution time of this step highly
affects the total DT creation execution time.
Initial population has no “minimal” priority similar to what we used
for log propagation; the priority at which the same number of log records
are propagated and produced within a time interval. Hence, we consider
two alternatives to the very low priority described above. The first is to
use the same priority as the log propagation step. To get an indication on
the performance implications of this alternative, 30 test iterations have been
performed with workloads of 50, 70 and 90% during diff/int. Not surprisingly,
the tests show that the two steps degrade performance to the same extent
when the priorities are equal. The results are shown in Table 9.7.
The second alternative is to use a very high priority on the initial population step. Intuitively, this results in higher performance degradation for
a shorter time interval. The extreme version of this is the insert into select method used in many existing DBMSs (Løland, 2003): it involves readlocking the source tables for the duration of the entire initial population step.
Log propagation and synchronization is not needed in this case.
Table 9.8 shows the results from the same 30 test iterations described
in Section 9.3.3 and illustrated in Figure 9.7. The priority of DT creation
CHAPTER 9. PROTOTYPE TESTING
159
has been varied, but initial population and log propagation have had the
same priority in all cases. The table clearly illustrates how the performance
degradation decreases as the DT creation time increases. The “blocking”
line represents the “insert into select” method. As argued in Section 9.3.3,
the choice of which priority is “best” is left to the DBA.
The performance degradation during the synchronization step is very
small when the DTs are not used for schema transformations. 30 test runs
have been performed on diff/int at 50 and 80% workload. Synchronization
was started automatically when 10 or less log records remained to be redone.
The experiments showed that the source table latches were held for a duration of 1-2.5 ms while these log records were applied to the DTs. The test
report files indicate a slightly higher average of failed transactions due to
unacceptably high response times the second immediately following synchronization; 2.4% and 1.7% for the 50% and 80% workload cases, respectively.
However, these results vary much between the report files due to the short
time interval and hence few available response time samples. The expected
number of failed transactions for this step relies heavily on the remaining
number of log records when the source table latches are set and the response
time considered acceptable. Intuitively, if acceptable response time latch
time, few transactions will fail.
9.3.5
Performance Experiment Summary
In the previous sections, we have discussed the results from extensive testing
to find the performance degradation incurred by the DT creation methods.
Log propagation has been discussed in most detail since this step typically
runs for much longer time intervals than the other steps. In the described experiments, ∼75% of the DT creation time has been used by log propagation,
∼25% by initial population and 1% by preparation and synchronization
combined.
The experiments have shown that the incurred performance degradation
relies heavily on the workload. At low to medium (∼70%) workloads, DT creations running scenario 1 of the transaction mixes can be performed almost
with no degradation for concurrent transaction. If the workload is increased
beyond this point, the performance quickly becomes unbearable because
response times increase very quickly, and eventually result in throughput
thrashing.
In addition to workload, DT creation thread priority also affects performance degradation to a huge extent. While a low priority DT creation
thread results in little performance degradation, it also incurs long execution
time. Any increase to the priority will decrease the time to complete but also
160
9.4. DISCUSSION
2.5
3.0
Response Time Average
2.0
1.5
●
1.0
●
●
●
●
50
60
●
0.0
0.5
Response Time (ms)
●
Difference and Intersection
Vertical Split
Vertical Merge
Horizontal Merge
Horizontal Split
70
80
90
100
% Workload
Figure 9.8: Summary of average response time for all DT creation methods.
increase the performance degradation.
The type of work performed by the transactions running on the server
also affects the degradation. If the transactions perform few write operations
on records in the source tables, the degradation is smaller than if many
write operations are performed. Alternatively, the degradation may be held
constant while the execution time is varied.
Finally, the different DT creation methods incur different degradations.
The reason for this is that they must perform different amounts of work
to propagate each log record. For example, log propagation of an update
operation incurs an update of only one record in a DT when horizontal split is
used. In difference and intersection, however, the same update would require
a lookup for equal records in one DT, and an update in one or even two
records in the other DTs. As shown in Figure 9.8, difference and intersection
incurs most degradation while horizontal split incurs least. Vertical merge,
vertical split and horizontal merge are between these.
9.4
Discussion
In this chapter, we have discussed empirical validation and performance experiments performed on the Non-blocking DBMS prototype. The empirical
validation experiments gave predictable and correct output, which strongly
indicates that the DT creation methods work correctly.
The performance experiments showed that close to 100% of the total DT
CHAPTER 9. PROTOTYPE TESTING
161
creation time was used by the log propagation and initial population steps.
Under moderate workloads, DT creation can be performed with almost no
performance degradation for concurrent transactions. However, this requires
a very low priority on the DT creation process, which in turn increases the
total execution time significantly. A consequence of this is that the DBA
has to make a decision on whether to perform the DT creation quickly with
much degradation, slowly with little degradation or something in between.
When to use the DT Creation Method
It is clear that the execution time decreases and the performance degradation
increases as the priority of the DT creation process is increased. At extremely
high priorities, the DT creation method behaves almost like the insert into
select method used in current DBMSs (Løland, 2003). When fast completion
time is more important than low performance degradation, the existing insert
into select method or Ronström’s method should be used. Since performance
experiments have not been published on Ronström’s method, it is uncertain
which of these would be preferred under which circumstances. In cases where
it is advisable to trade longer execution time for lower performance degradation our DT creation method should be used instead. Using our method
provides flexibility since the priority may be increased or decreased as the
DBA sees fit. Note, however, that the insert into select method allows combinations of relational operators, including aggregates. These combinations
are not yet supported in our DT creation method.
We expect our method to outperform Ronström’s method when it comes
to performance degradation. The reason for this is that Ronström’s method
forwards source record modifications by using triggers executed within the
original transaction (Ronström, 1998). A similar use of triggers is explicitly
discouraged for MV maintenance (Colby et al., 1996; Kawaguchi et al., 1997).
If disk space is a major issue, Ronström’s method may still be preferred
for vertical merge and split schema transformations, however. The reason
for this is that Ronström performs vertical merge transformations by adding
attributes to an existing table, and vertical split by adding only one of the
new tables. In contrast, our method makes full copies of the source tables in
both cases.
Chapter 10
Discussion
This chapter contains a discussion of the work presented in this thesis. We
start by discussing our research contributions with respect to how they meet
the requirements stated in Chapter 1, and how they compare to related work.
We then briefly summarize the research question, and discuss to which extent
it has been answered.
10.1
Contributions
The work presented in this thesis is based on the argument that database
operations which are blocking or incur high degrees of performance degradation are not suited for database systems with high availability requirements.
With this in mind, we decided to focus on a solution for two operations,
database schema transformation and materialized view creation. The current solutions for both these operations degrade performance of concurrent
transactions significantly.
In this section, the contributions of our DT creation methods are discussed. To summarize, the main contributions of the thesis are:
• A framework based on existing DBMS technology, that can be used to
create DTs without blocking effects and with little performance degradation to concurrent transactions.
• Methods to create DTs using six relational operators.
• Strategies for how to use the DTs for Materialized View creation and
schema transformation purposes.
• Solutions to common DT creation problems, which significantly simplifies the design of other DT creation methods.
CHAPTER 10. DISCUSSION
163
• Empirical validation of all presented DT creation methods.
• Thorough performance experiments on all DT creation methods.
In particular, we consider how the solution meets the requirements stated
in Chapter 1, and how it compares to related work.
10.1.1
A General DT Creation Framework
The General DT Creation Framework presented in Chapter 4 is an abstract
framework. It is based on the idea of running DT creation as a non-blocking,
low priority background process to incur minimal performance degradation
for concurrent transactions.
Although we have focused on centralized DBMSs in this thesis, we are confident that the framework can be used to create DTs in distributed database
systems as well. In particular, the framework should easily integrate into
distributed DBMSs where recoverability is achieved by sending logical log
records to other nodes. In ClustRa DBMS, e.g., an ARIES like recovery
strategy is enforced by shipping logical log records between nodes (Hvasshovd
et al., 1995). Furthermore, ClustRa uses logical record identifiers and logical record state identifiers (Hvasshovd et al., 1995; Bratsberg et al., 1997b).
Hence, this solution concurs to all the technological requirements that the
open source DBMSs evaluated in Chapter 7 did not concur to. This application of the framework is purely theoretical, however.
As described in Chapter 6, the framework can be used when creating DTs
using six relational operators1 . However, the framework is expressive enough
to be useful for DT creation using other relational operators as well. Jonasson
uses the framework for DT creation involving aggregates (Jonasson, 2006).
An auxiliary table is used to compute the aggregate values. The solution
has not been implemented, however, and the performance implications are
therefore uncertain.
10.1.2
DT Creation for Many Relational Operators
As discussed in Chapter 1, Materialized Views and schema transformations
are defined by a query, and are therefore created using relational operators.
Relational operators can be categorized in two groups: non-aggregate and
aggregate operators. The non-aggregate operators are join, projection, union,
1
In this thesis, the full outer join, projection, union (both duplicate inclusion and
removal), selection, difference and intersection relational operators have been used for DT
creation
164
10.1. CONTRIBUTIONS
selection, difference and intersection. Aggregate operators are mathematical
functions that apply to collections of records (Elmasri and Navathe, 2004).
In Chapter 1, we decided to use non-aggregate operators as a basis for DT
creation, allowing us to use optimized algorithms already available in current
DBMSs (Houston and Newton, 2002; IBM Information Center, 2006; Lorentz
and Gregoire, 2002; Microsoft TechNet, 2006). We chose to focus on nonaggregate operators because these are useful for both schema transformations
and materialized views. An alternative would be to focus on aggregate operators. These are frequently used in materialized views, but are not often
used in schema transformations. Furthermore, materialized views defined
over aggregate functions often include non-aggregate operators as well (Alur
et al., 2002).
As described in Chapter 6, our DT creation method can be used to create
DTs using the full outer join, projection, union, selection, difference and
intersection relational operators. This means that the method can be used
to create a broad range of DTs. Only the first four operators can be used in
Ronström’s method (Ronström, 2000).
10.1.3
Support for both Schema Transformations and
Materialized Views
In Chapter 1, we realized that both schema transformations and materialized
view (MV) creation were blocking operations that could be seen as applications of derived tables. Hence, we decided to focus both on how DTs should
be created to be usable in highly available database systems, and how these
DTs could be used for the two operations.
In this thesis, we have shown how the DT creation method can be used for
both operations. The solution for MV creation proved to be a straightforward
application of DTs; they can be used for this purpose without modification.
The solution for schema transformations is more complex, especially for
operators where a record in one schema version may contribute to or be
derived from multiple records in the other schema version2 . As argued in
Chapter 6, this may cause high degrees of locking conflicts between concurrent transactions in the different schema versions. If this becomes a big
problem, we know of no other alternative than to use the non-blocking abort
synchronization strategy, thus resolving the problem by aborting transactions
in the old schema version.
In contrast to our method and the “insert into select” method, the method
2
This applies to all DT creation methods except horizontal merge with duplicate inclusion, horizontal split and difference and intersection.
CHAPTER 10. DISCUSSION
165
suggested by Ronström can only be used for schema transformations (Ronström, 2000).
10.1.4
Solutions to Common DT Creation Problems
In Chapter 5, we presented five problems that are frequently encountered
in the DT creation methods. To summarize, these problems were: Missing
Record and State Identification, Missing Record Pre-States, Lock Forwarding
During Transformations and Inconsistent Source Records. We also described
how these problems could be solved in general.
The solutions to these five problems are contributions by themselves, as
they significantly eases the design of new DT creation methods. For example,
a method for DT creation using aggregate operators described by Jonasson
uses the suggested solutions for the missing record and state identification
problems and the missing pre-state problem (Jonasson, 2006).
10.1.5
Implemented and Empirically Validated
The DT creation methods for the six relational operators have been implemented in a prototype DBMS. In Chapter 9, we discussed thorough empirical
validation experiments that were executed on all the methods in the prototype. All the experiment results showed correct execution, thus strongly
indicating that the methods are correct. No matter how strong the indications are, empirical validation experiments can never be used as a proof of
correctness with absolute certainty (Tichy, 1998). Even so, empirical validation is considered vital in the software engineering community (Tichy, 1998;
Pfleeger, 1999; Zelkowitz and Wallace, 1998). If confirmation of the results is
required, empirical validation should be executed on another implementation
of the same methods (Walker et al., 2003). Due to time considerations, this
has not been done.
10.1.6
Low Degree of Performance Degradation
The rationale for the research question was to develop DT creation methods
that can be used in highly available database systems. Hence, a crucial goal
was to incur as little performance degradation to concurrent transactions as
possible.
The DT creation methods have been implemented in a prototype DBMS,
and the performance implications were thoroughly discussed in Chapter 9.
The experiments showed that the degree of performance degradation depends
heavily on four parameters: the workload, the transaction mix, the priority of
166
10.1. CONTRIBUTIONS
the DT creation thread compared to other threads and the relational operator
used to create the DTs.
Workload
The incurred degradation is highly affected by the workload on the database
server. The performance experiments with 2,000 test runs on the six DT
creation methods showed that the response time increases rapidly as the
workload increases. This comes as no surprise since the delay over a bottleneck resource is given by (Highleyman, 1989):
Delay =
T
1−L
(10.1)
Here, T is the average service time over the bottleneck resource, and L is the
load. Hence, as the workload approaches 100%, the response time increases
towards infinity. Also, since 10 ms was defined as the highest acceptable
response time for transactional requests, we observe throughput thrashing
when the response time gets too high. Based on these observations, we
strongly suggest that DT creation is performed when the workload is moderate or lower. In the experiments, the rapid increase in response time started
at approximately 75% workload. Different systems may observe this rapid
increase in response time at other workloads, depending on which resource
is the bottleneck.
Priority on the DT creation thread
The priority of the DT creation thread affects both the performance degradation and the execution time to a great extent. As was clear from Section
9.3.3, a high priority results in quick execution with high degradation. On
the other hand, a low priority results in low degradation over a longer time
interval. We consider it the responsibility of the DBA to determine which
priority setting to use.
Transaction Mix
The transaction mix3 of the workload on the system plays a significant role
for the amount of performance degradation. The reason for this is that only
write operations to records in the source tables must be propagated to the
DTs. Hence, a transaction mix that is read intensive4 produces less relevant
3
4
The type of work performed by the transactions running on the server.
Or write intensive on records in non-source tables.
CHAPTER 10. DISCUSSION
167
log records5 than a transaction mix that is write intensive on source table
records.
For equal workloads, the log propagator has less work to do per time
unit if few relevant log records are produced than if many are produced.
Hence, DT creation during the former transaction mix can either be processed
quicker or with lower performance degradation than DT creation during the
latter transaction mix.
Relational Operator
The final identified parameter that affects performance to a great extent is
the relational operator used to create the DTs. The reason for the variations is that different amounts of work must be performed by the different
operators when a logged operation is applied to the DTs. In Section 9.3,
we showed that DT creation using difference and intersection (diff/int) incurs most degradation, while horizontal split incurs the least. Hence, under
equal workloads, horizontal split can either be performed quicker or with less
performance degradation than diff/int.
A Comparison with Related Work
Basing the framework on a non-blocking, low priority background process
is significantly different from the two alternative strategies for materialized
view creation and schema transformations. In the schema transformation
method presented by Ronström, a background process is used for copying
records from the old to the new schema versions (Ronström, 2000). Triggers
executed within normal transactions are then used to keep the copied records
up to date with the original records. As discussed in Section 3.1.3, these
triggers impose much degradation. A similar trigger strategy has previously
been suggested for maintenance of MVs6 (Gupta et al., 1993; Griffin et al.,
1997), but is explicitly discouraged due to the high performance cost (Colby
et al., 1996; Kawaguchi et al., 1997).
Although performance experiments have not been published on Ronström’s schema transformation method, we expect our DT creation method
to incur less degradation. The reason for this is that our framework forwards
modifications to the DTs using a low priority background process, as opposed
to Ronström’s method of using triggers executed within each user transaction (Ronström, 2000). On the other hand, Ronström’s method is likely to
complete in shorter time than our method.
5
6
Log records that must be propagated to the DTs.
In MV maintenance, this is called Immediate Update.
168
10.1. CONTRIBUTIONS
An even more drastic solution is currently used for DT creation in existing
DBMSs (Løland, 2003). In the insert into select method, the source tables
are locked and read before the content is inserted into the DTs. This method
is simple, but all write operations on records in the source tables are blocked
during the process.
The insert into select method and Ronström’s method is better than
our DT creation method only in cases where fast completion time is much
more important than low performance degradation. With our DT creation
method, however, longer execution time can be traded for lower performance
degradation. This can be done to a small or large extent to fit in different
scenarios. Hence, in all cases where performance degradation has a high
priority, our method outperforms both.
10.1.7
Based on Existing DBMS Functionality
Already in the initial requirement specification, it was clear that the solution
should be based on functionality found in existing DBMSs whenever possible.
By using existing functionality, the method should be easy to integrate into
existing systems. Hence, literature on DBMS internals and related works
have been studied carefully. The most relevant parts of this study were
presented in Chapters 2 and 3.
Our DT creation method uses standard DBMS functionality to a great
extent. For example, the widely accepted ARIES (Mohan et al., 1992) protocol is used for recovery, Log Sequence Numbers (Elmasri and Navathe,
2004) are used to achieve idempotence and algorithms for relational operators (Garcia-Molina et al., 2002) available in all modern relational DBMSs
are used for initial population of the DTs.
On the other hand, our solution also requires functionality that is thoroughly discussed in the literature but is not common in existing DBMSs.
Most importantly, this includes logical redo logging, record state identifiers
(Hvasshovd, 1999) and logical record identification (Gray and Reuter, 1993).
These principles were described in Chapter 2, and are required because the
records are physically reorganized since relational operators are applied.
As was evident from Chapter 7, the use of nonstandard functionality
makes the integration into existing DBMSs more complex than if this was
not the case. In Section 2.4, it was argued that the logical record identifiers
can be replaced by physical identifiers if a mapping between the source and
derived addresses is maintained. We have not found any solution to remove
the logical redo log and record state identification requirements.
Hence, with few exceptions, the method is based on functionality common
in current DBMSs. A description of existing functionality can be found in
CHAPTER 10. DISCUSSION
169
Chapter 2.
10.1.8
Other Considerations - Total Amount of Data
The DT creation framework copies records from source tables to derived
tables. Thus, storage space is required for two copies of all records in the
source tables during DT creation.
When the DTs are used as materialized views, the additional data will
persist after DT creation has completed, and is therefore not considered a
waste of storage space. When used for schema transformations, on the other
hand, the source tables are removed once transformation is complete. Hence,
this additional storage space required may be considered wasted.
Since the source tables may contain huge amounts of data, the added
storage usage in schema transformations may be problematic. This is also
a problem in the two alternative solutions for schema transformations: the
insert into select (Løland, 2003; MySQL AB, 2006) method and Ronström’s
schema transformations (Ronström, 2000). In the former method, the source
tables are locked while the records are read, transformed by applying the
relational operator and inserted into the new tables. Thus, this method
requires the same amount of storage space as our method.
As thoroughly described in Section 3.1, Ronström’s method requires less
storage space during vertical merge and split transformations. The reason is
that these transformations work “in-place”, i.e., attributes are added to or
removed from already existing records. Horizontal merge and split requires
the same amount of storage space as our method, on the other hand.
We have no solution to this problem other than increasing the storage
capacity of the database server if required. This was also suggested by Ronström to solve the same problem (Ronström, 1998).
10.2
Answering the Research Question
In this section, we discuss how and to what extent our research has answered
the research question:
How can we create derived tables and use these for schema transformation and materialized view creation purposes while incurring minimal performance degradation to transactions operating
concurrently on the involved source tables.
In Chapter 1, we decided on refining the research question into four key
challenges. In the following sections, the results of the research is discussed
170
10.2. ANSWERING THE RESEARCH QUESTION
with respect to these challenges and the research question.
Q1: Current Situation
What is the current status of related research designed to address
the main research question or part of it?
The current status of related research was presented in a survey in Chapter
3. From this review, we have identified the main limitations of existing solutions. The limitations are mainly associated with unacceptable performance
degradation.
Q2: System Requirements
What DBMS functionality is required for non-blocking DT creation to work?
Our DT Creation method is inspired by Fuzzy Copying (Hagmann, 1986;
Gray and Reuter, 1993; Bratsberg et al., 1997a), and is based on making an
inconsistent copy of the involved tables. The copies are then made consistent by applying logged operations. The requirements of this strategy are
described in Chapters 2 and 4. Most of these are related to the reorganized
structure of records after applying relational operators.
Q3: Approach and Solutions
How can derived tables be created with minimal performance degradation, and be used for schema transformation and MV creation
purposes?
• How can we create derived tables using the chosen six relational operators.
• What is required for the DTs to be used a) as materialized
views? b) for schema transformations?
• To what extent can the solution be based on standard DBMS
functionality and thereby be easily integradable in existing
DBMSs?
Our solution to creating derived tables was presented in Chapters 4 and
6. The method enables DT creation using the six relational operators, and
the DTs can be used for both MVs and schema transformations. Thus, we
have answered the two former parts of the question.
CHAPTER 10. DISCUSSION
171
The method is based on standard, existing functionality whenever we
have found it possible to do so. However, we also require some functionality
that is not commonly used in current DBMSs. The implications of this were
discussed in detail in Section 10.1.7.
Q4: Performance
Is the performance of the solution satisfactory?
• How much does the proposed solution degrade performance
for user transactions operating concurrently?
• With the inevitable performance degradation in mind; under
which circumstances is the proposed solution better than a)
other solutions? b) performing the schema transformation
or MV creation in the traditional, blocking way?
The performance implications of executing the DT creation method was
thoroughly discussed in Chapter 9. We found that the method incurs little
performance degradation when the workload is not too high. However, there
are circumstances when DT creation incurs high performance degradation.
Hence, the database administrator has to consider three parameters before
starting the operation: the workload on the database server, the DT creation
priority and the operator used for DT creation. This was discussed in detail
in Sections 9.3 and 10.1.6.
10.2.1
Summary
We have answered the main research question by developing the Non-blocking
DT Creation method. To summarize, the work included: deciding on a research approach based on the design paradigm (Denning et al., 1989), a
thorough study of related work and usable functionality in existing DBMSs,
development of a general DT creation framework, identification of and solutions to common DT creation problems, specialized methods for the six
relational operators and a prototype design and implementation used in experiments.
The main research question and all the refined research questions have
been answered.
Chapter 11
Conclusion and Future Work
This chapter summarizes the main contributions and suggests several directions for future research. Finally, publications resulting from the research
are briefly described.
11.1
Research Contributions
The major research contributions of this thesis are:
An easily extendable framework for derived table creation. A framework that can be used in the general case to create derived tables (DTs) is
presented. It is designed to degrade performance of concurrent transactions
as little as possible. In this thesis, the framework is used by six relational
operators in a centralized database system setting. It is, however, extendable in multiple ways. Examples include adding aggregate operators or to
perform DT creation in a distributed database system setting.
Methods for creating derived tables using six relational operators.
By using the general DT creation framework, we present non-blocking DT
creation solutions for six relational operators: vertical merge and split (full
outer join and its inverse), horizontal merge and split (union and its inverse),
difference and intersection. Together, these methods represent a powerful
basis for DT creation.
Means to use the derived tables for schema transformation and materialized view creation purposes. Schema transformations and materialized view (MV) creation are two database operations that must be performed in a blocking way in current DBMSs. By using the DT creation frame-
CHAPTER 11. CONCLUSION AND FUTURE WORK
173
work to perform these operations, we take advantage of the non-blocking and
low performance degradation capabilities.
Design and implementation of a prototype capable of non-blocking
derived table creation. The DT creation methods for each of the six
relational operators have been implemented in a DBMS prototype. Extensive
experiments on this prototype have been used to empirically validate the
methods.
Thorough performance experiments for DT creation using all six
relational operators. Thorough performance experiments have been performed on the six DT creation methods in the prototype. The experiments
show that the performance degradation for concurrent transactions can be
made very low. They also indicate under which circumstances DT creation
should be avoided. This is primarily during high workload.
11.2
Future Work
The following topics are identified as possible directions for further research.
DT Creation in Distributed Database Systems
The primary focus in this thesis has been DT creation in centralized database
systems. However, we believe that the general framework can be used for
DT creation in distributed database systems as well. Especially, distributed
systems based on log shipping, i.e., systems where the transaction log is sent
to another node instead of written to disk, seem to be a good starting place
for this research.
DT Creation using Aggregate Operators
Some work has already been conducted on DT creations that involve aggregate operators (Jonasson, 2006), but the research is far from complete. It
would be very interesting to implement these methods into the Non-blocking
Database. Experiments should then be executed to empirically validate the
methods and to indicate how much performance degradation they incur.
Implementation in a Modern DBMS
Because the DT creation methods require functionality that was not found in
any of the DBMSs investigated in Chapter 7, we chose to perform experiments
174
11.2. FUTURE WORK
on a prototype DBMS. Ideally, DT creation should not require any nonstandard functionality. For example, we believe that block state identifiers
can be allowed in the source tables as long as the derived tables have record
state identifiers.
Implementing the DT creation functionality in an existing DBMS could
provide additional insight. For example, the simplifications made to the prototype would be removed, and the methods could be subject to full-scale
benchmarks to get better indications on performance implications. Furthermore, a second implementation could be subject to more empirical validation
experiments, and thereby to confirm the correctness of the methods (Walker
et al., 2003).
Hence, further research to find whether more standard DBMS functionality could be used, and an implementation in an existing DBMS would both
be considered interesting to the author.
Synchronizing Schema Transformations with Application Modifications
When schema transformations have been discussed in this thesis, we have
been concerned with incurring low performance degradation and performing
the switch between schemas as fast as possible so that latches are only held
for a millisecond or two.
The transactions executed in the database system are, however, typically
requested by applications. When the schema is modified, the applications
should also be modified to reflect the new schema. It would be interesting to
investigate how changes to an application can be synchronized with schema
transformations.
As an initial approach, we would start by creating views reflecting the
old schema when a schema transformation is committed. By doing so, the
application can be changed after the schema transformation has completed.
Of course, this strategy requires that all old data is intact in the new schema.
Combining Multiple Operators
In this thesis, we have designed methods to create derived tables using any of
six relational operators. What we have not considered, however, is that materialized views and schema transformations may in some cases require multiple
operators to get the wanted result. A Materialized View in a Datawarehouse
may, e.g., be constructed by a join between four tables and an aggregate operator. For schema transformation purposes, the effect of multiple operators
can often be achieved by performing the required transformations in serial;
CHAPTER 11. CONCLUSION AND FUTURE WORK
175
Student
Student
StudID
Name
CourseID
CourseName
Grade
StudID
Name
StudentCourse
StudID
CourseID
Grade
StudentCourse
StudID
CourseID
CourseName
Grade
Course
CourseID
CourseName
Figure 11.1: Example of Schema Transformation performed in two steps.
Figure 11.2: Example interface for dynamic priorities for DT creation.
an example is illustrated in Figure 11.1. This serial execution can not be
used for DT creation in general. Hence, it would be interesting to research if
and how the DT creation method can be used when multiple operators are
involved.
Dynamic Priorities for the DT Creation Process
The performance experiments showed that the priority setting of the DT creation process can be used to make the operation complete fast but with high
performance degradation, or complete in longer time with less degradation.
In the current implementation, the priority is set once and for all when DT
creation is started. However, we see no reason why this priority should not
be dynamic. Figure 11.2 shows an example of how a graphical user interface
for dynamic priority could look like.
11.3
Publications
Some of the research presented in this thesis has already been presented at
several conferences. The papers, presented in chronological order, are:
176
11.3. PUBLICATIONS
1. Jørgen Løland and Svein-Olaf Hvasshovd (2006) Online, non-blocking
relational schema changes. In Advances in Database Technology –
EDBT 2006, volume 3896 of Lecture Notes in Computer Science, pages
405–422. Springer-Verlag.
This paper describes the first strategy used to perform schema transformations with the vertical merge and split operators. Records in the
transformed schema are identified using their primary keys, whereas
the method in this thesis identifies records on non-physical Record ID.
Compared to using non-physical Record IDs for identification, the primary key solution of this paper requires more complex transformation mechanisms. On the other hand, the method is not restricted to
DBMSs that identify records in one particular way.
2. Jørgen Løland, and Svein-Olaf Hvasshovd (2006) Non-blocking materialized view creation and transformation of schemas. In
Advances in Databases and Information Systems - Proceedings of ADBIS 2006, volume 4152 of Lecture Notes in Computer Science, pages
96–107. Springer-Verlag.
In this paper, the general framework for derived table creation used
in this thesis is presented. DT creation methods for all six relational
operators in this thesis are described, and the idea of using DTs for
either schema transformations or materialized views is introduced.
As opposed to the schema transformations described in “Online, nonblocking relational schema changes”, the methods in this paper uses
Record IDs for identification.
3. Jørgen Løland, and Svein-Olaf Hvasshovd (2006) Non-blocking Creation of Derived Tables. In Proceedings of Norsk Informatikkonferanse 2006. Tapir Forlag.
The generalized DT creation problems formalized in Chapter 5 were
first described in this paper. By using these generalized problems, the
DT creation methods can be described in a more structured way.
Part IV
Appendix
Appendix A
Non-blocking Database: SQL
Syntax
The SQL recognized by the Non-blocking Database prototype is by no means
the complete SQL language. A small subset of the SQL standard has been
selected for implementation with the goal of providing enough flexibility in
testing while being feasible to implement.
In the language definitions below, <. . . > means that it is a variable name.
[. . . ] means optional. {. . . } is used for ordering. The following statements
are recognized:
create table <tablename>(<colname> <type> [<constraint>], ...);
drop table <tablename>;
delete from <tablename> where <pk_col>=<value>;
insert into <tablename>(<col1>, <col2>,...,<colX>)
values(<value1>, <value2>,...,<valueX>);
update <tablename> set <col1>=<value1>, <col2>=<value2>...
where <pk_col>=<value_pk>;
select {<col1>, <col2>...|*}
from <tablename>
[where <col>=<value>]
[order by <col>];
select {<col1>, <col2>...|*}
180
from (<table1> join <table2>
on <ja_col1>=<ja_col2>)
[where <colX>=<value>]
[order by <colY>];
select {<col1>, <col2>...|*}
from <table1>
union
select {<col1>, <col2>...|*}
from <table2>
select {<col1>, <col2>...|*}
from <table1>
{difference|intersection}
select {<col1>, <col2>...|*}
from <table2>
<colX> = the name of attribute X
<valueX> = the value of attribute X
<constraint> = primary key
<ja_colX>= the join attribute column name of table X
<type> = Integer|String|Boolean|autoincrement
<pk_col> = column name of primary key attribute
Appendix B
Performance Graphs
1.8
Response Time Average
1.2
1.4
●
●
1.0
Response Time (ms)
1.6
●
Unloaded Scenario 1
Unloaded Scenario 2
Log Prop Scenario 1
Log Prop Scenario 2
●
0.6
0.8
●
●
●
50
60
70
80
90
100
% Workload
(a) Response times for varying workloads, scenario 1 and 2.
400
Throughput, Unloaded and During Log Propagation
Unloaded Scenario 1
Unloaded Scenario 2
Log Prop Scenario 1
Log Prop Scenario 2
300
●
●
●
●
200
●
●
●
●
100
Throughput
●
●
0
●
●
0
20
40
60
80
100
% Workload
(b) Throughput for varying workloads, scenario 1 and 2.
Figure B.1: Response times and throughput for varying workloads during
horizontal merge DT creation, Transaction Mix 1, Scenario 1 and 2.
3.5
Response Time Average and 90% Quartile
Unloaded
Log Propagation
2.5
2.0
1.5
Response Time (ms)
3.0
●
●
1.0
●
●
●
50
60
70
●
0.5
●
80
90
100
% Workload
(a) Response times for varying workloads, scenario 1. The red lines indicate mean
response time and confidence intervals without DT creation.
Response Time Average
1.2
●
●
Unloaded Scenario 1
Unloaded Scenario 2
Log Prop Scenario 1
Log Prop Scenario 2
1.0
●
0.8
●
●
●
0.6
Response Time (ms)
●
50
60
70
80
90
100
% Workload
(b) Response times for varying workloads, scenario 1 and 2.
Figure B.2: Response times for varying workloads during horizontal split DT
creation.
400
Throughput, Unloaded and During Log Propagation
Unloaded Scenario 1
Unloaded Scenario 2
Log Prop Scenario 1
Log Prop Scenario 2
300
●
●
●
●
200
●
●
●
●
100
Throughput
●
●
0
●
●
0
20
40
60
80
100
% Workload
Figure B.3: Throughput for varying workloads during horizontal split DT
creation.
6
Response Time Average and 90% Quartile
Unloaded
During Log Propagation
4
3
2
Response Time (ms)
5
●
●
1
●
●
●
●
50
60
70
●
80
90
100
% Workload
(a) Response time mean and 90% quartiles for varying workloads, scenario 1.
Response Time Average
1.6
●
1.2
1.0
●
●
●
●
●
0.6
0.8
Response Time (ms)
1.4
●
Unloaded Scenario 1
Unloaded Scenario 2
Log Prop Scenario 1
Log Prop Scenario 2
50
60
70
80
90
100
% Workload
(b) Response times for varying workloads. Transaction Mix 1, scenario 1 and 2.
Figure B.4: Response times for varying workloads during vertical merge DT
creation.
●
1.2
1.4
●
Unloaded Scenario 1 1x
Unloaded Scenario 1 5x
Unloaded Scenario 1 15x
Log Prop Scenario 1 1x
Log Prop Scenario 1 5x
Log Prop Scenario 1 15x
1.0
●
●
●
●
●
0.6
0.8
Response Time (ms)
1.6
1.8
Response Time Average
50
60
70
80
90
100
% Workload
Figure B.5: Response times for varying workloads during vertical merge DT
creation for three variations of record numbers in the two source tables. The
1x plots are the ones shown in Figure B.4(a). Transaction Mix 1, Scenario
1 and 2.
400
Throughput, Unloaded and During Log Propagation
Unloaded Scenario 1
Unloaded Scenario 2
Log Prop Scenario 1
Log Prop Scenario 2
300
●
●
●
●
200
●
●
●
●
100
Throughput
●
●
0
●
●
0
20
40
60
80
100
% Workload
Figure B.6: Throughput for varying workloads during vertical merge DT creation. Transaction Mix 1, Scenario 1 and 2.
6
Response Time Average and 90% Quartile
Unloaded
During Log Propagation
4
3
2
Response Time (ms)
5
●
●
1
●
●
●
●
●
50
60
70
80
90
100
% Workload
(a) Response time mean and 90% quartiles for varying workloads. Transaction Mix 1,
Scenario 1.
Response Time Average
1.8
●
1.4
1.2
●
1.0
Response Time (ms)
1.6
●
Unloaded Scenario 1
Unloaded Scenario 2
Log Prop Scenario 1
Log Prop Scenario 2
●
0.6
0.8
●
●
●
50
60
70
80
90
100
% Workload
(b) Response times for varying workloads. Transaction Mix 1, Scenario 1 and 2.
Figure B.7: Response times for varying workloads during vertical split DT
creation.
400
Throughput, Unloaded and During Log Propagation
Unloaded Scenario 1
Unloaded Scenario 2
Log Prop Scenario 1
Log Prop Scenario 2
300
●
●
●
200
●
●
●
●
●
100
Throughput
●
●
0
●
●
0
20
40
60
80
100
% Workload
Figure B.8: Throughput for varying workloads during vertical split DT creation. Transaction Mix 1, Scenario 1 and 2.
Glossary
1MLF Abbreviation for One-to-Many Lock Forwarding. Technique used
during non-blocking commit and abort synchronization of schema transformations.
2PC Common abbreviation for two-phase commit. Commonly used by
schedulers in distributed database systems to ensure that transactions
either commit or abort on all nodes.
2PL Common abbreviation for two-phase locking. Transactions are not
allowed to acquire new locks once they have released a lock.
Availability A database system is available when it can be fully accessed
by all users that are supposed to have access.
Consistency Checker A background thread used to find inconsistencies
between records during vertical split DT creation.
Database A collection of related data.
Database Management System The program used to manage a database.
Database Schema The description, or model, of a database.
Database Snapshot The first type of materialized view. In contrast to
MVs, snapshots can not be continuously refreshed.
Database System A database managed by a DBMS.
DBMS Common abbreviation for Database Management System.
Derived Table A table containing data gathered from one or more other
tables.
DT
Abbreviation for Derived Table.
Fine-granularity locking Locks that are set on small data items, i.e. records.
192
Fuzzy Copy A technique used to make a copy of a table without blocking
concurrent operations, including updates, to the same table. Can be
based on copying records or blocks of records.
Fuzzy Mark A special log record used as a place-keeper by DT creation.
High availability Defines systems that are not allowed to be unavailable
for more than a few minutes each year on average.
Horizontal Merge Derived Table creation operator, corresponding to the
union relational operator.
Horizontal Split Derived Table creation operator, corresponding to the
selection relational operator.
Idempotence Idempotent operations can be redone any number of times
and still yield the same result.
Initial Population Step Second step of the DT creation framework. The
derived tables are populated with records read from the source tables
without using locks.
Latch A lock held for a very short time. For example used to ensure that
only one thread writes to a disk block at a time. Also called semaphore.
Log Propagation Step Third step of the DT creation framework. Log
records describing operations on source table records are applied to
the records in the derived tables.
Log Sequence Number See State Identifier.
Logical Log A transaction log containing the operations performed on the
data objects.
LSN Common abbreviation for Log Sequence Number; See State Identifier.
M1LF Abbreviation for Many-to-One Lock Forwarding. Technique used
during non-blocking commit and abort synchronization of schema transformations.
Materialized View A view where the result of the view query is physically
stored.
MMLF Abbreviation for Many-to-Many Lock Forwarding. Technique used
during non-blocking commit and abort synchronization of schema transformations.
APPENDIX B. PERFORMANCE GRAPHS
193
MV Common abbreviation for Materialized View.
NbDBMS Abbreviation for Non-blocking DBMS. The name of the prototype DBMS used for testing in this thesis.
Performance Degradation The degree of reduced performance, measured
in throughput or response time.
Physical Log A transaction log containing before and after values of the
changed objects.
Physiological Log A compromise between physical and logical logging.
Uses logical logging to describe operations on the physical objects;
blocks.
Preparation Step First step of the DT creation framework. Necessary
tables, indices etc are added to the database schema.
Record Identification Policy The strategy a DBMS uses to uniquely identify records. There are four alternative strategies: Relative Byte Address, Tuple Identifier, Database Key and Primary Key. The DT
creation framework presented in this thesis requires that either of the
two latter strategies is used.
Record Identifier A unique identifier assigned to all records in a database.
RID Common abbreviation for Record Identifier.
Schema Transformation A change to the database schema that happens
after the schema has been put into use.
Self-maintainable Highly desirable property for materialized views; used
on MVs that can be maintained without querying the source tables.
Throughout this thesis: also used on DT creations that can be performed without querying the source tables.
Semantically rich locks Lock types that allow multiple transactions to
lock the same data item. Requires that the operations are commutative, i.e. can be performed in any order. An example is “add $1000
to account X”, which commutes with “subtract $10 from account X”.
SLF Abbreviation for Simple Log Forwarding. Technique used during nonblocking commit and abort synchronization of schema transformations.
194
Source Table The tables used to derive records from in the derived table
creation framework.
State Identifier A value assigned to records or blocks (containing records)
which identifies the latest operation that was applied to it. Used to
achieve idempotence when logical logging is used.
Synchronization Step Fourth step of the DT creation framework. The
derived tables are made consistent with the source table records, and
are either turned into materialized views or used to perform a schema
transformation.
Transaction Log A file (normally) in which database operations are written. Used by the DBMS to recover a database after a failure.
Vertical Merge Derived Table creation operator, corresponding to the left
outer join relational operator in Ronström’s schema transformation
method, and the full outer join operator in the DT creation method
presented in this thesis.
Vertical Split Derived Table creation operator, corresponding to the projection relational operator.
Bibliography
Aas,
J. (2005).
Understanding the linux 2.6.8.1 cpu scheduler.
http://josh.trancesoftware.com/linux/linux cpu scheduler.pdf.
Adiba, M. E. and Lindsay, B. G. (1980). Database snapshots. In Proceedings
of the Sixth International Conference on Very Large Data Bases, 1980,
Canada, pages 86–91. IEEE Computer Society.
Agarwal, S., Keller, A. M., Wiederhold, G., and Saraswat, K. (1995). Flexible relation: An approach for integrating data from multiple, possibly
inconsistent databases. In ICDE ’95: Proceedings of the Eleventh International Conference on Data Engineering, pages 495–504, Washington,
DC, USA. IEEE Computer Society.
Agrawal, R., Carey, M. J., and Livny, M. (1987). Concurrency control performance modeling: alternatives and implications. ACM Trans. Database
Syst., 12(4):609–654.
Alur, N., Haas, P., Momiroska, D., Read, P., Summers, N., Totanes, V., and
Zuzarte, C. (2002). DB2 UDB’s High Function Business Intelligence in
e-business. IBM Corp., 1st edition.
Apache
Derby
(2007a).
http://db.apache.org/derby/.
Apache
derby
homepage.
Apache Derby (2007b). Derby engine papers: Derby logging and recovery.
http://db.apache.org/derby/papers/recovery.html.
Apache Derby (2007c). Derby engine papers: Derby write ahead log format.
http://db.apache.org/derby/papers/logformats.html.
Austin,
C.
(2000).
Sun
developer
network:
Java
technology
on
the
linux
platform.
http://java.sun.com/developer/technicalArticles/Programming/linux/.
196
BIBLIOGRAPHY
Ballinger, C. (1993). TPC-D: benchmarking for decision support. In Gray,
J., editor, The Benchmark Handbook for Database and Transaction Systems. Morgan Kaufmann, 2nd edition.
Bernstein, P. A., Hadzilacos, V., and Goodman, N. (1987). Concurrency
Control and Recovery in Database Systems. Addison-Weslay Publishing
Company, 1st edition.
Blakeley, J. A., Coburn, N., and Larson, P.-A. (1989). Updating derived
relations: detecting irrelevant and autonomously computable updates.
ACM Transactions on Database Systems, 14(3):369–400.
Blakeley, J. A., Larson, P.-A., and Tompa, F. W. (1986). Efficiently updating materialized views. In Proceedings of the 1986 ACM SIGMOD
international Conference on Management of Data, pages 61–71.
Bratsberg, S. E., Hvasshovd, S.-O., and Torbjørnsen, Ø. (1997a). Location
and replication independent recovery in a highly available database. In
15th British Conference on Databases. Springer-Verlag LNCS.
Bratsberg, S. E., Hvasshovd, S.-O., and Torbjørnsen, Ø. (1997b). Parallel
solutions in ClustRa. IEEE Data Eng. Bull., 20(2):13–20.
Caroprese, L. and Zumpano, E. (2006). A framework for merging, repairing
and querying inconsistent databases. In Advances in Databases and Information Systems - Proceedings of ADBIS 2006, volume 4152 of Lecture
Notes in Computer Science, pages 383–398. Springer-Verlag.
Cha, S. K., Park, B. D., Lee, S. J., Song, S. H., Park, J. H., Lee, J. S.,
Park, S. Y., Hur, D. Y., and Kim, G. B. (1995). Object-oriented design of main-memory dbms for real-time applications. In Proceedings of
the 2nd International Workshop on Real-Time Computing Systems and
Applications, page 109, Washington, DC, USA. IEEE Computer Society.
Cha, S. K., Park, J. H., and Park, B. D. (1997). Xmas: an extensible
main-memory storage system. In Proceedings of the sixth international
conference on Information and knowledge management, pages 356–362,
New York, NY, USA. ACM Press.
Cha, S. K. and Song, C. (2004). P*time: Highly scalable oltp dbms for
managing update-intensive stream workload. In (e)Proceedings of the
30th International Conference on VLDB, pages 1033–1044.
BIBLIOGRAPHY
197
Codd, E. F. (1970). A relational model of data for large shared data banks.
Commununications of the ACM, 13(6):377–387.
Colby, L. S., Griffin, T., Libkin, L., Mumick, I. S., and Trickey, H. (1996). Algorithms for deferred view maintenance. In Proceedings of the 1996 ACM
SIGMOD International Conference on Management of Data, pages 469–
480. ACM Press.
Crus, R. A. (1984). Data Recovery in IBM Database 2. IBM Systems Journal,
23(2):178.
Cyran, M. and Lane, P. (2003). Oracle database online documentation 10g
release 1 (10.1) - ”concepts”, part no. b10743-01.
Denning, P. J., Comer, D. E., Gries, D., Mulder, M. C., Tucker, A., Turner,
A. J., and Young, P. R. (1989). Computing as a discipline. Commununications of the ACM, 32(1):9–23.
Desmo-J (2006). Desmo-j: A framework for discrete-event modelling and
simulation. http://desmoj.de/.
Elmasri, R. and Navathe, S. B. (2000). Fundamentals of Database Systems.
Addison-Weslay Publishing Company, 3rd edition.
Elmasri, R. and Navathe, S. B. (2004). Fundamentals of Database Systems.
Addison-Wesley, 4th edition.
Flesca, S., Greco, S., and Zumpano, E. (2004). Active integrity constraints.
In Proceedings of the 6th ACM SIGPLAN international conference on
Principles and practice of declarative programming, pages 98–107, New
York, NY, USA. ACM Press.
Friedl, J. E. (2006). Mastering Regular Expressions. O’Reilly & Associates,
3rd edition.
Garcia-Molina, H. and Salem, K. (1987). Sagas. In Proceedings of the
1987 ACM SIGMOD International Conference on Management of Data,
pages 249–259. ACM Press.
Garcia-Molina, H. and Salem, K. (1992). Main memory database systems:
An overview. IEEE Transactions on Knowledge and Data Engineering,
4(6):509–516.
Garcia-Molina, H., Ullman, J. D., and Widom, J. (2002). Database Systems:
The Complete Book. Prentice Hall PTR, Upper Saddle River, NJ, USA.
198
BIBLIOGRAPHY
Gray, J. (1978). Notes on data base operating systems. In Operating Systems,
An Advanced Course, pages 393–481, London, UK. Springer-Verlag.
Gray, J. (1981). The transaction concept: Virtues and limitations. In Very
Large Data Bases, 7th International Conference, September 9-11, 1981,
Cannes, France, Proceedings, pages 144–154. IEEE Computer Society.
Gray, J., editor (1993). The Benchmark Handbook for Database and Transaction Systems. Morgan Kaufmann, 2nd edition.
Gray, J. and Reuter, A. (1993). Transaction Processing: Concepts and Techniques. Morgan Kaufmann Publishers, Inc.
Greco, G., Greco, S., and Zumpano, E. (2001a). A logic programming
approach to the integration, repairing and querying of inconsistent
databases. In Proceedings of the 17th International Conference on Logic
Programming, pages 348–364, London, UK. Springer-Verlag.
Greco, G., Greco, S., and Zumpano, E. (2003). A logical framework for
querying and repairing inconsistent databases. IEEE Transactions on
Knowledge and Data Engineering, 15(6):1389–1408.
Greco, S., Pontieri, L., and Zumpano, E. (2001b). Integrating and managing conflicting data. In PSI ’02: Revised Papers from the 4th International Andrei Ershov Memorial Conference on Perspectives of System
Informatics, volume 4152 of Lecture Notes in Computer Science, pages
349–362, London, UK. Springer-Verlag.
Griffin, T. and Libkin, L. (1995). Incremental maintenance of views with
duplicates. In Proceedings of the 1995 ACM SIGMOD international
conference on Management of data, pages 328–339. ACM Press.
Griffin, T., Libkin, L., and Trickey, H. (1997). An improved algorithm for
the incremental recomputation of active relational expressions. TKDE,
9(3):508–511.
Gupta, A., Jagadish, H. V., and Mumick, I. S. (1996). Data integration
using self-maintainable views. In Proceedings of the 5th International
Conference on Extending Database Technology, pages 140–144. SpringerVerlag.
Gupta, A., Katiyar, D., and Mumick, I. S. (1992). Counting solutions to
the view maintenance problem. In Workshop on Deductive Databases,
JICSLP, pages 185–194.
BIBLIOGRAPHY
199
Gupta, A., Mumick, I. S., and Subrahmanian, V. S. (1993). Maintaining
views incrementally. In Proceedings of the 1993 ACM SIGMOD international conference on Management of data, pages 157–166. ACM Press.
Haerder, T. and Reuter, A. (1983). Principles of transaction-oriented
database recovery. ACM Comput. Surv., 15(4):287–317.
Hagmann, R. B. (1986). A crash recovery scheme for a memory-resident
database system. IEEE Trans. Comput., 35(9):839–843.
Highleyman, W. H. (1989). Performance analysis of transaction processing
systems. Prentice-Hall, Inc., Upper Saddle River, NJ, USA.
Houston, Leland, S. and Newton (2002). IBM Informix Guide to SQL: Syntax, version 9.3. IBM.
Hvasshovd, S.-O. (1999). Recovery in Parallel Database Systems. Verlag
Vieweg, 2nd edition.
Hvasshovd, S.-O., Sæter, T., Torbjørnsen, Ø., Moe, P., and Risnes, O. (1991).
A continuously available and highly scalable transaction server: Design
experience from the HypRa project. In Proceedings of the 4th International Workshop on High Performance Transaction Systems.
Hvasshovd, S.-O., Torbjørnsen, Ø., Bratsberg, S. E., and Holager, P. (1995).
The ClustRa telecom database: High availability, high throughput, and
real-time response. In Proceedings of the 21st VLDB Conference.
IBM Information Center (2006).
DB2 Version 9 Information Center,
http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp
(checked december 5. 2006).
IBM Information Center (2007). Ibm db2 universal database glossary, version
8.2, checked february 6. 2007.
Jonasson, Ø. A. (2006). Non-blocking creation and maintenance of materialized views. Master’s thesis, Norwegian University of Science and
Technology.
Kahler, B. and Risnes, O. (1987). Extended logging for database snapshot
refresh. In Proceedings of the 13th International Conference on Very
Large Data Bases.
200
BIBLIOGRAPHY
Kawaguchi, A., Lieuwen, D. F., Mumick, I. S., Quass, D., and Ross, K. A.
(1997). Concurrency control theory for deferred materialized views. In
Proceedings of the 6th International Conference on Database Theory,
ICDT 1997, volume 1186 of Lecture Notes in Computer Science, pages
306–320. Springer-Verlag.
Knuth, D. (1998). The Art of Computer Programming, Volume 3: Sorting
and Searching. Addison-Wesley, 2nd edition.
Korth, H. F. (1983). Locking primitives in a database system. Journal of
the ACM, 30(1):55–79.
Kruckenberg, M. and Pipes, J. (2006). Pro MySQL. Apress.
Lesk, M. E. and Schmidt, E. (1990). Lex - a lexical analyzer generator. In
UNIX Vol. II: research system, pages 375–387, Philadelphia, PA, USA.
W. B. Saunders Company.
Lin, J. (1996). Integration of weighted knowledge bases. Artificial Intelligence, 83(2):363–378.
Lin, J. and Mendelzon, A. O. (1999). Knowledge base merging by majority.
In Pareschi, R. and Fronhoefer, B., editors, Dynamic Worlds: From the
Frame Problem to Knowledge Management. Kluwer Academic Publishers.
Lindsay, B., Haas, L., Mohan, C., Pirahesh, H., and Wilms, P. (1986). A
snapshot differential refresh algorithm. In Proceedings of the 1986 ACM
SIGMOD international conference on Management of data, pages 53–60,
New York, NY, USA. ACM Press.
Lorentz, D. and Gregoire, J. (2002). Oracle9i SQL Reference Release 2 (9.2),
Part no. A96540-01. Oracle.
Lorentz, D. and Gregoire, J. (2003a). Oracle Database SQL Reference 10 g
Release 1 (10.1), Part no. B10759-01. Oracle.
Lorentz, D. and Gregoire, J. (2003b). Oracle Database SQL Reference 10g
Release 1 (10.1). Oracle.
Løland, J. (2003). Schema transformations in commercial databases. Report,
Norwegian University of Science and Technology.
Løland, J. and Hvasshovd, S.-O. (2006a). Non-blocking creation of derived
tables. In Norsk Informatikkonferanse 2006. Tapir Akademisk Forlag.
BIBLIOGRAPHY
201
Løland, J. and Hvasshovd, S.-O. (2006b). Non-blocking materialized view
creation and transformation of schemas. In Advances in Databases and
Information Systems - Proceedings of ADBIS 2006, volume 4152 of Lecture Notes in Computer Science, pages 96–107. Springer-Verlag.
Løland, J. and Hvasshovd, S.-O. (2006c). Online, non-blocking relational
schema changes. In Advances in Database Technology – EDBT 2006,
volume 3896 of Lecture Notes in Computer Science, pages 405–422.
Springer-Verlag.
Marche, S. (1993). Measuring the stability of data. European Journal of
Information Systems, 2(1):37–47.
Microsoft TechNet (2006).
Microsoft technet: SQL Server 2005,
http://msdn2.microsoft.com/en-us/library/ms130214.aspx
(checked
march 22. 2007).
Mohan, C., Haderle, D., Lindsay, B., Pirahesh, H., and Schwarz, P. (1992).
Aries: a transaction recovery method supporting fine- granularity locking and partial rollbacks using write-ahead logging. ACM Transactions
on Database Systems, 17(1):94–162.
MySQL AB (2006). Mysql 5.1 reference manual. http://dev.mysql.com/doc/.
MySQL AB (2007). Mysql homepage. http://dev.mysql.com/.
Oracle Corporation (2006a).
Berkeley DB reference guide, version
4.5.20.
http://www.oracle.com/technology/documentation/berkeleydb/db/index.html.
Oracle Corporation (2006b).
White Paper: A comparison of Oracle Berkeley DB and relational database management systems.
http://www.oracle.com/database/berkeley-db/.
Pfleeger, S. L. (1999). Albert Einstein and Empirical Software Engineering.
Computer, 32(10):32–38.
PostgreSQL
Global
Development
Group
(2001).
Postgresql
online
manuals
postgresql
7.1.
documentation.
http://www.postgresql.org/docs/7.1/static/postgres.html.
PostgreSQL
Global
Development
Group
(2002).
Postgresql
online
manuals
postgresql
7.3.
documentation.
http://www.postgresql.org/docs/7.3/interactive/index.html.
202
BIBLIOGRAPHY
PostgreSQL Global Development Group (2007). Postgresql history.
Qian, X. and Wiederhold, G. (1991). Incremental recomputation of active
relational expressions. Knowledge and Data Engineering, 3(3):337–341.
Quass, D., Gupta, A., Mumick, I. S., and Widom, J. (1996). Making views
self-maintainable for data warehousing. In Proceedings of the Fourth International Conference on Parallel and Distributed Information Systems,
1996, USA, pages 158–169. IEEE Computer Society.
Ronström, M. (1998). Design and Modelling of a Parallel Data Server for
Telecom Applications. PhD thesis, Linkoping University.
Ronström, M. (2000). On-line schema update for a telecom database. In
Proceedings of the 16th International Conference on Data Engineering,
pages 329–338. IEEE Computer Society.
Serlin, O. (1993). The history of debitcredit and the TPC. In Gray, J., editor, The Benchmark Handbook for Database and Transaction Systems.
Morgan Kaufmann, 2nd edition.
Shapiro, L. D. (1986). Join processing in database systems with large main
memories. ACM Trans. Database Syst., 11(3):239–264.
Shirazi, J. (2003). Java Performance Tuning. O’Reilly & Associates.
Sjøberg, D. (1993). Quantifying schema evolution. Information and Software
Technology, 35(1):35–44.
Solid Info. Tech. (2006a).
Solid boostengine data
http://www.solidtech.com/pdfs/SolidBoostEngine DS.pdf.
sheet.
Solid Info. Tech. (2006b). Solid database engine administration guide.
Solid
Info.
Tech.
(2007).
http://www.solidtech.com/.
Solid
database
homepage.
Sun Microsystems (2006a). Sun developer network: Java 2 standard edition
5.0. http://java.sun.com/j2se/1.5.0/.
Sun Microsystems (2006b). Sun developer network: Java se hotspot at a
glance. http://java.sun.com/javase/technologies/hotspot/.
Sun Microsystems (2007). Java 2 platform standard edition 5.0 api specification. http://java.sun.com/j2se/1.5.0/docs/api/.
BIBLIOGRAPHY
203
Tichy, W. F. (1998). Should Computer Scientists Experiment More? IEEE
Computer, 31(5):32–40.
Turbyfill, C., Orji, C. U., and Bitton, D. (1993). AS3 AP - an ANSI SQL
standard scaleable and portable benchmark for relational database systems. In Gray, J., editor, The Benchmark Handbook for Database and
Transaction Systems. Morgan Kaufmann, 2nd edition.
Walker, R. J., Briand, L. C., Notkin, D., Seaman, C. B., and Tichy, W. F.
(2003). Panel: empirical validation: what, why, when, and How. In Proceedings of the 25th International Conference on Software Engineering
(ICSE’03), pages 721–722. IEEE Computer Society Press.
Weikum, G. (1986). A theoretical foundation of multi-level concurrency control. In PODS ’86: Proceedings of the fifth ACM SIGACT-SIGMOD
symposium on Principles of database systems, pages 31–43, New York,
NY, USA. ACM Press.
Zaitsev, P. (2006).
Presentation: Innodb architecture and performance optimization.
Open Source Database Conference 2006,
http://www.opendbcon.net/.
Zelkowitz, M. V. and Wallace, D. R. (1998). Experimental Models for Validating Technology. Computer, 31(5):23–31.