Download PPT - S3Lab

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
An Efficient and Transparent Transaction
Management based on the Data Workflow of
HVEM DataGrid
Im Young Jung
Seoul National University
Introduction
 Transaction Management for a safe data update and insertion on e-Science
DataGrid
 Heterogeneous storages according to the characteristics and the size of data
 Based on workflow, the storing precedence of data across heterogeneous
storages in a transaction
 In this paper
2
 An efficient and transparent transaction management on HVEM DataGrid
 Dividing the transaction into sub-transactions according to the transaction states and
Classifying them
 Transaction hierarchy and parallelism provide
 efficient and safe large data upload to HVEM DataGrid
 transparency in the transaction including simultaneous access to heterogeneous
storages
 Automatic garbage collection
HVEM Grid
 High Voltage Electron Microscope(HVEM)
 Let scientists realize the 3D structure analysis of new materials in micrometer-
scale
 HVEM Grid
3
 Remote users can perform the same tasks as on-site scientists.
 Remote controlling of HVEM
 Storing, retrieval and search data through HVEM DataGrid
 Processing data through HVEM Computational Grid
HVEM
DataGrid
 Designed for Biologic experiments using HVEM
 A logical view of one storage for DB and file storage
 The small metadata is stored at DB
 Information for materials, material handling methods, HVEM experiments, Images,
4
experimenters
 The large files are stored in file storages
 2D or 3D image files, the documents related to HVEM experiments
 Internal process to find files
 After finding their logical path in the file storage by searching the DB, users can
retrieve the files they want in the file storage
HVEM DataGrid
 A unified data management
 The storing precedence
among data
 When store all biological
information for the images, we
should keep the images in HVEM
Grid at the same time
 The relational semantics
between various data stored in
distributed heterogeneous
storages
 To upload many large files to
HVEM DataGrid efficiently
and safely
 Upload dependency &
Serialization
 Ensure the transactions for safe
5
parallel uploads
An efficient and transparent transaction
management
 Requirement for the transactions on HVEM DataGrid
 Consider the semantic of HVEM DataGrid
 A project is composed of several experiments
 The data for an experiment should be inserted according to its data workflow
 The file and its metadata should be stored to HVEM DataGrid simultaneously.
Otherwise, all of them should be deleted
 Support
 the long lifetime transaction according to the timelimit of experiment or project
 the short lifetime transaction which stores the data to HVEM DataGrid physically
 The optimization for the upload of large files to reduce the blocking time
should ensure safe transactions
 An asynchronous and parallel upload scheme should protect upload dependency and
ensure safe transactions
6
An efficient and transparent transaction
management
 Transaction hierarchy
For Project
 The transaction units as checkpoints on
For Experiment
incomplete data insertion
 Confine the rollback extent
For a group
of TnSs
Parallel Processing
For storing data to physical storage

When the data for an experiment or a
project is not inserted to HVEM
DataGrid until each timelimit, the
experiment or the project should be
vanished by the rollback of TnE or TnP
 TnS((((1)2)5)2)
 (1) represents the identity of TnP it
belongs to
 The next index ‘2’ indicates the identity of
TnE and so on
 Support Autonomous garbage collection
 It is dependent on users to insert data or delete it on HVEM DataGrid.
 When they do not insert experimental data any more due to any reason without deleting the
7 related data, HVEM DataGrid would have a big garbage.
Transaction management Scheme
 HVEM DataGrid forks two processes to connect DB and file storage each.
 In the light failure(LF) due to temporary failures on network or server,
 When the connections succeed, it gets the next requests and so on.
retry the transaction fixed times
When
jSiS the
jSiD(the
notification
from
DB), jSiF(the
notification
from
the file storage)
 jSiE (both
of

retries
fail,
a
serious
failure(SF)
is
assumed

rollback
process
8
them arrive) : TnS completes
 The state change of TnS(((())j)i)
Evaluation
 Analysis
 Transparency
 Through transaction hierarchy and fine grained state management
 the transaction manager in HVEM DataGrid enables the transparent transaction to upload the
image files to the file storage and store their metadata to DB simultaneously.
 Serializability
 Many TnSs are upload serializable because their state changes are logged through transaction index.
 To keep the upload dependency,
 the transaction manager protects the first user entering TnW.
o If he withdraws the TnW, then an other user can initiate the TnW
 Transaction performance
 Support the transaction scheme asynchronism and parallelism
 Experiment Setting
 Because the sub-transaction time on DB is negligible compared with that on file storage due to
data size, we only considered the upload time for image file
 Considering the semantic of the data workflow in HVEM DataGrid
 For an asynchronous file transfer, the request intervals for file transfer are chosen randomly
within 50 sec
 The physical locations of the file storages are assumed to be distributed
9
Evaluation
 Overhead
10
 Log management cost
 The cost for TnP, TnE and TnW; The general transaction management requires the log for TnS
 The log size for TnP, TnE and TnW is smaller than that for TnS because they function as
checkpoint rather than real transaction units.
 Rollback cost
 The cascade rollback of TnS in TnW due to the upload dependency on parallel processing of TnS
 At LF, if the retry succeeds, the gain from transaction parallelism can be very large especially for
large file handling
 There are not many SFs or LFs because e-Science DataGrid is not popular as the multimedia
storage
Conclusion
 A transaction management on HVEM Grid
 Safety
 Ensure a safe transaction considering the data workflow in HVEM DataGrid
 Efficiency
 Improve the performance to upload large files by asynchronism and parallelism
 Transparency
 Data management across the heterogeneous storages
 Automatic garbage collection
 Reduce garbage
11