Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
An Efficient and Transparent Transaction Management based on the Data Workflow of HVEM DataGrid Im Young Jung Seoul National University Introduction Transaction Management for a safe data update and insertion on e-Science DataGrid Heterogeneous storages according to the characteristics and the size of data Based on workflow, the storing precedence of data across heterogeneous storages in a transaction In this paper 2 An efficient and transparent transaction management on HVEM DataGrid Dividing the transaction into sub-transactions according to the transaction states and Classifying them Transaction hierarchy and parallelism provide efficient and safe large data upload to HVEM DataGrid transparency in the transaction including simultaneous access to heterogeneous storages Automatic garbage collection HVEM Grid High Voltage Electron Microscope(HVEM) Let scientists realize the 3D structure analysis of new materials in micrometer- scale HVEM Grid 3 Remote users can perform the same tasks as on-site scientists. Remote controlling of HVEM Storing, retrieval and search data through HVEM DataGrid Processing data through HVEM Computational Grid HVEM DataGrid Designed for Biologic experiments using HVEM A logical view of one storage for DB and file storage The small metadata is stored at DB Information for materials, material handling methods, HVEM experiments, Images, 4 experimenters The large files are stored in file storages 2D or 3D image files, the documents related to HVEM experiments Internal process to find files After finding their logical path in the file storage by searching the DB, users can retrieve the files they want in the file storage HVEM DataGrid A unified data management The storing precedence among data When store all biological information for the images, we should keep the images in HVEM Grid at the same time The relational semantics between various data stored in distributed heterogeneous storages To upload many large files to HVEM DataGrid efficiently and safely Upload dependency & Serialization Ensure the transactions for safe 5 parallel uploads An efficient and transparent transaction management Requirement for the transactions on HVEM DataGrid Consider the semantic of HVEM DataGrid A project is composed of several experiments The data for an experiment should be inserted according to its data workflow The file and its metadata should be stored to HVEM DataGrid simultaneously. Otherwise, all of them should be deleted Support the long lifetime transaction according to the timelimit of experiment or project the short lifetime transaction which stores the data to HVEM DataGrid physically The optimization for the upload of large files to reduce the blocking time should ensure safe transactions An asynchronous and parallel upload scheme should protect upload dependency and ensure safe transactions 6 An efficient and transparent transaction management Transaction hierarchy For Project The transaction units as checkpoints on For Experiment incomplete data insertion Confine the rollback extent For a group of TnSs Parallel Processing For storing data to physical storage When the data for an experiment or a project is not inserted to HVEM DataGrid until each timelimit, the experiment or the project should be vanished by the rollback of TnE or TnP TnS((((1)2)5)2) (1) represents the identity of TnP it belongs to The next index ‘2’ indicates the identity of TnE and so on Support Autonomous garbage collection It is dependent on users to insert data or delete it on HVEM DataGrid. When they do not insert experimental data any more due to any reason without deleting the 7 related data, HVEM DataGrid would have a big garbage. Transaction management Scheme HVEM DataGrid forks two processes to connect DB and file storage each. In the light failure(LF) due to temporary failures on network or server, When the connections succeed, it gets the next requests and so on. retry the transaction fixed times When jSiS the jSiD(the notification from DB), jSiF(the notification from the file storage) jSiE (both of retries fail, a serious failure(SF) is assumed rollback process 8 them arrive) : TnS completes The state change of TnS(((())j)i) Evaluation Analysis Transparency Through transaction hierarchy and fine grained state management the transaction manager in HVEM DataGrid enables the transparent transaction to upload the image files to the file storage and store their metadata to DB simultaneously. Serializability Many TnSs are upload serializable because their state changes are logged through transaction index. To keep the upload dependency, the transaction manager protects the first user entering TnW. o If he withdraws the TnW, then an other user can initiate the TnW Transaction performance Support the transaction scheme asynchronism and parallelism Experiment Setting Because the sub-transaction time on DB is negligible compared with that on file storage due to data size, we only considered the upload time for image file Considering the semantic of the data workflow in HVEM DataGrid For an asynchronous file transfer, the request intervals for file transfer are chosen randomly within 50 sec The physical locations of the file storages are assumed to be distributed 9 Evaluation Overhead 10 Log management cost The cost for TnP, TnE and TnW; The general transaction management requires the log for TnS The log size for TnP, TnE and TnW is smaller than that for TnS because they function as checkpoint rather than real transaction units. Rollback cost The cascade rollback of TnS in TnW due to the upload dependency on parallel processing of TnS At LF, if the retry succeeds, the gain from transaction parallelism can be very large especially for large file handling There are not many SFs or LFs because e-Science DataGrid is not popular as the multimedia storage Conclusion A transaction management on HVEM Grid Safety Ensure a safe transaction considering the data workflow in HVEM DataGrid Efficiency Improve the performance to upload large files by asynchronism and parallelism Transparency Data management across the heterogeneous storages Automatic garbage collection Reduce garbage 11