Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lessons from SAM What data should be in a metadata system An incomplete guide 06-Jul-2006 Stefan Stonjek: Lessons from SAM 1 Outline What is SAM – The Wallpaper What is stored in SAM Useful and useless data Bits and pieces we learned during the design, implementation and operation of SAM Don’t reinvent the wheel 06-Jul-2006 Stefan Stonjek: Lessons from SAM 2 What is SAM Sequential file Access via Metadata DØ and CDF data handling system Central database – with middleware front-end (contains business logic) “Eierlegende Wollmilchsau” (compared to LHC approach) 06-Jul-2006 Stefan Stonjek: Lessons from SAM 3 06-Jul-2006 Stefan Stonjek: Lessons from SAM 4 The File The central entity in SAM is a data file File is identified by unique name – Not necessary meaningful File information is divided in – Physical file data – Physics data 06-Jul-2006 Stefan Stonjek: Lessons from SAM 5 File provenience SAM keeps provenience for every file – Therefore every file has to be in SAM File relationship is “m x n” Connection is done via “process” – Contains: date, executable, version, machine State machine for files 06-Jul-2006 Stefan Stonjek: Lessons from SAM 6 Do not delete No information will be deleted Files get retired If a disk size changes, an old alias gets retired and a new alias is used (at least this is the theory) File name problem – A re-reprocessing should create a corrected file with the same name – Solution: file names should not carry information 06-Jul-2006 Stefan Stonjek: Lessons from SAM 7 SAM station (like LFC+SRM) Every SAM station has attached disks/tapes SAM station transfers all the data into its realm (if necessary) Central systems knows which files are where SAM Grid can send jobs to where the data is 06-Jul-2006 Stefan Stonjek: Lessons from SAM 8 SAM projects and processes (like GANGA) SAM keeps track of all processes which witch – Write files into SAM – Read files from SAM Processes are grouped in projectsa – project reads/writes a whole dataset – process reads/writes some files from a dataset 06-Jul-2006 Stefan Stonjek: Lessons from SAM 9 Dimensions SQL like language to define a selection of input files which form a dataset Dimension query is stored on database Translated to SQL Prevents user from accidentally overloading database server 06-Jul-2006 Stefan Stonjek: Lessons from SAM 10 Dimension vs. plain SQL Dimensions are easy to use Dimensions shield the database from typos etc. A dimension query requires the admins to configure all query types Run_number > 100 and Run_number < 200 SQL is more flexible!!! … where Runs.Number > 100 and Runs.number < 200 … 06-Jul-2006 Stefan Stonjek: Lessons from SAM 11 Memo User should think “dataset” not file “file” is atomic unit for any datahandling system In some cases a single file might be useless 06-Jul-2006 Stefan Stonjek: Lessons from SAM 12 Oracle 9 vs. Oracle 10 Rule based optimization is gone Therefore optimal solution is different for Oracle 9 and Oracle 10 06-Jul-2006 Stefan Stonjek: Lessons from SAM 13 Summary A stupid man does’t learn from his mistakes A clever man does learn from his mistakes A wise man learns from others mistakes LCG Grid can learn from SAMGrid – (I hope) 06-Jul-2006 Stefan Stonjek: Lessons from SAM 14