Download First feedback from DC04 (CMS)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Entity–attribute–value model wikipedia , lookup

Big data wikipedia , lookup

Data Protection Act, 2012 wikipedia , lookup

Data model wikipedia , lookup

Data center wikipedia , lookup

Disk formatting wikipedia , lookup

Data analysis wikipedia , lookup

Forecasting wikipedia , lookup

Information privacy law wikipedia , lookup

Data vault modeling wikipedia , lookup

3D optical data storage wikipedia , lookup

Business intelligence wikipedia , lookup

Transcript
‘Real World’ issues from DC04

DC04:



Trying to operate the CMS computing system at 25Hz for one
month
We are three days in!
We are using components that are ready NOW
• Even if it’s not politically correct
• Often using several different approaches for comparison

This talk: concentrates on data management issues



‘Real World’ issues that have come up during DC04 preparation
Stuff that is not (yet) well covered by the available tools
I know that…
• Some issues may be application problems, not middleware ones
• Some issues may be covered by components under development
• Some issues may be self-inflicted injuries
Dave Newbold, University of Bristol
GridPP Middleware Meeting
Directed data transfer

Data management ‘type I’: replica management




The (automatic?) movement of data products to where they are
needed; managing relevant system and application metadata
Best-effort optimisation of data location in response to dynamic
workload needs
Well-covered by current and future middleware
Data transfer ‘type II’: bulk data management

The predictable straight(ish) ‘production line’ of data flow
•
•
•
•



Detector -> DAQ -> Buffer -> Reco farm -> T1 -> MSS -> calib -> …
Requirements are different to replica management
Robustness and reliability paramount (raw data is the ‘crown jewels’)
Throughput is very important: ‘best effort’ is not good enough
Not explicitly addressed by current middleware products
Data distribution is explicitly ‘directed’ by policy
‘Seeds’ the replica mangement system from the Tier-1’s.
Dave Newbold, University of Bristol
GridPP Middleware Meeting
Directed data transfer

Our current solution





Cooperating system of simple ‘agents’ at Tier-0 and Tier-1
They communicate only through a shared (Oracle) DB
They have little or no state - it’s all held in the central DB
Could this be useful as generic middleware?
Other related issues:



Lack of a single consistent interface to MSS (in Europe and US)
makes life difficult (being addressed?)
There are very many failure modes in the data management
system that we must think of…
Would be good to factorise out the problems of failing storage
components by having the MSS ‘remap’ our data when required
• Predict at least one disk failure per day somewhere in DC04
Dave Newbold, University of Bristol
GridPP Middleware Meeting
Data transfer tools

Need low-level transfer tools that:






Log what is going on! (We have ad-hoc solutions here for DC04)
Adjust policy automatically for optimum throughput according to network
conditions
Fail gracefully when something is wrong at an end-point
Play nice with firewalls, etc
NB: performance is not currently the problem, but the tools are…
Checksumming


We would like a system that performs fast file-level checksum of data ON
THE DISK
No, TCP checksum does not catch all errors
• Silent disk problems, filesystem errors, NFS problems, etc etc


Checksumming data from MSS after-the-fact is very difficult
Would also like:


Some SIMPLE means of distributed, authenticated, atomic, reliable
message-passing between agents over the Grid
With a command-line level API for scripting
Dave Newbold, University of Bristol
GridPP Middleware Meeting
Other issues…

Small files!

They seem to be inevitable, but play havoc with efficiency:
• Huge lists of files in catalogues
• Not dealt with efficiently by MSS, transfer tools, etc




Basic unit of information management: data produced by one MC, reco,
filter job during its run (with unique GUID)
Do not want to make jobs too long… (too much state in the system)
Can aggregation help? Perhaps, but we need the tools
Metadata



Currently a ‘hot topic’?
How to handle efficient distribution of system- and user-level metadata?
Which metadata are immutable after creation? Which need to be
distributed widely? How to handle schema extension on per-user basis?
Dave Newbold, University of Bristol
GridPP Middleware Meeting