Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Distributed BELLE Analysis Framework HIGUCHI Takeo Department of Physics, Faulty of Science, University of Tokyo Representing dBASF Development Team BELLE/CHEP2000 1 Introduction to B Factory at KEK KEK-B Accelerator is e + e - asymmetric energy collider: 3.5GeV/c for positrons 8.0GeV/c for electrons Designed luminosity is 1.0 x 1034 cm-2s-1 Now KEK-B is operated at ~5.0 x 1032 cm-2s-1 BELLE Experiment Goal of BELLE experiment is to study CP violation in B meson decays Experiment is in progress at KEK BELLE/CHEP2000 2 BELLE Detector 3 BELLE Detector SVD ECL Precise vertex detection Electromagnetic calorimeter for e - and g reconstruction CDC Track momentum reconstruction Particle ID with dE/dx KLM Muon and K L and detection ACC Aerogel Cherenkov counter for particle ID EFC TOF Particle ID and trigger BELLE/CHEP2000 Electromagnetic calorimeter for luminosity measurement 4 Current Event Reconstruction Computing Environments Event reconstruction is performed on 8 SMP machines UltraEnterprise x 7 servers equipped with 28 CPUs Total CPU power is 1,200 SEPCint95 Sharing CPUs with user analysis jobs MC production is done on PC farm (P3 500MHz x 4 x 16) Reconstruction Speed 15Hz/server 70Hz/server with L4 (5.0 x 1032 cm-2s-1) BELLE/CHEP2000 5 Necessity for System Upgrade In Future We will have more luminosity 200Hz after L4 (1.0 x 1034 cm-2s-1) Data size may increase more Possibly background It causes lack of computing power We need 10 times of current computing power when considering DST reproduction and user analysis activities BELLE/CHEP2000 6 Next Computing System Low Cost Solution We will build new computing farm with sufficient computing power Computing servers will consist of ~50 units of 4-CPU PC servers with Linux ~50 units of 4-CPU SPARC servers with Solaris Total CPU power will be 12,000 SPECint95 BELLE/CHEP2000 7 Configuration of Next System Tape I/O: 24MB/s tape library switch FS file server Sun Sun Sun Sun I/O servers Gigabit switch hub hub 100Base-T PC PC servers PC PC PC 8 Current Analysis Framework BELLE AnalysiS Framework (B.A.S.F.) B.A.S.F. supports event by event parallel processing on SMP machines hiding parallel processing nature from users B.A.S.F. is currently used widely in BELLE from DST production to user analysis We develop an extension to B.A.S.F. to utilize many PC servers connected via network to be used in next computing system BELLE/CHEP2000 9 New Analysis Framework New Framework Should Provide: Event by event parallel processing capability over network Resource usage optimization Maximize total CPU usage Draw maximum I/O rate from tape servers Capability of handling other purpose than DST production User analysis, Monte Carlo simulation or anything Application for parallel processing at university site dBASF – Distributed B.A.S.F Super-framework for B.A.S.F. 10 Link of dBASF Servers Job Client I/O B.A.S.F. init/term B.A.S.F. B.A.S.F. B.A.S.F. I/O report of resource usages dynamic change of node allocation Resource PC server SPARC PC server PC server Job Client B.A.S.F. B.A.S.F. I/O B.A.S.F. B.A.S.F. I/O Communication among Servers Functionality Call function on a remote node by sending a message Shared memory expanded over network space Implementation NSM – Network Shared Memory House-grown product Originally used for BELLE DAQ Based on TCP and UDP BELLE/CHEP2000 12 Components of dBASF dBASF Client User interface Accepts from user: B.A.S.F. execution script Number of CPUs to be allocated for analysis Asks Resource manager to allocate B.A.S.F. daemons Resource manager returns allocated nodes Initiates B.A.S.F. execution on allocated nodes Waits for completion Notified from B.A.S.F. daemons when job ends BELLE/CHEP2000 13 Components of dBASF Resource Manager Collects resource usage from B.A.S.F. daemons through NSM shared memory CPU load Network traffic rate Monitors idling B.A.S.F. daemons of each dBASF session Increase/decrease number of allocated B.A.S.F. daemons dynamically when better assignment is discovered BELLE/CHEP2000 14 Components of dBASF B.A.S.F. Daemon Runs on each computing server Accepts ‘initiation request’ from dBASF client and forks B.A.S.F. processes Reports resource usage to Resource manager through NSM shared memory BELLE/CHEP2000 15 Components of dBASF I/O Daemon Reads tapes or disk files and distributes events to B.A.S.F. running on each node through network Collects processed data from B.A.S.F. through network and writes them to tapes or disk files In case of Monte Carlo event generation, event generator output is distributed to B.A.S.F. where detector simulation is running BELLE/CHEP2000 16 Components of dBASF Miscellaneous Servers Histogram server Merges histogram data accumulated on each node Output server Collects standard out on each node and saves them to file BELLE/CHEP2000 17 Resource Management Best Performance Achieved when total I/O rate becomes maximum with minimum number of CPUs Dynamic Load Balancing CPU bound: Increase number of Computing servers so that I/O speed becomes maximum I/O bound: Decrease number of Computing servers so as not to change I/O speed BELLE/CHEP2000 18 Resource Management Load Balancing When n now CPUs are assigned for a job, best assignment number of CPUs; n new is given by: nnew CPU real spent time IOcurrent rate nnow IOlimit rate CPU analysisspent time BELLE/CHEP2000 19 Resource Management Job Client initiate B.A.S.F. B.A.S.F. B.A.S.F. B.A.S.F. B.A.S.F. increase node report of resource usage best allocation? no Resource decrease node B.A.S.F. Job Client terminate B.A.S.F. B.A.S.F. B.A.S.F. B.A.S.F. Data Flow SPARC Raw Data PC servers B.A.S.F. B.A.S.F. TCP/IP B.A.S.F. B.A.S.F. TCP/IP Processed Data I/O I/O PC servers STDOUT Histogram BELLE/CHEP2000 21 Status System test is in progress on BELLE PC farm consisting of 16 units of P3 550MHz x 4 servers Node-to-node communication framework was developed and being tested Resource management algorithm is under study Basic speed test of network data transfer has been finished Fastether: Point-to-Point, 1-to-n GigbitEther: Point-to-point, 1-to-n New computing system will be available in March 2001 22 Summary We will build computing farm of 12,000 SPECint95 with PC Linux and Solaris servers to solve facing computing power shortness We began to develop management scheme of computing system extending current analysis framework We have developed communication framework and are studying resource management algorithm BELLE/CHEP2000 23