Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Department of Particle & Particle Astrophysics Sea-Of-Flash-Interface SOFI introduction and status The PetaCache Review Michael Huffer, [email protected] Stanford Linear Accelerator Center November 02, 2006 Department of Particle & Particle Astrophysics Outline • • • • • • Background – History of PPA involvement – Synergy with current activities Requirements – Usage model – System requirements – Individual client requirements Implementation – Abstract model and features – Building Blocks Deliverables – Packaging Schedule – Status – Milestones Summary – Reuse – Conclusions 2 Department of Particle & Particle Astrophysics Background • Research Engineering Group supports a wide range of activities with limited resources – LSST, SNAP, ILC, SiD, EXO, LHC, LCLS, etc • In order to utilize these resources most effectively requires understanding: – core competencies – the requirements of future electronics systems • Two imperatives for REG: – Support upcoming experiments – Build for the future by advancing core competencies • What are: – More detailed examples of a couple of upcoming experiments? – The necessary core competencies? 3 Department of Particle & Particle Astrophysics LSST “The Large Synoptic Survey Telescope (LSST) is a proposed ground-based 8.4-meter, 10 square-degreefield telescope that will provide digital imaging of faint astronomical objects across the entire sky, night after night. In a relentless campaign of 15 second exposures, LSST will cover the available sky every three nights, opening a movie-like window on objects that change or move on rapid timescales: exploding supernovae, potentially hazardous near-Earth asteroids, and distant Kuiper Belt Objects. The superb images from the LSST will also be used to trace billions of remote galaxies and measure the distortions in their shapes produced by lumps of Dark Matter, providing multiple tests of the mysterious Dark Energy.” SLAC/KIPAC is lead institution for the camera – Camera contains > 3 gigapixels • > 6 gigabytes of data/image • Readout time is 1-2 seconds – KIPAC delivers camera DAQ system 4 Department of Particle & Particle Astrophysics SNAP • SLAC is lead institution for all non-FPA related electronics – One contact every 24 hours – Requires data to be stored on board instrument – Storage capacity is roughly 1 Terabyte (includes redundancy) – Examining NAND flash as solution to storage problem “The Supernova/Acceleration Probe (SNAP) satellite observatory is capable of measuring thousands of distant supernovae and mapping hundreds to thousands of square degrees of the sky for gravitational lensing each year. The results will include a detailed expansion history of the universe over the last 10 billion years, determination of its spatial curvature to provide a fundamental test of inflation - the theoretical mechanism that drove the initial formation of structure in the universe, precise measures of the amounts of the key constituents of the universe, ΩM and ΩL, and the behavior of the dark energy and its evolution over time.” 5 Department of Particle & Particle Astrophysics Core competencies • • • • • System on Chip (SOC) – Integrated processors and functional blocks on an FPGA Small footprint, high performance, persistent, memory systems – NAND Flash Open Source R/T kernels – RTEMS (Real-time Executive for Multiprocessor Systems) High performance serial data transport and switching – MGTs (Multi-Gigabit Transceivers) Modern networking protocols: – 10 Gigabit Ethernet – InfiniBand – PCI-Express 6 Department of Particle & Particle Astrophysics PetaCache consistent with mission? Project Use core technology? SOC Memory R/T kernels H/S transport LSST yes no yes yes SNAP no yes yes no Petacache yes yes yes yes Main Entry: syn·er·gy Pronunciation: 'si-n&r-jE noun Inflected Form(s): plural -gies Etymology: New Latin synergia, from Greek synergos working together 1: broadly : combined action or operation 2 : a mutually advantageous conjunction or compatibility of distinct business participants or elements (as resources or efforts) Function: SYNERGISM ; 7 Department of Particle & Particle Astrophysics Usage model • System requirements: – Scalable, both in: • Storage capacity • Number of concurrent clients – Large address space – Random access – Support population evolution • Data Storage distribution, transport & management Features: – Changes are quasi-adiabatic • “Write once, read many” – Able to treat as Read-Only system • Requirements not addressed in this phase: – Access Control – Redundancy – Cost client Host “Lots of storage, shared concurrently by many clients, distributed over a large number of hosts” 8 Department of Particle & Particle Astrophysics Client Requirements • Uniform access time to fetch a “fixed” amount of data from storage – Implies: deterministic and relatively “small” latency in round-trip time • Where: “fixed” is O(8 Kbytes) & “small” O(200 micro-seconds) • – Need approximately 40 Mbytes/sec between client & storage Access time scales independent of: – Address – Number of concurrent clients Petacache project focus is on this issue alone Two contributions to latency: – Storage access time – Distribution, transport, and management overhead SOFI architecture attempts to address both issues 9 Department of Particle & Particle Astrophysics Abstract model • Key features: – Available concurrency and bandwidth scales with storage capacity – Many individual “Memory servers” Memory server Flash Memory Controller (FMC) • Access granularity is 8 Kbytes • 16 GBytes of Memory/server • 40 Mbytes/sec/server – Load Leveling • Data randomly distributed over memory servers • Multicast for concurrent addressing • Both client & server side caching – Two address spaces • Physical page access • Logical block access • Hides data distribution from client – Network Attached storage Client Content Addressable Switching 10 Department of Particle & Particle Astrophysics Building Blocks Four Slice Module (FSM) 256 GByte Flash Slice Access Module (SAM) Cluster Inter-Connect Module (CIM) 1 Gbyte/sec PGP (Pretty Good Protocol) Host Inter-connect 10 GEthernet Application specific Host Client Interface (SOFI) (1 of n) 1 Gigabit Ethernet (.1 GByte/sec) (1 of n) Network Attached Storage 8 x 10 G-Ethernet (8 GByte/sec) 11 Department of Particle & Particle Astrophysics Four Slice Module (FSM) to PHY CRC-In CRC-Out PGP & command encode/decode initiator PHY in-bound transfer in-bound out-bound out-bound transfer & encode arbiter arbiter & decode FMC1 FPGA clock configuration FMC2 FMC3 1 DIMM (8 devices) 32 GBytes FMC4 1 x 4 slices to DIMM 12 Department of Particle & Particle Astrophysics Flash Memory Controller (FMC) • • • • Implemented as Core IP Controls 16 GBytes of memory (4 devices) in units of: – Pages (8 Kbytes) – Blocks (512 Kbytes) Queues operations – Read Page (in units of 128 byte chunks) – Write Page – Erase Block – Read statistics counters – Read device attributes Transfers data at 40 Mbyte/sec 13 Department of Particle & Particle Astrophysics Universal Protocol Adapter (UPA) The SAM is ½ of a UPA pair Right side PPC-405 (450 MHZ) Fabric clock Right side MGT clock FPGA (SOC) 200 DSPs Lots of gates Xilinx XC4VFX60 Right side Memory (512 Mbytes) Micron RLDRAM II Right side Configuration memory 128 Mbytes) Samsung K9F5608 Right side Multi-Gigabit Transceivers (MGT) 8 lanes Left side MFD Reset options JTAG Left side 100-baseT Reset 14 Department of Particle & Particle Astrophysics UPA Features • “Fat” Memory Subsystem – Sustains 8 Gbytes/sec – “Plug-In” DMA interface (PIC) • Designed as a set of IP cores • Designed to work in conjunction with MGT and protocol cores • • • • • Bootstrap loader (with up to 16 boot options and images) Interface to configuration memory Open Source R/T kernel (RTEMS) 100 base-T Ethernet interface Full network stack “Think of the UPA as a Single Board Computer (SBC) which interfaces to one or more busses through its MGTs” 15 Department of Particle & Particle Astrophysics UPA Customization for SAM • • • • Implements two cores: – PGP – 10-GE All 8 lanes of MGT used: – 4 lanes for PGP core – 4 lanes for 10-GE Network driver to interface 10G-E to network stack Executes application code to satisfy: – Server side of SOFI client interface • Physical to Logical translation • Server side caching – FSM management software • Proxy FMC command set • Maintains bad blocks • Maintains available blocks 16 Department of Particle & Particle Astrophysics (Cluster Inter-connect Module (CIM) to SAMs (high-speed) 100 baseT to SAMs (low-speed) 10 GE (XUI) 100 baseT (16) (4) (16) (4) (16) (16) (4) (4) (8) High Speed Switch Data (24 x 10-GE) Fulcrum FM2224 10 GE (XUI) Switch management (UPA) (21) (21) (22) Low-speed Switch Management (24 x FE + 4 x GE) Zarlink ZL33020 1000 baseT 1000 baseT to host inter-connect (data-network) to host inter-connect (management-network) 17 Department of Particle & Particle Astrophysics Client/Server Interface • • • • • • • • Client Interface resides on host Servers reside on SAMs Any one client on any one host has uniform access to all flash storage Client accesses flash through network Interconnect Abstract Interconnect model – Delivered implementation is IP (UDP and multicast services) Interface delivers three types of services: – Random Read access to objects within the store – Population of objects within the store (Write and Erase access) – Access to performance metrics Client Interface is Object-Oriented (C++) – Class library (distributed as a set of binaries and header files) Two address spaces (physical & logical) – Client access information only in logical space – Client is not sensitive to actual physical location of information – Population distribution is pseudo-random ( static load leveling) 18 Department of Particle & Particle Astrophysics Addressing Physical addressing (1 page = 8 Kbytes) Interconnect 20 Manager 232 Slice 22 Controller 22 Page 221 20 x 232 x 22 x 22 x 221 = 128K peta-pages (1M peta-bytes) Logical Addressing (1 block = 8 Kbytes) Interconnect 20 Partition 264 Bundle 264 Block 264 19 Department of Particle & Particle Astrophysics Using the interface • Partition is a management tool – Segment logically storage into disjoint sets – One-to-One correspondence between a partition and a server – One SAM may host more then one server • Bundle is an organization tool – Bundle belongs to one (and only one) partition – Bundle is an access pattern hint. Allows: • fetch look-ahead • optimization of overlapping fetches from different clients • • Both partition and bundle are assigned unique identifiers (over all time) Identifiers may have character names (alias) – Assigned at population time • Client query is composed of: partition/cluster/offset/length – offset is expressed in units of blocks – length is express in units of bytes Client may query by either identifier or alias • 20 Department of Particle & Particle Astrophysics Deliverables • • • • Two FSMs (8 slices) – 1/2 TByte Two SAMs – Enough to support FSM operations Client/Server interface (SOFI) – Targeted to Linux How will the hardware be packaged? – Where packaging is defined as: • How the building blocks are partitioned • The specification of the electro-mechanical interfaces 21 Department of Particle & Particle Astrophysics The “Chassis” • • • • 2 FSMs/Card – 1/2 TByte 16 Cards/Bank – 8 TByte 2 Banks/Chassis – 64 SAMS – 1 CIM – 16 TByte 3 chassis/rack – 48 TByte 1U Air-Outlet 1U Fan-Tray Passive Backplane Supervisor Card (8U) 8U X2 (XENPACK MSA) Accepts DC power 1U Air-Inlet Line Card (4U) 22 Department of Particle & Particle Astrophysics 48 TByte facility 1 chassis Catalyst 6500 (3 x 4 10GE, 2 x 48 1GE) SOFI Host ( 1 x 96) xRootD servers 23 Department of Particle & Particle Astrophysics Schedule/Status • Methodology: – Hardware Host Client API • Implement 3 “platforms” logical/physcial translation cache management – One for each type of module IP protocol implementation • Decouple packaging from architectural & implementation issues… – Evaluate layout issues concerning high-speed signals – Evaluate potential packaging solutions – Allow concurrent development of VHDL & CPU code The “wire” IP protocol implementation logical/physcial translation cache management – Software • Emulate FSM component of server software – Complete/debug in absence of hardware – Allows clients an “early look” at interface FSM interface SAM 24 Department of Particle & Particle Astrophysics Evaluation platforms • • • UPA – Memory subsystem – Bootstrap loader – Configuration memory – RTEMs – Network stack/network driver interface issues CIM – Low and high speed management – Evaluate different physical interfaces (including X2) FSM Line card (depending on packaging this could be production prototype) – FMC debug – PGP debug 25 Department of Particle & Particle Astrophysics Schedule products activities SOFI schematic Chassis/mechanical layout Backplane 5 debug Line Card PCB 3 spin/load UPA/10GE driver design UPA/PGP specification UPA/10GE MAC implement RTEMS/UPA CIM platform 2 4 Supervisor PCB UPA platform 1 PIC October November December January February March 26 Department of Particle & Particle Astrophysics Milestones Milestone date RTEMS running on UPA evaluation platform 2rd week December/2006 SOFI (emulation) ready 3rd week January/2007 Supervisor PCB ready for debug 3rd week January/2007 Chassis & PCBs complete 3th week of Febuary/2007 Start Test & Integrate 2nd week of March/2007 27 Department of Particle & Particle Astrophysics Status Products specification design implementation SOFI in-progress in-progress DIMM FCS FSM in-progress SAM in-progress CIM in-progress UPA in-progress PGP core 10-GE core The “chassis” 28 Department of Particle & Particle Astrophysics Products & Reuse Product Targeted for use? Petacache LSST Camera DAQ SNAP LCLS DAQ Atlas Trigger Upgrade UPA yes yes no yes yes 10-GE core yes yes no yes yes PGP core yes yes no yes yes FCS yes no yes no no CIM yes yes no yes yes FSM yes no no no no SAM yes no no no no DIMM yes no no no no SOFI yes no no no no The “chassis” yes maybe no maybe maybe 29 Department of Particle & Particle Astrophysics Conclusions • • • Robust and well developed architecture – Concurrency and bandwidth scale as storage is added – Logical Address space hides client from actual data distribution – Network Attached Storage – Scalable (in size and users) Packaging solution may need an iteration… Schedule – Somewhat unstable, however… • sequence and activities are to a large degree correct • risk is in development of 10 GE • – Well-along implementation road Well developed synergy between Petacache and the current activities of ESE – Great mechanism to develop core competencies – Many of the project deliverables are directly usable in other experiments 30