Download An Architecture for Reconfigurable Computing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Distributed operating system wikipedia , lookup

Transcript
An Architecture for Reconfigurable Computing in Space
Robert F. Hodson1, Kevin Somervill1, John Williams2, Neil Bergman2, Robert Jones III3
1
NASA Langley Research Center
2
The University of Queensland
3
ASRC Aerospace Corp.
Introduction
The availability of reconfigurable radiation tolerant Field Programmable Gate Arrays
(FPGAs) for space applications creates an opportunity to explore and potentially exploit
new processing paradigms for increased computing performance. Studies have shown
that custom FPGA-based implementations or soft core processors coupled with custom
co-processors can greatly improve the performance of some applications. Imaging
applications have been shown to achieve speedups of eight to 800 over an 800 MHz
Pentium III processor [1]. Other embedded benchmarking programs have shown an
average performance speedup of 5.8 when soft core processors are combined with
custom co-processors. Reduced power consumption (on average 57%) has also been
demonstrated [2].
Space-qualified device availability and promising initial performance studies suggest that
development of a space-qualified reconfigurable computing platform is a viable
computing solution for certain space applications. The Reconfigurable Scalable
Computing (RSC) project, funded through NASA’s Exploration Systems Mission
Directorate (ESMD), is an on-going effort to develop a reconfigurable computing platform
to support the processing requirements of future NASA missions. The high-level system
architecture is presented within along with an overview of the challenges of space-based
reconfigurable computing.
Architecture Goals and Objectives
Strategic technical challenges were develop by ESMD to guide the development of
sustainable and affordable solutions for future exploration missions to the Moon and
Mars. The RSC architecture is directly traceable to the strategic technical challenges of
reconfigurability and modularity. Reconfigurability led to the selection of SRAM-based
FPGAs as a fundamental computing resource to enable adaptation to new or
unanticipated circumstances. Modularity led to design decisions related to physical formfactor and levels of granularity in the RSC architecture.
Applications are often implemented on traditional general-purpose processors. The
desire to leverage the traditional (sequential) general-purpose computing approach led
to a decision to support soft core processor(s) in FPGAs. Additionally custom coprocessor support for specialized processing was also desirable due the performance
improvements this approach has been shown to yield. A computing platform that could
be configured (or reconfigured) to support the appropriate blend of general and specialpurpose processing needed to meet application performance requirements is one of the
architecture’s objectives.
In developing a computing platform for a varied application space it is desirable to
provide system flexibility. A modular approach with “reasonable” grain size per module
1
Hodson/130
MAPLD 2005
allows for a scalable solution that can be tailored to meet the processing requirements
for a given application. The ability to scale a processing solution requires an intermodule communications capability to facilitate data and control flow between processing
elements. Combining the elements discussed, an architecture as depicted in Figure 1
begins to emerge.
Figure 1. High-level architecture of the RSC platform.
General-purpose Computing Model
One approach to reconfigurable computing is to use traditional general-purpose
processors (ASICs or hard-cores) in conjunction with specialized computing coprocessors implemented in FPGAs. Although the RSC’s architecture could support this
approach through its standard bus interface, this is not the approach used by the RSC.
The RSC implements soft processor(s) for general-purpose computing. The soft
processor core supported is the 32-bit MicroBlaze RISC processor. A triple modular
redundant (TMR) version of this processor is currently under study by the Xilinx Single
Event Effect (SEE) consortium and is being tested for radiation effects [3]. Because this
processor is optimized for Xilinx FPGAs, it is possible to fit two TMR-MicroBlaze
processors in a single Xilinx Virtex-4 FX60 device and still have resources available for
additional special-purpose processing.
The MicroBlaze architecture can be customized to meet an application’s processing
requirements. For example a cache, barrel shifter, multiplier, divider, and floating point
unit are all customizable options. The RSC architecture uses the MicroBlaze with custom
instruction and data caches that will support its Harvard architecture and provide the
necessary support for single event upset (SEU) mitigation. Another advantage of the soft
core approach is the ability change SEU mitigation techniques within the FPGA to meet
mission requirements. For example, an upset in an image may not require mitigation but
and upset in a control processor could be catastrophic. Different mitigation strategies
can be applied in each case by programming the FPGA appropriately.
Special-purpose Computing Model
Much of the performance gain of FPGA-based computing comes from the ability to
design special-purpose cores that are optimized to perform computationally expensive
2
Hodson/130
MAPLD 2005
tasks. The RSC architecture inherently supports this approach with FPGA resources
(logic, memory, DSP slices, etc.) that can be programmed by developers. These cores
must communicate with memory, other cores, I/O devices, and soft processors. The
RSC architecture supports several forms of communications for special-purpose
computing. High-speed (2.5 Gbps) serial I/O is provided for external communications
to/from the RSC’s Reconfigurable Processing Module (RPM). Communications with the
MicroBlaze processor or I/O devices can be accommodated via the On-Chip Peripheral
Bus (OPB). Additionally the MicroBlaze processor supports Fast Simplex Links (FSL)
which provides an instruction-pipeline interface via point-to-point communication links
with FIFO queuing. Also, cores can directly access internal block RAM within the FPGA
or access external memory via direct memory access (DMA). Figure 2 shows how
custom logic may be combined with a MicroBlaze (uB) soft core in the RSC’s RPM.
Figure 2. Block Diagram of the RSC’s Reconfigurable Processing Module (RPM).
Network Model
The MicroBlaze processor does not have the shared interrupt and cache coherence
hardware support needed for Symmetric Multiprocessing (SMP). These limitations make
a loosely-coupled Massively Parallel Processing (MPP) approach attractive. In this
computational model each processor supports its own unique memory space and
communication between processors is performed through Network Interface Controllers
(NICs) that are used to send and receive messages between processing nodes.
The RSC has adopted the MPP approach, but has a multi-level network. There are
potentially three classes of communications in a RSC system. These relate to the
physical implementation of the RSC system. The RSC is designed with modules that can
be stacked together to form a PCI bus (compatible with the PCI-104 standard). Stacks
can also be interconnected via a Network Module (NM) to scale to larger systems.
Therefore it is possible for communications (1) between modules, via the PCI interface,
(2) between stacks, via the NM, and (3) between processors within the same FPGA. The
RSC networking architecture provides for all three classes of communication seamlessly
and abstracts communication details from the user through traditional software
3
Hodson/130
MAPLD 2005
abstraction provided by the operating system. Figure 3 shows the network model
protocol layers for inter-module communications.
Application/MPI
Sockets
message
Transport
UDP
segment
Network
Internet Protocol
IP Address and Size of Datagram
RSC Protocol Stack
Sending
CPU
Receiving
CPU or NM
Message
Message
INTR
INTR
IP
PHdr
Data
PHdr
Data
IP
NIC
NIC
packet
DMA Engine
Network
Network Req
Req
Data Link
and Physical
PHdr
Data
PHdr
Data
NIC
NIC
INTR
INTR
Pull
PCI Address and Size of Datagram
ACK
ACK
Buses
RSC
1
Figure 3. Network Protocol Layers.
Hodson
MAPLD 2005/130
The details of a message transfer between two processors on different modules are
shown in Figure 4. The message is created by the application layer, converted to
packets by the operating system and sent via NIC-to-NIC communications across the
physical media (in this example, the PCI bus).
2. Message
send request.
Source NIC
CPU
Controller
Req queue
9. Interrupt to source
CPU. Buffer can now
be released.
Source RPM
3. IP address is
translated to PCI
address of destination.
SDRAM
Message
Buffer
IP2PCI
IP2PCI
Mapping
Mapping
PCI
PCI I/F
I/F
5. Destination NIC pulls
(DMAs) message into
destination RPM’s memory.
PCI Bus
1. Packet is built in
memory.
4. PCI Address of
message on
source RPM sent
to destination NIC.
8. Destination NIC tells
source NIC “message
received.”
IP2PCI
IP2PCI
Mapping
Mapping
PCI
PCI I/F
I/F
SDRAM
Message
Buffer
Controller
Req queue
Destination NIC
Destination RPM
CPU
6. Message received
interrupt sent to CPU.
7. Message
processed.
1
Hodson
Figure 4. Inter-module message communication flow.
MAPLD 2005/130
Software Model
4
Hodson/130
MAPLD 2005
To facilitate application development, the RSC architecture supports software layers to
abstract hardware details and provide a rich set of development tools. The uClinux
operating system is run on the MicroBlaze processor providing typical OS functionality
including process management, file management, and device/network abstraction.
uClinux is a derivative of the popular Linux operating system for processors that lack
memory management units. The GCC toolchain can be used with uClinux to provide a
development environment that includes compilers, assemblers, linkers, debuggers, and
other tools. These software elements provide a traditional software programming
paradigm for a soft processor in a reconfigurable system.
Additionally, to support the MPP structure of the RSC architecture, the Message Passing
Interface (MPI) will be implemented to provide high-level primitives for inter-process
communications. MPI functionality includes capabilities to send/receive/broadcast
messages as well as synchronize processes. A subset of the common MPICH
implementation of MPI which has approximately 125 library calls will be ported to the
RSC platform. MPICH can be implemented on top of TCP/IP protocol but can also be
optimized to take advantage of underlying hardware support and bypass the operating
system for improved performance. This provides a mechanism to optimize the RSC
message passing architecture by eliminating unnecessary layers in the TCP/IP protocol
when they are not applicable or inefficient in the uClinux implementation of the protocol
stack.
A development environment is also needed for custom core development. Traditional
hardware description languages like VHDL and Verilog can be used, but also newer
higher level tools such as StarBridge System’s Viva graphical environment or Celoxica’s
DK Design Suite can be used. Viva support in particular is being developed for the RSC
platform. Plans for a Viva system description of the RPM and interfacing primitives are
underway. The combination of a common operating system, toolchain, message passing
library, and HDL tools provides a rich environment for productive application
development.
Fault Tolerance
Radiation-induced errors, typically from particles trapped in radiation belts or from
cosmic rays, are a common source of faults in space avionics. A variety of techniques
are use to mitigate against single event upsets and improve fault tolerance in space
electronics. The RSC platform uses several approaches. The reconfigurable logic in the
Xilinx FPGAs can have three types of errors that require mitigation: logic errors, memory
errors, and configuration errors. Errors in logic (transients or bit flips) are eliminated
through TMR. Three identical circuits vote to eliminate the affected cell and the error is
corrected. This is done using the Xilinx XTMR tool for logic triplication. One drawback of
this approach is the need to triplicate inputs and outputs to the device. The I/Os are tied
together on the circuit board effectively reducing the device’s I/O capacity by two-thirds.
A TMR design alone is not enough to ensure a fault free circuit. The FPGA’s
configuration memory must also be scubbed (continually rewritten) to correct any errors
due to SEUs. The scrubbing logic is external to the Xilinx FPGA in a rad-tolerant Actel
FPGA. The Actel device uses antifuse technology along with TMR, so no scrubbing of
configuration memory is needed. Memory errors can also be eliminated with TMR but
error correction codes, like Hamming codes, are also used to detect and correct errors.
5
Hodson/130
MAPLD 2005
Unlike configuration memory the contents of a generic RAM cells are not known a priori
and therefore the contents of each cell must be readout, checked for errors, corrected if
necessary, and written back. A special case of memory error is a cache error. Because a
cache line is always duplicated in main memory (for a write-through cache), a detectand-invalidate method can be utilized since the memory element can be re-fetched from
main memory. The cache should still be periodically scrubbed to prevent multiple bit
errors from occurring which may be undetectable. Cache tags must also be checked for
errors.
Memories external to the reconfigurable logic must also have Error Detection And
Correction (EDAC). For the RSC platform, these memories include non-volatile memory
and SDRAM. Additional logic to support this is implemented in the Actel device. Memory
buffers in the Actel require EDAC and scrubbing where data can become stale and
accumulate multi-bit errors.
Physical Model
As mentioned previously, the RSC uses a stackable form factor, see Figure 5. The RSC
project is extending the PCI-104 standard for space applications. The project has been
calling this standard SPACE-104. SPACE-104 is ruggedized, shielded, and conduction
cooled to support launch and space environments. A stackable 33MHz, 32 bit PCI bus is
implemented and is backwards compatible with the PCI-104 standard. The form factor is
larger than PCI-104 to support the larger footprints of space-grade FPGAs but still
provides for connection of PCI-104 cards for ground support and testing. The larger side
of its rectangular form factor also provides a large contact area for to remove heat from
the stack. A second high-density connector is defined in the SPACE-104 standard to
provide additional inter-module I/O and future expansion to a 64-bit PCI implementation.
Figure 5. Two interconnected RSC stacks.
The fundamental modules that make up an RSC stack are the Reconfigurable
Processing Module, which was previously discussed, the Network Module that is used to
bridge between multiple stacks, the Command and Control Module which is the
command interface for the system, and the Power Module that performs power
conversion and distribution.
This stackable approach has proven effective for embedded terrestrial systems and its
modularity allows for the system to be customized to meet processing requirements. The
6
Hodson/130
MAPLD 2005
stackable approach also has the added advantages of not requiring separate backplane
and enclosure designs.
Challenges
In the development of any new computing system there are many challenges to
overcome. Developing a computing system for space complicates the design even more.
Managing this complexity is a primary challenge in developing the RSC platform. The
RSC project is taking an incremental approach to its design, first implementing basic
functionality and then adding performance enhancements. The use of reconfigurable
logic aligns with this approach. As an example a simple direct mapped cache can be
developed first, followed by a cache with error detection and scrubbing, later more
advanced techniques, like lazy write or selective prefetch can be implemented. There
are similar incremental steps for the development of other subsystems.
In addition to design complexity there are technical challenges. The availability of dense
rad-hard non-volatile memory is a problem. Space-grade devices of sufficient density are
essentially non-existent. Some companies have screened their own FLASH memory
devices and other efforts like the development of Chalcogenide RAM are underway. Fast
rad-hard SDRAM is also problematic. The performance of many applications is bounded
by memory bandwidth and the availability of high-speed SDRAM is limited.
FPGAs power distribution becomes an important concern. High power with low core
voltages makes a Switching Point of Load (SPOL) power system desirable to eliminate
losses. Linear regulators can be used but waste power and generate excessive heat that
must be managed by the thermal system. A better solution is a SPOL converter, but
again they do not appear to be available as space grade parts.
There are also risks in other areas. The RSC is targeting the rad-tolerant Xilinx V4 and
the Actel RTAX devices which are yet unproven in space systems. Also the TLK2711
SERDES which is used to implement high-speed serial communications provides risk as
initial parts are not available until the second quarter of 2006.
Project Status
The high-level processing architecture and physical design approach of the RSC has
been established although it is under continuous refinement as more details are added.
A proof of concept system with commercial development boards, MicroBlaze
processors, uClinux, and MPI has been demonstrated. The SPACE-104 mechanical
specification is in its initial draft form. The board design for the prototype RPM is
complete. The first build of the RPM is expected in October of this year.
Conclusion
A high-level architecture for a reconfigurable computing system for space has been
outlined. A multi-processor architecture using soft core processors with operating system
and message passing support is coupled with the capability to utilize custom cores for
specialized processing. The challenges of the space environment are addressed through
fault mitigation techniques, parts selection, and physical design. The Reconfigurable
Scalable Computing project is refining and implementing this new and challenging
computing architecture for future space exploration mission.
7
Hodson/130
MAPLD 2005
References
[1] Accelerated Image Processing on FPGAs, B. Draper, R. Beveridge, W. Böhm, C.
Ross, M. Chawathe. Submitted to IEEE Transactions on Image Processing.
[2] A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using
Dynamic Hardware/Software Partitioning, R. Lysecky and F. Vahid, Design Automation
and Test in Europe (DATE), March 2005.
[3] Complex Upset Mitigation Applied to a Reconfigurable Embedded Processor, S.
Rezgui, G. Swift, K. Somervill, J. George, C. Carmichael, and G. Allen, IEEE Nuclear
and Space Radiation Effects Conference (NSREC'05), July 2005.
8
Hodson/130
MAPLD 2005