Download 投影片 1

National Sun Yat-Sen University Embedded System Laboratory XMCAPI：Inter-Core Communication Interface on Multi-chip Embedded Systems Presenter: Hung-Lun Chen 2013/12/2 Miura, S. Center for Comput. Sci., Univ. of Tsukuba, Tsukuba, Japan Hanawa, T. ; Boku, T. ; Sato, M. Embedded and Ubiquitous Computing (EUC), 2011 IFIP 9th International Conference on 1 2017/5/25  2 Multi-core processor technology has been applied to the processors in embedded systems as well as in ordinary PC systems. In multi-core embedded processors, however, a processor may consist of heterogeneous CPU cores that are not configured with a shared memory and do not have a communication mechanism for inter-core communication. MCAPI is a highly portable API standard for providing inter-core communication independent of the architecture heterogeneity. In this paper, we extend the current MCAPI to a multi-chip in a distributed memory configuration and propose its portable implementation, named XMCAPI, on a commodity network stack. With XMCAPI, the inter-core communication method for intra-chip cores is extended to inter-chip cores. We evaluate the XMCAPI implementation, xmcapi/ip, on a standard socket in a portable software development environment. 2017/5/25 IA32 Embedded System Memory Shared Often not shared Cache Coherent cache Often not shared Fig 1. Comparison of IA32 and Embedded System 3  Therefore, communication mechanisms in IA32 may not be appropriate to implement in embedded system.  So, we need an efficient communication mechanism to coordinate the cores in distributed or parallel applications. 2017/5/25  Using MCAPI to create a new communication mechanism that not only support intra-chip communication, but also support inter-chip communication by distributed memory, named XMCAPI.   MCAPI：MULTICORE COMMUNICATIONS API WORKING GROUP (MCAPI®) Why we choose the MCAPI to implement not MPI or OpenMP?  MPI or OpenMP 。 Consumes too much resources from a system and many of the functions they provide are overkill. 。 Large footprint required by their library implementation can not fit into the local memory for most embedded platforms.  MCAPI 。 High portability、Independent、flexibility、low overhead、small memory footprint Fig 2. The overview of the MCAPI from their official website 4 2017/5/25 [1] [2][3] [4][5][6] Avoid race conditions Improve Performance [7] [9][10] Message Passing Interface 1. High portability 2. Independent 3. flexibility 4. low overhead 5. small memory footprint Support physical interface [8] Light weight This paper : XMCAPI: Inter-Core Communication Interface on Multi-chip Embedded Systems 5 2017/5/25  Supports a variety of physical interfaces and protocols.  If shared memory is available, we should use it for inter-core communication.  Libraries used   OpenMP：Most typical MPI implementation  PM Libraries：Support multiple physical interfaces Why not we just use the traditional communication libraries?  To achieve the better performance for MCAPI, XMCAPI should access the physical interface directly to avoids overheads. OFED InfiniBand is a network PEARL 1.communications OpenFabrics Enterprise Ethernet link used Process and Experiment Distribution 1. It is the common network in high-performance 2. The OFED stack includes Automation Realtime type in modern development computing and enterprise softwareis drivers, core kernelLanguage, a computer 2. It defines the connection data centers. code, middleware, and user-level programming language in physical layer, interfaces. designed for multitasking and real-time programming Fig 3. Overview of XMCAPI software stacks 6 2017/5/25  Ethernet   Commonly used in many systems, including embedded system. TCP/IP ( Socket APIs)  Advantage：Does not depend on the operating system and the network device.  Disadvantage：Communication performance decreased due to does not clearly defined services/interfaces/protocols in its library. 。 Against the property of XMCAPI of accessing the physical interface directly.  To improve the portability of the program, the paper implement the XMCAPI with socket API, named XMCAPI/IP module.  Main purpose of XMCAPI/IP is to provide a test bed of the MCAPI application.  Advantage：Portability  Disadvantage：Decrease the communication performance because using TCP/IP 。 A trade-off between the success of implementation and cost of using TCP/IP. 。 Our purpose here is to extend the utilization and coverage field of MCAPI from its limited platform to a wider system configuration with multi-chip solutions. 7 2017/5/25   Communicator thread and User thread are implemented by POSIX threads (i.e.,pthreads)  User thread：User applications  Communicator thread：Cores communication Communicator thread：  epoll() (event triger)：Manage the connection between two or more cores, and notify the system with pipe().  pipe()：Used only for event and control signal notification and the exchange of data. Fig 4. Overview of the XMCAPI/IP implementation 8 2017/5/25  Flow：  First step：User thread tells the communicator thread to send data.  Second step：The communicator thread send the data from the request.  Third step：The other side communicator thread receive data and stored it in a buffer. ( bits) send_id, receiver_id, port_id Message type Header size：24 bytes Determines it is an ACK packet or not Indicates the queue pairs Size of the payload Data 9 Fig 5. Packet format on the XMCAPI/IP module 2017/5/25  Message  Data arrival guaranteed by TCP (Handshaking protocol)  Problem：。 The receiver buffer may not have enough space to store data  Solution：。 If receiver buffer is full, it respond with an ACK having a stop flag. 。 If receiver buffer have enough space, it tell the sender to send data.  Packet Channel (Unidirectional FIFO)  XMCAPI/IP access the buffer of the user application directly. 。 Advantage：User applications can access the receiver buffer directly, so unnecessary memory copies are reduced in the packet channel.  Scalar Channel (Unidirectional FIFO)  10 Advantage：If we apply this mechanism in the XMCAPI/IP module, we can also reduce the frequency of read()/write() system calls. 2017/5/25 SC：Single-Core DC：Dual-Core 64 74 28 Fig 6. Latency of various data size of the MCAPI message  In this evaluation, the latency of XMCAPI/IP is larger than normal TCP/IP.  The overhead of 10 usec is added by the multi-threading operation in the xmcapi/ip (SC) xmcapi/ip (DC) single-core environment. Latency  11 74-10-10=54 64-10=54 The communication time for pipe() of 10 usec is added to the XMCAPI/IP environment. 2017/5/25  We transmit data of 1.0 Gbytes between two nodes in a singledirection communication.  The performance in single-core and dual-core environments are approximately 112 Mbytes/sec at 32 Kbytes.  Realized 90% performance of the XMCAPI/IP environment. 112 Mbytes/sec at 32 KBytes Fig 7. Bandwidth of various data size on the MCAPI packet channel 12 2017/5/25  XMCAPI is an extension of the MCAPI concept with the purpose of allowing inter-chip and inter-node to communicate while keeping the compatibility with the original MCAPI API.  To provide high portability on various hardware platforms, we implemented XMCAPI based on TCP/IP on an Ethernet with a socket library.  To support asynchronous communication in MCAPI, we introduced the communicator thread with POSIX threads for each user thread. The added thread causes a certain amount of overhead that increases the latency for short messages.  13 This overhead is accepted to keep the compatibility and portability.  We could achieve approximately 90% of the theoretical peak performance on Gigabit Ethernet as the bandwidth.  My comments  Experiments does not have the other implementations to compare.  The size of library of XMCAPI/IP does not shown in the paper.  Latency still a little too high, and it needs to be reduced. 2017/5/25

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download 投影片 1