Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
National Sun Yat-Sen University Embedded System Laboratory XMCAPI:Inter-Core Communication Interface on Multi-chip Embedded Systems Presenter: Hung-Lun Chen 2013/12/2 Miura, S. Center for Comput. Sci., Univ. of Tsukuba, Tsukuba, Japan Hanawa, T. ; Boku, T. ; Sato, M. Embedded and Ubiquitous Computing (EUC), 2011 IFIP 9th International Conference on 1 2017/5/25 2 Multi-core processor technology has been applied to the processors in embedded systems as well as in ordinary PC systems. In multi-core embedded processors, however, a processor may consist of heterogeneous CPU cores that are not configured with a shared memory and do not have a communication mechanism for inter-core communication. MCAPI is a highly portable API standard for providing inter-core communication independent of the architecture heterogeneity. In this paper, we extend the current MCAPI to a multi-chip in a distributed memory configuration and propose its portable implementation, named XMCAPI, on a commodity network stack. With XMCAPI, the inter-core communication method for intra-chip cores is extended to inter-chip cores. We evaluate the XMCAPI implementation, xmcapi/ip, on a standard socket in a portable software development environment. 2017/5/25 IA32 Embedded System Memory Shared Often not shared Cache Coherent cache Often not shared Fig 1. Comparison of IA32 and Embedded System 3 Therefore, communication mechanisms in IA32 may not be appropriate to implement in embedded system. So, we need an efficient communication mechanism to coordinate the cores in distributed or parallel applications. 2017/5/25 Using MCAPI to create a new communication mechanism that not only support intra-chip communication, but also support inter-chip communication by distributed memory, named XMCAPI. MCAPI:MULTICORE COMMUNICATIONS API WORKING GROUP (MCAPI®) Why we choose the MCAPI to implement not MPI or OpenMP? MPI or OpenMP 。 Consumes too much resources from a system and many of the functions they provide are overkill. 。 Large footprint required by their library implementation can not fit into the local memory for most embedded platforms. MCAPI 。 High portability、Independent、flexibility、low overhead、small memory footprint Fig 2. The overview of the MCAPI from their official website 4 2017/5/25 [1] [2][3] [4][5][6] Avoid race conditions Improve Performance [7] [9][10] Message Passing Interface 1. High portability 2. Independent 3. flexibility 4. low overhead 5. small memory footprint Support physical interface [8] Light weight This paper : XMCAPI: Inter-Core Communication Interface on Multi-chip Embedded Systems 5 2017/5/25 Supports a variety of physical interfaces and protocols. If shared memory is available, we should use it for inter-core communication. Libraries used OpenMP:Most typical MPI implementation PM Libraries:Support multiple physical interfaces Why not we just use the traditional communication libraries? To achieve the better performance for MCAPI, XMCAPI should access the physical interface directly to avoids overheads. OFED InfiniBand is a network PEARL 1.communications OpenFabrics Enterprise Ethernet link used Process and Experiment Distribution 1. It is the common network in high-performance 2. The OFED stack includes Automation Realtime type in modern development computing and enterprise softwareis drivers, core kernelLanguage, a computer 2. It defines the connection data centers. code, middleware, and user-level programming language in physical layer, interfaces. designed for multitasking and real-time programming Fig 3. Overview of XMCAPI software stacks 6 2017/5/25 Ethernet Commonly used in many systems, including embedded system. TCP/IP ( Socket APIs) Advantage:Does not depend on the operating system and the network device. Disadvantage:Communication performance decreased due to does not clearly defined services/interfaces/protocols in its library. 。 Against the property of XMCAPI of accessing the physical interface directly. To improve the portability of the program, the paper implement the XMCAPI with socket API, named XMCAPI/IP module. Main purpose of XMCAPI/IP is to provide a test bed of the MCAPI application. Advantage:Portability Disadvantage:Decrease the communication performance because using TCP/IP 。 A trade-off between the success of implementation and cost of using TCP/IP. 。 Our purpose here is to extend the utilization and coverage field of MCAPI from its limited platform to a wider system configuration with multi-chip solutions. 7 2017/5/25 Communicator thread and User thread are implemented by POSIX threads (i.e.,pthreads) User thread:User applications Communicator thread:Cores communication Communicator thread: epoll() (event triger):Manage the connection between two or more cores, and notify the system with pipe(). pipe():Used only for event and control signal notification and the exchange of data. Fig 4. Overview of the XMCAPI/IP implementation 8 2017/5/25 Flow: First step:User thread tells the communicator thread to send data. Second step:The communicator thread send the data from the request. Third step:The other side communicator thread receive data and stored it in a buffer. ( bits) send_id, receiver_id, port_id Message type Header size:24 bytes Determines it is an ACK packet or not Indicates the queue pairs Size of the payload Data 9 Fig 5. Packet format on the XMCAPI/IP module 2017/5/25 Message Data arrival guaranteed by TCP (Handshaking protocol) Problem: 。 The receiver buffer may not have enough space to store data Solution: 。 If receiver buffer is full, it respond with an ACK having a stop flag. 。 If receiver buffer have enough space, it tell the sender to send data. Packet Channel (Unidirectional FIFO) XMCAPI/IP access the buffer of the user application directly. 。 Advantage:User applications can access the receiver buffer directly, so unnecessary memory copies are reduced in the packet channel. Scalar Channel (Unidirectional FIFO) 10 Advantage:If we apply this mechanism in the XMCAPI/IP module, we can also reduce the frequency of read()/write() system calls. 2017/5/25 SC:Single-Core DC:Dual-Core 64 74 28 Fig 6. Latency of various data size of the MCAPI message In this evaluation, the latency of XMCAPI/IP is larger than normal TCP/IP. The overhead of 10 usec is added by the multi-threading operation in the xmcapi/ip (SC) xmcapi/ip (DC) single-core environment. Latency 11 74-10-10=54 64-10=54 The communication time for pipe() of 10 usec is added to the XMCAPI/IP environment. 2017/5/25 We transmit data of 1.0 Gbytes between two nodes in a singledirection communication. The performance in single-core and dual-core environments are approximately 112 Mbytes/sec at 32 Kbytes. Realized 90% performance of the XMCAPI/IP environment. 112 Mbytes/sec at 32 KBytes Fig 7. Bandwidth of various data size on the MCAPI packet channel 12 2017/5/25 XMCAPI is an extension of the MCAPI concept with the purpose of allowing inter-chip and inter-node to communicate while keeping the compatibility with the original MCAPI API. To provide high portability on various hardware platforms, we implemented XMCAPI based on TCP/IP on an Ethernet with a socket library. To support asynchronous communication in MCAPI, we introduced the communicator thread with POSIX threads for each user thread. The added thread causes a certain amount of overhead that increases the latency for short messages. 13 This overhead is accepted to keep the compatibility and portability. We could achieve approximately 90% of the theoretical peak performance on Gigabit Ethernet as the bandwidth. My comments Experiments does not have the other implementations to compare. The size of library of XMCAPI/IP does not shown in the paper. Latency still a little too high, and it needs to be reduced. 2017/5/25