Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Toolchains - Compiler and OS support for embedded multiprocessors Timo Töyry Department of Computer Science and Engineering Aalto University, School of Science timo.toyry(at)aalto.fi 16.9.2011 Program performance analysis I I Program performance is usually measured by measuring its execution time Different execution times: I I I Worst-case execution time (WCET) Best-case execution time (BCET) Average-case execution time I Performance models I Path analysis I Path timing T-106.6200 High-performance embedded computing 16.9.2011 2/30 Performance models I Performance estimation is usually done in instruction-level I In theory calculation of, for example, WCET is quite simple. Just sum up the execution times of the instructions of the sequence. I In reality it is hard to say how long an instruction will be executed without knowing the internal state of the processor There are several factors which contribute to execution time of a instruction: I I I I Pipelining; dependencies between instructions Caches; access to data found from cache is a lot faster DRAM refreshing latencies T-106.6200 High-performance embedded computing 16.9.2011 3/30 Path analysis I Use some abstract program flow analysis method to bound the set of feasible paths I Path analysis is equivalent to the halting problem, so it is not possible to find the exact set of feasible paths and also some not feasible paths might be included I Currently integer linear programming (ILP) is utilized by many WCET methologies to solve paths I In ILP a set of constrains describe the program and partly its behaviour. Then a solver can be used to find the values for variables that identify the longest path through the program. I Many systems support user constrains. They are usefull because the developer may have some knowledge of the behaviour of the program that cannot be produced by analysis software T-106.6200 High-performance embedded computing 16.9.2011 4/30 Path timing I Several techniques to analyze path timings at different levels of abstraction I Abstract interpretation I Data flow analysis I Simulation T-106.6200 High-performance embedded computing 16.9.2011 5/30 Memory-oriented optimizations I Motivation for memory-optimizations: Memory access is expensive I Loop transformations: permutation, unrolling, splitting, fusion, tiling, padding, index rewriting I Buffer size optimization I Problems with dynamic memory allocation I Improve cache hit rate by reducing conflicts I Data arrangment so that it benefits from prefetching Main memory optimizations I I I I Burst access modes Paged memories Banked memories T-106.6200 High-performance embedded computing 16.9.2011 6/30 Code generation and back-end compilation I I ASIPs require modifications to the compiler to use the application specific instructions Main steps in code generation: I I I I I Instruction selection Register allocation Address generation Instruction scheduling Code placement I I Important because it affects memory performance (speed and energy consumption) Cache line collisions can be avoided by assigning conflicting code to nonconflicting block T-106.6200 High-performance embedded computing 16.9.2011 7/30 Real-time process scheduling I Terms I Real-time scheduling algorithms I Scheduling for dynamic voltage scaling I Performance estimation T-106.6200 High-performance embedded computing 16.9.2011 8/30 Terms I Thread, process, subtask, task I Time quantum, context switch I Schedule Scheduling algorithms I I Static I I I I I Constructive - uses rules to select next task Iterative improvement - revisits its decisions to change the order of tasks Dynamic Priority Real-time scheduling I I Hard real-time Soft real-time T-106.6200 High-performance embedded computing 16.9.2011 9/30 Terms cont. I Deadlines Figure: Deadline terminology, (C) Wayne Wolf T-106.6200 High-performance embedded computing 16.9.2011 10/30 Real-time scheduling algorithms I Static scheduling algorithms and quite close to code synthesis I Static schedulers usually look at data dependencies between processes I As-soon-as-possible (ASAP) I As-late-as-possible (ALAP) I If a process have same place in ASAP and ALAP schedules it is called critical process I Resource dependencies will add their own limitations for the schedule T-106.6200 High-performance embedded computing 16.9.2011 11/30 Real-time scheduling algorithms cont. I Dynamic scheduling is often priority driven I Priorities can be static or dynamic Common priority scheduling strategies I I I Rate-monotonic-scheduling (RMS) Earliest-deadline-first (EDF) T-106.6200 High-performance embedded computing 16.9.2011 12/30 Scheduling for dynamic voltage scaling I There are several studies howto schedule tasks on processors implementing dynamic voltage and frequency scaling (DVFS) I How much the processor can be slown down so that there is still feasible schedule I One commonly used technique is to maximize the lenght of idle periods of the processor T-106.6200 High-performance embedded computing 16.9.2011 13/30 Performance estimation I Execution time of a process is not fixed I Data-depedencies may cause delays I Caches will cause more variation to execution time than data-dependencies I In multitasking caches are often trashed by previous process I Behaviour of caches need to be modelled to get more accurate estimates for execution times T-106.6200 High-performance embedded computing 16.9.2011 14/30 Operating system design I Real-time vs. general-purpose OS I Memory management in embedded OS I Structure of real-time OS I OS overhead I Support for scheduling I Interprocess communication (IPC) mechanisms I Power management I File systems in embedded devices T-106.6200 High-performance embedded computing 16.9.2011 15/30 Memory management in embedded OS I More a general-purpose OS feature, but many embedded OSes need to take case of some general tasks. I Memory mapping hardware can be used for memory protection I Virtual memory system allow programs to use large amounts of memory T-106.6200 High-performance embedded computing 16.9.2011 16/30 Structure of real-time OS I Typical real-time OS have two key parts: I I Scheduler Interrupt-handling I Hardware generated interrupts have higher priority than any process running in the OS I Interrupts may infect or compromice the real-time performance of the system Common technique to avoid this is to slit interrupt-handling to two parts: I I I Interrupt service routine (ISR) Interrupt service thread (IST) T-106.6200 High-performance embedded computing 16.9.2011 17/30 OS overhead I OS will cause some over head to system in form of context-switching I Time required by context-switches increases when scheduled processes are short and when system utilization is high I The effect of context-switching can be studied with simulator Hardware support for scheduling I Scheduler may use significant fraction of processor time I To remove this load from the processor one solution is to use hardware scheduling I In hardware scheduling the scheduling algorithm is implemented as co-processor or accelerator, which monitors the processor(s) state to determine which process will be run next on which processor T-106.6200 High-performance embedded computing 16.9.2011 18/30 Interprocess communication mechanisms I General-purpose systems move large amounts of data with IPC so they use heavily buffered IPC mechanisms to speedup overall performance in expence of latency I Embedded systems may have to do same but without compromizing the real-time requirements I Mailboxes are a common IPC mechanisms used in embedded systems I A mailbox can be implemented in soft- and hardware I A mailbox can have one writer and multiple readers, it can store quite limited amount of data I Hardware mailboxes are used in some OMAPs for communcation between ARM and DSP cores I For larger amounts of data some other solutions like specialized memories must be used T-106.6200 High-performance embedded computing 16.9.2011 19/30 Power management I Dynamic power management changes system state to optimize energy consumption I Power management is usually centralized to OS, which will then monitor and manage energy consumption as a normal resource I Centralized power state management makes sure that all intrested components get notified of the state change I In PCs advanced configuration and power interface (ACPI) is widely used I ACPI specifies global power states for power management T-106.6200 High-performance embedded computing 16.9.2011 20/30 File systems in embedded devices I Have to be designed differently than desktop machines file systems, because I I I Energy limitations Small size Physical characteristics of flash memory I Flash memories are typically written in large blocks I Flash memories have quite limited number of erase-write-cycles T-106.6200 High-performance embedded computing 16.9.2011 21/30 Verification I Systems are verified by using formal methods on abstract models of the system I Liveness is a important property of a concurrent system I Deadlock is a important property of a communicating processes I Temporal logic can be used to describe specific system properties T-106.6200 High-performance embedded computing 16.9.2011 22/30 Embedded multiprocessors software I Similar problems as with traditional multiprocessor systems: I I I I Embedded multicore processors are usually heterogenous, which may cause some problems: I I I I I I Variable delays introduce timing bugs Nonpredictable delays Longer delays for memory access It might requite some work to get software running on different processor types to work togetger. Common problem is endianness. Development tools are often just package of tools for all the component processors Different processors have different resources and may have different interfaces for shared resources Communication between processors is not free Scheduling is harder Dynamic resource allocation T-106.6200 High-performance embedded computing 16.9.2011 23/30 Real-time multiprocessors operating systems I Role of the OS I Multiprocessor scheduling I Scheduling with dynamic tasks T-106.6200 High-performance embedded computing 16.9.2011 24/30 Role of the OS I Many embedded multiprocessors run a separate OS on each processor I If the embedded multiprocessor run real multiprocessor OS it can more tightly control activity on each processor I Master/slave processors, master manages all I Master/slave kernels, each kernel make local decisions, master kernel tells the schedules for slave kernels I Expences of communication causes that master usually does not have complete knowledgement about slaves state T-106.6200 High-performance embedded computing 16.9.2011 25/30 Multiprocessor scheduling I Multiprocessor scheduling is NP-complete problem I Heuristic are used to find “best” schedule I Interprocess data-dependencies make things more complicated I Multiprocessor scheduling is much easier on SMP systems than on heterogeous systems T-106.6200 High-performance embedded computing 16.9.2011 26/30 Scheduling with dynamic tasks I If new tasks are created dynamically it is impossible to guarantee that all requirements are met I It must be figured out on-the-fly if a new task can be accepted and on with processor it will be executed I Decisions must be done quickly since all delays shorten the execution time of the task before is deadline I Load balancing is a form of dynamic task allocation T-106.6200 High-performance embedded computing 16.9.2011 27/30 Services and middleware for embedded multiprocessors I I Applications are built by using services which are offered by OS or by other software (such as middleware) Usages for middleware in embedded systems: I I I I Speed up application development by providing basic services Simplify porting of applications from one embedded system to another which supports same middleware Efficient and correct implementation of key features Standards-based services I I I Common object request broker architecture (CORBA) RT-CORBA Message passing interface (MPI) T-106.6200 High-performance embedded computing 16.9.2011 28/30 Services and middleware for embedded multiprocessors cont. I System-on-chip services I I Utilize custom middleware Quality-of-service (QoS) I I I Applications Application-specific libraries Process is realiably scheduled Interprocess communication periodically with given amount of execution time Real-time operating system Some types of schedulers such Hardware abstraction layer as RMS have “build in” support for QoS Simple QoS model: Contract, Figure: Typical software protocol, scheduler stack, (C) Wayne Wolf T-106.6200 High-performance embedded computing 16.9.2011 29/30 Design verification of multiprocessor software I I Verifying multiprocessor software is harder than verification of uniprocessor software Some common reasons for that: I I I I Data is harder to observe and/or control Harder to get some parts of system to desired state Effects of timing are hard to generate and test Simulators, especially cycle accurate simulators, can be used to get information about performance and energy consumption T-106.6200 High-performance embedded computing 16.9.2011 30/30