Download Toolchains - Compiler and OS support for embedded multiprocessors

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Microprocessor wikipedia , lookup

Fault tolerance wikipedia , lookup

Emulator wikipedia , lookup

Immunity-aware programming wikipedia , lookup

Embedded system wikipedia , lookup

Transcript
Toolchains - Compiler and OS support for
embedded multiprocessors
Timo Töyry
Department of Computer Science and Engineering
Aalto University, School of Science
timo.toyry(at)aalto.fi
16.9.2011
Program performance analysis
I
I
Program performance is usually measured by measuring its
execution time
Different execution times:
I
I
I
Worst-case execution time (WCET)
Best-case execution time (BCET)
Average-case execution time
I
Performance models
I
Path analysis
I
Path timing
T-106.6200 High-performance embedded computing
16.9.2011
2/30
Performance models
I
Performance estimation is usually done in instruction-level
I
In theory calculation of, for example, WCET is quite simple.
Just sum up the execution times of the instructions of the
sequence.
I
In reality it is hard to say how long an instruction will be
executed without knowing the internal state of the processor
There are several factors which contribute to execution time of
a instruction:
I
I
I
I
Pipelining; dependencies between instructions
Caches; access to data found from cache is a lot faster
DRAM refreshing latencies
T-106.6200 High-performance embedded computing
16.9.2011
3/30
Path analysis
I
Use some abstract program flow analysis method to bound
the set of feasible paths
I
Path analysis is equivalent to the halting problem, so it is not
possible to find the exact set of feasible paths and also some
not feasible paths might be included
I
Currently integer linear programming (ILP) is utilized by many
WCET methologies to solve paths
I
In ILP a set of constrains describe the program and partly its
behaviour. Then a solver can be used to find the values for
variables that identify the longest path through the program.
I
Many systems support user constrains. They are usefull
because the developer may have some knowledge of the
behaviour of the program that cannot be produced by analysis
software
T-106.6200 High-performance embedded computing
16.9.2011
4/30
Path timing
I
Several techniques to analyze path timings at different levels
of abstraction
I
Abstract interpretation
I
Data flow analysis
I
Simulation
T-106.6200 High-performance embedded computing
16.9.2011
5/30
Memory-oriented optimizations
I
Motivation for memory-optimizations: Memory access is
expensive
I
Loop transformations: permutation, unrolling, splitting, fusion,
tiling, padding, index rewriting
I
Buffer size optimization
I
Problems with dynamic memory allocation
I
Improve cache hit rate by reducing conflicts
I
Data arrangment so that it benefits from prefetching
Main memory optimizations
I
I
I
I
Burst access modes
Paged memories
Banked memories
T-106.6200 High-performance embedded computing
16.9.2011
6/30
Code generation and back-end compilation
I
I
ASIPs require modifications to the compiler to use the
application specific instructions
Main steps in code generation:
I
I
I
I
I
Instruction selection
Register allocation
Address generation
Instruction scheduling
Code placement
I
I
Important because it affects memory performance (speed and
energy consumption)
Cache line collisions can be avoided by assigning conflicting
code to nonconflicting block
T-106.6200 High-performance embedded computing
16.9.2011
7/30
Real-time process scheduling
I
Terms
I
Real-time scheduling algorithms
I
Scheduling for dynamic voltage scaling
I
Performance estimation
T-106.6200 High-performance embedded computing
16.9.2011
8/30
Terms
I
Thread, process, subtask, task
I
Time quantum, context switch
I
Schedule
Scheduling algorithms
I
I
Static
I
I
I
I
I
Constructive - uses rules to select next task
Iterative improvement - revisits its decisions to change the order
of tasks
Dynamic
Priority
Real-time scheduling
I
I
Hard real-time
Soft real-time
T-106.6200 High-performance embedded computing
16.9.2011
9/30
Terms cont.
I
Deadlines
Figure: Deadline terminology, (C) Wayne Wolf
T-106.6200 High-performance embedded computing
16.9.2011
10/30
Real-time scheduling algorithms
I
Static scheduling algorithms and quite close to code synthesis
I
Static schedulers usually look at data dependencies between
processes
I
As-soon-as-possible (ASAP)
I
As-late-as-possible (ALAP)
I
If a process have same place in ASAP and ALAP schedules it
is called critical process
I
Resource dependencies will add their own limitations for the
schedule
T-106.6200 High-performance embedded computing
16.9.2011
11/30
Real-time scheduling algorithms cont.
I
Dynamic scheduling is often priority driven
I
Priorities can be static or dynamic
Common priority scheduling strategies
I
I
I
Rate-monotonic-scheduling (RMS)
Earliest-deadline-first (EDF)
T-106.6200 High-performance embedded computing
16.9.2011
12/30
Scheduling for dynamic voltage scaling
I
There are several studies howto schedule tasks on processors
implementing dynamic voltage and frequency scaling (DVFS)
I
How much the processor can be slown down so that there is
still feasible schedule
I
One commonly used technique is to maximize the lenght of
idle periods of the processor
T-106.6200 High-performance embedded computing
16.9.2011
13/30
Performance estimation
I
Execution time of a process is not fixed
I
Data-depedencies may cause delays
I
Caches will cause more variation to execution time than
data-dependencies
I
In multitasking caches are often trashed by previous process
I
Behaviour of caches need to be modelled to get more
accurate estimates for execution times
T-106.6200 High-performance embedded computing
16.9.2011
14/30
Operating system design
I
Real-time vs. general-purpose OS
I
Memory management in embedded OS
I
Structure of real-time OS
I
OS overhead
I
Support for scheduling
I
Interprocess communication (IPC) mechanisms
I
Power management
I
File systems in embedded devices
T-106.6200 High-performance embedded computing
16.9.2011
15/30
Memory management in embedded OS
I
More a general-purpose OS feature, but many embedded
OSes need to take case of some general tasks.
I
Memory mapping hardware can be used for memory
protection
I
Virtual memory system allow programs to use large amounts
of memory
T-106.6200 High-performance embedded computing
16.9.2011
16/30
Structure of real-time OS
I
Typical real-time OS have two key parts:
I
I
Scheduler
Interrupt-handling
I
Hardware generated interrupts have higher priority than any
process running in the OS
I
Interrupts may infect or compromice the real-time
performance of the system
Common technique to avoid this is to slit interrupt-handling to
two parts:
I
I
I
Interrupt service routine (ISR)
Interrupt service thread (IST)
T-106.6200 High-performance embedded computing
16.9.2011
17/30
OS overhead
I
OS will cause some over head to system in form of
context-switching
I
Time required by context-switches increases when scheduled
processes are short and when system utilization is high
I
The effect of context-switching can be studied with simulator
Hardware support for scheduling
I
Scheduler may use significant fraction of processor time
I
To remove this load from the processor one solution is to use
hardware scheduling
I
In hardware scheduling the scheduling algorithm is
implemented as co-processor or accelerator, which monitors
the processor(s) state to determine which process will be run
next on which processor
T-106.6200 High-performance embedded computing
16.9.2011
18/30
Interprocess communication mechanisms
I
General-purpose systems move large amounts of data with
IPC so they use heavily buffered IPC mechanisms to speedup
overall performance in expence of latency
I
Embedded systems may have to do same but without
compromizing the real-time requirements
I
Mailboxes are a common IPC mechanisms used in embedded
systems
I
A mailbox can be implemented in soft- and hardware
I
A mailbox can have one writer and multiple readers, it can
store quite limited amount of data
I
Hardware mailboxes are used in some OMAPs for
communcation between ARM and DSP cores
I
For larger amounts of data some other solutions like
specialized memories must be used
T-106.6200 High-performance embedded computing
16.9.2011
19/30
Power management
I
Dynamic power management changes system state to
optimize energy consumption
I
Power management is usually centralized to OS, which will
then monitor and manage energy consumption as a normal
resource
I
Centralized power state management makes sure that all
intrested components get notified of the state change
I
In PCs advanced configuration and power interface (ACPI) is
widely used
I
ACPI specifies global power states for power management
T-106.6200 High-performance embedded computing
16.9.2011
20/30
File systems in embedded devices
I
Have to be designed differently than desktop machines file
systems, because
I
I
I
Energy limitations
Small size
Physical characteristics of flash memory
I
Flash memories are typically written in large blocks
I
Flash memories have quite limited number of
erase-write-cycles
T-106.6200 High-performance embedded computing
16.9.2011
21/30
Verification
I
Systems are verified by using formal methods on abstract
models of the system
I
Liveness is a important property of a concurrent system
I
Deadlock is a important property of a communicating
processes
I
Temporal logic can be used to describe specific system
properties
T-106.6200 High-performance embedded computing
16.9.2011
22/30
Embedded multiprocessors software
I
Similar problems as with traditional multiprocessor systems:
I
I
I
I
Embedded multicore processors are usually heterogenous,
which may cause some problems:
I
I
I
I
I
I
Variable delays introduce timing bugs
Nonpredictable delays
Longer delays for memory access
It might requite some work to get software running on different
processor types to work togetger. Common problem is
endianness.
Development tools are often just package of tools for all the
component processors
Different processors have different resources and may have
different interfaces for shared resources
Communication between processors is not free
Scheduling is harder
Dynamic resource allocation
T-106.6200 High-performance embedded computing
16.9.2011
23/30
Real-time multiprocessors operating systems
I
Role of the OS
I
Multiprocessor scheduling
I
Scheduling with dynamic tasks
T-106.6200 High-performance embedded computing
16.9.2011
24/30
Role of the OS
I
Many embedded multiprocessors run a separate OS on each
processor
I
If the embedded multiprocessor run real multiprocessor OS it
can more tightly control activity on each processor
I
Master/slave processors, master manages all
I
Master/slave kernels, each kernel make local decisions,
master kernel tells the schedules for slave kernels
I
Expences of communication causes that master usually does
not have complete knowledgement about slaves state
T-106.6200 High-performance embedded computing
16.9.2011
25/30
Multiprocessor scheduling
I
Multiprocessor scheduling is NP-complete problem
I
Heuristic are used to find “best” schedule
I
Interprocess data-dependencies make things more
complicated
I
Multiprocessor scheduling is much easier on SMP systems
than on heterogeous systems
T-106.6200 High-performance embedded computing
16.9.2011
26/30
Scheduling with dynamic tasks
I
If new tasks are created dynamically it is impossible to
guarantee that all requirements are met
I
It must be figured out on-the-fly if a new task can be accepted
and on with processor it will be executed
I
Decisions must be done quickly since all delays shorten the
execution time of the task before is deadline
I
Load balancing is a form of dynamic task allocation
T-106.6200 High-performance embedded computing
16.9.2011
27/30
Services and middleware for embedded
multiprocessors
I
I
Applications are built by using services which are offered by
OS or by other software (such as middleware)
Usages for middleware in embedded systems:
I
I
I
I
Speed up application development by providing basic services
Simplify porting of applications from one embedded system to
another which supports same middleware
Efficient and correct implementation of key features
Standards-based services
I
I
I
Common object request broker architecture (CORBA)
RT-CORBA
Message passing interface (MPI)
T-106.6200 High-performance embedded computing
16.9.2011
28/30
Services and middleware for embedded
multiprocessors cont.
I
System-on-chip services
I
I
Utilize custom middleware
Quality-of-service (QoS)
I
I
I
Applications
Application-specific
libraries
Process is realiably scheduled
Interprocess communication
periodically with given amount of
execution time
Real-time operating system
Some types of schedulers such
Hardware abstraction layer
as RMS have “build in” support
for QoS
Simple QoS model: Contract,
Figure: Typical software
protocol, scheduler
stack, (C) Wayne Wolf
T-106.6200 High-performance embedded computing
16.9.2011
29/30
Design verification of multiprocessor software
I
I
Verifying multiprocessor software is harder than verification of
uniprocessor software
Some common reasons for that:
I
I
I
I
Data is harder to observe and/or control
Harder to get some parts of system to desired state
Effects of timing are hard to generate and test
Simulators, especially cycle accurate simulators, can be used
to get information about performance and energy
consumption
T-106.6200 High-performance embedded computing
16.9.2011
30/30