Download ppt

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Scheduling (computing) wikipedia , lookup

Multi-core processor wikipedia , lookup

Transcript
User-Level Interprocess Communication
for Shared Memory Multiprocessors
Bershad, B. N., Anderson, T. E., Lazowska, E.D., and Levy, H. M.
Presented by: SHILPI AGARWAL
1
OUTLINE

InterProcess Communication



URPC
ITS COMPONENTS





2
Processor Reallocation
Data Transfer
Thread Management
Performance


Its problem
Latency
Throughput
Conclusion
IPC: INTERPROCESS COMMUNICATION



Central to the design of OS.
Communication between different address spaces
on the same machine.
Allows system decomposition across address space
boundaries.




3
Failure Isolation
Extensibility
Modularity
Usability of the address spaces depends on the
performance of the communication primitives.
Problems:

IPC is traditionally the responsibility of Kernel.

To switch from one address space to the other on the calling
processor in order to run the receiving thread there and then
returning back to the caller thread requires the kernel
Intervention.
High cost for invoking the kernel and reallocating processor to a
different address space.
LRPC indicates 70% of overhead can be attributed to kernel
mediation.
Degraded performance and added complexity when user-level
threads communicate across boundaries



4
OUTLINE



InterProcess Communication
URPC
ITS COMPONENTS




Performance



5
Processor Reallocation
Data Transfer
Thread Management
Latency
Throughput
Conclusion
Solution: URPC for shared memory
multiprocessors.




6
The user level thread packages on each machine
can be used to efficiently switch to a different thread
whenever caller or callee threads block.
Thus kernel can be eliminated from the path of
cross-address space communication.
Use shared memory to send messages directly
between address spaces.
Avoid Processor Reallocation (use processor
already active in the target address space).
URPC:







Client thread invokes a procedure at server.
It gets block, waiting for the reply.
While blocked, it can run another ready thread in the same address
space.
When reply arrives, the blocked thread can be rescheduled to any
processor allocated to its address space.
In the server side, execution can be done by the processor already
executing in the same address space.
IN LRPC: The blocked thread and the ready thread are the same
except running in the different address space.
IN URPC: It schedules another thread from same address space on
the clients processor.
Advantage: Less overhead in Context switch then Processor Reallocation
7
OUTLINE



InterProcess Communication
URPC
ITS COMPONENTS




Performance



8
Processor Reallocation
Data Transfer
Thread Management
Latency
Throughput
Conclusion
URPC

Division Of Responsibilities:





9
Processor Reallocation
Thread management
Data transfer
Only Processor reallocation requires kernel.
Move Thread Management and data Transfer
to User level.
Components of URPC
10
OUTLINE



InterProcess Communication
URPC
ITS COMPONENTS




Performance



11
Processor Reallocation
Data Transfer
Thread Management
Latency
Throughput
Conclusion
Processor Reallocation
Why should be avoided?




12
Deciding and transferring the processor between threads of
different address spaces
Requires privileged kernel mode to access protected mapping
registers
Diminished cost of cache and TLB.
Minimal latency same-address space context switch takes
about 15 microseconds on the C-VAX while cross-address
space processor reallocation takes 55 microseconds (doesn’t
consider long-term costs!).
URPC: Optimistic reallocation policy



13
Assumptions:
– Client has other work to do
– Server will soon have a processor to service a message
Doesn't perform well in all situations
– Uniprocessors
– Real-time applications
– High-latency I/O operations (require early initialization )
– Priority invocations
URPC allows forced processor reallocation to solve some of
these problems
Advantages over:
14

Handoff scheduling: a single kernel operation
blocks the client and reallocates its processor
directly to the server.

Kernel centralized data structure: creates
performance bottleneck (lock contention,
thread run queues and message channels)
If needed, Processor reallocation is done via
Kernel.
Needed for Load balancing Problem:
Idle processor at the client side can donate itself to
underpowered address space
Kernel required to change the processor’s virtual
memory context to underpowered address space.
The identity of the donating processor is made known
to the receiver.
15
Voluntary Return of Processors

It states:
A processor needs to be returned back to the client.



Voluntary return of processors cannot be Enforced.

URPC deals with Load balancing only for communicating applications

Preemptive policies, which forcibly reallocate processors from one
address space to other is required to avoid starvation.
No need for global Processor allocator (it could be done by the client
itself)

16
when all outstanding messages from the client have generated
replies.
when the client has become “underpowered.
Sample
execution
Client: Editor
Two servers: A window manager
A File cache manager
Two threads: T1 & T2
17
OUTLINE



InterProcess Communication
URPC
ITS COMPONENTS




Performance



18
Processor Reallocation
Data Transfer
Thread Management
Latency
Throughput
Conclusion
Data transfer using shared memory

In traditional RPC:




In URPC:







19
Clients and servers can overhead each other (deny service, fail to release
channel locks, provide bogus results.
Up to higher-level protocols to filter abuses up to application layer.
Kernel copies the data between address spaces
Logical channels of pair-wise shared memory
Applications access URPC procedures through Stubs layer
Stubs copy data in/out, no direct use of shared memory
Arguments are passed in buffers that are allocated and pair-wise mapped during
binding
Data queues monitored by application level thread management
Channels created & mapped once for every client/server pairing
A bidirectional shared memory queue with test and set locks is used for data flow.
OUTLINE



InterProcess Communication
URPC
ITS COMPONENTS




Performance



20
Processor Reallocation
Data Transfer
Thread Management
Latency
Throughput
Conclusion
Thread Management




Strong interaction between thread management (start….stop) and cross
address space communication (send…receive).
This close interaction can be exploited to achieve extremely good performance
for both (implemented together at user level)
Thread management facilities can be provided either kernel or User level but
high performance can be provided by user level.
Threads overhead can be decided over three points of reference:



Heavyweight: no distinction between a thread and its address space.
Middleweight: threads and address spaces are decoupled.
Lightweight: threads managed by user-level libraries.
: implies two level scheduling (light weight threads on the top of
weightier threads)
21
OUTLINE



InterProcess Communication
URPC
ITS COMPONENTS




Performance



22
Processor Reallocation
Data Transfer
Thread Management
Latency
Throughput
Conclusion
Performance of URPC
23
Call latency And Throughput
24
Latency increases when T> C + S
Pure latency =T=C=S=1= 93 micro secs,
Latency is proportional to the number of threads per CPU
T = C = S = 1 call latency is 93 microseconds
C = 1, S = 0, worst performance (need to reallocate processors frequently)
In both cases, C = 2, S = 2 yields best performance
Problems with URPC:

25
When T=1,latency is 373microsecs.
Every call requires two traps and two processor reallocations. At this point, URPC performs
worse than LRPC (157 microsecs)
Why?
1. Processor reallocation in URPC is based on LRPC.
2. URPC integrated with two level scheduling
–
Is there an idle processor ? and
–
is there an underpowered address space to which it can be reallocated ?

Two processors for single computation, only one active at a time. (Due to synchronous
nature of RPC)

Not ideal for all application types
 Single-threaded applications
 High-latency I/O
OUTLINE



InterProcess Communication
URPC
ITS COMPONENTS




Performance



26
Processor Reallocation
Data Transfer
Thread Management
Latency
Throughput
Conclusion
Conclusion



27
Better Performance and flexibility when move
traditional OS functions out of kernel.
URPC designs a appropriate division of
responsibility between user level and kernel
URPC demonstratres a design specific to a
multiprocessor, and not just uniprocessor
design that runs on multiprocessor hardware