Download Layered - Computer Information Science

Document related concepts
no text concepts found
Transcript
CIS 620
Advanced Operating
Systems
Lecture 6 – Communication
Prof. Timothy Arndt
BU 331
Layered Protocols
 As we saw previously, network software is often
structured as a layered protocol suite. We will now
examine these protocols in somewhat more detail.
 Protocol: An agreement between communicating
parties on how communication is to proceed.
• Error correction codes.
• Blocksize.
• Ack/Nak.
Layered Protocols
 Layered protocol: The protocol decisions
concern very different things
 How many volts is 1 or zero? How wide is the
pulse? (low level details)
• Error correction
• Routing
• Sequencing (higher level details)
 As a result you have many routines that work
on the various aspects. They are called layered.
Layered Protocols
 Layer X of the sender acts as if it is directly
communicating with layer X of the receiver but
in fact it is communicating with layer X-1 of
the sender.
 Similarly layer X of the sender acts as a virtual
layer X+1 of the receiver to layer X+1 of the
sender.
 A famous example is the ISO OSI
(International Standards Organization Open
Systems Interconnection Reference Model).
Layered Protocols
Layered Protocols
 So for example the network layer sends
messages intended for the other network layer
but in fact sends them to the data link layer.
 Also the network layer must accept messages
from the transport layer, which it then sends to
the other network layer (really its own data link
layer.
• What a layer really does to a message it receives is
add a header (and maybe a trailer) that is to be
interpreted by its corresponding layer in the
receiver.
Layered Protocols
 So the network layer adds a header (in front of
the transport layer's header) and sends to the
other network layer (really its own data link
layer that adds a header in front of the network
layer's and a trailer).
 So headers get added as you go down the
sender's layers (often called the Protocol Stack
or Protocol Suite).
 They get used (and stripped off) as the message
goes up the receiver's stack.
Layered Protocols
Layered Protocols
 It all starts with process A sending a message.
By the time it reaches the wire it has 6 headers
(the physical layer doesn't add one - Why?) and
one trailer.
• The nice thing is that the layers are independent.
You can change one layer and not change the others.
 Physical layer: hardware, i.e. voltages, speeds,
connectors.
 Data link layer: Error correction and detection.
"Group the bits into units called frames".
Layered Protocols
• Frames contain error detection (and correction) bits.
• This is what the pair of data link layers do when
viewed as an extension of the physical.
• But when being used, the sending DL layer gets a
packet from the network layer and breaks it into
frames and adds the error detection bits.
Data Link Layer
2-3
• Discussion between a receiver and a sender in the data link layer.
Layered Protocols
 Network layer: Routing.
• Connection oriented network-layer protocol: X.25 or
ATM.
 Send a message to destination and establish a route that
will be used for further messages during this connection (a
connection number is given).
 Like a telephone call.
• Connectionless: IP (Internet Protocol).
 Each packet (message between the network layers) is
routed separately.
 Like the post office.
Layered Protocols
 Transport layer: make reliable and ordered
(but not always).
• Break incoming message into packets and send to
corresponding transport layer (really send to ...).
They are sequence numbered.
• Header contains info as to which packets have been
sent and received.
• These sequence numbers are for the end to end
message.
Layered Protocols
• I.e. if grail.cba.csuohio.edu sends message to
www.microsoft.com the transport layer breaks
message into packets and numbers the packets.
 These packets may take different routes.
 On any one hop the data link layer keeps the frames
ordered.
• If you use connection-oriented network layer there
is little for transport layer to do.
• If you use IP for network layer, there is a lot to do.
• If use connection-oriented TCP for transport layer of
client-server system, slower than need be
 Can use transactional TCP
Client-Server TCP
•
•
2-4
Normal operation of TCP.
Transactional TCP.
Layered Protocols
 Session Layer: dialog and synchronization.
• Dialog control
• Synchronization facilities
 Presentation layer: Describes "meaning" of fields.
• Record definition
 Application layer: For specific applications (e.g. mail,
news, ftp).
• Middleware logically resides in the application layer, but
contains functionality that is quite general
 Authentication
 Authorization
 Multicast, etc.
• This leads to a slightly modified reference model
Middleware Protocols
2-5
• An adapted reference model for networked communication.
Remote Procedure Call
(RPC)
• Developed by Birrell and Nelson (1984).
 Recall how different the client code for copying
a file was from the normal centralized
(uniprocessor) code.
 Let’s make the client server request-reply look
like a normal procedure call and return.
 Notice that getchar in the centralized version
turns into a read system call. The following is
for Unix:
• read looks like a normal procedure to its caller.
Remote Procedure Call
(RPC)
• read is a user mode program.
• read manipulates registers and then does a trap to
the kernel.
• After the trap, the kernel manipulates registers and
then does a C-language routine and lots of work gets
done (drivers, disks, etc).
• After the I/O, the process get unblocked, the kernel
read manipulates registers, and returns. The user
mode read manipulates registers and returns to the
original caller.
 Let’s do something similar with request reply:
Remote Procedure Call
(RPC)
 User (client) does a subroutine call to getchar
(or read).
• Client knows nothing about messages.
 We link in a user mode program called the
client stub (analogous to the user mode read
above).
• This takes the parameters to read and converts them
to a message (marshalls the arguments).
• Sends a message to machine containing the server
directed to a server stub.
• Does a blocking receive (of the reply message).
Remote Procedure Call
(RPC)
 The server stub is linked with the server.
• It receives the message from the client stub.
• Unmarshalls the arguments and calls the server (as a
subroutine).
 The server procedure does what it does and
returns (to the server stub).
• Server knows nothing about messages
 Server stub now converts this to a reply
message sent to the client stub.
• Marshalls the arguments.
Remote Procedure Call
(RPC)
 Client stub unblocks and receives the reply.
• Unmarshalls the arguments.
• Returns to the client.
 Client believes (correctly) that the routine it
calls has returned just like a normal procedure
does.
Passing Value Parameters
(1)
• Steps involved in doing remote computation through RPC
2-8
Remote Procedure Call
(RPC)
 Heterogeneity: Machines have different data
formats.
• How can we handle these differences in RPC?
 Have conversions between all possibilities.
 Done during marshalling and unmarshalling.
• Adopt a standard and convert to/from it.
Passing Value Parameters
(2)
a)
b)
c)
Original message on the Pentium
The message after receipt on the SPARC
The message after being inverted. The little numbers in
boxes indicate the address of each byte
Remote Procedure Call
(RPC)
 Pointers: Avoid them for RPC!
• Can put the object pointed to into the message itself
(assuming you know its length).
• Convert call-by-reference to copyin/copyout
 If we have in or out parameters (instead of in out) can
eliminate one of the copies
• Change the server to handle pointers in a special
way.
 Callback to client stub
Registering and name
servers
 As we said before, we can use a name server.
 This permits the server to move using the
following process.
• deregister from the name server
• move
• reregister
 This is sometimes called dynamic binding.
Registering and name
servers
 The client stub calls the name server (binder)
the first time to get a handle to use for the
future.
• There is a callback from the binder to the client stub
if the server deregisters or we could have the
attempt to use the handle fail so that the client stub
will go to the binder again.
RPC Failures
• This gets hard and ugly.
• Can't find the server.
 Need some sort of out-of-band response from
the client stub to the client.
 Ada exceptions
 C signals
 Multithread the client and start the "exception" thread.
 This loses transparency (centralized systems
don't have this).
RPC Failures
• Lost request message.
 This is easy if known. That is, if we are sure
the request was lost.
 Also easy if idempotent and we think it might
be lost.
• Simply retransmit the request.
• Assumes the client still knows the request.
• Lost reply message.
 If it is known the reply was lost, have server
retransmit.
RPC Failures
• Assumes the server still has the reply.
• How long should the server hold the reply?





Wait forever for the reply to be ack'ed? No!
Discard after "enough" time.
Discard after we receive another request from this client.
Ask the client if the reply was received.
Keep resending reply.
 What if we are not sure of whether we lost the
request or the reply?
• If the server is stateless, it doesn't know and the
client can't tell!
• If idempotent, simply retransmit the request.
RPC Failures
 What if the server is not idempotent and can't
tell if we lost the request or the reply?
• Use sequence numbers so server can tell that this is
a new request not a retransmission of a request it has
already done.
• Doesn't work for stateless servers.
• Server crashes
 Did it crash before or after doing some
nonidempotent action?
 Can't tell from messages.
RPC Failures
 From databases, we get the idea of transactions
and commits.
• This really does solve the problem but is not cheap.
 Fairly easy to get “at least once” (try request
again if timer expires) or “at most once (give up
if timer expires)” semantics. Hard to get
“exactly once” without transactions.
 To be more precise. A transaction either
happens exactly once or not at all (sounds like
at most once) and the client knows which.
RPC Failures
• Client crashes
 Orphan computations exist.
 Again transactions work but are expensive.
 We can have the rebooted client start another
epoch and all computations of previous epoch
are killed and clients resubmit.
• It is better is to let old computations with owners
that can be found continue.
 This isn’t a great solution.
RPC Failures
• An orphan may hold locks or might have done
something not easily undone.
 Serious programming is needed.
Implementation Issues
• Protocol choice
 Existing ones like UDP are designed for harder
(more general) cases and so are not efficient.
 Often developers of distributed systems invent
their own protocol that is more efficient.
• But of course they are all different.
 On a LAN we would like large messages since
they are more efficient and don't take so long
considering the high data rate.
Implementation Issues
• Acks
 One per packet vs. one per message.
 Called stop-and-wait and blast.
• In former wait for each ack.
• In blast keep sending packets until message finished.
 Could also do a hybrid.
• Blast but ack each packet.
• Blast but request only those missing instead of
general nak.
 Called selective repeat.
Implementation Issues
• Flow control
 Buffer overrun problem.
• Internet worm caused by buffer overrun and
rewriting non-buffer space. This is not the problem
here.
• Can occur right at the interface chip, in which case
the (later) packet is lost.
• More likely with blast but can occur with stop and
wait if have multiple senders.
Implementation Issues
 What to do
• If chip needs a delay to do back to back receives
have sender delay that amount.
• If we can only buffer n packets, have sender only
send n then wait for ack.
• The above fails when we have simultaneous sends.
But hopefully that is not too common.
• This tuning to the specific hardware present is one
reason why general protocols don't work as well as
specialized ones.
Implementation Issues
 Why is RPC slow? We have to...
•
•
•
•
•
•
Call stub
get message buffer
marshall parameters
If using UDP, computer checksum
fill in headers
Copy message to kernel space (Unless we have a
special kernel)
• Put in real destination address
• Start DMA to communication device
• ---------------- wire time
Implementation Issues
 Why is RPC slow? We have to...
•
•
•
•
Process interrupt (or polling delay)
Check packet
Determine relevant stub
Copy to stub address space (unless we have a
special kernel)
• Unmarshall
• Call server
 On the Paragon (large Intel MPP of a few years
ago), a variety of the above took 30ms of which
1ms was wire time.
Implementation Issues
• Eliminating copying
 Message transmission is essentially a copy so
the minimum number of copies is 1.
• This requires the network device to do its DMA
from the user buffer (client stub) directly into the
server stub.
• But it is hard for the receiver to know where to put
the message until it arrives and is inspected.
• Sounds like a copy is needed from the receiving
buffer to the server stub.
• We can avoid this by adjusting memory maps.
Implementation Issues
 Messages must then be full pages (as that is what is
mapped).
 Normally there are two copies on the receiving
side.
• From a hardware buffer to a kernel buffer.
• From the kernel buffer to user space (server stub).
 Often there are two on the sending side.
• User space (client stub) to kernel buffer.
• Kernel buffer to buffer on device.
• Then start the device.
 The sender ones can be reduced.
Implementation Issues
• The device can do DMA from the kernel buffer thus
eliminating the second.
• Doing DMA from the user would eliminate the first,
but we would need scatter gather (just gather here)
since the header must be in the kernel space since
the user is not allowed to set it (for security).
 To eliminate the two on the receiver side is
harder.
• We can eliminate the first if the device writes
directly into a kernel buffer.
• To eliminate the second requires the remapping
trick.
Implementation Issues
• Timers and timeout values
 Getting a good value for the timeouts is a black
art.
• Too small a value leads to many unneeded
retransmissions.
• Too large causes us to wait too long when a message
is lost.
• Should it be adaptive??
 If we find that we sent an extra message then raise the
timeout value for this class of transmissions.
 If timeout expires most of the time, lower the value for
this class.
Implementation Issues
 How to keep timeout values?
• If you know that almost all timers of this class are going to go
off (alarms) and accuracy is important, then keep a list sorted
by time to alarm.
 Only have to scan head for timer (so we can do it frequently).
 Additions must search for a place to add.
 Deletions (cancelled alarms) are presumed rare.
• If deletions are common and we can afford not so accurate an
alarm, then sweep list of all processes (not so frequently since
accuracy not required).
• Deletions and additions are easy since list is indexed by
process number.
Implementation Issues
• Difficulties with RPC
 Global variables like errno inherently have
shared-variable semantics and so they don't fit
in a distributed system.
• One (remote) procedure sets the variable and the
local procedure is supposed to see it.
• But the setting is a normal store so is not seen by the
communication system.
• So transparency is violated.
Implementation Issues
 Weak typing (as in C) makes marshalling
hard/impossible.
• How big is the object we should copy?
• What is the conversion needed if heterogeneous
system?
• So transparency is violated.
How does a programmer
create a program with RPC?
• uuidgen generates a unique identifier for the
RPC
• Include it in an IDL (interface description
language file) and describe the interface for
the RPC in the file as well
• Write the client and server code
• Client and server stubs are generated from
the IDL file automatically
• Link things together and run on desired
machines
Writing a Client and a Server
2-14
• The steps in writing a client and a server in DCE RPC.
Binding a Client to an Object
• Unlike RPC, distributed objects have
systemwide object references
• The system may support either implicit
binding or explicit binding
• The object reference may contain - IP
address, port, object name
• Or use a location server so we need only
address for this server plus the object name
Binding a Client to an Object
Distr_object* obj_ref;
obj_ref = …;
obj_ref-> do_something();
//Declare a systemwide object reference
// Initialize the reference to a distributed object
// Implicitly bind and invoke a method
(a)
Distr_object objPref;
Local_object* obj_ptr;
obj_ref = …;
obj_ptr = bind(obj_ref);
obj_ptr -> do_something();
//Declare a systemwide object reference
//Declare a pointer to local objects
//Initialize the reference to a distributed object
//Explicitly bind and obtain a pointer to the local proxy
//Invoke a method on the local proxy
(b)
a)
b)
(a) Example with implicit binding using only global references
(b) Example with explicit binding using global and local references
Parameter Passing
• Since we have systemwide object refs, we
don’t have the same types of problems we
had with RPCs and pointers
• However, for performance motives we may
want to treat object ref parameters
differently depending on where the object
resides
Parameter Passing
• The situation when passing an object by reference or by
value.
2-18
Java RMI
• Java offers remote objects as the only type
of distributed object
• One difference between local and remote
objects is that synchronized methods work
differently on the two types
 Blocking applies only to the proxies of the
remote objects
• A parameter passed to an RMI must be
serializable
Message-Oriented
Communication
• Neither RPC nor RMI works when we can’t
assure that the receiving side isn’t executing
 We can use messaging in this case
Message-Oriented Communication
• General organization of a communication system in which
hosts are connected through a network
2-20
Messaging Modes
• Messaging systems can be either persistent
or transient
 Are messages retained when the senders and/or
receivers stop executing?
• Can also be either synchronous or
asynchronous
 Blocking vs. non-blocking
Persistent Communication
• Persistent communication of letters back in the days of the
Pony Express.
Persistence and Synchronicity in
Communication
a)
b)
Persistent asynchronous communication
Persistent synchronous communication
2-22.1
Persistence and Synchronicity in
Communication
2-22.2
c)
d)
Transient asynchronous communication
Receipt-based transient synchronous communication
Persistence and Synchronicity in
Communication
e)
f)
Delivery-based transient synchronous communication at message delivery
Response-based transient synchronous communication
Message-Oriented Transient
Communication
• Sockets are an example of message-oriented
transient communication
• The Message-Passing Interface (MPI) is a
newer set of message-oriented primitives
for multicomputers
 MPI communication takes place within a
known group of processes
• A (groupID, processID) pair uniquely identifies a
source or destination of a message
The Message-Passing
Interface (MPI)
• Some of the most intuitive message-passing primitives of
MPI.
Primitive
Meaning
MPI_bsend
Append outgoing message to a local send buffer
MPI_send
Send a message and wait until copied to local or remote buffer
MPI_ssend
Send a message and wait until receipt starts
MPI_sendrecv
Send a message and wait for reply
MPI_isend
Pass reference to outgoing message, and continue
MPI_issend
Pass reference to outgoing message, and wait until receipt starts
MPI_recv
Receive a message; block if there are none
MPI_irecv
Check if there is an incoming message, but do not block
Message-Oriented Persistent
Communication
• Known as message-queuing systems or
Message-Oriented Middleware (MOM)
 Support persistent asynchronous
communication
 Generally have slow communications
 Similar to e-mail systems
 Basic model - applications communicate by
inserting messages in specific queues
Message-Queuing Model
• Four combinations for loosely-coupled communications
using queues.
2-26
Message-Queuing Model
• Basic interface to a queue in a message-queuing system.
Primitive
Meaning
Put
Append a message to a specified queue
Get
Block until the specified queue is nonempty, and remove the first message
Poll
Check a specified queue for messages, and remove the first. Never block.
Notify
Install a handler to be called when a message is put into the specified
queue.
General Architecture of a Message-Queuing
System
• Messages are inserted into a local source
queue
 The message contains the name of a destination
queue
• The message-queuing system transfers
messages to the destination queue
 Use a db which maps queue names to network
locations
General Architecture of a Message-Queuing
System
• Queues are managed by queue managers
 Special queue managers act as relays which
forward messages to other managers
General Architecture of a Message-Queuing
System
•
The relationship between queue-level addressing and
network-level addressing.
General Architecture of a Message-Queuing
System
• The general organization of a message-queuing system
with routers.
2-29
Message Brokers
• Message-queuing systems can be used to
integrate existing and new applications
 These diverse applications have different
message formats
 Since we have old apps, can’t use a standard
message format
 So use message brokers which convert
messages from one format to another
Message Brokers
2-30
• The general organization of a message broker in a
message-queuing
•
system.
Example: IBM MQSeries
• IBM WebSphere MQ (formerly MQSeries) is
used to integrate old apps (generally running on
IBM mainframes)
 Queues are managed by queue managers
 Queue managers are connected through message
channels
 Each of the two ends of the message channel is
managed by a message channel agents (MCA)
 Queue managers can be linked into the same process as
the application using the queue
 Queue managers implemented using RPC
Example: IBM WebSphere
MQ
• General organization of IBM's WebSphere MQ messagequeuing system.
2-31
Channels
• Some attributes associated with message channel agents.
Attribute
Description
Transport type
Determines the transport protocol to be used
FIFO delivery
Indicates that messages are to be delivered in the order they are sent
Message length
Maximum length of a single message
Setup retry
count
Specifies maximum number of retries to start up the remote MCA
Delivery retries
Maximum times MCA will try to put received message into queue
Aliases
• In order to be able to change the name of a
queue manager or to replace it with another
without having to recompile all of the
applications which send messages to it,
local aliases are used for queue manager
names.
Message Transfer
• The general organization of an MQ queuing network using
routing tables and aliases.
Message Transfer
Primitive
Description
MQopen
Open a (possibly remote) queue
MQclose
Close a queue
MQput
Put a message into an opened queue
MQget
Get a message from a (local) queue
• Primitives available in an IBM MQ MQI
Stream-Oriented
Communication
• Multimedia systems use stream-oriented
communications
 The timing of the data delivery is critical in
such systems
 Such communication is used for continuous
media such as audio where the temporal
relationships between different data items are
meaningful as opposed to discrete media such
as text
Stream-Oriented
Communication
• Data streams have several modes
 Asynchronous transmission mode places no
timing constraints on the data items in a stream
 Synchronous transmission mode gives a
maximum end-to-end delay for each item in a
data stream
 Isochronous transmission mode gives both
maximum and minimum delays
• Bounded jitter
Stream-Oriented
Communication
• Streams can be either simple or complex
(with several related simple substreams)
 Related substreams will need to be
synchronized
• Streams can be be seen as a channel
between a source and a sink
 Source could be a file or multimedia capture
device
 Sink could be a file or multimedia rendering
device
Data Stream
• Setting up a stream between two processes across a
network.
Data Stream
• Setting up a stream directly between two devices.
2-35.2
Data Stream
• An example of multicasting a stream to several receivers.
Streams and QoS
• Time-dependent requirements are generally
expressed as Quality of Service (QoS)
requirements
 The underlying distributed system and network
must ensure that these are met
•
•
•
•
Required bit rate
Maximum session delay
Maximum end-to-end delay
Maximum delay variance (jitter)
Using a Buffer to Reduce
Jitter
Interleaved Transmission
Setting up a Stream
• Before a stream is opened between source and
sink resources through the network must be
reserved in order to meet the QoS requirements




Bandwidth
Buffers
Processing capability
Figuring out how much of each is required is difficult
since they aren’t specified directly in the QoS
 RSVP is a protocol for enabling resource reservations
in network routers
Setting Up a Stream
• The basic organization of RSVP for resource reservation in
a distributed
• system.
Stream Synchronization
• An important issue is that different streams
(possibly substreams of a complex stream)
must be synchronized
 Continuous with discrete
 Continuous with continuous (more difficult)
 Different levels of granularity for syncing
required depending on situation
Synchronization Mechanisms
• Synchronization can be carried out by the
application
• Can also be supplied by a middleware layer
• Complex streams are multiplexed according
to a given synchronization specification
(e.g. MPEG)
• Syncing can occur either at the sending or
receiving end.
Synchronization Mechanisms
• The principle of explicit synchronization on the level of
data units.
Synchronization Mechanisms
• The principle of synchronization as supported by highlevel interfaces.
2-41
Multicast Communication
• Much research has been done on enhancing
network protocols by adding support for
sending a message to multiple receivers
(multicasting)
 No standard has yet emerged
• Peer-to-peer technology has been
implemented with some success at the
application level
 Multicasting has been implemented here routers don’t need to be changed
Multicast Communication
• The nodes organize themselves into an
overlay network
 This is a logical organization, not a physical
one
 May take the form of a tree or mesh
 A node which wants to start a multicast session
becomes the root of a multicast tree.
 Other nodes may join the multicast group by
becoming nodes of the logical tree
Multicast Communication
• Some nodes may be forward messages
without being a member of the multicast
group
• Problems may arise if the overlay network
does not match well to the underlying
physical network
 Link stress
 Stretch or Relative Delay Penalty
 Tree cost
Gossip-Based Data
Dissemination
• We can use epidemic protocols to rapidly
spread information in a very large-scale
distributed system
 Use only local information, so this is a
distributed algorithm
 A node that holds data to be spread to other
nodes is said to be infected
 A node that has not is said to be susceptible
 An update node that doesn’t want to spread the
info is said to be removed
Gossip-Based Data
Dissemination
• One popular algorithm is anti-entropy
 Pick a node at random. Exchange data using: push; pull; or
both
 Push doesn’t work well when many nodes are infected.
Pull will be used by susceptible nodes
 Push-pull has been shown to be the best
 Rumor spreading or gossiping is another popular
approach
 If a node has new data to spread, it randomly contacts
another node and pushes the data
 If the contacted node has already been updated, the
original node is removed with prob. 1/k
 Removing data is hard – need to use death
certificates not just delete the data