Download File Systems

Document related concepts

Computer network wikipedia , lookup

Net bias wikipedia , lookup

Airborne Networking wikipedia , lookup

TCP congestion control wikipedia , lookup

AppleTalk wikipedia , lookup

Piggybacking (Internet access) wikipedia , lookup

IEEE 1355 wikipedia , lookup

Deep packet inspection wikipedia , lookup

Wake-on-LAN wikipedia , lookup

Internet protocol suite wikipedia , lookup

Zero-configuration networking wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

Cracking of wireless networks wikipedia , lookup

Files-11 wikipedia , lookup

UniPro protocol stack wikipedia , lookup

Transcript
Lecture 7:
File Systems
Networking
Contents
 Files & File System Interface
 Directories & their Organization
 File System Implementation
 Disk Space Allocation
 File System Efficiency & Reliability
 Networking
 OSI - Open System Interconnection Model
 Ethernet
 IP
 UDP
 TCP
AE3B33OSD
Lecture 7 / Page 2
2011
File Systems Interface
 Concept of the file


Contiguous logical address space
Types:
Data – numeric, character, binary
 Program

 File Structure

None - sequence of words, bytes
 Simple record structure – lines, fixed length records, variable
length records
 Complex Structures


Formatted documents, relocatable load files
Complex Structures can be simulated
by simple record structures through inserting appropriate control
characters
 by having special control blocks in the file (e.g., section table at the file

beginning)
AE3B33OSD
Lecture 7 / Page 3
2011
File Systems Interface (2)
 File Attributes
Name – the only information kept in human-readable form
 Identifier – unique tag (number) identifies file within file system
 Type – needed for systems that support different types
 Location – information on file location on a device
 Size – current file size
 Protection – for control who can do reading, writing, executing
 Time, date, and user identification – data for protection, security,
and usage monitoring
 Information about files is kept in the file-system structures, which
are stored and maintained on the disk
 File Operations – exported by the OS API (cf. e.g., POSIX)
 Open(Fi) – search the directory structure on disk for entry Fi, and
move the content of entry to memory
 Write, Read, Reposition within file
 Close(Fi) – move the content of entry Fi in memory to directory
structure on disk
 Delete, Truncate
 etc.

AE3B33OSD
Lecture 7 / Page 4
2011
Directory Structure
 Directory is a collection of nodes containing information
about files

Both the directory
structure and the files
reside on disk
Directory
Files
F1
 A Typical File-system Organization
AE3B33OSD
Lecture 7 / Page 5
F2
F3
F4
Fn
2011
Logical Organization the Directories
 Operations Performed on Directory






Search for a file
Create a file
Delete a file
List a directory
Rename a file
Traverse the file system
 Organize directories to get

Efficiency – locating a file quickly


Naming – convenient to users


AE3B33OSD
The same file can have several different names
Two users can have same name for different files
Grouping – logical grouping of files by properties, (e.g., all Java
programs, all games, …)
Lecture 7 / Page 6
2011
Single-Level Directory
 A single directory for all users
 Easy but



AE3B33OSD
Naming problem
Grouping problem
Sharing problem
Lecture 7 / Page 7
2011
Two-Level Directory
 Separate directory for each user
 Path name
 Can have the same file name for different user
 Efficient searching
 No grouping capability
AE3B33OSD
Lecture 7 / Page 8
2011
Tree-Structured Directories
 Efficient searching
 Grouping Capability
 Current directory (working directory)


AE3B33OSD
cd /spell/mail/prog
type list
Lecture 7 / Page 9
2011
Acyclic-Graph Directories
 Have shared subdirectories and files

aliasing – an object can have
different names
/home:
joe
jeff
 Problem:
When ‘joe’ deletes file
‘test’, the directory item
‘joetst’ points wrong
 Solution:


prg.c
Each object has a counter
containing a count of
references.
mail
inbox
test
mail
joetst
text
sent
The counter increments when a new reference is
created and decrements when a reference is deleted.
The object is erased when the counter drops to zero
AE3B33OSD
Lecture 7 / Page 10
2011
File System Mounting
 A file system must be mounted before it can be accessed
E.g., file system on a removable media must be ‘announced’ to the
OS, i.e. must be mounted
 Have prepared a mount point – a directory

Anything
referenced from the mount-point before mounting will be
hidden after mounting
/
/
home
adam
home
/
mnt
docs
submnt
AE3B33OSD
adam
mnt
docs
joe
joe
work
progs
Lecture 7 / Page 11
work
progs
2011
File Sharing
 Sharing of files on multi-user systems is desirable
 Sharing may be done through a protection scheme
 On distributed systems, files may be shared across a
network

Network File System (NFS) is a common distributed file-sharing
method
 User IDs identify users, allowing permissions and
protections to be per-user
 Group IDs allow users to be in groups, permitting group
access rights
POSIX rwx|rwx|rwx scheme
U
G
O
 ACL – Access Control Lists (Windows, some UNIXes)

AE3B33OSD
Lecture 7 / Page 12
2011
File System Implementation Objectives
 Implementation possibilities of local
file systems and directory structures
 File block allocation and free-block
strategies, algorithms and trade-offs
 File structure

Logical storage unit
 Collection of related information
 File system resides on secondary
storage (disks)
 File system is organized into layers
 File control block – storage structure
consisting of information about a file

AE3B33OSD
Size, ownership, allocation info, time
stamps, ...
Lecture 7 / Page 13
2011
In-Memory File System Structures
 The following figure illustrates the necessary file system
structures provided by the operating systems.
opening a file
reading a file
AE3B33OSD
Lecture 7 / Page 14
2011
Virtual File Systems
 Virtual File Systems (VFS) provide an object-oriented
way of implementing file systems.
 VFS allows the same system call interface (the API) to be
used for different types of file systems.
 The API is to the VFS interface, rather than any specific
type of file system.
AE3B33OSD
Lecture 7 / Page 15
2011
Directory Implementation
 Linear list of file names with pointer to the data blocks.

simple to program
 time-consuming to execute
 Hash Table – linear list with hash data structure.

decreases directory search time
 collisions – situations where two file names hash to the same
location
 fixed size
 Complex data structure – e.g., B+ tree

AE3B33OSD
NTFS in MS Windows
Lecture 7 / Page 16
2011
Allocation Methods for Files
 An allocation method refers to how disk blocks are
allocated for files:



Contiguous allocation
Linked allocation
Indexed allocation
 Contiguous allocation
– simple to implement





AE3B33OSD
Each file occupies a set of
contiguous blocks on the disk
Simple – only starting location
(block #) and length (number
of blocks) are required
Random access
Wasteful of space (dynamic
storage-allocation problem)
Files cannot grow
Lecture 7 / Page 17
2011
Extent-Based Systems
 Many newer file systems (e.g., Veritas File System) use a
modified contiguous allocation scheme
 Extent-based file systems allocate disk blocks in extents
 An extent is a contiguous block of disks

Extents are allocated for file growth
 A file consists of one or more extents
AE3B33OSD
Lecture 7 / Page 18
2011
Linked Allocation
 Each file is a linked list of disk blocks: blocks may be
scattered anywhere on the disk.
 Simple – need only starting address
 Free-space management system
– no waste of space
 Difficult random access

AE3B33OSD
block =
pointer to next block
must go through the
whole chain
Lecture 7 / Page 19
2011
Linked Allocation with FAT
 Allocation chains stored separately
 File-allocation table (FAT)

Disk-space allocation used by MS-DOS and OS/2.
 Problems:



Size of the table
Access speed
Reliability
All file info is concentrated
in one place
 FAT duplicates

AE3B33OSD
Lecture 7 / Page 20
2011
Allocation block size with FAT
 Allocation block, cluster

group of adjacent disk sectors
 Fixed size of FAT on disk
 Different FAT types


FAT item has 12, 16 or 32 bits
Directory entry (MSDOS):
FAT-16
8 bytes
3
1
10
4
2
4
Name
Extension
Attrs
Reserved
Date and time
1st block
File size
 Addressing capability of different FAT types
AE3B33OSD
Block size
FAT-12
0.5 KB = 1 sector
2 MB
1 KB = 2 sectors
2 KB = 4 sectors
4 MB
8 MB
4 KB = 8 sectors
8 KB = 16 sectors
16 KB = 32 sectors
32 KB = 64 sectors
16 MB
b)
FAT-16
FAT-32
a)
128 MB
256 MB
512 MB
1 GB
2 GB
1 TB
2 TB
2 TB
2 TB
Lecture 7 / Page 21
Empty entries in the table are unused
because:
a) FAT is too large compared to the
disk capacity
b) losses due to internal fragmentation
are to high
2011
Indexed Allocation
 Brings all pointers for one file together into an index block.
 Logical view
index table
 Need index table
 Random access
 Dynamic access without
external fragmentation, but
have overhead of index
block.

Mapping from logical to physical in a file of maximum size of 256K
words and block size of 512 words. We need only 1 block for index
table
 Only “small” files
AE3B33OSD
Lecture 7 / Page 22
2011
Multi-level Indexed Allocation

outer-index
index table
AE3B33OSD
Lecture 7 / Page 23
file
2011
Combined Scheme: UNIX FS
 Disk i-node

AE3B33OSD
4K bytes per block
Lecture 7 / Page 24
2011
Free-Space Management
 Bit vector (n blocks) – one bit per block
Bit map requires extra space
Easy to get contiguous files
n-1
…
bit[i] =
 Linked list (free list)


0 1 2



0  block[i] free
1  block[i] occupied
Cannot get contiguous space easily
No waste of space
 Need to protect:


Pointer to free list
Bit map




Solution:



AE3B33OSD
Must be kept on disk
Copy in memory and disk may differ
Cannot allow for block[i] to have a
situation where bit[i] = 1 in memory
and bit[i] = 0 on disk
Set bit[i] = 1 in disk
Allocate block[i]
Set bit[i] = 1 in memory
Lecture 7 / Page 25
2011
Directory Implementation
 Linear list of file names with pointer to the data blocks

simple to implement
 time-consuming to execute
 directory can grow and shrink
 Hash Table – linear list with hash data structure

decreases directory search time
 collisions – situations where two file names hash to the same
location
 fixed size
AE3B33OSD
Lecture 7 / Page 26
2011
File System Efficiency and Performance
 Efficiency dependent on:


disk allocation and directory algorithms
types of data kept in file’s directory entry
 Performance
disk cache – separate section of main memory for frequently
used blocks
 free-behind and read-ahead – techniques to optimize sequential
access
 improve PC performance by dedicating section of memory as
virtual disk, or RAM disk

AE3B33OSD
Lecture 7 / Page 27
2011
Recovery from a Crash
 Consistency checking – compares data in directory
structure with data blocks on disk, and tries to fix
inconsistencies
 Use system programs to back up data from disk to
another storage device (floppy disk, magnetic tape, other
magnetic disk, optical)
 Recover lost file or disk by restoring data from backup
AE3B33OSD
Lecture 7 / Page 28
2011
Log Structured File Systems
 Log structured (or journaling) file systems record each
update to the file system as a transaction

similar to database systems
 All transactions are written to a log

A transaction is considered committed once it is written to the log
 However, the file system may not yet be updated
 The transactions in the log are asynchronously written to
the file system

When the file system is modified, the transaction is removed from
the log
 If the file system crashes, all remaining transactions in the
log must still be performed
 Used by NTFS file system
AE3B33OSD
Lecture 7 / Page 29
2011
Short introduction to Networking
AE3B33OSD
Lecture 7 / Page 30
2011
Network Structure - LAN
 Local-Area Network (LAN) – designed to cover small
geographical area.

Multiaccess bus, ring, or star network
 Speed  10 megabits/second, or higher
 Broadcast is fast and cheap
 Nodes:
usually workstations and/or
personal computers
 a few (usually one or two)
mainframes

AE3B33OSD
Lecture 7 / Page 31
2011
Network Structure - WAN
 Wide-Area Network (WAN) – links geographically
separated sites

Point-to-point connections over long-haul lines (often leased from a
phone company)
 Broadcast usually requires
multiple messages
 Nodes:

AE3B33OSD
often a high percentage
of mainframes
Lecture 7 / Page 32
2011
Network Topology
 Sites in the system can be physically connected in a
variety of ways; they are compared with respect to the
following criteria:
Basic cost – How expensive is it to link various sites in the
system?
 Communication cost – How long does it take to send a
message from site A to site B?
 Reliability – If a link or a site in the
system fails, can the remaining
sites still communicate with
each other?

 The various topologies are
depicted as graphs whose
nodes correspond to sites

AE3B33OSD
An edge from node A to
node B corresponds to a direct
connection between the two sites
Lecture 7 / Page 33
2011
CESNET structure
AE3B33OSD
Lecture 7 / Page 34
2011
Communication Structure
The design of a communication network must address
four basic issues:
 Naming and name resolution

How do two processes locate each other to communicate?
 Routing strategies

How are messages sent through the network?
 Connection strategies

How do two processes send a sequence of messages?
 Contention

AE3B33OSD
The network is a shared resource, so how do we resolve
conflicting demands for its use?
Lecture 7 / Page 35
2011
Communication Protocol
The communication network is partitioned into the
following multiple layers:
 Physical layer – handles the mechanical and electrical
details of the physical transmission of a bit stream
 Data-link layer – handles the frames, or fixed-length
parts of packets, including any error detection and
recovery that occurred in the physical layer
 Network layer – provides connections and routes
packets in the communication network, including
handling the address of outgoing packets, decoding the
address of incoming packets, and maintaining routing
information for proper response to changing load levels
AE3B33OSD
Lecture 7 / Page 36
2011
Communication Protocol (Cont.)
 Transport layer – responsible for low-level network
access and for message transfer between clients,
including partitioning messages into packets, maintaining
packet order, controlling flow, and generating physical
addresses
 Session layer – implements sessions, or process-toprocess communications protocols
 Presentation layer – resolves the differences in formats
among the various sites in the network, including
character conversions, and half duplex/full duplex
(echoing)
 Application layer – interacts directly with the users’
deals with file transfer, remote-login protocols and
electronic mail, as well as schemas for distributed
databases, etc.
AE3B33OSD
Lecture 7 / Page 37
2011
The ISO Protocol Layers
AE3B33OSD
Lecture 7 / Page 38
2011
Communication Via ISO Network Model
AE3B33OSD
Lecture 7 / Page 39
2011
The TCP/IP Protocol Layers
AE3B33OSD
Lecture 7 / Page 40
2011
Design Issues
 Transparency – the distributed system should appear as
a conventional, centralized system to the user
 Fault tolerance – the distributed system should continue
to function in the face of failure
 Scalability – as demands increase, the system should
easily accept the addition of new resources to
accommodate the increased demand
 Clusters – a collection of semi-autonomous machines
that acts as a single system
AE3B33OSD
Lecture 7 / Page 41
2011
Example: Networking
 The transmission of a network packet between hosts on





an Ethernet network
Every host has a unique IP address and a corresponding
Ethernet (MAC) address
Communication requires both addresses
Domain Name Service (DNS) can be used to acquire IP
addresses
Address Resolution Protocol (ARP) is used to map MAC
addresses to IP addresses
If the hosts are on the same network, ARP can be used

AE3B33OSD
If the hosts are on different networks, the sending host will send
the packet to a router which routes the packet to the destination
network
Lecture 7 / Page 42
2011
An Ethernet Packet
Destination Source Frame
Message data Checksum
address address
type
64 bits 48 bits 48 bits 16 bits 368-12000 bits 32 bits
Preamble
 Preamble:

7 bytes [10101010] + 1 byte [10101011] used to synchronize the
transfer rate
 6 byte addresses (MAC addresses)


World unique addresses
If destination address = 0xFF-FF-FF-FF-FF-FF then broadcast
 Frame type – indicates different standards:



0x0800 = the packet contains an IPv4 datagram
0x86DD indicates an IPv6 frame
0x0806 indicates an ARP frame
 Message data usually 1500 bytes

AE3B33OSD
Encapsulates packet with the transferred data
Lecture 7 / Page 43
2011
Internet Architecture
 Basic Internet properties

Each host has its unique identification: the IP address

IP address is composed by the network address and the host address
within the network

The applications and the network API behavior is independent of the
LAN technology
 Basic architecture of the Internet (and general internetworking)
LAN 1

LAN 2
gateway or
router
LAN 3
Gateways and routers connect LAN’s



gateway or
router
LAN’s may be based on different technologies
Gateways keep info on hosts belonging to LAN’s they connect
Routers maintain knowledge about networks

Routers forward packets based on the network part of the IP address
(the host part is ignored when forwarding)
Units connecting LAN’s usually merge gateway and router
functionality
 IP protocols consider all LAN’s as equivalent regardless of the
technology

AE3B33OSD
Lecture 7 / Page 44
2011
Internet addresses
 Current Internet – v. 4 uses 32 bits addresses

Convention: decimal numbers per 8 bits each – 147.32.85.1
 Internet v. 6 uses 128 bits addresses

Not treated here as still under experimental development
 IP address

Identifies each single network adaptor
Host can have several adaptors („multi-homed“ host)
 One adaptor can even have more addresses


Is composed of two parts
Identification (address) of the network – netid (leftmost bits)
 Identification (address) of the host within the network – hostid (rightmost
bits)

AE3B33OSD
Lecture 7 / Page 45
2011
Internet addresses (2)
 Primary IP address classes
01234
Class A 0
Class B 1 0
8
16
netid
24
31
hostid
netid
Class C 1 1 0
255.0.0.0
hostid
netid
Net Mask
hostid
Address range
0.0.0.0 –
127.255.255.255
255.255.0.0
128.0.0.0 –
191.255.255.255
255.255.255.0
192.0.0.0 –
255.255.255.255
 Reserved ranges:



AE3B33OSD
In A
In B
In C
0.0.0.0 – 0.255.255.255,
128.0.0.0 – 128.0.255.255,
192.0.0.0 – 192.0.0.255,
127.0.0.0 – 127.255.255.255,
191.255.0.0 – 191.255.255.255,
255.255.255.0 – 255.255.255.255
Lecture 7 / Page 46
2011
Internet addresses (3)
 Convention:


Network address netid is the full IP address with hostid = 0
Address composed by netid and the hostid part is full of "1" serves
for addressing all hosts in the network (network broadcast address)
 Net maska:

Address „arithmetic“
IP _ Address  NetMask  netid
IP _ Address   ( NetMask )  hostid
 Special addresses


127.0.0.1 – loopback address – a host speaks to itself
Private addresses – may not spread over Internet – routers must not
forward datagrams containing these addresses
1 class A network:
10.0.0.0 – 10.255.255.255
 16 class B networks:
172.16.0.0 – 172.31.255.255
 256 class C networks:
192.168.0.0 – 192.168.255.255
 Multicast addresses – one host sends info to many “subscribed” hosts
(e.g. Internet TV)
 range 224.0.0.0 – 238.255.255.255

AE3B33OSD
Lecture 7 / Page 47
2011
Internet addresses (4)
 CIDR addressing (= Classless Inter-Domain Routing)
Address arithmetic enables for more efficient splitting netid|hostid –
the border between netid and hostid may be anywhere
 Net mask may be any number composed of n (n=0 ... 32) leftmost
“1” bits
 CIDR notation:



IP_Address/n; e.g.: 147.32.85.128 – 147.32.85.191 = 147.32.85.128/26
Reserved ranges in CIDR notation:
0.0.0.0/8,
192.0.0.0/24,

127.0.0.0 /8,
255.255.255.0/24
128.0.0.0/16,
191.255.0.0/16,
LAN 192.168.200.64/30 contains 4 addresses: 192.168.200.64 = netid,
192.168.200.65=host1, 192.168.200.66=host2, 192.168.200.67 = LAN broadcast
 Saving IP addresses

Using private addresses and their translation to “public” addresses
(NAT = Network Address Translation)
Many private addresses is translated into 1 public
Problem with publicly available servers on the private address LAN
(behind the NAT router)
 The principle of NAT is connected to IP protocols


Internet
AE3B33OSD
147.32.85.27
Router
with NAT
Lecture 7 / Page 48
192.168.100.1
LAN with private
IP addresses
2011
Internet datagrams
 Internet creates a virtual network and carries IP datagrams
 The network is a best effort delivery system
 Datagrams travel through Internet over different physicals LAN’s
 Datagrams may not depend on the LAN technology
 Format of an IP datagram
Datagram
header

Frame
header
Datagram data part

Frame data part
Encapsulating IP datagram into physical network frame
AE3B33OSD
Lecture 7 / Page 49
2011
IP datagram header
 Every IP datagram has a header carrying information
necessary for datagram delivery
0
4
8
16
19
24
31
VERS HLEN
SERVICE TYPE
TOTAL LENGTH
IDENTIFICATION
FLAGS
FRAGMENT OFFSET
TIME TO LIVE
PROTOCOL
HEADER CHECKSUM
SOURCE IP ADDRESS
DESTINATION IP ADDRESS
IP OPTIONS (IF ANY)
PADDING
DATA
…
IP datagram format
 Fields






VERS: IP protocol version – for IP v. 4 VERS = 4
HLEN: Header length in 32-bit words (standard 5)
TOTAL LENGTH: of the datagram in bytes (header+data) – max. 64 kB
SOURCE IP ADDRESS: sender’s IP address
DESTINATION IP ADDRESS: IP address where to deliver
IDENTIFICATION: usually sequential or a random number generated
by the datagram sender
AE3B33OSD
Lecture 7 / Page 50
2011
IP datagram header (cont.)

PROTOCOL: Identification of the protocol carried in the IP datagram
(UDP=17, ICMP=1, TCP=6, ...). Defined in RFC 1060
 FLAGS, FRAGMENT OFFSET: Information on datagram
fragmentation
 TIME TO LIVE (TTL): Determines how long the datagram may travel
through Internet. Every router decrements this value; if TTL==0 the
datagram is discarded and the router sends an ICMP message to the
sender
 SERVICE TYPE: 8-bit field commanding the datagram routing
0
1
2
PRECEDENCE
Datagram precedence:
0=normal,
7=network control
AE3B33OSD
3
4
D
Low
delay
5
T
High
throughput
Lecture 7 / Page 51
6
R
7
UNUSED
High reliability
2011
Datagram fragmentation (1)
 MTU: (Maximum Transmission Unit) defines the maximum
datagram size that can be transferred in a LAN
Net type
Implicit MTU
PPP
296
Ethernet
1 500
TokenRing 4Mb
4 464
Net type
X.25
FDDI
TokenRing 16Mb
Implicit MTU
576
4 352
17 914
 Internet – a set of LAN’s with different MTU’s

Whenever the datagram is larger than MTU it must be fragmented
Host
A
Net 1,
MTU=1500
Host
B
G1
G2
Net 3,
MTU=1500
Net 2,
MTU=620
AE3B33OSD
Lecture 7 / Page 52
2011
Datagram fragmentation (2)
 Fragmentation occurs anywhere during the datagram travel

If the datagram is fragmented it is not defragmented on the way and
the reconstruction is the task of the destination host
 Every fragment travels as a separate datagram:
 The
following fields are copied from the original datagram header: VERS,
HLEN, SERVICE TYPE, IDENTIFICATION, PROTOCOL, SOURCE IP
ADDRESS, DESTINATION IP ADDRESS
 TOTAL LENGTH is changed to the fragment size a and the field
FRAGMENT OFFSET determines the offset of the fragment in the original
datagram
 Field FLAGS contains a bit more fragments. If this bit is 0 then the target
host knows that the last fragment has been obtained. The target host
reconstructs the original datagram using the FRAGMENT OFFSET and
TOTAL LENGTH fields
DATAGRAM
HEADER
Data_1
600 bytes
FRAGMENT
1 HEADER
Data_1
Fragment 1 (offset=0)
FRAGMENT
2 HEADER
Data_2
Fragment 2 (offset=600)
FRAGMENT
3 HEADER
Data_3
Data_2
600 bytes
Data_3
200 bytes
Fragment 3 (offset=1200)
1400 bytes datagram fragmentation
AE3B33OSD
Lecture 7 / Page 53
2011
UDP Protocol
 UDP (= User Datagram Protocol), IP protocol number = 17



Very simple transport protocol defined in RFC 768
Provides connectionless and unreliable transport of user datagrams.
If any kind of transmission reliability is needed, it must be
implemented in the user's application.
 Compared to pure IP datagrams, UDP has the ability to address
target processes on the destination host using the port field
UDP
HEADER

UDP DATA AREA

IP HEADER

FRAME HEADER
IP DATA AREA

FRAME DATA AREA
UDP encapsulated in an IP datagram
0
16
UDP SOURCE PORT
UDP MESSAGE LENGTH
31
UDP DESTINATION PORT
UDP CHECKSUM
UDP Header

PORT fields are used to distinguish computing processes awaiting a
UDP datagram on the destination host.
 SOURCE
PORT field is optional; must be 0 if unused.
 Otherwise
it contains the port number to which a possible reply should
be sent.
AE3B33OSD
Lecture 7 / Page 54
2011
UDP Protocol (cont.)

To ensure that different hosts on Internet will understand each other,
IANA (= Internet Assigned Numbers Authority) issues the obligatory list
of “Well Known Port Numbers”
 Selected UDP port numbers are:
Port
0
7
9
13
53
67
68
69
137

Keyword
echo
discard
daytime
dns
bootps
bootpc
tftp
netbios-ns
Meaning
Reserved
Echo the datagram
Discard the datagram
Time of the day
Domain Name Service
Bootstrap Protocol Server, DHCP Server
Bootstrap Protocol Client, DHCP Client
Trivial File Transfer
NETBIOS Name Service
…
For the complete list see http://www.iana.org/assignments/port-numbers
AE3B33OSD
Lecture 7 / Page 55
2011
TCP Protocol for reliable data streaming
 TCP is the most important general reliable transport service
providing a virtual bidirectional communication channel
between two hosts

TCP/IP is the IP implementation of this service
 TCP properties

Data stream
Applications communicating through
a TCP connection consider the
communication channel as a byte stream similarly to a file

Virtual connection
Before
the data transmission can start, the communicating applications
have to negotiate the connection by means of the network components of
their local operating systems – create and connect the sockets.
The protocol software (transport layer) in the operating systems of both
hosts make an agreement on the connection using messages passed
over the network. The hosts also verify that the connection can be
reliably established and that both end-point systems are ready to
communicate – open the sockets.
Afterwards the end-point applications are informed about the established
connection and the data communication can start.
If the connection breaks, both communicating applications are informed
The term virtual connection is used to create an illusion that applications
are interconnected through a dedicated line.
The reliability is ensured by full hand-shake communication (everything
must be acknowledged by the other party)
AE3B33OSD
Lecture 7 / Page 56
2011
TCP/IP – TCP Protocol IP Implementation
 Besides of the general TCP properties, the TCP/IP
implementation provides:

Buffered transport (streaming)
 To
improve the efficiency, the TCP/IP module in the OS assembles
bytes from the stream into packets (datagrams) of reasonable size. If
this is not desirable (e.g., TELNET ctrl-C), TCP/IP is equipped by a
mechanism enforcing the priority transfer of a short datagram “out-oforder”.

Full duplex connection
 Application
processes can see the TCP/IP link as two independent data
streams running in opposite directions without an apparent interaction.
The protocol software actually acknowledges data running in one
direction in the packets sent together with the data in the opposite
direction.
TCP HEADER

IP HEADER

TCP DATA AREA

IP DATA AREA

FRAME HEADER
FRAME DATA AREA
TCP/IP encapsulated in an IP datagram on a physical network
AE3B33OSD
Lecture 7 / Page 57
2011
TCP reliability solution
 The reliable transfer is ensured by

positive acknowledging of the received data together with repeated
packet transfer
Events on the
sender side
Send packet 1
Datagrams on network
Events on the
receiver side
Get packet 1
Send ACK1
Get ACK1
Send packet 2
Get packet 2
Send ACK2
Get ACK2

Lost packet are repeated based on suitable timeouts
Events on the
sender side
Datagrams on network
Events on the
receiver side
Packet
lost
Send packet 1
Start timer
Packet should have arrived
ACK should have been sent
ACK should have arrived now
but it timed-out.
Resend packet 1
Start timer
Get packet 1
Send ACK1
Get ACK1
Stop timer
AE3B33OSD
Lecture 7 / Page 58
2011
TCP reliability solution (cont.)

Datagrams may get duplicated during the travel (both data and ACK).
 This

problem is solved by sequential numbering of datagrams
The problem of positive acknowledgement efficiency
 The
moving window method
Starting window position
1
2
3
4
5
6
7
8
The window moves 
1
Events on the Massages on the net
sender side
2
3
4
5
6
7
8
Events on the
receiver side
Send packet 1
Send packet 2
Get packet 1
Send ACK1
Send packet 3
Get packet 2
Send ACK2
Get ACK1
Get packet 3
Get ACK3
Get ACK2
Get ACK3
AE3B33OSD
Lecture 7 / Page 59
2011
Retransmission timeouts
 Constant value of timeout when to resend the TCP/IP
segment is inappropriate

Internet is too heterogeneous and is composed of a huge number of
LAN’s based on different HW technologies
 Giga-bit
ethernet, 33 kbit serial line, intercontinental satellite link, etc.
 TCP/IP adapts to changing timing parameters of the virtual
connection


A simple adaptive algorithm to adjust the timeout is used
The algorithm is based on continuous monitoring of „round trip time“
(RTT)
 Time
between dispatching the packet and its acknowledgement.

The real timeout is then computed as a weighted average of RTT
measured in the recent history.
 This strategy quickly accommodates to the speed and load changes
on the intermediate networks and routers
AE3B33OSD
Lecture 7 / Page 60
2011
Establishing the TCP connection

TCP uses a three-stage procedure to establish the virtual
connection:
1.
2.
3.


In the first step, the connection initiator (client) sends the other party
(server) a segment containing SYN bit = 1, randomly generated
SEQUENCE NUMBER = x and an empty data section.
The server responds by a segment with SYN and ACK set to 1,
random SEQUENCE NUMBER = y and ACKNOWLEDGMENT NUMBER = x+1.
After receiving this segment, the client acknowledges it by sending a
segment with ACK set to 1, SYN bit=0 and the ACKNOWLEDGMENT
NUMBER = y+1.
This way the initial values of SEQUENCE NUMBER and ACKNOWLEDGMENT
NUMBER fields are synchronized for the future life time of the virtual
connection and are used to number the following TCP/IP segments
The sequential numbers are random

AE3B33OSD
It enables to detect a failure or restart of hosts on the connection ends
that can happen during a longer timeout
Lecture 7 / Page 61
2011
TCP connection termination
 Connection is normally terminated on request of one of the
connected applications

Application tells TCP that there are no more data to be exchanged


If server – passive close
If client – active close

TCP software closes the connection by sending a segment with a set
FIN bit
 For the final termination of the virtual connection, it is necessary also
to close the opposite direction. The party that received the segment
with FIN bit reacts by sending a segment with FIN bit, too.
 The TCP connection can be terminated forcibly using the RST bit
AE3B33OSD
Lecture 7 / Page 62
2011
Well known TCP ports
 Examples of some TCP/IP ports
Port
0
7
13
20
21
22
23
25
53
79
80
110
443
Keyword
echo
daytime
ftp-data
ftp
ssh
telnet
smtp
domain
finger
http
pop3
https
Meaning
Reserved
Echo the datagram
Time of the day
File Transfer Protocol (data stream)
File Transfer Protocol (controls)
Secure Terminal Connection
Terminal Connection
Simple Mail Transport Protocol
Domain Name Server Query
Finger service (who is logged on?)
World-wide web
Post Office Protocol - V3 – get incoming mail from server
Secured world-wide web
and many others (see http://www.iana.org/assignments/port-numbers)
AE3B33OSD
Lecture 7 / Page 63
2011
TCP finite state machine
Dashed arrows – server side
state transitions
Solid arrows – client side state
transitions
Lower left frame – active
connection close
Lower right frame – passive
connection close
appl: transition caused by
system call from
application
recv: transition caused by
received segment
send: shows what is sent in
reaction to state change
AE3B33OSD
Lecture 7 / Page 64
2011
End of Lecture 7
Questions?