* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download File Systems
Survey
Document related concepts
Computer network wikipedia , lookup
Airborne Networking wikipedia , lookup
TCP congestion control wikipedia , lookup
Piggybacking (Internet access) wikipedia , lookup
Deep packet inspection wikipedia , lookup
Wake-on-LAN wikipedia , lookup
Internet protocol suite wikipedia , lookup
Zero-configuration networking wikipedia , lookup
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
Transcript
Lecture 7: File Systems Networking Contents Files & File System Interface Directories & their Organization File System Implementation Disk Space Allocation File System Efficiency & Reliability Networking OSI - Open System Interconnection Model Ethernet IP UDP TCP AE3B33OSD Lecture 7 / Page 2 2011 File Systems Interface Concept of the file Contiguous logical address space Types: Data – numeric, character, binary Program File Structure None - sequence of words, bytes Simple record structure – lines, fixed length records, variable length records Complex Structures Formatted documents, relocatable load files Complex Structures can be simulated by simple record structures through inserting appropriate control characters by having special control blocks in the file (e.g., section table at the file beginning) AE3B33OSD Lecture 7 / Page 3 2011 File Systems Interface (2) File Attributes Name – the only information kept in human-readable form Identifier – unique tag (number) identifies file within file system Type – needed for systems that support different types Location – information on file location on a device Size – current file size Protection – for control who can do reading, writing, executing Time, date, and user identification – data for protection, security, and usage monitoring Information about files is kept in the file-system structures, which are stored and maintained on the disk File Operations – exported by the OS API (cf. e.g., POSIX) Open(Fi) – search the directory structure on disk for entry Fi, and move the content of entry to memory Write, Read, Reposition within file Close(Fi) – move the content of entry Fi in memory to directory structure on disk Delete, Truncate etc. AE3B33OSD Lecture 7 / Page 4 2011 Directory Structure Directory is a collection of nodes containing information about files Both the directory structure and the files reside on disk Directory Files F1 A Typical File-system Organization AE3B33OSD Lecture 7 / Page 5 F2 F3 F4 Fn 2011 Logical Organization the Directories Operations Performed on Directory Search for a file Create a file Delete a file List a directory Rename a file Traverse the file system Organize directories to get Efficiency – locating a file quickly Naming – convenient to users AE3B33OSD The same file can have several different names Two users can have same name for different files Grouping – logical grouping of files by properties, (e.g., all Java programs, all games, …) Lecture 7 / Page 6 2011 Single-Level Directory A single directory for all users Easy but AE3B33OSD Naming problem Grouping problem Sharing problem Lecture 7 / Page 7 2011 Two-Level Directory Separate directory for each user Path name Can have the same file name for different user Efficient searching No grouping capability AE3B33OSD Lecture 7 / Page 8 2011 Tree-Structured Directories Efficient searching Grouping Capability Current directory (working directory) AE3B33OSD cd /spell/mail/prog type list Lecture 7 / Page 9 2011 Acyclic-Graph Directories Have shared subdirectories and files aliasing – an object can have different names /home: joe jeff Problem: When ‘joe’ deletes file ‘test’, the directory item ‘joetst’ points wrong Solution: prg.c Each object has a counter containing a count of references. mail inbox test mail joetst text sent The counter increments when a new reference is created and decrements when a reference is deleted. The object is erased when the counter drops to zero AE3B33OSD Lecture 7 / Page 10 2011 File System Mounting A file system must be mounted before it can be accessed E.g., file system on a removable media must be ‘announced’ to the OS, i.e. must be mounted Have prepared a mount point – a directory Anything referenced from the mount-point before mounting will be hidden after mounting / / home adam home / mnt docs submnt AE3B33OSD adam mnt docs joe joe work progs Lecture 7 / Page 11 work progs 2011 File Sharing Sharing of files on multi-user systems is desirable Sharing may be done through a protection scheme On distributed systems, files may be shared across a network Network File System (NFS) is a common distributed file-sharing method User IDs identify users, allowing permissions and protections to be per-user Group IDs allow users to be in groups, permitting group access rights POSIX rwx|rwx|rwx scheme U G O ACL – Access Control Lists (Windows, some UNIXes) AE3B33OSD Lecture 7 / Page 12 2011 File System Implementation Objectives Implementation possibilities of local file systems and directory structures File block allocation and free-block strategies, algorithms and trade-offs File structure Logical storage unit Collection of related information File system resides on secondary storage (disks) File system is organized into layers File control block – storage structure consisting of information about a file AE3B33OSD Size, ownership, allocation info, time stamps, ... Lecture 7 / Page 13 2011 In-Memory File System Structures The following figure illustrates the necessary file system structures provided by the operating systems. opening a file reading a file AE3B33OSD Lecture 7 / Page 14 2011 Virtual File Systems Virtual File Systems (VFS) provide an object-oriented way of implementing file systems. VFS allows the same system call interface (the API) to be used for different types of file systems. The API is to the VFS interface, rather than any specific type of file system. AE3B33OSD Lecture 7 / Page 15 2011 Directory Implementation Linear list of file names with pointer to the data blocks. simple to program time-consuming to execute Hash Table – linear list with hash data structure. decreases directory search time collisions – situations where two file names hash to the same location fixed size Complex data structure – e.g., B+ tree AE3B33OSD NTFS in MS Windows Lecture 7 / Page 16 2011 Allocation Methods for Files An allocation method refers to how disk blocks are allocated for files: Contiguous allocation Linked allocation Indexed allocation Contiguous allocation – simple to implement AE3B33OSD Each file occupies a set of contiguous blocks on the disk Simple – only starting location (block #) and length (number of blocks) are required Random access Wasteful of space (dynamic storage-allocation problem) Files cannot grow Lecture 7 / Page 17 2011 Extent-Based Systems Many newer file systems (e.g., Veritas File System) use a modified contiguous allocation scheme Extent-based file systems allocate disk blocks in extents An extent is a contiguous block of disks Extents are allocated for file growth A file consists of one or more extents AE3B33OSD Lecture 7 / Page 18 2011 Linked Allocation Each file is a linked list of disk blocks: blocks may be scattered anywhere on the disk. Simple – need only starting address Free-space management system – no waste of space Difficult random access AE3B33OSD block = pointer to next block must go through the whole chain Lecture 7 / Page 19 2011 Linked Allocation with FAT Allocation chains stored separately File-allocation table (FAT) Disk-space allocation used by MS-DOS and OS/2. Problems: Size of the table Access speed Reliability All file info is concentrated in one place FAT duplicates AE3B33OSD Lecture 7 / Page 20 2011 Allocation block size with FAT Allocation block, cluster group of adjacent disk sectors Fixed size of FAT on disk Different FAT types FAT item has 12, 16 or 32 bits Directory entry (MSDOS): FAT-16 8 bytes 3 1 10 4 2 4 Name Extension Attrs Reserved Date and time 1st block File size Addressing capability of different FAT types AE3B33OSD Block size FAT-12 0.5 KB = 1 sector 2 MB 1 KB = 2 sectors 2 KB = 4 sectors 4 MB 8 MB 4 KB = 8 sectors 8 KB = 16 sectors 16 KB = 32 sectors 32 KB = 64 sectors 16 MB b) FAT-16 FAT-32 a) 128 MB 256 MB 512 MB 1 GB 2 GB 1 TB 2 TB 2 TB 2 TB Lecture 7 / Page 21 Empty entries in the table are unused because: a) FAT is too large compared to the disk capacity b) losses due to internal fragmentation are to high 2011 Indexed Allocation Brings all pointers for one file together into an index block. Logical view index table Need index table Random access Dynamic access without external fragmentation, but have overhead of index block. Mapping from logical to physical in a file of maximum size of 256K words and block size of 512 words. We need only 1 block for index table Only “small” files AE3B33OSD Lecture 7 / Page 22 2011 Multi-level Indexed Allocation outer-index index table AE3B33OSD Lecture 7 / Page 23 file 2011 Combined Scheme: UNIX FS Disk i-node AE3B33OSD 4K bytes per block Lecture 7 / Page 24 2011 Free-Space Management Bit vector (n blocks) – one bit per block Bit map requires extra space Easy to get contiguous files n-1 … bit[i] = Linked list (free list) 0 1 2 0 block[i] free 1 block[i] occupied Cannot get contiguous space easily No waste of space Need to protect: Pointer to free list Bit map Solution: AE3B33OSD Must be kept on disk Copy in memory and disk may differ Cannot allow for block[i] to have a situation where bit[i] = 1 in memory and bit[i] = 0 on disk Set bit[i] = 1 in disk Allocate block[i] Set bit[i] = 1 in memory Lecture 7 / Page 25 2011 Directory Implementation Linear list of file names with pointer to the data blocks simple to implement time-consuming to execute directory can grow and shrink Hash Table – linear list with hash data structure decreases directory search time collisions – situations where two file names hash to the same location fixed size AE3B33OSD Lecture 7 / Page 26 2011 File System Efficiency and Performance Efficiency dependent on: disk allocation and directory algorithms types of data kept in file’s directory entry Performance disk cache – separate section of main memory for frequently used blocks free-behind and read-ahead – techniques to optimize sequential access improve PC performance by dedicating section of memory as virtual disk, or RAM disk AE3B33OSD Lecture 7 / Page 27 2011 Recovery from a Crash Consistency checking – compares data in directory structure with data blocks on disk, and tries to fix inconsistencies Use system programs to back up data from disk to another storage device (floppy disk, magnetic tape, other magnetic disk, optical) Recover lost file or disk by restoring data from backup AE3B33OSD Lecture 7 / Page 28 2011 Log Structured File Systems Log structured (or journaling) file systems record each update to the file system as a transaction similar to database systems All transactions are written to a log A transaction is considered committed once it is written to the log However, the file system may not yet be updated The transactions in the log are asynchronously written to the file system When the file system is modified, the transaction is removed from the log If the file system crashes, all remaining transactions in the log must still be performed Used by NTFS file system AE3B33OSD Lecture 7 / Page 29 2011 Short introduction to Networking AE3B33OSD Lecture 7 / Page 30 2011 Network Structure - LAN Local-Area Network (LAN) – designed to cover small geographical area. Multiaccess bus, ring, or star network Speed 10 megabits/second, or higher Broadcast is fast and cheap Nodes: usually workstations and/or personal computers a few (usually one or two) mainframes AE3B33OSD Lecture 7 / Page 31 2011 Network Structure - WAN Wide-Area Network (WAN) – links geographically separated sites Point-to-point connections over long-haul lines (often leased from a phone company) Broadcast usually requires multiple messages Nodes: AE3B33OSD often a high percentage of mainframes Lecture 7 / Page 32 2011 Network Topology Sites in the system can be physically connected in a variety of ways; they are compared with respect to the following criteria: Basic cost – How expensive is it to link various sites in the system? Communication cost – How long does it take to send a message from site A to site B? Reliability – If a link or a site in the system fails, can the remaining sites still communicate with each other? The various topologies are depicted as graphs whose nodes correspond to sites AE3B33OSD An edge from node A to node B corresponds to a direct connection between the two sites Lecture 7 / Page 33 2011 CESNET structure AE3B33OSD Lecture 7 / Page 34 2011 Communication Structure The design of a communication network must address four basic issues: Naming and name resolution How do two processes locate each other to communicate? Routing strategies How are messages sent through the network? Connection strategies How do two processes send a sequence of messages? Contention AE3B33OSD The network is a shared resource, so how do we resolve conflicting demands for its use? Lecture 7 / Page 35 2011 Communication Protocol The communication network is partitioned into the following multiple layers: Physical layer – handles the mechanical and electrical details of the physical transmission of a bit stream Data-link layer – handles the frames, or fixed-length parts of packets, including any error detection and recovery that occurred in the physical layer Network layer – provides connections and routes packets in the communication network, including handling the address of outgoing packets, decoding the address of incoming packets, and maintaining routing information for proper response to changing load levels AE3B33OSD Lecture 7 / Page 36 2011 Communication Protocol (Cont.) Transport layer – responsible for low-level network access and for message transfer between clients, including partitioning messages into packets, maintaining packet order, controlling flow, and generating physical addresses Session layer – implements sessions, or process-toprocess communications protocols Presentation layer – resolves the differences in formats among the various sites in the network, including character conversions, and half duplex/full duplex (echoing) Application layer – interacts directly with the users’ deals with file transfer, remote-login protocols and electronic mail, as well as schemas for distributed databases, etc. AE3B33OSD Lecture 7 / Page 37 2011 The ISO Protocol Layers AE3B33OSD Lecture 7 / Page 38 2011 Communication Via ISO Network Model AE3B33OSD Lecture 7 / Page 39 2011 The TCP/IP Protocol Layers AE3B33OSD Lecture 7 / Page 40 2011 Design Issues Transparency – the distributed system should appear as a conventional, centralized system to the user Fault tolerance – the distributed system should continue to function in the face of failure Scalability – as demands increase, the system should easily accept the addition of new resources to accommodate the increased demand Clusters – a collection of semi-autonomous machines that acts as a single system AE3B33OSD Lecture 7 / Page 41 2011 Example: Networking The transmission of a network packet between hosts on an Ethernet network Every host has a unique IP address and a corresponding Ethernet (MAC) address Communication requires both addresses Domain Name Service (DNS) can be used to acquire IP addresses Address Resolution Protocol (ARP) is used to map MAC addresses to IP addresses If the hosts are on the same network, ARP can be used AE3B33OSD If the hosts are on different networks, the sending host will send the packet to a router which routes the packet to the destination network Lecture 7 / Page 42 2011 An Ethernet Packet Destination Source Frame Message data Checksum address address type 64 bits 48 bits 48 bits 16 bits 368-12000 bits 32 bits Preamble Preamble: 7 bytes [10101010] + 1 byte [10101011] used to synchronize the transfer rate 6 byte addresses (MAC addresses) World unique addresses If destination address = 0xFF-FF-FF-FF-FF-FF then broadcast Frame type – indicates different standards: 0x0800 = the packet contains an IPv4 datagram 0x86DD indicates an IPv6 frame 0x0806 indicates an ARP frame Message data usually 1500 bytes AE3B33OSD Encapsulates packet with the transferred data Lecture 7 / Page 43 2011 Internet Architecture Basic Internet properties Each host has its unique identification: the IP address IP address is composed by the network address and the host address within the network The applications and the network API behavior is independent of the LAN technology Basic architecture of the Internet (and general internetworking) LAN 1 LAN 2 gateway or router LAN 3 Gateways and routers connect LAN’s gateway or router LAN’s may be based on different technologies Gateways keep info on hosts belonging to LAN’s they connect Routers maintain knowledge about networks Routers forward packets based on the network part of the IP address (the host part is ignored when forwarding) Units connecting LAN’s usually merge gateway and router functionality IP protocols consider all LAN’s as equivalent regardless of the technology AE3B33OSD Lecture 7 / Page 44 2011 Internet addresses Current Internet – v. 4 uses 32 bits addresses Convention: decimal numbers per 8 bits each – 147.32.85.1 Internet v. 6 uses 128 bits addresses Not treated here as still under experimental development IP address Identifies each single network adaptor Host can have several adaptors („multi-homed“ host) One adaptor can even have more addresses Is composed of two parts Identification (address) of the network – netid (leftmost bits) Identification (address) of the host within the network – hostid (rightmost bits) AE3B33OSD Lecture 7 / Page 45 2011 Internet addresses (2) Primary IP address classes 01234 Class A 0 Class B 1 0 8 16 netid 24 31 hostid netid Class C 1 1 0 255.0.0.0 hostid netid Net Mask hostid Address range 0.0.0.0 – 127.255.255.255 255.255.0.0 128.0.0.0 – 191.255.255.255 255.255.255.0 192.0.0.0 – 255.255.255.255 Reserved ranges: AE3B33OSD In A In B In C 0.0.0.0 – 0.255.255.255, 128.0.0.0 – 128.0.255.255, 192.0.0.0 – 192.0.0.255, 127.0.0.0 – 127.255.255.255, 191.255.0.0 – 191.255.255.255, 255.255.255.0 – 255.255.255.255 Lecture 7 / Page 46 2011 Internet addresses (3) Convention: Network address netid is the full IP address with hostid = 0 Address composed by netid and the hostid part is full of "1" serves for addressing all hosts in the network (network broadcast address) Net maska: Address „arithmetic“ IP _ Address NetMask netid IP _ Address ( NetMask ) hostid Special addresses 127.0.0.1 – loopback address – a host speaks to itself Private addresses – may not spread over Internet – routers must not forward datagrams containing these addresses 1 class A network: 10.0.0.0 – 10.255.255.255 16 class B networks: 172.16.0.0 – 172.31.255.255 256 class C networks: 192.168.0.0 – 192.168.255.255 Multicast addresses – one host sends info to many “subscribed” hosts (e.g. Internet TV) range 224.0.0.0 – 238.255.255.255 AE3B33OSD Lecture 7 / Page 47 2011 Internet addresses (4) CIDR addressing (= Classless Inter-Domain Routing) Address arithmetic enables for more efficient splitting netid|hostid – the border between netid and hostid may be anywhere Net mask may be any number composed of n (n=0 ... 32) leftmost “1” bits CIDR notation: IP_Address/n; e.g.: 147.32.85.128 – 147.32.85.191 = 147.32.85.128/26 Reserved ranges in CIDR notation: 0.0.0.0/8, 192.0.0.0/24, 127.0.0.0 /8, 255.255.255.0/24 128.0.0.0/16, 191.255.0.0/16, LAN 192.168.200.64/30 contains 4 addresses: 192.168.200.64 = netid, 192.168.200.65=host1, 192.168.200.66=host2, 192.168.200.67 = LAN broadcast Saving IP addresses Using private addresses and their translation to “public” addresses (NAT = Network Address Translation) Many private addresses is translated into 1 public Problem with publicly available servers on the private address LAN (behind the NAT router) The principle of NAT is connected to IP protocols Internet AE3B33OSD 147.32.85.27 Router with NAT Lecture 7 / Page 48 192.168.100.1 LAN with private IP addresses 2011 Internet datagrams Internet creates a virtual network and carries IP datagrams The network is a best effort delivery system Datagrams travel through Internet over different physicals LAN’s Datagrams may not depend on the LAN technology Format of an IP datagram Datagram header Frame header Datagram data part Frame data part Encapsulating IP datagram into physical network frame AE3B33OSD Lecture 7 / Page 49 2011 IP datagram header Every IP datagram has a header carrying information necessary for datagram delivery 0 4 8 16 19 24 31 VERS HLEN SERVICE TYPE TOTAL LENGTH IDENTIFICATION FLAGS FRAGMENT OFFSET TIME TO LIVE PROTOCOL HEADER CHECKSUM SOURCE IP ADDRESS DESTINATION IP ADDRESS IP OPTIONS (IF ANY) PADDING DATA … IP datagram format Fields VERS: IP protocol version – for IP v. 4 VERS = 4 HLEN: Header length in 32-bit words (standard 5) TOTAL LENGTH: of the datagram in bytes (header+data) – max. 64 kB SOURCE IP ADDRESS: sender’s IP address DESTINATION IP ADDRESS: IP address where to deliver IDENTIFICATION: usually sequential or a random number generated by the datagram sender AE3B33OSD Lecture 7 / Page 50 2011 IP datagram header (cont.) PROTOCOL: Identification of the protocol carried in the IP datagram (UDP=17, ICMP=1, TCP=6, ...). Defined in RFC 1060 FLAGS, FRAGMENT OFFSET: Information on datagram fragmentation TIME TO LIVE (TTL): Determines how long the datagram may travel through Internet. Every router decrements this value; if TTL==0 the datagram is discarded and the router sends an ICMP message to the sender SERVICE TYPE: 8-bit field commanding the datagram routing 0 1 2 PRECEDENCE Datagram precedence: 0=normal, 7=network control AE3B33OSD 3 4 D Low delay 5 T High throughput Lecture 7 / Page 51 6 R 7 UNUSED High reliability 2011 Datagram fragmentation (1) MTU: (Maximum Transmission Unit) defines the maximum datagram size that can be transferred in a LAN Net type Implicit MTU PPP 296 Ethernet 1 500 TokenRing 4Mb 4 464 Net type X.25 FDDI TokenRing 16Mb Implicit MTU 576 4 352 17 914 Internet – a set of LAN’s with different MTU’s Whenever the datagram is larger than MTU it must be fragmented Host A Net 1, MTU=1500 Host B G1 G2 Net 3, MTU=1500 Net 2, MTU=620 AE3B33OSD Lecture 7 / Page 52 2011 Datagram fragmentation (2) Fragmentation occurs anywhere during the datagram travel If the datagram is fragmented it is not defragmented on the way and the reconstruction is the task of the destination host Every fragment travels as a separate datagram: The following fields are copied from the original datagram header: VERS, HLEN, SERVICE TYPE, IDENTIFICATION, PROTOCOL, SOURCE IP ADDRESS, DESTINATION IP ADDRESS TOTAL LENGTH is changed to the fragment size a and the field FRAGMENT OFFSET determines the offset of the fragment in the original datagram Field FLAGS contains a bit more fragments. If this bit is 0 then the target host knows that the last fragment has been obtained. The target host reconstructs the original datagram using the FRAGMENT OFFSET and TOTAL LENGTH fields DATAGRAM HEADER Data_1 600 bytes FRAGMENT 1 HEADER Data_1 Fragment 1 (offset=0) FRAGMENT 2 HEADER Data_2 Fragment 2 (offset=600) FRAGMENT 3 HEADER Data_3 Data_2 600 bytes Data_3 200 bytes Fragment 3 (offset=1200) 1400 bytes datagram fragmentation AE3B33OSD Lecture 7 / Page 53 2011 UDP Protocol UDP (= User Datagram Protocol), IP protocol number = 17 Very simple transport protocol defined in RFC 768 Provides connectionless and unreliable transport of user datagrams. If any kind of transmission reliability is needed, it must be implemented in the user's application. Compared to pure IP datagrams, UDP has the ability to address target processes on the destination host using the port field UDP HEADER UDP DATA AREA IP HEADER FRAME HEADER IP DATA AREA FRAME DATA AREA UDP encapsulated in an IP datagram 0 16 UDP SOURCE PORT UDP MESSAGE LENGTH 31 UDP DESTINATION PORT UDP CHECKSUM UDP Header PORT fields are used to distinguish computing processes awaiting a UDP datagram on the destination host. SOURCE PORT field is optional; must be 0 if unused. Otherwise it contains the port number to which a possible reply should be sent. AE3B33OSD Lecture 7 / Page 54 2011 UDP Protocol (cont.) To ensure that different hosts on Internet will understand each other, IANA (= Internet Assigned Numbers Authority) issues the obligatory list of “Well Known Port Numbers” Selected UDP port numbers are: Port 0 7 9 13 53 67 68 69 137 Keyword echo discard daytime dns bootps bootpc tftp netbios-ns Meaning Reserved Echo the datagram Discard the datagram Time of the day Domain Name Service Bootstrap Protocol Server, DHCP Server Bootstrap Protocol Client, DHCP Client Trivial File Transfer NETBIOS Name Service … For the complete list see http://www.iana.org/assignments/port-numbers AE3B33OSD Lecture 7 / Page 55 2011 TCP Protocol for reliable data streaming TCP is the most important general reliable transport service providing a virtual bidirectional communication channel between two hosts TCP/IP is the IP implementation of this service TCP properties Data stream Applications communicating through a TCP connection consider the communication channel as a byte stream similarly to a file Virtual connection Before the data transmission can start, the communicating applications have to negotiate the connection by means of the network components of their local operating systems – create and connect the sockets. The protocol software (transport layer) in the operating systems of both hosts make an agreement on the connection using messages passed over the network. The hosts also verify that the connection can be reliably established and that both end-point systems are ready to communicate – open the sockets. Afterwards the end-point applications are informed about the established connection and the data communication can start. If the connection breaks, both communicating applications are informed The term virtual connection is used to create an illusion that applications are interconnected through a dedicated line. The reliability is ensured by full hand-shake communication (everything must be acknowledged by the other party) AE3B33OSD Lecture 7 / Page 56 2011 TCP/IP – TCP Protocol IP Implementation Besides of the general TCP properties, the TCP/IP implementation provides: Buffered transport (streaming) To improve the efficiency, the TCP/IP module in the OS assembles bytes from the stream into packets (datagrams) of reasonable size. If this is not desirable (e.g., TELNET ctrl-C), TCP/IP is equipped by a mechanism enforcing the priority transfer of a short datagram “out-oforder”. Full duplex connection Application processes can see the TCP/IP link as two independent data streams running in opposite directions without an apparent interaction. The protocol software actually acknowledges data running in one direction in the packets sent together with the data in the opposite direction. TCP HEADER IP HEADER TCP DATA AREA IP DATA AREA FRAME HEADER FRAME DATA AREA TCP/IP encapsulated in an IP datagram on a physical network AE3B33OSD Lecture 7 / Page 57 2011 TCP reliability solution The reliable transfer is ensured by positive acknowledging of the received data together with repeated packet transfer Events on the sender side Send packet 1 Datagrams on network Events on the receiver side Get packet 1 Send ACK1 Get ACK1 Send packet 2 Get packet 2 Send ACK2 Get ACK2 Lost packet are repeated based on suitable timeouts Events on the sender side Datagrams on network Events on the receiver side Packet lost Send packet 1 Start timer Packet should have arrived ACK should have been sent ACK should have arrived now but it timed-out. Resend packet 1 Start timer Get packet 1 Send ACK1 Get ACK1 Stop timer AE3B33OSD Lecture 7 / Page 58 2011 TCP reliability solution (cont.) Datagrams may get duplicated during the travel (both data and ACK). This problem is solved by sequential numbering of datagrams The problem of positive acknowledgement efficiency The moving window method Starting window position 1 2 3 4 5 6 7 8 The window moves 1 Events on the Massages on the net sender side 2 3 4 5 6 7 8 Events on the receiver side Send packet 1 Send packet 2 Get packet 1 Send ACK1 Send packet 3 Get packet 2 Send ACK2 Get ACK1 Get packet 3 Get ACK3 Get ACK2 Get ACK3 AE3B33OSD Lecture 7 / Page 59 2011 Retransmission timeouts Constant value of timeout when to resend the TCP/IP segment is inappropriate Internet is too heterogeneous and is composed of a huge number of LAN’s based on different HW technologies Giga-bit ethernet, 33 kbit serial line, intercontinental satellite link, etc. TCP/IP adapts to changing timing parameters of the virtual connection A simple adaptive algorithm to adjust the timeout is used The algorithm is based on continuous monitoring of „round trip time“ (RTT) Time between dispatching the packet and its acknowledgement. The real timeout is then computed as a weighted average of RTT measured in the recent history. This strategy quickly accommodates to the speed and load changes on the intermediate networks and routers AE3B33OSD Lecture 7 / Page 60 2011 Establishing the TCP connection TCP uses a three-stage procedure to establish the virtual connection: 1. 2. 3. In the first step, the connection initiator (client) sends the other party (server) a segment containing SYN bit = 1, randomly generated SEQUENCE NUMBER = x and an empty data section. The server responds by a segment with SYN and ACK set to 1, random SEQUENCE NUMBER = y and ACKNOWLEDGMENT NUMBER = x+1. After receiving this segment, the client acknowledges it by sending a segment with ACK set to 1, SYN bit=0 and the ACKNOWLEDGMENT NUMBER = y+1. This way the initial values of SEQUENCE NUMBER and ACKNOWLEDGMENT NUMBER fields are synchronized for the future life time of the virtual connection and are used to number the following TCP/IP segments The sequential numbers are random AE3B33OSD It enables to detect a failure or restart of hosts on the connection ends that can happen during a longer timeout Lecture 7 / Page 61 2011 TCP connection termination Connection is normally terminated on request of one of the connected applications Application tells TCP that there are no more data to be exchanged If server – passive close If client – active close TCP software closes the connection by sending a segment with a set FIN bit For the final termination of the virtual connection, it is necessary also to close the opposite direction. The party that received the segment with FIN bit reacts by sending a segment with FIN bit, too. The TCP connection can be terminated forcibly using the RST bit AE3B33OSD Lecture 7 / Page 62 2011 Well known TCP ports Examples of some TCP/IP ports Port 0 7 13 20 21 22 23 25 53 79 80 110 443 Keyword echo daytime ftp-data ftp ssh telnet smtp domain finger http pop3 https Meaning Reserved Echo the datagram Time of the day File Transfer Protocol (data stream) File Transfer Protocol (controls) Secure Terminal Connection Terminal Connection Simple Mail Transport Protocol Domain Name Server Query Finger service (who is logged on?) World-wide web Post Office Protocol - V3 – get incoming mail from server Secured world-wide web and many others (see http://www.iana.org/assignments/port-numbers) AE3B33OSD Lecture 7 / Page 63 2011 TCP finite state machine Dashed arrows – server side state transitions Solid arrows – client side state transitions Lower left frame – active connection close Lower right frame – passive connection close appl: transition caused by system call from application recv: transition caused by received segment send: shows what is sent in reaction to state change AE3B33OSD Lecture 7 / Page 64 2011 End of Lecture 7 Questions?