Computer Networking:
a Top-Down Approach (8th ed.) :
Notes of "Select" Lectures
Brief Content
Chapter 1 Computer Networks and the Internet
Chapter 4 Network Layer - data plane
Chapter 5 Network Layer - control plane
Chapter 7 Wireless and Mobile Networks
Course Information
Computer Networks
Professor Jim Kurose
COMPSCI 453
College of Information and Computer Sciences
University of Massachusetts
Class textbook:
Computer Networking: a Top-Down Approach (8th ed.) J.F. Kurose, K.W. Ross, Pearson, 2020
J.F. Kurose, K.W. Ross, Pearson, 2020
http://gaia.cs.umass.edu/kurose_ross
Related Resources
Book Edition |
8th ed. en-us |
7th ed. zh-cn |
Book home |
||
Book PDF |
||
PowerPoint |
||
Solutions etc. |
|
|
Online lectures |
|
|
High quality reading notes |
|
Foreword
Learning Motives
With my second attempt at the postgraduate entrance exam doomed, I started to study myself subjects of computer science that might be useful for all of my possible future circumstances –to eventually receive an interview invitation from a school and accept it, to prepare for a beginner developer's interview, or to fail to be enrolled this year and make a third try. I decided to study myself computer networking first, by watching online lectures and reading slides of the book Computer Networking. However, I found it too challenging to acquire knowledge without a translator. Meanwhile, the absence of terminology translation does not align with real-life study and work. And what is most annoying, knowledge in my mind is not well-organized. To cope with issues above, I believe it is wise to write down, rearrange or even simply copy and paste key points. Time limited, these series of notes will only cover core topics (concepts and principles, e.g.) such as the overview, important layers and protocols.
How I Write
• Content of the series of notes is based on (from mostly to least):
• Part of Professor Jim Kurose's lecture videos and PowerPoints which I believe is of importance.
• The original book mainly for chapter/section numbering. Meanwhile, I use it for figure screenshots, definition/mechanism checks and comprehensive knowledge acquisition.
• The Chinese version of the book for terminology referencing.
• Others' notes.
• Microsoft Word operations.
• Share the document via "Post to blog".
• Convert numbers to text (thanks to cmt); resize all pictures to 100%.
• Publish it to cnblogs as draft.
• cnblogs operations.
• Set titles, tags, alias URL, etc. for posts.
• Copy the fragment needed of the draft to the corresponding post.
Chapter 1 Computer Networks and the Internet
1.1 What Is the Internet?
A nuts-and-bolts and a services description. What is a protocol?
1.1.1 A Nuts-and-Bolts Description
Billions of connected computing devices:
• hosts (主机)= end systems (端系统)
• running network apps (应用) at Internet's "edge" (边缘)
Packet switches (分组交换机) : forward packets (分组) (chunks of data)
• routers (路由器) , switches (交换机)
Communication links (通信链路)
• fiber (光纤) , copper (铜) , radio (无线电) , satellite (卫星)
• transmission rate (传输速率): bandwidth (带宽)
Networks
• collection of devices, routers, links: managed by an organization
Internet: "network of networks"
• Interconnected ISPs (Internet Service Providers, 因特网服务提供商)
Protocols are everywhere
• control sending, receiving of messages (报文)
• e.g., HTTP (Web), streaming video, Skype, TCP, IP, WiFi, 4G, Ethernet
Internet standards
• RFC: Request for Comments (请求评论)
• IETF: Internet Engineering Task Force (因特网工程任务组)
1.1.2 A Services Description
Infrastructure that provides services to applications:
• Web, streaming video, multimedia teleconferencing, email, games, e-commerce, social media, inter-connected appliances, …
provides programming interface to distributed applications (分布式应用程序):
• "hooks" allowing sending/receiving apps to "connect" to, use Internet transport service
• provides service options, analogous to postal service
1.1.3 What Is a Protocol?
Network protocols:
• computers (devices) rather than humans
• all communication activity in Internet governed by protocols
Protocols (协议) define the format, order of messages sent and received among network entities, and actions taken on message transmission, receipt
1.2 The Network Edge
Access networks, physical media
A closer look at Internet structure
Network edge (网络边缘) :
• hosts (主机): clients and servers
• servers often in data centers (数据中心)
Access networks (接入网), physical media (物理媒体):
• wired, wireless communication links
Network core (网络核心) :
• interconnected routers
• network of networks
1.2.1 Access Networks
Q: How to connect end systems to edge router?
• residential access nets
• institutional access networks (school, company)
• mobile access networks (WiFi, 4G/5G)
Cable-based Access
Frequency division multiplexing (FDM, 频分复用): different channels transmitted in different frequency bands
HFC: hybrid fiber coax (混合光纤同轴)
• asymmetric: up to 40 Mbps – 1.2 Gbps downstream transmission rate, 30-100 Mbps upstream transmission rate
network of cable, fiber attaches homes to ISP router
• homes share access network to cable headend
Digital Subscriber Line (DSL, 数字用户线)
Use existing telephone line to central office DSLAM (digital subscriber line access multiplexer, 数字用户线接入复用器)
• data over DSL phone line goes to Internet
• voice over DSL phone line goes to telephone net
• 24-52 Mbps dedicated downstream transmission rate
• 3.5-16 Mbps dedicated upstream transmission rate
Home Networks
Wireless Access Networks
• Shared wireless access network connects end system to router
• via base station (基站) aka "access point" (接入点)
Wireless local area networks (WLANs)
• typically within or around building (~100 ft)
• 802.11b/g/n (WiFi): 11, 54, 450 Mbps transmission rate
Wide-area cellular access networks
• provided by mobile, cellular (蜂窝) network operator (10's km)
• 10's Mbps
• 4G cellular networks (5G coming)
Enterprise Networks
companies, universities, etc.
mix of wired, wireless link technologies, connecting a mix of switches and routers
• Ethernet: wired access at 100Mbps, 1Gbps, 10Gbps
• WiFi: wireless access points at 11, 54, 450 Mbps
Data Center Networks
high-bandwidth links (10s to 100s Gbps) connect hundreds to thousands of servers together, and to Internet
Host: sends packets of data
host sending function:
• takes application message
• breaks into smaller chunks, known as packets, of length L bits
• transmits packet into access network at transmission rate R
• link transmission rate, aka link capacity, aka link bandwidth
packet transmission delay = time needed to transmit L-bit packet into link = L (bits) /R (bits/sec)
1.2.2 Physical Media
bit: propagates (传播) between transmitter/receiver (发射器—接收器) pairs
physical link: what lies between transmitter & receiver
guided media (导引型媒体):
• signals propagate in solid media: copper, fiber, coax
unguided media (非导引型媒体):
• signals propagate freely, e.g., radio
Twisted pair (TP, 双绞铜线)
two insulated (绝缘的) copper wires
• Category 5: 100 Mbps, 1 Gbps Ethernet
• Category 6: 10Gbps Ethernet
Coaxial cable (同轴电缆)
two concentric (同心的) copper conductors
bidirectional
broadband:
• multiple frequency channels on cable
• 100's Mbps per channel
Fiber optic cable (光纤电缆)
glass fiber carrying light pulses, each pulse a bit
high-speed operation:
• high-speed point-to-point transmission (10's-100's Gbps)
low error rate:
• repeaters spaced far apart
• immune to electromagnetic noise
Wireless radio
signal carried in various "bands" in electromagnetic spectrum
no physical "wire"
broadcast, "half-duplex" (半双工,sender to receiver)
propagation environment effects:
• reflection
• obstruction by objects
• interference/noise
Radio link types:
• Wireless LAN (WiFi)
• 10-100's Mbps; 10's of meters
• wide-area (e.g., 4G cellular)
• 10's Mbps over ~10 Km
• Bluetooth: cable replacement
• short distances, limited rates
• terrestrial (陆地的) microwave
• point-to-point (点对点); 45 Mbps channelsssssssssssssssssssssssssssss sssssssssssssss
• satellite
• up to 45 Mbps per channel
• 270 msec end-end delay
1.3 Network Core
Forwarding, routing; packet switching; circuit switching; a network of networks
The network core
Mesh (网状物) of interconnected routers
Packet-switching (分组交换): hosts break application-layer messages into packets
• network forwards packets from one router to the next, across links on path from source to destination
Two key network-core functions
Forwarding (转发) :
• aka "switching" (交换)
• local action: move arriving packets from router's input link to appropriate router output link
Routing (路由):
• global action: determine source-destination paths taken by packets
• routing algorithms
1.3.1 Packet Switching
Packet transmission delay (时延): takes L/R seconds to transmit (push out) L-bit packet into link at R bps
Store-and-Forward (存储转发) Transmission
Store and forward: entire packet must arrive at router before it can be transmitted on next link
One-hop (跳) numerical example:
• L = 10 Kbits
• R = 100 Mbps
• one-hop transmission delay = 0.1 msec
Queuing Delays and Packet Loss
Queueing occurs when work arrives faster than it can be serviced
Packet queuing and loss: if arrival rate (in bps) to link exceeds transmission rate (bps) of link for some period of time:
• packets will queue, waiting to be transmitted on output link
• packets can be dropped (lost) if memory (buffer) in router fills up
1.3.2 Circuit Switching (电路交换)
End-end resources allocated to, reserved for "call" between source and destination
• in diagram, each link has four circuits.
• call gets 2nd circuit in top link and 1st circuit in right link.
• dedicated (专用的) resources: no sharing
• circuit-like (guaranteed) performance
• circuit segment idle if not used by call (no sharing)
• commonly used in traditional telephone networks
Multiplexing in Circuit-Switched Networks
Frequency Division Multiplexing (FDM, 频分复用)
• optical, electromagnetic frequencies divided into (narrow) frequency bands
• each call allocated its own band, can transmit at max rate of that narrow band
Time Division Multiplexing (TDM, 时分复用)
• time divided into slots
• each call allocated periodic slot(s), can transmit at maximum rate of (wider) frequency band (only) during its time slot(s)
Packet Switching Versus Circuit Switching
Is packet switching a "slam dunk winner" (必定成功的事;稳操胜券的事) ?
• great for "bursty" data – sometimes has data to send, but at other times not
• resource sharing
• simpler, no call setup
• excessive congestion possible: packet delay and loss due to buffer overflow
• protocols needed for reliable data transfer, congestion control (拥塞控制)
1.3.3 A Network of Networks
hosts connect to Internet via access Internet Service Providers (ISPs)
access ISPs in turn must be interconnected
• so that any two hosts (anywhere!) can send packets to each other
resulting network of networks is very complex
• evolution driven by economics, national policies
At "center": small # of well-connected large networks
• "tier-1" commercial ISPs (e.g., Level 3, Sprint, AT&T, NTT), national & international coverage
• content provider networks (e.g., Google, Facebook, 内容提供商网络): private network that connects its data centers to Internet, often bypassing tier-1, regional ISPs
1.4 Delay (时延), Loss (丢包), and Throughput (吞吐量) in Packet-Switched Networks
Packet delay and loss, end-end throughput
1.4.1 Overview of Delay in Packet-Switched Networks
Packets queue in router buffers, waiting for turn for transmission
• queue length grows when arrival rate to link (temporarily) exceeds output link capacity
packet loss occurs when memory to hold queued packets fills up
Types of Delay
dnodal = dproc + dqueue + dtrans + dprop
dproc: Processing Delay (处理时延)
• check bit errors
• determine output link
• typically < microsecs
dqueue: QueueingDelay (排队时延)
• time waiting at output link for transmission
• depends on congestion level of router
dtrans: Transmission Delay (传输时延)
• L: packet length (bits)
• R: link transmission rate (bps)
• dtrans = L/R
dprop: Propagation Delay (传播时延)
• d: length of physical link
• s: propagation speed (~2x108 m/sec)
• dprop = d/s
dtrans and dprop
very different
1.4.2 Queuing Delay and Packet Loss
a: average packet arrival rate
L: packet length (bits)
R: link bandwidth (bit transmission rate)
La/R : arrival rate of bits / service rate of bits "traffic intensity"
La/R ~ 0: avg. queueing delay small
La/R -> 1: avg. queueing delay large
La/R > 1: more "work" arriving is more than can be serviced - average delay infinite!
Packet Loss
queue (aka buffer) preceding link in buffer has finite capacity
packet arriving to full queue dropped (aka lost)
lost packet may be retransmitted by previous node, by source end system, or not at all
1.4.3 End-to-End Delay
What do "real" Internet delay & loss look like?
traceroute program: provides delay measurement from source to router along end-end Internet path towards destination. For all i:
• sends three packets that will reach router i on path towards destination (with time-to-live (生存时间) field value of i)
• router i will return packets to sender
• sender measures time interval between transmission and reply
1.4.4 Throughput in Computer Networks
Throughput: rate (bits/time unit) at which bits are being sent from sender to receiver
• instantaneous (瞬时) : rate at given point in time
• average (平均): rate over longer period of time
bottleneck link (瓶颈链路)
link on end-end path that constrains end-end throughput
per-connection end-end throughput: min(Rc, Rs , R/10)
in practice: Rc or Rs is often bottleneck
1.5 Protocol Layers and Their Service Models
Layered architecture, encapsulation.
1.5.1 Layered Architecture
Networks are complex, with many "pieces":
• hosts
• routers
• links of various media
• applications
• protocols
• hardware, software
layers: each layer implements a service
• via its own internal-layer actions
• relying on services provided by layer below
Why layering?
Approach to designing/discussing complex systems:
• explicit structure allows identification, relationship of system's pieces
• layered reference model for discussion
• modularization eases maintenance, updating of system
• change in layer's service implementation: transparent to rest of system
• e.g., change in gate procedure doesn't affect rest of system
Protocol Layering
Layered Internet protocol stack (协议栈)
Application Layer
application: supporting network applications
• HTTP, IMAP, SMTP, DNS
Transport Layer
transport: process-process data transfer
• TCP, UDP
Network Layer
network: routing of datagrams from source to destination
Link Layer
• IP, routing protocols
link: data transfer between neighboring network elements
Physical Layer
• Ethernet, 802.11 (WiFi), PPP
physical: bits "on the wire"
1.5.2 Encapsulation
Application exchanges messages (报文) to implement some application service using services of transport layer
Transport-layer protocol transfers M (e.g., reliably) from one process to another, using services of network layer
• transport-layer protocol encapsulates application-layer message, M, with transport layer-layer header Ht to create a transport-layer segment (报文段)
• Ht used by transport layer protocol to implement its service
Network-layer protocol transfers transport-layer segment [Ht | M] from one host to another, using link layer services
• network-layer protocol encapsulates transport-layer segment [Ht | M] with network layer-layer header Hn to create a network-layer datagram (数据报)
• Hn used by network layer protocol to implement its service
Link-layer protocol transfers datagram [Hn| [Ht |M] from host to neighboring host, using network-layer services
• link-layer protocol encapsulates network datagram [Hn| [Ht |M], with link-layer header Hl to create a link-layer frame (帧)
Encapsulation: an end-end view
1.6 Networks Under Attack
What can bad actors do? What defenses?
Internet not originally designed with (much) security in mind
• original vision: "a group of mutually trusting users attached to a transparent network"
• Internet protocol designers playing "catch-up"
• security considerations in all layers!
We now need to think about:
• how bad guys can attack computer networks
• how we can defend networks against attacks
• how to design architectures that are immune to attacks
The Bad Guys Can Attack Servers and Network Infrastructure
Denial of Service (DoS, 拒绝服务): attackers make resources (server, bandwidth) unavailable to legitimate traffic by overwhelming resource with bogus (伪造的) traffic
1. select target
2. break into hosts around the network (see botnet)
3. send packets to target from compromised hosts (受害主机)
The Bad Guys Can Sniff Packets
Packet "sniffing" (嗅探分组):
broadcast media (shared Ethernet, wireless)
promiscuous network interface reads/records all packets (e.g., including passwords!) passing by
The Bad Guys Can Masquerade as Someone You Trust
IP spoofing (IP哄骗): injection of packet with false source address
Lines of defense
Authentication (鉴别): proving you are who you say you are
• cellular networks provides hardware identity via SIM card; no such hardware assist in traditional Internet
confidentiality (机密性): via encryption
integrity checks (完整性检查): digital signatures prevent/detect tampering (篡改)
access restrictions: password-protected VPNs
firewalls: specialized "middleboxes" (中间盒) in access and core networks:
• off-by-default: filter incoming packets to restrict senders, receivers, applications
• detecting/reacting to DOS attacks
1.7 Internet history
From 1961 until today!
1.7.1 The Development of Packet Switching
Early packet-switching principles
1.7.2 Proprietary Networks (专用网络) and Internetworking
Internetworking, new and proprietary networks
1.7.3 A Proliferation (激增) of Networks
New protocols, a proliferation of networks
1.7.4 The Internet Explosion
Commercialization, the Web, new applications
1.7.5 The New Millennium
Scale, SDN, mobility, cloud
Chapter 2 Application Layer
2.1 Principles of Network Applications
Applications; client-server, P2P, sockets, APIs; transport services
2.1.1 Network Application Architectures (应用程序体系结构)
Client-server Architecture(客户-服务器架构)
Server:
• always-on host
• permanent IP address
• often in data centers, for scaling (扩展)
Clients:
• contact, communicate with server
• may be intermittently (间歇) connected
• may have dynamic IP addresses
• do not communicate directly with each other
• examples: HTTP, IMAP, FTP
P2P Architecture
• no always-on server
• arbitrary end systems directly communicate
• peers request service from other peers, provide service in return to other peers
• self scalability – new peers bring new service capacity, as well as new service demands
• peers are intermittently connected and change IP addresses
• complex management
• example: P2P file sharing
2.1.2 Processes communicating
Process (进程): program running within a host
• within same host, two processes communicate using inter-process communication (defined by OS)
• processes in different hosts communicate by exchanging messages (报文)
Client and Server Processes
client (客户) process: process that initiates communication
server (服务器) process: process that waits to be contacted
• note: applications with P2P architectures have client processes & server processes
The Interface Between the Process and the Computer Network
• process sends/receives messages to/from its socket (套接字)
• socket analogous to door
• sending process shoves message out door
• sending process relies on transport infrastructure on other side of door to deliver message to socket at receiving process
• two sockets involved: one on each side
Addressing Processes
To receive messages, process must have identifier (标识符)
host device has unique 32-bit IP address
Q: does IP address of host on which process runs suffice for (足够,足以) identifying the process?
• A: no, many processes can be running on same host
identifier includes both IP address (IP地址) and port numbers (端口号) associated with process on host.
example port numbers:
• HTTP server: 80
• mail server: 25
to send HTTP message to gaia.cs.umass.edu web server:
• IP address: 128.119.245.12
• port number: 80
2.1.3 Transport Services Available to Applications
Reliable Data Transfer (可靠数据传输)
Some apps (e.g., file transfer, web transactions) require 100% reliable data transfer
Other apps (e.g., audio) can tolerate some loss
Throughput (吞吐量)
Some apps (e.g., multimedia) require minimum amount of throughput to be "effective"
Other apps ("elastic apps" (弹性应用)) make use of whatever throughput they get
Timing (定时)
Some apps (e.g., Internet telephony, interactive games) require low delay to be "effective"
Security (安全性)
Encryption, data integrity, …
2.1.4 Transport Services Provided by the Internet
Internet transport protocols services
TCP service |
UDP service |
• reliable transport (可靠的传输) between sending and receiving process • flow control (流量控制): sender won't overwhelm receiver • congestion control (拥塞控制): throttle (抑制) sender when network overloaded • connection-oriented (面向连接的): setup required between client and server processes • does not provide: timing, minimum throughput guarantee, security |
• unreliable data transfer between sending and receiving process • does not provide: reliability, flow control, congestion control, timing, throughput guarantee, security, or connection setup. |
Internet applications, and transport protocols
Securing TCP
Vanilla TCP & UDP sockets:
• no encryption
• cleartext (明文) passwords sent into socket traverse Internet in cleartext (!)
Transport Layer Security (TLS)
• provides encrypted TCP connections
• data integrity
• end-point authentication
TLS implemented in application layer
• apps use TLS libraries, that use TCP in turn
• cleartext sent into "socket" traverse Internet encrypted
2.1.5 Application-Layer Protocols
An application-layer protocol defines
• types of messages exchanged (交换的报文类型),
• e.g., request, response
• message syntax (报文的语法):
• what fields in messages & how fields are delineated (描述)
• message semantics (报文的语义)
• meaning of information in fields
• rules for when and how processes send & respond to messages
Open protocols (开放的协议):
• defined in RFCs, everyone has access to protocol definition
• allows for interoperability (相互操作)
• e.g., HTTP, SMTP
Proprietary protocols (专用协议):
• e.g., Skype, Zoom
2.2 The Web and HTTP
Overview, statelessness, HTTP messages, cookies, caching, HTTP/2
• web page consists of objects (对象), each of which can be stored on different Web servers
• object can be HTML file, JPEG image, Java applet, audio file, …
• web page consists of base HTML-file (HTML基本文件) which includes several referenced objects (引用对象), each addressable by a URL, e.g.,
2.2.1 Overview of HTTP
HTTP: HyperText Transfer Protocol (超文本传输协议)
• Web's application-layer protocol
• client/server model:
• client: browser (浏览器) that requests, receives, (using HTTP protocol) and "displays" Web objects
• server: Web server sends (using HTTP protocol) objects in response to requests
HTTP uses TCP:
• client initiates TCP connection (creates socket) to server, port 80
• server accepts TCP connection from client
• HTTP messages (application-layer protocol messages) exchanged between browser (HTTP client) and Web server (HTTP server)
• TCP connection closed
HTTP is "stateless" (无状态的)
• server maintains no information about past client requests
aside
• protocols that maintain "state" are complex!
• past history (state) must be maintained
• if server/client crashes, their views of "state" may be inconsistent, must be reconciled (折中)
2.2.2 Non-Persistent and Persistent Connections
Non-persistent (非持续) HTTP
1. TCP connection opened
2. at most one object sent over TCP connection
3. TCP connection closed
downloading multiple objects required multiple connections
Persistent (持续) HTTP
• TCP connection opened to a server
• multiple objects can be sent over single TCP connection between client, and that server
• TCP connection closed
2.2.2.1 HTTP with Non-Persistent Connections
Example
User enters URL: www.someSchool.edu/someDepartment/home.index
(containing text, references to 10 jpeg images)
1a. HTTP client initiates TCP connection to HTTP server (process) at www.someSchool.edu on port 80
1b. HTTP server at host www.someSchool.edu waiting for TCP connection at port 80 "accepts" connection, notifying client
2. HTTP client sends HTTP request message (containing URL) into TCP connection socket. Message indicates that client wants object someDepartment/home.index
3. HTTP server receives request message, forms response message containing requested object, and sends message into its socket
4. HTTP server closes TCP connection.
5. HTTP client receives response message containing html file, displays html. Parsing html file, finds 10 referenced jpeg objects
6. Steps 1-5 repeated for each of 10 jpeg objects
Response time
RTT (Round-Trip Time, 往返时间, definition): time for a small packet to travel from client to server and back
HTTP response time (per object):
• one RTT to initiate TCP connection
• one RTT for HTTP request and first few bytes of HTTP response to return
• object/file transmission time
Non-persistent HTTP response time = 2RTT+ file transmission time
2.2.2.2 HTTP with Persistent Connections
Non-persistent HTTP issues:
• requires 2 RTTs per object
• OS overhead for each TCP connection
• browsers often open multiple parallel TCP connections to fetch referenced objects in parallel
Persistent HTTP (HTTP1.1):
• server leaves connection open after sending response
• subsequent HTTP messages between same client/server sent over open connection
• client sends requests as soon as it encounters a referenced object
• as little as one RTT for all the referenced objects (cutting response time in half)
2.2.3 HTTP Message Format
two types of HTTP messages: request, response
2.2.3.1 HTTP request message
ASCII (human-readable format)
request line (请求行, GET, POST, HEAD commands) |
GET /index.html HTTP/1.1\r\n |
carriage return character line-feed character |
header lines (首部行) |
Host: www-net.cs.umass.edu\r\n User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:80.0) Gecko/20100101 Firefox/80.0 \r\n Accept: text/html,application/xhtml+xml\r\n Accept-Language: en-us,en;q=0.5\r\n Accept-Encoding: gzip,deflate\r\n Connection: keep-alive\r\n \r\n |
carriage return, line feed at start of line indicates end of header lines |
General Format
Other HTTP request messages
POST method:
• web page often includes form input
• user input sent from client to server in entity body of HTTP POST request message
GET method (for sending data to server):
• include user data in URL field of HTTP GET request message (following a '?'):
www.somesite.com/animalsearch?monkeys&banana
HEAD method:
• requests headers (only) that would be returned if specified URL were requested with an HTTP GET method.
PUT method:
• uploads new file (object) to server
• completely replaces file that exists at specified URL with content in entity body of POST HTTP request message
2.2.3.2 HTTP Response Message
status line (状态行, protocol status code status phrase) |
HTTP/1.1 200 OK |
header lines (首部行) |
Date: Tue, 08 Sep 2020 00:53:20 GMT Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.2k-fips PHP/7.4.9 mod_perl/2.0.11 Perl/v5.16.3 Last-Modified: Tue, 01 Mar 2016 18:57:50 GMT ETag: "a5b-52d015789ee9e" Accept-Ranges: bytes Content-Length: 2651 Content-Type: text/html; charset=UTF-8 \r\n |
data, e.g., requested HTML file |
data data data data data ... |
HTTP response status codes
status code appears in 1st line in server-to-client response message.
some sample codes:
200 OK
• request succeeded, requested object later in this message
301 Moved Permanently
• requested object moved, new location specified later in this message (in Location: field)
400 Bad Request
• request msg not understood by server
404 Not Found
• requested document not found on this server
505 HTTP Version Not Supported
2.2.4 User-Server Interaction: Cookies
Web sites and client browser use cookies to maintain some state between transactions
four components:
1) cookie header line of HTTP response message
2) cookie header line in next HTTP request message
3) cookie file kept on user's host, managed by user's browser
4) back-end database at Web site
HTTP cookies: comments
What cookies can be used for:
• authorization
• shopping carts
• recommendations
• user session state (Web e-mail)
Challenge: How to keep state?
• at protocol endpoints: maintain state at sender/receiver over multiple transactions
• in messages: cookies in HTTP messages carry state
aside
cookies and privacy:
• cookies permit sites to learn a lot about you on their site.
• third party persistent cookies (tracking cookies) allow common identity (cookie value) to be tracked across multiple web sites
2.2.5 Web Caching (Web缓存)
Goal: satisfy client requests without involving origin server
• user configures browser to point to a (local) Web cache
• browser sends all HTTP requests to cache
• if object in cache: cache returns object to client
• else cache requests object from origin server, caches received object, then returns object to client
Web caches (aka proxy servers, 代理服务器)
• Web cache acts as both client and server
• server for original requesting client
• client to origin server
• server tells cache about object's allowable caching in response header:
Why Web caching?
• reduce response time for client request
• cache is closer to client
• reduce traffic on an institution's access link
• Internet is dense with caches
• enables "poor" content providers to more effectively deliver content
The Conditional GET (条件 GET)
Goal: don't send object if cache has up-to-date cached version
• no object transmission delay (or use of network resources)
• client: specify date of cached copy in HTTP request
If-modified-since: <date>
• server: response contains no object if cached copy is up-to-date:
HTTP/1.0 304 Not Modified
2.2.6 HTTP/2
Key goal: decreased delay in multi-object HTTP requests
HTTP/2: [RFC 7540, 2015] increased flexibility at server in sending objects to client:
• methods, status codes, most header fields unchanged from HTTP 1.1
• transmission order of requested objects based on client-specified object priority (not necessarily FCFS)
• push unrequested objects to client
• divide objects into frames, schedule frames to mitigate HOL blocking
HTTP/2: mitigating HOL (Head of Line) blocking (线路前部阻塞)
HTTP 1.1: client requests 1 large object (e.g., video file) and 3 smaller objects
HTTP/2: objects divided into frames, frame transmission interleaved (尤指将片状物插入,夹进)
HTTP/2 to HTTP/3
HTTP/2 over single TCP connection means:
• recovery from packet loss still stalls (暂缓;搁置;停顿) all object transmissions
• as in HTTP 1.1, browsers have incentive (激励) to open multiple parallel TCP connections to reduce stalling, increase overall throughput
• no security over vanilla TCP connection
• HTTP/3: adds security, per object error- and congestion-control (more pipelining,流水线) over UDP
• more on HTTP/3 in transport layer
2.3
2.4 The Domain Name Service: DNS
Internet hosts, routers:
• IP address (IP地址, 32 bit) - used for addressing datagrams
• "name" - used by humans
2.4.1 Services Provided by DNS
• Domain Name System (域名系统, DNS):
• a distributed database (分布式数据库) implemented in a hierarchy of DNS servers
• an application-layer protocol that allows hosts to query the distributed database (address/name translation)
DNS: services
• hostname-to-IP-address translation
• host aliasing (主机别名)
• canonical (规范), alias names
• mail server aliasing (邮件服务器别名)
• load distribution (负载分配)
• replicated (冗余的) Web servers: many IP addresses correspond to one name
2.4.2 Overview of How DNS Works
Q: Why not centralize DNS?
• single point of failure (单点故障)
• traffic volume (通信容量)
• distant centralized database (远距离的集中式数据库)
• maintenance (维护)
A: doesn't scale (有可扩展能力)!
A distributed, Hierarchical Database
Client wants IP address for www.amazon.com; 1st approximation:
• client queries root server to find .com DNS server
• client queries .com DNS server to get amazon.com DNS server
• client queries amazon.com DNS server to get IP address for www.amazon.com
Root DNS servers (根DNS服务器)
• official, contact-of-last-resort by name servers that cannot resolve name
• incredibly important Internet function
• Internet couldn't function without it!
• DNSSEC – provides security (authentication, message integrity)
• ICANN (Internet Corporation for Assigned Names and Numbers) manages root DNS domain
Top-level domain (TLD) servers (*域名服务器)
• responsible for .com, .org, .net, .edu, .aero, .jobs, .museums, and all top-level country domains, e.g.: .cn, .uk, .fr, .ca, .jp
• Network Solutions: authoritative registry for .com, .net TLD
• Educause: .edu TLD
Authoritative DNS servers (权威DNS服务器)
• organization's own DNS server(s), providing authoritative hostname to IP mappings for organization's named hosts
• can be maintained by organization or service provider
Local DNS server (本地DNS服务器)
• when host makes DNS query, it is sent to its local DNS server
• Local DNS server returns reply, answering:
• from its local cache of recent name-to-address translation pairs
• forwarding the query into DNS hierarchy
• each ISP has local DNS server; to find yours:
• MacOS: % scutil --dns
• Windows: >ipconfig /all
• local DNS server doesn't strictly belong to hierarchy
DNS name resolution
Example: host at engineering.nyu.edu wants IP address for gaia.cs.umass.edu
Iterative query (迭代查询):
• contacted server replies with name of server to contact
• "I don't know this name, but ask this server"
Recursive query (递归查询):
• puts burden of name resolution on contacted name server
• heavy load at upper levels of hierarchy?
DNS Caching (DNS缓存)
• once (any) name server learns mapping, it caches mapping, and immediately returns a cached mapping in response to a query
• caching improves response time
• cache entries timeout (disappear) after some time (TTL)
• TLD servers typically cached in local name servers
• cached entries may be out-of-date
• if named host changes IP address, may not be known Internet-wide until all TTLs expire!
• best-effort (尽力而为) name-to-address translation!
2.4.3 DNS Records and Messages
RR (resource records, 资源记录) format: (name, value, type, ttl)
type=A
• name is hostname
• value is IP address
type=NS
• name is domain
• value is hostname of authoritative name server for this domain
type=CNAME
• name is alias name for some "canonical" (the real) name
• value is canonical name
type=MX
• value is name of SMTP mail server associated with name
DNS Messages
DNS query and reply messages, both have same format:
Inserting Records into the DNS Database
• register name at DNS registrar (注册登记机构)
• create authoritative server locally with the IP address
DNS security
DDoS (分布式拒绝服务) attacks
• bombard root servers with traffic
• not successful to date
• traffic filtering
• local DNS servers cache IPs of TLD servers, allowing root server bypass
• bombard TLD servers
• potentially more dangerous
Spoofing (哄骗) attacks
• intercept (截获) DNS queries, returning bogus (伪造的) replies
• DNS cache poisoning (毒害)
• RFC 4033: DNSSEC authentication services
2.7 Socket Programming: Creating Network Applications
Socket abstraction, UDP and TCP socket programming
Goal: learn how to build client/server applications that communicate using sockets
Socket: door between application process and end-end-transport protocol
Two socket types for two transport services:
• UDP: unreliable datagram
• TCP: reliable, byte stream-oriented
Application Example:
1. client reads a line of characters (data) from its keyboard and sends data to server
2. server receives the data and converts characters to uppercase
3. server sends modified data to client
4. client receives modified data and displays line on its screen
2.7.1 Socket programming with UDP
UDP: no "connection" between client and server:
• no handshaking before sending data
• sender explicitly attaches IP destination address and port # to each packet
• receiver extracts sender IP address and port# from received packet
UDP: transmitted data may be lost or received out-of-order
Application viewpoint:
• UDP provides unreliable transfer of groups of bytes ("datagrams") between client and server processes
Client/server socket interaction:
UDPClient.py
Python UDPClient |
||
include Python's socket library |
→ |
from socket import * |
serverName = 'hostname' |
||
serverPort = 12000 |
||
create UDP socket for server |
→ |
clientSocket = socket(AF_INET, SOCK_DGRAM) |
get user keyboard input |
→ |
message = raw_input('Input lowercase sentence:') |
attach server name, port to message; send into socket |
→ |
clientSocket.sendto(message.encode(), (serverName, serverPort)) |
read reply characters from socket into string |
→ |
modifiedMessage, serverAddress = clientSocket.recvfrom(2048) |
print out received string and close socket |
→ |
print modifiedMessage.decode() |
clientSocket.close() |
UDPServer.py
Python UDPServer |
||
from socket import * |
||
serverPort = 12000 |
||
create UDP socket |
→ |
serverSocket = socket(AF_INET, SOCK_DGRAM) |
bind socket to local port number 12000 |
→ |
serverSocket.bind(('', serverPort)) |
print ("The server is ready to receive") |
||
loop forever |
→ |
while True: |
Read from UDP socket into message, getting client's address (client IP and port) |
→ |
message, clientAddress = serverSocket.recvfrom(2048) |
modifiedMessage = message.decode().upper() |
||
send upper case string back to this client |
→ |
serverSocket.sendto(modifiedMessage.encode(), clientAddress) |
2.7.2 Socket Programming with TCP
Client must contact server
• server process must first be running
• server must have created socket (door) that welcomes client's contact
Client contacts server by:
• Creating TCP socket, specifying IP address, port number of server process
• when client creates socket: client TCP establishes connection to server TCP
when contacted by client, server TCP creates new socket for server process to communicate with that particular client
• allows server to talk with multiple clients
• source port numbers used to distinguish clients
Application viewpoint
TCP provides reliable, in-order byte-stream transfer ("pipe") between client and server processes
Client/server socket interaction:
TCPClient.py
Python TCPClient |
||
from socket import * |
||
serverName = 'servername' |
||
serverPort = 12000 |
||
create TCP socket for server, remote port 12000 |
→ |
clientSocket = socket(AF_INET, SOCK_STREAM) |
clientSocket.connect((serverName,serverPort)) |
||
sentence = raw_input('Input lowercase sentence:') |
||
clientSocket.send(sentence.encode()) |
||
Read from UDP socket into message, No need to attach server name, port |
→ |
modifiedSentence = clientSocket.recv(1024) |
print ('From Server:', modifiedSentence.decode()) |
TCPServer.py
Python TCPServer |
||
from socket import * |
||
serverPort = 12000 |
||
create TCP welcoming socket |
→ |
serverSocket = socket(AF_INET,SOCK_STREAM) |
→ |
serverSocket.bind(('',serverPort)) |
|
server begins listening for incoming TCP requests |
serverSocket.listen(1) |
|
→ |
print 'The server is ready to receive' |
|
loop forever |
→ |
while True: |
server waits on accept() for incoming requests, new socket created on return |
connectionSocket, addr = serverSocket.accept() |
|
read bytes from socket (but not address as in UDP) |
→ |
sentence = connectionSocket.recv(1024).decode() |
capitalizedSentence = sentence.upper() |
||
connectionSocket.send(capitalizedSentence.encode()) |
||
close connection to this client (but not welcoming socket) |
connectionSocket.close() |
|
Chapter 3 Transport Layer
3.1 Introduction and Transport-layer Services
Transport-layer services and protocols. Transport layer actions.
• provide logical communication (逻辑通信) between application processes running on different hosts
• transport protocols actions in end systems:
• sender: breaks application messages into segments (报文段), passes to network layer
• receiver: reassembles segments into messages, passes to application layer
• two transport protocols available to Internet applications
• TCP, UDP
3.1.1 Relationship Between Transport and Network Layers
• network layer: logical communication between hosts
• transport layer: logical communication between processes
• relies on, enhances, network layer services
3.1.2 Overview of the Transport Layer in the Internet
Transport Layer Actions
Sender:
• is passed an application-layer message
• determines segment header fields values
• creates segment
• passes segment to IP
Receiver:
• receives segment from IP
• checks header values
• extracts application-layer message
• demultiplexes (多路分解) message up to application via socket
Two principal Internet transport protocols
TCP: Transmission Control Protocol
• reliable (可靠), in-order delivery
• congestion control (拥塞控制)
• flow control
• connection setup
UDP: User Datagram Protocol
• unreliable (不可靠), unordered delivery
• no-frills (不提供不必要服务的) extension of "best-effort" (尽力而为) IP
Services not available:
• delay guarantees
• bandwidth guarantees
3.2 Multiplexing and Demultiplexing
What is multiplexing, demultiplexing? How is it done? How does it work in TCP and UDP?
Multiplexing (多路复用) at sender:
handle data from multiple sockets (套接字), add transport header (later used for demultiplexing)
Demultiplexing (多路分解) at receiver:
use header info to deliver received segments to correct socket
How demultiplexing works
• host receives IP datagrams
• each datagram has source IP address, destination IP address
• each datagram carries one transport-layer segment
• each segment has source, destination port number
• host uses IP addresses & port numbers to direct segment to appropriate socket
3.2.1.1 Connectionless Multiplexing and Demultiplexing
Recall:
• when creating socket, must specify host-local port #:
• DatagramSocket mySocket1 = new DatagramSocket(12534);
• when creating datagram to send into UDP socket, must specify
• destination IP address
• destination port #
• when receiving host receives UDP segment:
• checks destination port # in segment
• directs UDP segment to socket with that port #
↓
IP/UDP datagrams with same dest. port #, but different source IP addresses and/or source port numbers will be directed to same socket at receiving host
3.2.1.2 Connection-Oriented Multiplexing and Demultiplexing
• TCP socket identified by 4-tuple:
• source IP address
• source port number
• dest IP address
• dest port number
• demux: receiver uses all four values (4-tuple) to direct segment to appropriate socket
• server may support many simultaneous TCP sockets:
• each socket identified by its own 4-tuple
• each socket associated with a different connecting client
3.2.1.3 Summary
• Multiplexing, demultiplexing: based on segment, datagram header field values
• UDP: demultiplexing using destination port number (only)
• TCP: demultiplexing using 4-tuple: source and destination IP addresses, and port numbers
• Multiplexing/demultiplexing happen at all layers
3.3 Connectionless Transport: UDP
UDP segment structure. The Internet checksum.
• "no frills," "bare bones" Internet transport protocol
• "best effort" service, UDP segments may be:
• lost
• delivered out-of-order to app
• connectionless:
• no handshaking between UDP sender, receiver
• each UDP segment handled independently of others
Why is there a UDP?
• no connection establishment (which can add RTT delay)
• simple: no connection state at sender, receiver
• Small packet header overhead
• Finer application-level control over what data is sent, and when. No congestion control.
• UDP can blast away as fast as desired!
• can function in the face of congestion
• UDP use:
• streaming multimedia apps (loss tolerant, rate sensitive)
• DNS
• SNMP (Simple Network Management Protocol, 简单网络管理协)
• HTTP/3
• if reliable transfer needed over UDP (e.g., HTTP/3):
• add needed reliability at application layer
• add congestion control at application layer
RFC 768
Transport Layer Actions
UDP sender actions:
• is passed an application-layer message
• determines UDP segment header fields values
• creates UDP segment
• passes segment to IP
UDP receiver actions:
• checks UDP checksum header value
• extracts application-layer message
• demultiplexes message up to application via socket
3.3.1 UDP Segment Structure
3.3.2 UDP Checksum
Goal: detect errors (i.e., flipped bits) in transmitted segment
Sender:
• Treat contents of UDP segment (including UDP header fields and IP addresses) as sequence of 16-bit integers
• Checksum (校验和): addition (one's complement sum) of segment content
• Checksum value put into UDP checksum field
Receiver:
• Compute checksum of received segment
• Check if computed checksum equals checksum field value:
• Not equal - error detected
• Equal - no error detected. But maybe errors nonetheless? More later ….
Internet checksum
Example: add two 16-bit integers
1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 wraparound
1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0 sum
0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1 checksum
Note: when adding numbers, a carryout from the most significant bit needs to be added to the result
Weak protection!
1 1 1 0 0 1 1 0 0 1 1 0 0 1 0 1
1 1 0 1 0 1 0 1 0 1 0 1 0 1 1 0
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 wraparound
1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0 sum
0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1 checksum
Even though numbers have changed (bit flips), no change in checksum!
3.4 Principles of Reliable Data Transfer
Protocol mechanisms for reliable data transfer (rdt). Building an rdt protocol. Pipelining. Go-back-N. Selective Repeat.
Interfaces
3.4.1 Building a Reliable Data Transfer Protocol
We will:
• incrementally develop sender, receiver sides of reliable data transfer protocol (可靠数据传输协议, rdt)
• consider only unidirectional data transfer (单向数据传输)
• but control info will flow in both directions!
• use finite-state machines (FSM, 有限状态机) to specify sender, receiver
3.4.1.1 Reliable Data Transfer over a Perfectly Reliable Channel: rdt1.0
• underlying channel perfectly reliable
• no bit errors
• no loss of packets
• separate FSMs for sender, receiver:
• sender sends data into underlying channel
• receiver reads data from underlying channel
3.4.1.2 Reliable Data Transfer over a Channel with Bit Errors: rdt2.0
• underlying channel may flip bits in packet
• checksum to detect bit errors
• the question: how to recover from errors?
• Positive acknowledgements (ACKs, 肯定确认): receiver explicitly tells sender that packet received OK
• Negative acknowledgements (NAKs): receiver explicitly tells sender that packet had errors
• sender retransmits packet on receipt of NAK
stop-and-wait (停等)
sender sends one packet, then waits for receiver response
rdt2.0: the FSM Representation
Note: "state" of receiver (did the receiver get my message correctly?) isn't known to sender unless somehow communicated from receiver to sender
that's why we need a protocol!
rdt2.0 has a fatal flaw!
What happens if ACK/NAK corrupted?
• sender doesn't know what happened at receiver!
• can't just retransmit: possible duplicate
Handling duplicates:
• sender retransmits current packet if ACK/NAK corrupted
• sender adds sequence number (序号) to each packet
• receiver discards (doesn't deliver up) duplicate packet (冗余分组)
stop-and-wait
sender sends one packet, then waits for receiver response
Protocol that uses both ACKs and NAKs from the receiver to the sender: rdt2.1
The FSM Description
Discussion
Sender:
• sequence number added to packet
• two sequence numbers (0,1) will suffice. Why?
• must check if received ACK/NAK corrupted
• twice as many states
• state must "remember" whether "expected" packet should have sequence number of 0 or 1
Receiver:
• must check if received packet is duplicate
• state indicates whether 0 or 1 is expected packet sequence number
• note: receiver can not know if its last ACK/NAK received OK at sender
NAK-free Reliable Data Transfer Protocol for a Channel with Bit Errors: rdt2.2
• same functionality as rdt2.1, using ACKs only
• instead of NAK, receiver sends ACK for last packet received OK
• receiver must explicitly include sequence number of packet being ACKed
• duplicate ACK (冗余ACK) at sender results in same action as NAK: retransmit current packet
As we will see, TCP uses this approach to be NAK-free
3.4.1.3 Reliable Data Transfer over a Lossy Channel with Bit Errors: rdt3.0
New channel assumption: underlying channel can also lose packets (data, ACKs)
• checksum, sequence numbers, ACKs, retransmissions will be of help … but not quite enough
Approach: sender waits "reasonable" amount of time for ACK
• retransmits if no ACK received in this time
• if packet (or ACK) just delayed (not lost):
• retransmission will be duplicate, but sequence numbers already handles this!
• receiver must specify sequence number of packet being ACKed
• use countdown timer (倒计数定时器) to interrupt after "reasonable" amount of time
3.4.2 Pipelined Reliable Data Transfer Protocols
3.4.2.1 Performance of rdt3.0 (stop-and-wait)
• Usender: utilization (利用率) – fraction of time sender busy sending
• example: 1 Gbps link, 15 ms prop. delay, 8000 bit packet
• time to transmit packet into channel:
rdt3.0: Stop-and-wait Operation
• rdt 3.0 protocol performance stinks!
• Protocol limits performance of underlying infrastructure (channel)
3.4.2.2 Solution: Pipelining
rdt3.0: Pipelined Operation
Pipelining (流水线): sender allows multiple, "in-flight", yet-to-be-acknowledged packets
• range of sequence numbers must be increased
• buffering at sender and/or receiver
Pipelining: increased utilization
3-packet pipelining tripled the utilization.
3.4.3 Go-Back-N (GBN, 回退N步)
GBN Sender
• Sender: "window" of up to N, consecutive transmitted but unACKed packets
• k-bit sequence number in packet header
• Cumulative acknowledgement (累计确认): ACK(n): ACKs all packets up to, including sequence number n
• on receiving ACK(n): move window forward to begin at n+1
• Timer for oldest in-flight packet
• timeout(n): retransmit packet n and all higher sequence number packets in window
GBN Receiver
• ACK-only: always send ACK for correctly-received packet so far, with highest in-order sequence number
• may generate duplicate ACKs
• need only remember rcv_base
• on receipt of out-of-order packet:
• can discard (don't buffer) or buffer: an implementation decision
• re-ACK packet with highest in-order sequence number
Go-Back-N in Action
3.4.4 Selective Repeat (SR, 选择重传)
• Receiver individually acknowledges all correctly received packets
• buffers packets, as needed, for eventual in-order delivery to upper layer
• Sender times-out/retransmits individually for unACKed packets
• sender maintains timer for each unACKed packet
• Sender window
• N consecutive sequence numbers
• limits sequence numbers of sent, unACKed packets
Sender, Receiver Windows
Sender, Receiver Events and Actions
Sender |
Receiver |
• Data received from above. • if next available sequence number in window, send packet Timeout. • resend packet n, restart timer • ACK receivedin [send_base, send_base+N]. • mark packet n as received • if n smallest unACKed packet, advance window base to next unACKed sequence number |
• Packet with sequence number in [rcv_base, rcv_base+N-1] is correctly received. • send ACK(n) • out-of-order: buffer • in-order: deliver (also deliver buffered, in-order packets), advance window to next not-yet-received packet • Packet with sequence number in[rcv_base-N, rcvbase-1] is correctly received. • ACK(n) • Otherwise • ignore |
Selective-repeat in Action
A dilemma!
Example:
• sequence numbers 0, 1, 2, 3
• a window size of three.
• receiver can't see sender side
• receiver behavior identical in both cases!
• something's (very) wrong!
Q: What relationship is needed between sequence # size and window size to avoid problem in scenario (b)?
The window size must be less than or equal to half the size of the sequence number space for SR protocols.
3.5 Connection-oriented Transport: TCP
The TCP connection and segment, RTT estimation and timeout, flow control
RFCs: 793,1122, 2018, 5681, 7323
• Reliable, in-order byte stream:
• no "message boundaries"
• Cumulative acknowledgements
• Pipelining:
• TCP congestion and flow control set window size
• Flow controlled:
• sender will not overwhelm receiver
3.5.1 The TCP Connection
• Connection-oriented (面向连接的):
• handshaking (exchange of control messages) initializes sender, receiver state before data exchange
• Full-duplex service (全双工服务):
• bi-directional data flow in same connection
• Point-to-point (点对点):
• one sender, one receiver
• Three-way handshake (三次握手).
• Send buffer (发送缓存):
• Maximum segment size (MSS, 最大报文段长度)
• Maximum transmission unit (MTU, 最大传输单元)
• TCP segments (TCP报文段).
3.5.2 TCP Segment Structure
Sequence Numbers and Acknowledgment Numbers
• Sequence numbers:
• byte stream "number" of first byte in segment's data
• Acknowledgements:
• seq # of next byte expected from other side
• cumulative acknowledgement (累积确认)
• Q: how receiver handles out-of-order segments
• A: TCP spec doesn't say, - up to implementor
Telnet: A Case Study for Sequence and Acknowledgment Numbers
3.5.3 Round-Trip Time Estimation and Timeout
Q: how to set TCP timeout value?
• Longer than RTT, but RTT varies!
• Too short: premature timeout, unnecessary retransmissions
• Too long: slow reaction to segment loss
Estimating the Round-Trip Time
Q: how to estimate RTT?
• SampleRTT: measured time from segment transmission until ACK receipt
• Ignore retransmissions
• SampleRTT will vary, want estimated RTT "smoother"
• Average several recent measurements, not just current SampleRTT
EstimatedRTT = (1 – α) · EstimatedRTT + α · SampleRTT
• Exponential weighted moving average (EWMA, 指数加权移动平均)
• Influence of past sample decreases exponentially fast
• Typical value: α = 0.125
Setting and Managing the Retransmission Timeout Interval
• Timeout interval: EstimatedRTT plus "safety margin"
• Large variation in EstimatedRTT: want a larger safety margin
TimeoutInterval = EstimatedRTT + 4 · DevRTT
• DevRTT: EWMA of SampleRTT deviation from EstimatedRTT:
DevRTT = (1 – β) · DevRTT + β · | SampleRTT – EstimatedRTT |
(typically, β = 0.25)
3.5.4 Reliable Data Transfer
Event: data received from application above
• Create segment with sequence number
• Sequence number is byte-stream number of first data byte in segment
• Start timer if not already running
• Think of timer as for oldest unACKed segment
• Expiration interval: TimeOutInterval
Event: Timer timeout
• Retransmit segment that caused timeout
• Restart timer
Event: ACK receipt
• If ACK acknowledges previously unACKed segments
• Update what is known to be ACKed
• Start timer if there are still unACKed segments
A Few Interesting Scenarios
Fast Retransmit
Fast retransmit (快速重传):
If sender receives 3 additional ACKs for same data ("triple duplicate ACKs"), resend unACKed segment with smallest sequence number
• likely that unACKed segment lost, so don't wait for timeout
Receipt of three duplicate ACKs indicates 3 segments received after a missing segment – lost segment is likely. So retransmit!
3.5.5 Flow Control
Q: What happens if network layer delivers data faster than application layer removes data from socket buffers?
Flow control (流量控制): receiver controls sender, so sender won't overflow receiver's buffer by transmitting too much, too fast.
• TCP receiver "advertises" free buffer space in rwnd field in TCP header
• RcvBuffer size set via socket options (typical default is 4096 bytes)
• Many operating systems autoadjust RcvBuffer
• Sender limits amount of unACKed ("in-flight") data to received rwnd
• Guarantees receive buffer will not overflow
3.5.6 TCP Connection Management
3.5.6.1 TCP Connection Establishment
Before exchanging data, sender/receiver "handshake":
• agree to establish connection (each knowing the other willing to establish connection)
• agree on connection parameters (e.g., starting seq #s)
Agreeing to Establish a Connection
Two-way handshake
Q: Will 2-way handshake always work in network?
• Variable delays
• Retransmitted messages (e.g., req_conn(x)) due to message loss
• Message reordering
• Can't "see" other side
Two-way handshake scenarios:
TCP three-way handshake(三次握手)
3.5.6.2 TCP Connection Teardown
• Client, server each close their side of connection
• send TCP segment with FIN bit = 1
• Respond to received FIN with ACK
• on receiving FIN, ACK can be combined with own FIN
• Simultaneous FIN exchanges can be handled
3.6 Principles of Congestion Control
Causes and costs of congestion, approaches to congestion control
Congestion:
• informally: "too many sources attempting to send data at too high a rate"
• manifestations:
• long delays (queueing in router buffers)
• packet loss (buffer overflow at routers)
• different from flow control!
• a top-10 problem!
Congestion control: too many senders, sending too fast
Flow control: one sender too fast for one receiver
3.6.1 The Causes and the Costs of Congestion
Scenario 1: Two Senders, a Router with Infinite Buffers
Simplest scenario:
• one router, infinite buffers
• input, output link capacity: R
• two flows
• no retransmissions needed
Q: What happens as arrival rate in approaches R/2?
One cost of a congested network—large queuing delays are experienced as the packet-arrival rate nears the link capacity.
Scenario 2: Two Senders and a Router with Finite Buffers
• one router, finite buffers
• sender retransmits lost, timed-out packet
• application-layer input = application-layer output: in = out
• transport-layer input includes retransmissions: ' in ≥ in
First, the unrealistic case
• Host A sends a packet only when a buffer is free.
The slightly more realistic case
• the sender retransmits only when a packet is known for certain to be lost.
Another cost of a congested network—the sender must perform retransmissions in order to compensate for dropped (lost) packets due to buffer overflow.
Finally, the case
• the sender may time out prematurely and retransmit a packet that has been delayed in the queue but not yet lost.
Yet another cost of a congested network—unneeded retransmissions by the sender in the face of large delays may cause a router to use its link bandwidth to forward unneeded copies of a packet.
Scenario 3: Four Senders, Routers with Finite Buffers, and Multihop Paths
• four senders
• multi-hop paths
• timeout/retransmit
If ' in is extremely large for all connections, the A–C end-to-end throughput goes to zero in the limit of heavy traffic.
Yet another cost of dropping a packet due to congestion—when a packet is dropped along a path, the transmission capacity that was used at each of the upstream links to forward that packet to the point at which it is dropped ends up having been wasted.
3.6.2 Approaches to Congestion Control
• End-end congestion control (端到端拥塞控制):
• no explicit feedback from network
• congestion inferred from observed loss, delay
• approach taken by TCP
• Network-assisted congestion control (网络辅助的拥塞控制):
• routers provide direct feedback to sending/receiving hosts with flows passing through congested router
• may indicate congestion level or explicitly set sending rate
• TCP ECN, ATM, DEC DECnet protocols
3.7 TCP Congestion Control
AIMD
approach: senders can increase sending rate until packet loss (congestion) occurs, then decrease sending rate on loss event
Multiplicative decrease detail: sending rate is
Cut in half on loss detected by triple duplicate ACK (TCP Reno)
Cut to 1 MSS (maximum segment size) when loss detected by timeout (TCP Tahoe)
Why AIMD?
AIMD – a distributed, asynchronous algorithm – has been shown to:
optimize congested flow rates network wide!
have desirable stability properties
details
TCP sending behavior:
roughly: send cwnd bytes, wait RTT for ACKS, then send more bytes
TCP sender limits transmission:
cwnd is dynamically adjusted in response to observed network congestion (implementing TCP congestion control)
3.7.1 Classic TCP Congestion Control
Classic TCP; Explicit Congestion Notification, delay-based TCP, fairness
Slow Start
TCP slow start
when connection begins, increase rate exponentially until first loss event:
initially cwnd = 1 MSS
double cwnd every RTT
done by incrementing cwnd for every ACK received
summary: initial rate is slow, but ramps up exponentially fast
Congestion Avoidance
TCP: from slow start to congestion avoidance
Q: when should the exponential increase switch to linear?
A: when cwnd gets to 1/2 of its value before timeout.
Implementation:
variable ssthresh
on loss event, ssthresh is set to 1/2 of cwnd just before loss event
Fast Recovery
TCP Congestion Control: Retrospective
Summary: TCP congestion control
TCP Cubic
Is there a better way than AIMD to "probe" for usable bandwidth?
Insight/intuition:
Wmax: sending rate at which congestion loss was detected
congestion state of bottleneck link probably (?) hasn't changed much
after cutting rate/window in half on loss, initially ramp to to Wmax faster, but then approach Wmax more slowly
K: point in time when TCP window size will reach Wmax
K itself is tuneable
increase W as a function of the cube of the distance between current time and K
larger increases when further away from K
smaller increases (cautious) when nearer K
TCP CUBIC default in Linux, most popular TCP for popular Web servers
TCP and the congested "bottleneck link"
TCP (classic, CUBIC) increase TCP's sending rate until packet loss occurs at some router's output: the bottleneck link
understanding congestion: useful to focus on congested bottleneck link
Keeping sender-to-receiver pipe "just full enough, but no fuller": keep bottleneck link busy transmitting, but avoid high delays/buffering
Macroscopic Description of TCP Reno Throughput
3.7.2 Network-Assisted Explicit Congestion Notification and Delay-based Congestion Control
Explicit Congestion Notification
Explicit congestion notification (ECN)
TCP deployments often implement network-assisted congestion control:
two bits in IP header (ToS field) marked by network router to indicate congestion
policy to determine marking chosen by network operator
congestion indication carried to destination
destination sets ECE bit on ACK segment to notify sender of congestion
involves both IP (IP header ECN bit marking) and TCP (TCP header C,E bit marking)
Delay-based Congestion Control
Delay-based approach:
RTTmin - minimum observed RTT (uncongested path)
uncongested throughput with congestion window cwnd is cwnd/RTTmin
if measured throughput "very close" to uncongested throughput
increase cwnd linearly /* since path not congested */
else if measured throughput "far below" uncongested throughout
decrease cwnd linearly /* since path is congested */
congestion control without inducing/forcing loss
maximizing throughout ("keeping the just pipe full… ") while keeping delay low ("…but not fuller")
a number of deployed TCPs take a delay-based approach
BBR deployed on Google's (internal) backbone network
3.7.3 Fairness
Fairness goal: if K TCP sessions share same bottleneck link of bandwidth R, each should have average rate of R/K
Q: is TCP Fair?
Example: two competing TCP sessions:
additive increase gives slope of 1, as throughout increases
multiplicative decrease decreases throughput proportionally
Is TCP fair?
A: Yes, under idealized assumptions:
same RTT
fixed number of sessions only in congestion avoidance
Fairness: must all network apps be "fair"?
Fairness and UDP
multimedia apps often do not use TCP
do not want rate throttled by congestion control
instead use UDP:
send audio/video at constant rate, tolerate packet loss
there is no "Internet police" policing use of congestion control
Fairness and Parallel TCP Connections
application can open multiple parallel connections between two hosts
web browsers do this , e.g., link of rate R with 9 existing connections:
new app asks for 1 TCP, gets rate R/10
new app asks for 11 TCPs, gets R/2
3.8 Evolution of Transport Layer Functionality
TCP Evolution. HTTP/3, QUIC: functionality in the application layer.
TCP, UDP: principal transport protocols for 40 years
different "flavors" of TCP developed, for specific scenarios:
moving transport–layer functions to application layer, on top of UDP
HTTP/3: QUIC
QUIC: Quick UDP Internet Connections
application-layer protocol, on top of UDP
increase performance of HTTP
deployed on many Google servers, apps (Chrome, mobile YouTube app)
adopts approaches we've studied in this chapter for connection establishment, error control, congestion control
error and congestion control: "Readers familiar with TCP's loss detection and congestion control will find algorithms here that parallel well-known TCP ones." [from QUIC specification]
connection establishment: reliability, congestion control, authentication, encryption, state established in one RTT
multiple application-level "streams" multiplexed over single QUIC connection
separate reliable data transfer, security
common congestion control
TCP (reliability, congestion control state) + TLS (authentication, crypto state)
2 serial handshakes
QUIC: reliability, congestion control, authentication, crypto state
1 handshake
QUIC: streams: parallelism, no HOL blocking
Chapter 4 Network Layer - data plane
4.1 Network Layer Overview
Forwarding versus routing; data plane, control plane; network service model.
Chapter 5 Network Layer - control plane
5.1 Introduction to the Network-layer control plane.
Per-router versus SDN control plane.
Chapter 6 The Link Layer
6.1 Introduction to the Link Layer.
Link-layer: services, implementation context.