Book Edition	8th ed. en-us	7th ed. zh-cn
Book home	gaia.cs.umass.edu	HZCOURSE.COM
Book PDF	LibGen	用户1496968452581's Juejin blog
PowerPoint	gaia.cs.umass.edu	URL path exists but is not given here
Solutions etc.		URL path exists but is not given here
Online lectures	gias.cs.umass.edu bilibili
High quality reading notes		王文萱's CSDN post of reading notes GuoYi's Github post of English PowerPoints notes

Foreword

Learning Motives

With my second attempt at the postgraduate entrance exam doomed, I started to study myself subjects of computer science that might be useful for all of my possible future circumstances –to eventually receive an interview invitation from a school and accept it, to prepare for a beginner developer's interview, or to fail to be enrolled this year and make a third try. I decided to study myself computer networking first, by watching online lectures and reading slides of the book Computer Networking. However, I found it too challenging to acquire knowledge without a translator. Meanwhile, the absence of terminology translation does not align with real-life study and work. And what is most annoying, knowledge in my mind is not well-organized. To cope with issues above, I believe it is wise to write down, rearrange or even simply copy and paste key points. Time limited, these series of notes will only cover core topics (concepts and principles, e.g.) such as the overview, important layers and protocols.

How I Write

• Content of the series of notes is based on (from mostly to least):

• Part of Professor Jim Kurose's lecture videos and PowerPoints which I believe is of importance.

• The original book mainly for chapter/section numbering. Meanwhile, I use it for figure screenshots, definition/mechanism checks and comprehensive knowledge acquisition.

• The Chinese version of the book for terminology referencing.

• Others' notes.

• Microsoft Word operations.

• Share the document via "Post to blog".

• Convert numbers to text (thanks to cmt); resize all pictures to 100%.

• Publish it to cnblogs as draft.

• cnblogs operations.

• Set titles, tags, alias URL, etc. for posts.

• Copy the fragment needed of the draft to the corresponding post.

Chapter 1 Computer Networks and the Internet

1.1 What Is the Internet?

A nuts-and-bolts and a services description. What is a protocol?

1.1.1 A Nuts-and-Bolts Description

Billions of connected computing devices:

• hosts (主机)= end systems (端系统)

• running network apps (应用) at Internet's "edge" (边缘)

Packet switches (分组交换机) : forward packets (分组) (chunks of data)

• routers (路由器) , switches (交换机)

Communication links (通信链路)

• fiber (光纤) , copper (铜) , radio (无线电) , satellite (卫星)

• transmission rate (传输速率): bandwidth (带宽)

Networks

• collection of devices, routers, links: managed by an organization

Internet: "network of networks"

• Interconnected ISPs (Internet Service Providers, 因特网服务提供商)

Protocols are everywhere

• control sending, receiving of messages (报文)

• e.g., HTTP (Web), streaming video, Skype, TCP, IP, WiFi, 4G, Ethernet

Internet standards

• RFC: Request for Comments (请求评论)

• IETF: Internet Engineering Task Force (因特网工程任务组)

1.1.2 A Services Description

Infrastructure that provides services to applications:

• Web, streaming video, multimedia teleconferencing, email, games, e-commerce, social media, inter-connected appliances, …

provides programming interface to distributed applications (分布式应用程序):

• "hooks" allowing sending/receiving apps to "connect" to, use Internet transport service

• provides service options, analogous to postal service

1.1.3 What Is a Protocol?

Network protocols:

• computers (devices) rather than humans

• all communication activity in Internet governed by protocols

Protocols (协议) define the format, order of messages sent and received among network entities, and actions taken on message transmission, receipt

1.2 The Network Edge

Access networks, physical media

A closer look at Internet structure

Network edge (网络边缘) :

• hosts (主机): clients and servers

• servers often in data centers (数据中心)

Access networks (接入网), physical media (物理媒体):

• wired, wireless communication links

Network core (网络核心) :

• interconnected routers

• network of networks

1.2.1 Access Networks

Q: How to connect end systems to edge router?

• residential access nets

• institutional access networks (school, company)

• mobile access networks (WiFi, 4G/5G)

Cable-based Access

Frequency division multiplexing (FDM, 频分复用): different channels transmitted in different frequency bands

HFC: hybrid fiber coax (混合光纤同轴)

• asymmetric: up to 40 Mbps – 1.2 Gbps downstream transmission rate, 30-100 Mbps upstream transmission rate

network of cable, fiber attaches homes to ISP router

• homes share access network to cable headend

Digital Subscriber Line (DSL, 数字用户线)

Use existing telephone line to central office DSLAM (digital subscriber line access multiplexer, 数字用户线接入复用器)

• data over DSL phone line goes to Internet

• voice over DSL phone line goes to telephone net

• 24-52 Mbps dedicated downstream transmission rate

• 3.5-16 Mbps dedicated upstream transmission rate

Home Networks

Wireless Access Networks

• Shared wireless access network connects end system to router

• via base station (基站) aka "access point" (接入点)

Wireless local area networks (WLANs)

• typically within or around building (~100 ft)

• 802.11b/g/n (WiFi): 11, 54, 450 Mbps transmission rate

Wide-area cellular access networks

• provided by mobile, cellular (蜂窝) network operator (10's km)

• 10's Mbps

• 4G cellular networks (5G coming)

Enterprise Networks

companies, universities, etc.

mix of wired, wireless link technologies, connecting a mix of switches and routers

• Ethernet: wired access at 100Mbps, 1Gbps, 10Gbps

• WiFi: wireless access points at 11, 54, 450 Mbps

Data Center Networks

high-bandwidth links (10s to 100s Gbps) connect hundreds to thousands of servers together, and to Internet

Host: sends packets of data

host sending function:

• takes application message

• breaks into smaller chunks, known as packets, of length L bits

• transmits packet into access network at transmission rate R

• link transmission rate, aka link capacity, aka link bandwidth

packet transmission delay = time needed to transmit L-bit packet into link = L (bits) /R (bits/sec)

1.2.2 Physical Media

bit: propagates (传播) between transmitter/receiver (发射器—接收器) pairs

physical link: what lies between transmitter & receiver

guided media (导引型媒体):

• signals propagate in solid media: copper, fiber, coax

unguided media (非导引型媒体):

• signals propagate freely, e.g., radio

Twisted pair (TP, 双绞铜线)

two insulated (绝缘的) copper wires

• Category 5: 100 Mbps, 1 Gbps Ethernet

• Category 6: 10Gbps Ethernet

Coaxial cable (同轴电缆)

two concentric (同心的) copper conductors

bidirectional

broadband:

• multiple frequency channels on cable

• 100's Mbps per channel

Fiber optic cable (光纤电缆)

glass fiber carrying light pulses, each pulse a bit

high-speed operation:

• high-speed point-to-point transmission (10's-100's Gbps)

low error rate:

• repeaters spaced far apart

• immune to electromagnetic noise

Wireless radio

signal carried in various "bands" in electromagnetic spectrum

no physical "wire"

broadcast, "half-duplex" (半双工，sender to receiver)

propagation environment effects:

• reflection

• obstruction by objects

• interference/noise

Radio link types:

• Wireless LAN (WiFi)

• 10-100's Mbps; 10's of meters

• wide-area (e.g., 4G cellular)

• 10's Mbps over ~10 Km

• Bluetooth: cable replacement

• short distances, limited rates

• terrestrial (陆地的) microwave

• point-to-point (点对点); 45 Mbps channelsssssssssssssssssssssssssssss sssssssssssssss

• satellite

• up to 45 Mbps per channel

• 270 msec end-end delay

1.3 Network Core

Forwarding, routing; packet switching; circuit switching; a network of networks

The network core

Mesh (网状物) of interconnected routers

Packet-switching (分组交换): hosts break application-layer messages into packets

• network forwards packets from one router to the next, across links on path from source to destination

Two key network-core functions

Forwarding (转发) :

• aka "switching" (交换)

• local action: move arriving packets from router's input link to appropriate router output link

Routing (路由):

• global action: determine source-destination paths taken by packets

• routing algorithms

1.3.1 Packet Switching

Packet transmission delay (时延): takes L/R seconds to transmit (push out) L-bit packet into link at R bps

Store-and-Forward (存储转发) Transmission

Store and forward: entire packet must arrive at router before it can be transmitted on next link

One-hop (跳) numerical example:

• L = 10 Kbits

• R = 100 Mbps

• one-hop transmission delay = 0.1 msec

Queuing Delays and Packet Loss

Queueing occurs when work arrives faster than it can be serviced

Packet queuing and loss: if arrival rate (in bps) to link exceeds transmission rate (bps) of link for some period of time:

• packets will queue, waiting to be transmitted on output link

• packets can be dropped (lost) if memory (buffer) in router fills up

1.3.2 Circuit Switching (电路交换)

End-end resources allocated to, reserved for "call" between source and destination

• in diagram, each link has four circuits.

• call gets 2nd circuit in top link and 1st circuit in right link.

• dedicated (专用的) resources: no sharing

• circuit-like (guaranteed) performance

• circuit segment idle if not used by call (no sharing)

• commonly used in traditional telephone networks

Multiplexing in Circuit-Switched Networks

Frequency Division Multiplexing (FDM, 频分复用)

• optical, electromagnetic frequencies divided into (narrow) frequency bands

• each call allocated its own band, can transmit at max rate of that narrow band

Time Division Multiplexing (TDM, 时分复用)

• time divided into slots

• each call allocated periodic slot(s), can transmit at maximum rate of (wider) frequency band (only) during its time slot(s)

Packet Switching Versus Circuit Switching

Is packet switching a "slam dunk winner" (必定成功的事；稳操胜券的事) ?

• great for "bursty" data – sometimes has data to send, but at other times not

• resource sharing

• simpler, no call setup

• excessive congestion possible: packet delay and loss due to buffer overflow

• protocols needed for reliable data transfer, congestion control (拥塞控制)

1.3.3 A Network of Networks

hosts connect to Internet via access Internet Service Providers (ISPs)

access ISPs in turn must be interconnected

• so that any two hosts (anywhere!) can send packets to each other

resulting network of networks is very complex

• evolution driven by economics, national policies

At "center": small # of well-connected large networks

• "tier-1" commercial ISPs (e.g., Level 3, Sprint, AT&T, NTT), national & international coverage

• content provider networks (e.g., Google, Facebook, 内容提供商网络): private network that connects its data centers to Internet, often bypassing tier-1, regional ISPs

1.4 Delay (时延), Loss (丢包), and Throughput (吞吐量) in Packet-Switched Networks

Packet delay and loss, end-end throughput

1.4.1 Overview of Delay in Packet-Switched Networks

Packets queue in router buffers, waiting for turn for transmission

• queue length grows when arrival rate to link (temporarily) exceeds output link capacity

packet loss occurs when memory to hold queued packets fills up

Types of Delay

d_nodal = d_proc + d_queue + d_trans + d_prop

d_proc: Processing Delay (处理时延)

• check bit errors

• determine output link

• typically < microsecs

d_queue: QueueingDelay (排队时延)

• time waiting at output link for transmission

• depends on congestion level of router

d_trans: Transmission Delay (传输时延)

• L: packet length (bits)

• R: link transmission rate (bps)

• d_trans = L/R

d_prop: Propagation Delay (传播时延)

• d: length of physical link

• s: propagation speed (~2x108 m/sec)

• d_prop = d/s

d_trans and d_prop

very different

1.4.2 Queuing Delay and Packet Loss

a: average packet arrival rate

L: packet length (bits)

R: link bandwidth (bit transmission rate)

La/R : arrival rate of bits / service rate of bits "traffic intensity"

La/R ~ 0: avg. queueing delay small

La/R -> 1: avg. queueing delay large

La/R > 1: more "work" arriving is more than can be serviced - average delay infinite!

Packet Loss

queue (aka buffer) preceding link in buffer has finite capacity

packet arriving to full queue dropped (aka lost)

lost packet may be retransmitted by previous node, by source end system, or not at all

1.4.3 End-to-End Delay

What do "real" Internet delay & loss look like?

traceroute program: provides delay measurement from source to router along end-end Internet path towards destination. For all i:

• sends three packets that will reach router i on path towards destination (with time-to-live (生存时间) field value of i)

• router i will return packets to sender

• sender measures time interval between transmission and reply

1.4.4 Throughput in Computer Networks

Throughput: rate (bits/time unit) at which bits are being sent from sender to receiver

• instantaneous (瞬时) : rate at given point in time

• average (平均): rate over longer period of time

bottleneck link (瓶颈链路)

link on end-end path that constrains end-end throughput

per-connection end-end throughput: min(R_c, R_s, R/10)

in practice: R_c or R_s is often bottleneck

1.5 Protocol Layers and Their Service Models

Layered architecture, encapsulation.

1.5.1 Layered Architecture

Networks are complex, with many "pieces":

• hosts

• routers

• links of various media

• applications

• protocols

• hardware, software

layers: each layer implements a service

• via its own internal-layer actions

• relying on services provided by layer below

Why layering?

Approach to designing/discussing complex systems:

• explicit structure allows identification, relationship of system's pieces

• layered reference model for discussion

• modularization eases maintenance, updating of system

• change in layer's service implementation: transparent to rest of system

• e.g., change in gate procedure doesn't affect rest of system

Protocol Layering

Layered Internet protocol stack (协议栈)

Application Layer

application: supporting network applications

• HTTP, IMAP, SMTP, DNS

Transport Layer

transport: process-process data transfer

• TCP, UDP

Network Layer

network: routing of datagrams from source to destination

Link Layer

• IP, routing protocols

link: data transfer between neighboring network elements

Physical Layer

• Ethernet, 802.11 (WiFi), PPP

physical: bits "on the wire"

1.5.2 Encapsulation

Application exchanges messages (报文) to implement some application service using services of transport layer

Transport-layer protocol transfers M (e.g., reliably) from one process to another, using services of network layer

• transport-layer protocol encapsulates application-layer message, M, with transport layer-layer header H_t to create a transport-layer segment (报文段)

• H_t used by transport layer protocol to implement its service

Network-layer protocol transfers transport-layer segment [H_t | M] from one host to another, using link layer services

• network-layer protocol encapsulates transport-layer segment [H_t| M] with network layer-layer header H_n to create a network-layer datagram (数据报)

• H_n used by network layer protocol to implement its service

Link-layer protocol transfers datagram [H_n| [H_t |M] from host to neighboring host, using network-layer services

• link-layer protocol encapsulates network datagram [H_n| [H_t |M], with link-layer header H_l to create a link-layer frame (帧)

Encapsulation: an end-end view

1.6 Networks Under Attack

What can bad actors do? What defenses?

Internet not originally designed with (much) security in mind

• original vision: "a group of mutually trusting users attached to a transparent network"

• Internet protocol designers playing "catch-up"

• security considerations in all layers!

We now need to think about:

• how bad guys can attack computer networks

• how we can defend networks against attacks

• how to design architectures that are immune to attacks

The Bad Guys Can Attack Servers and Network Infrastructure

Denial of Service (DoS, 拒绝服务): attackers make resources (server, bandwidth) unavailable to legitimate traffic by overwhelming resource with bogus (伪造的) traffic

1. select target

2. break into hosts around the network (see botnet)

3. send packets to target from compromised hosts (受害主机)

The Bad Guys Can Sniff Packets

Packet "sniffing" (嗅探分组):

broadcast media (shared Ethernet, wireless)

promiscuous network interface reads/records all packets (e.g., including passwords!) passing by

The Bad Guys Can Masquerade as Someone You Trust

IP spoofing (IP哄骗): injection of packet with false source address

Lines of defense

Authentication (鉴别): proving you are who you say you are

• cellular networks provides hardware identity via SIM card; no such hardware assist in traditional Internet

confidentiality (机密性): via encryption

integrity checks (完整性检查): digital signatures prevent/detect tampering (篡改)

access restrictions: password-protected VPNs

firewalls: specialized "middleboxes" (中间盒) in access and core networks:

• off-by-default: filter incoming packets to restrict senders, receivers, applications

• detecting/reacting to DOS attacks

1.7 Internet history

From 1961 until today!

1.7.1 The Development of Packet Switching

Early packet-switching principles

1.7.2 Proprietary Networks (专用网络) and Internetworking

Internetworking, new and proprietary networks

1.7.3 A Proliferation (激增) of Networks

New protocols, a proliferation of networks

1.7.4 The Internet Explosion

Commercialization, the Web, new applications

1.7.5 The New Millennium

Scale, SDN, mobility, cloud

Chapter 2 Application Layer

2.1 Principles of Network Applications

Applications; client-server, P2P, sockets, APIs; transport services

2.1.1 Network Application Architectures (应用程序体系结构)

Client-server Architecture(客户-服务器架构)

Server:

• always-on host

• permanent IP address

• often in data centers, for scaling (扩展)

Clients:

• contact, communicate with server

• may be intermittently (间歇) connected

• may have dynamic IP addresses

• do not communicate directly with each other

• examples: HTTP, IMAP, FTP

P2P Architecture

• no always-on server

• arbitrary end systems directly communicate

• peers request service from other peers, provide service in return to other peers

• self scalability – new peers bring new service capacity, as well as new service demands

• peers are intermittently connected and change IP addresses

• complex management

• example: P2P file sharing

2.1.2 Processes communicating

Process (进程): program running within a host

• within same host, two processes communicate using inter-process communication (defined by OS)

• processes in different hosts communicate by exchanging messages (报文)

Client and Server Processes

client (客户) process: process that initiates communication

server (服务器) process: process that waits to be contacted

• note: applications with P2P architectures have client processes & server processes

The Interface Between the Process and the Computer Network

• process sends/receives messages to/from its socket (套接字)

• socket analogous to door

• sending process shoves message out door

• sending process relies on transport infrastructure on other side of door to deliver message to socket at receiving process

• two sockets involved: one on each side

Addressing Processes

To receive messages, process must have identifier (标识符)

host device has unique 32-bit IP address

Q: does IP address of host on which process runs suffice for (足够，足以) identifying the process?

• A: no, many processes can be running on same host

identifier includes both IP address (IP地址) and port numbers (端口号) associated with process on host.

example port numbers:

• HTTP server: 80

• mail server: 25

to send HTTP message to gaia.cs.umass.edu web server:

• IP address: 128.119.245.12

• port number: 80

2.1.3 Transport Services Available to Applications

Reliable Data Transfer (可靠数据传输)

Some apps (e.g., file transfer, web transactions) require 100% reliable data transfer

Other apps (e.g., audio) can tolerate some loss

Throughput (吞吐量)

Some apps (e.g., multimedia) require minimum amount of throughput to be "effective"

Other apps ("elastic apps" (弹性应用)) make use of whatever throughput they get

Timing (定时)

Some apps (e.g., Internet telephony, interactive games) require low delay to be "effective"

Security (安全性)

Encryption, data integrity, …

2.1.4 Transport Services Provided by the Internet

Internet transport protocols services

TCP service

UDP service

• reliable transport (可靠的传输) between sending and receiving process

• flow control (流量控制): sender won't overwhelm receiver

• congestion control (拥塞控制): throttle (抑制) sender when network overloaded

• connection-oriented (面向连接的): setup required between client and server processes

• does not provide: timing, minimum throughput guarantee, security

• unreliable data transfer between sending and receiving process

• does not provide: reliability, flow control, congestion control, timing, throughput guarantee, security, or connection setup.

Internet applications, and transport protocols

Securing TCP

Vanilla TCP & UDP sockets:

• no encryption

• cleartext (明文) passwords sent into socket traverse Internet in cleartext (!)

Transport Layer Security (TLS)

• provides encrypted TCP connections

• data integrity

• end-point authentication

TLS implemented in application layer

• apps use TLS libraries, that use TCP in turn

• cleartext sent into "socket" traverse Internet encrypted

2.1.5 Application-Layer Protocols

An application-layer protocol defines

• types of messages exchanged (交换的报文类型),

• e.g., request, response

• message syntax (报文的语法):

• what fields in messages & how fields are delineated (描述)

• message semantics (报文的语义)

• meaning of information in fields

• rules for when and how processes send & respond to messages

Open protocols (开放的协议):

• defined in RFCs, everyone has access to protocol definition

• allows for interoperability (相互操作)

• e.g., HTTP, SMTP

Proprietary protocols (专用协议):

• e.g., Skype, Zoom

2.2 The Web and HTTP

Overview, statelessness, HTTP messages, cookies, caching, HTTP/2

• web page consists of objects (对象), each of which can be stored on different Web servers

• object can be HTML file, JPEG image, Java applet, audio file, …

• web page consists of base HTML-file (HTML基本文件) which includes several referenced objects (引用对象), each addressable by a URL, e.g.,

2.2.1 Overview of HTTP

HTTP: HyperText Transfer Protocol (超文本传输协议)

• Web's application-layer protocol

• client/server model:

• client: browser (浏览器) that requests, receives, (using HTTP protocol) and "displays" Web objects

• server: Web server sends (using HTTP protocol) objects in response to requests

HTTP uses TCP:

• client initiates TCP connection (creates socket) to server, port 80

• server accepts TCP connection from client

• HTTP messages (application-layer protocol messages) exchanged between browser (HTTP client) and Web server (HTTP server)

• TCP connection closed

HTTP is "stateless" (无状态的)

• server maintains no information about past client requests

aside

• protocols that maintain "state" are complex!

• past history (state) must be maintained

• if server/client crashes, their views of "state" may be inconsistent, must be reconciled (折中)

2.2.2 Non-Persistent and Persistent Connections

Non-persistent (非持续) HTTP

1. TCP connection opened

2. at most one object sent over TCP connection

3. TCP connection closed

downloading multiple objects required multiple connections

Persistent (持续) HTTP

• TCP connection opened to a server

• multiple objects can be sent over single TCP connection between client, and that server

• TCP connection closed

2.2.2.1 HTTP with Non-Persistent Connections

Example

User enters URL: www.someSchool.edu/someDepartment/home.index

(containing text, references to 10 jpeg images)

1a. HTTP client initiates TCP connection to HTTP server (process) at www.someSchool.edu on port 80

1b. HTTP server at host www.someSchool.edu waiting for TCP connection at port 80 "accepts" connection, notifying client

2. HTTP client sends HTTP request message (containing URL) into TCP connection socket. Message indicates that client wants object someDepartment/home.index

3. HTTP server receives request message, forms response message containing requested object, and sends message into its socket

4. HTTP server closes TCP connection.

5. HTTP client receives response message containing html file, displays html. Parsing html file, finds 10 referenced jpeg objects

6. Steps 1-5 repeated for each of 10 jpeg objects

Response time

RTT (Round-Trip Time, 往返时间, definition): time for a small packet to travel from client to server and back

HTTP response time (per object):

• one RTT to initiate TCP connection

• one RTT for HTTP request and first few bytes of HTTP response to return

• object/file transmission time

Non-persistent HTTP response time = 2RTT+ file transmission time

2.2.2.2 HTTP with Persistent Connections

Non-persistent HTTP issues:

• requires 2 RTTs per object

• OS overhead for each TCP connection

• browsers often open multiple parallel TCP connections to fetch referenced objects in parallel

Persistent HTTP (HTTP1.1):

• server leaves connection open after sending response

• subsequent HTTP messages between same client/server sent over open connection

• client sends requests as soon as it encounters a referenced object

• as little as one RTT for all the referenced objects (cutting response time in half)

2.2.3 HTTP Message Format

two types of HTTP messages: request, response

2.2.3.1 HTTP request message

ASCII (human-readable format)

request line (请求行, GET, POST,

HEAD commands)

GET /index.html HTTP/1.1\r\n

carriage return character

line-feed character

header lines (首部行)

Host: www-net.cs.umass.edu\r\n

User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:80.0) Gecko/20100101 Firefox/80.0 \r\n

Accept: text/html,application/xhtml+xml\r\n

Accept-Language: en-us,en;q=0.5\r\n

Accept-Encoding: gzip,deflate\r\n

Connection: keep-alive\r\n

\r\n

carriage return, line feed at start of line indicates end of header lines

General Format

Other HTTP request messages

POST method:

• web page often includes form input

• user input sent from client to server in entity body of HTTP POST request message

GET method (for sending data to server):

• include user data in URL field of HTTP GET request message (following a '?'):

www.somesite.com/animalsearch?monkeys&banana

HEAD method:

• requests headers (only) that would be returned if specified URL were requested with an HTTP GET method.

PUT method:

• uploads new file (object) to server

• completely replaces file that exists at specified URL with content in entity body of POST HTTP request message

2.2.3.2 HTTP Response Message

status line (状态行, protocol status code status phrase)

HTTP/1.1 200 OK

header

lines

(首部行)

Date: Tue, 08 Sep 2020 00:53:20 GMT

Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.2k-fips PHP/7.4.9 mod_perl/2.0.11 Perl/v5.16.3

Last-Modified: Tue, 01 Mar 2016 18:57:50 GMT

ETag: "a5b-52d015789ee9e"

Accept-Ranges: bytes

Content-Length: 2651

Content-Type: text/html; charset=UTF-8

\r\n

data, e.g., requested

HTML file

data data data data data ...

HTTP response status codes

status code appears in 1st line in server-to-client response message.

some sample codes:

200 OK

• request succeeded, requested object later in this message

301 Moved Permanently

• requested object moved, new location specified later in this message (in Location: field)

400 Bad Request

• request msg not understood by server

404 Not Found

• requested document not found on this server

505 HTTP Version Not Supported

2.2.4 User-Server Interaction: Cookies

Web sites and client browser use cookies to maintain some state between transactions

four components:

1) cookie header line of HTTP response message

2) cookie header line in next HTTP request message

3) cookie file kept on user's host, managed by user's browser

4) back-end database at Web site

HTTP cookies: comments

What cookies can be used for:

• authorization

• shopping carts

• recommendations

• user session state (Web e-mail)

Challenge: How to keep state?

• at protocol endpoints: maintain state at sender/receiver over multiple transactions

• in messages: cookies in HTTP messages carry state

aside

cookies and privacy:

• cookies permit sites to learn a lot about you on their site.

• third party persistent cookies (tracking cookies) allow common identity (cookie value) to be tracked across multiple web sites

2.2.5 Web Caching (Web缓存)

Goal: satisfy client requests without involving origin server

• user configures browser to point to a (local) Web cache

• browser sends all HTTP requests to cache

• if object in cache: cache returns object to client

• else cache requests object from origin server, caches received object, then returns object to client

Web caches (aka proxy servers, 代理服务器)

• Web cache acts as both client and server

• server for original requesting client

• client to origin server

• server tells cache about object's allowable caching in response header:

Why Web caching?

• reduce response time for client request

• cache is closer to client

• reduce traffic on an institution's access link

• Internet is dense with caches

• enables "poor" content providers to more effectively deliver content

The Conditional GET (条件 GET)

Goal: don't send object if cache has up-to-date cached version

• no object transmission delay (or use of network resources)

• client: specify date of cached copy in HTTP request
If-modified-since: <date>

• server: response contains no object if cached copy is up-to-date:
HTTP/1.0 304 Not Modified

2.2.6 HTTP/2

Key goal: decreased delay in multi-object HTTP requests

HTTP/2: [RFC 7540, 2015] increased flexibility at server in sending objects to client:

• methods, status codes, most header fields unchanged from HTTP 1.1

• transmission order of requested objects based on client-specified object priority (not necessarily FCFS)

• push unrequested objects to client

• divide objects into frames, schedule frames to mitigate HOL blocking

HTTP/2: mitigating HOL (Head of Line) blocking (线路前部阻塞)

HTTP 1.1: client requests 1 large object (e.g., video file) and 3 smaller objects

HTTP/2: objects divided into frames, frame transmission interleaved (尤指将片状物插入，夹进)

HTTP/2 to HTTP/3

HTTP/2 over single TCP connection means:

• recovery from packet loss still stalls (暂缓；搁置；停顿) all object transmissions

• as in HTTP 1.1, browsers have incentive (激励) to open multiple parallel TCP connections to reduce stalling, increase overall throughput

• no security over vanilla TCP connection

• HTTP/3: adds security, per object error- and congestion-control (more pipelining，流水线) over UDP

• more on HTTP/3 in transport layer

2.3

2.4 The Domain Name Service: DNS

Internet hosts, routers:

• IP address (IP地址, 32 bit) - used for addressing datagrams

• "name" - used by humans

2.4.1 Services Provided by DNS

• Domain Name System (域名系统, DNS):

• a distributed database (分布式数据库) implemented in a hierarchy of DNS servers

• an application-layer protocol that allows hosts to query the distributed database (address/name translation)

DNS: services

• hostname-to-IP-address translation

• host aliasing (主机别名)

• canonical (规范), alias names

• mail server aliasing (邮件服务器别名)

• load distribution (负载分配)

• replicated (冗余的) Web servers: many IP addresses correspond to one name

2.4.2 Overview of How DNS Works

Q: Why not centralize DNS?

• single point of failure (单点故障)

• traffic volume (通信容量)

• distant centralized database (远距离的集中式数据库)

• maintenance (维护)

A: doesn't scale (有可扩展能力)!

A distributed, Hierarchical Database

Client wants IP address for www.amazon.com; 1st approximation:

• client queries root server to find .com DNS server

• client queries .com DNS server to get amazon.com DNS server

• client queries amazon.com DNS server to get IP address for www.amazon.com

Root DNS servers (根DNS服务器)

• official, contact-of-last-resort by name servers that cannot resolve name

• incredibly important Internet function

• Internet couldn't function without it!

• DNSSEC – provides security (authentication, message integrity)

• ICANN (Internet Corporation for Assigned Names and Numbers) manages root DNS domain

Top-level domain (TLD) servers (*域名服务器)

• responsible for .com, .org, .net, .edu, .aero, .jobs, .museums, and all top-level country domains, e.g.: .cn, .uk, .fr, .ca, .jp

• Network Solutions: authoritative registry for .com, .net TLD

• Educause: .edu TLD

Authoritative DNS servers (权威DNS服务器)

• organization's own DNS server(s), providing authoritative hostname to IP mappings for organization's named hosts

• can be maintained by organization or service provider

Local DNS server (本地DNS服务器)

• when host makes DNS query, it is sent to its local DNS server

• Local DNS server returns reply, answering:

• from its local cache of recent name-to-address translation pairs

• forwarding the query into DNS hierarchy

• each ISP has local DNS server; to find yours:

• MacOS: % scutil --dns

• Windows: >ipconfig /all

• local DNS server doesn't strictly belong to hierarchy

DNS name resolution

Example: host at engineering.nyu.edu wants IP address for gaia.cs.umass.edu

Iterative query (迭代查询):

• contacted server replies with name of server to contact

• "I don't know this name, but ask this server"

Recursive query (递归查询):

• puts burden of name resolution on contacted name server

• heavy load at upper levels of hierarchy?

DNS Caching (DNS缓存)

• once (any) name server learns mapping, it caches mapping, and immediately returns a cached mapping in response to a query

• caching improves response time

• cache entries timeout (disappear) after some time (TTL)

• TLD servers typically cached in local name servers

• cached entries may be out-of-date

• if named host changes IP address, may not be known Internet-wide until all TTLs expire!

• best-effort (尽力而为) name-to-address translation!

2.4.3 DNS Records and Messages

RR (resource records, 资源记录) format: (name, value, type, ttl)

type=A

• name is hostname

• value is IP address

type=NS

• name is domain

• value is hostname of authoritative name server for this domain

type=CNAME

• name is alias name for some "canonical" (the real) name

• value is canonical name

type=MX

• value is name of SMTP mail server associated with name

DNS Messages

DNS query and reply messages, both have same format:

Inserting Records into the DNS Database

• register name at DNS registrar (注册登记机构)

• create authoritative server locally with the IP address

DNS security

DDoS (分布式拒绝服务) attacks

• bombard root servers with traffic

• not successful to date

• traffic filtering

• local DNS servers cache IPs of TLD servers, allowing root server bypass

• bombard TLD servers

• potentially more dangerous

Spoofing (哄骗) attacks

• intercept (截获) DNS queries, returning bogus (伪造的) replies

• DNS cache poisoning (毒害)

• RFC 4033: DNSSEC authentication services

2.7 Socket Programming: Creating Network Applications

Socket abstraction, UDP and TCP socket programming

Goal: learn how to build client/server applications that communicate using sockets

Socket: door between application process and end-end-transport protocol

Two socket types for two transport services:

• UDP: unreliable datagram

• TCP: reliable, byte stream-oriented

Application Example:

1. client reads a line of characters (data) from its keyboard and sends data to server

2. server receives the data and converts characters to uppercase

3. server sends modified data to client

4. client receives modified data and displays line on its screen

2.7.1 Socket programming with UDP

UDP: no "connection" between client and server:

• no handshaking before sending data

• sender explicitly attaches IP destination address and port # to each packet

• receiver extracts sender IP address and port# from received packet

UDP: transmitted data may be lost or received out-of-order

Application viewpoint:

• UDP provides unreliable transfer of groups of bytes ("datagrams") between client and server processes

Client/server socket interaction:

UDPClient.py

		Python UDPClient
include Python's socket library	→	from socket import *
		serverName = 'hostname'
		serverPort = 12000
create UDP socket for server	→	clientSocket = socket(AF_INET, SOCK_DGRAM)
get user keyboard input	→	message = raw_input('Input lowercase sentence:')
attach server name, port to message; send into socket	→	clientSocket.sendto(message.encode(), (serverName, serverPort))
read reply characters from socket into string	→	modifiedMessage, serverAddress = clientSocket.recvfrom(2048)
print out received string and close socket	→	print modifiedMessage.decode()
		clientSocket.close()

UDPServer.py

		Python UDPServer
		from socket import *
		serverPort = 12000
create UDP socket	→	serverSocket = socket(AF_INET, SOCK_DGRAM)
bind socket to local port number 12000	→	serverSocket.bind(('', serverPort))
		print ("The server is ready to receive")
loop forever	→	while True:
Read from UDP socket into message, getting client's address (client IP and port)	→	message, clientAddress = serverSocket.recvfrom(2048)
		modifiedMessage = message.decode().upper()
send upper case string back to this client	→	serverSocket.sendto(modifiedMessage.encode(), clientAddress)

2.7.2 Socket Programming with TCP

Client must contact server

• server process must first be running

• server must have created socket (door) that welcomes client's contact

Client contacts server by:

• Creating TCP socket, specifying IP address, port number of server process

• when client creates socket: client TCP establishes connection to server TCP

when contacted by client, server TCP creates new socket for server process to communicate with that particular client

• allows server to talk with multiple clients

• source port numbers used to distinguish clients

Application viewpoint

TCP provides reliable, in-order byte-stream transfer ("pipe") between client and server processes

Client/server socket interaction:

TCPClient.py

		Python TCPClient
		from socket import *
		serverName = 'servername'
		serverPort = 12000
create TCP socket for server, remote port 12000	→	clientSocket = socket(AF_INET, SOCK_STREAM)
		clientSocket.connect((serverName,serverPort))
		sentence = raw_input('Input lowercase sentence:')
		clientSocket.send(sentence.encode())
Read from UDP socket into message, No need to attach server name, port	→	modifiedSentence = clientSocket.recv(1024)
		print ('From Server:', modifiedSentence.decode())

TCPServer.py

		Python TCPServer
		from socket import *
		serverPort = 12000
create TCP welcoming socket	→	serverSocket = socket(AF_INET,SOCK_STREAM)
	→	serverSocket.bind(('',serverPort))
server begins listening for incoming TCP requests		serverSocket.listen(1)
	→	print 'The server is ready to receive'
loop forever	→	while True:
server waits on accept() for incoming requests, new socket created on return		connectionSocket, addr = serverSocket.accept()
read bytes from socket (but not address as in UDP)	→	sentence = connectionSocket.recv(1024).decode()
		capitalizedSentence = sentence.upper()
		connectionSocket.send(capitalizedSentence.encode())
close connection to this client (but not welcoming socket)		connectionSocket.close()

Chapter 3 Transport Layer

3.1 Introduction and Transport-layer Services

Transport-layer services and protocols. Transport layer actions.

• provide logical communication (逻辑通信) between application processes running on different hosts

• transport protocols actions in end systems:

• sender: breaks application messages into segments (报文段), passes to network layer

• receiver: reassembles segments into messages, passes to application layer

• two transport protocols available to Internet applications

• TCP, UDP

3.1.1 Relationship Between Transport and Network Layers

• network layer: logical communication between hosts

• transport layer: logical communication between processes

• relies on, enhances, network layer services

3.1.2 Overview of the Transport Layer in the Internet

Transport Layer Actions

Sender:

• is passed an application-layer message

• determines segment header fields values

• creates segment

• passes segment to IP

Receiver:

• receives segment from IP

• checks header values

• extracts application-layer message

• demultiplexes (多路分解) message up to application via socket

Two principal Internet transport protocols

TCP: Transmission Control Protocol

• reliable (可靠), in-order delivery

• congestion control (拥塞控制)

• flow control

• connection setup

UDP: User Datagram Protocol

• unreliable (不可靠), unordered delivery

• no-frills (不提供不必要服务的) extension of "best-effort" (尽力而为) IP

Services not available:

• delay guarantees

• bandwidth guarantees

3.2 Multiplexing and Demultiplexing

What is multiplexing, demultiplexing? How is it done? How does it work in TCP and UDP?

Multiplexing (多路复用) at sender:

handle data from multiple sockets (套接字), add transport header (later used for demultiplexing)

Demultiplexing (多路分解) at receiver:

use header info to deliver received segments to correct socket

How demultiplexing works

• host receives IP datagrams

• each datagram has source IP address, destination IP address

• each datagram carries one transport-layer segment

• each segment has source, destination port number

• host uses IP addresses & port numbers to direct segment to appropriate socket

3.2.1.1 Connectionless Multiplexing and Demultiplexing

Recall:

• when creating socket, must specify host-local port #:

• DatagramSocket mySocket1 = new DatagramSocket(12534);

• when creating datagram to send into UDP socket, must specify

• destination IP address

• destination port #

• when receiving host receives UDP segment:

• checks destination port # in segment

• directs UDP segment to socket with that port #

↓

IP/UDP datagrams with same dest. port #, but different source IP addresses and/or source port numbers will be directed to same socket at receiving host

3.2.1.2 Connection-Oriented Multiplexing and Demultiplexing

• TCP socket identified by 4-tuple:

• source IP address

• source port number

• dest IP address

• dest port number

• demux: receiver uses all four values (4-tuple) to direct segment to appropriate socket

• server may support many simultaneous TCP sockets:

• each socket identified by its own 4-tuple

• each socket associated with a different connecting client

3.2.1.3 Summary

• Multiplexing, demultiplexing: based on segment, datagram header field values

• UDP: demultiplexing using destination port number (only)

• TCP: demultiplexing using 4-tuple: source and destination IP addresses, and port numbers

• Multiplexing/demultiplexing happen at all layers

3.3 Connectionless Transport: UDP

UDP segment structure. The Internet checksum.

• "no frills," "bare bones" Internet transport protocol

• "best effort" service, UDP segments may be:

• lost

• delivered out-of-order to app

• connectionless:

• no handshaking between UDP sender, receiver

• each UDP segment handled independently of others

Why is there a UDP?

• no connection establishment (which can add RTT delay)

• simple: no connection state at sender, receiver

• Small packet header overhead

• Finer application-level control over what data is sent, and when. No congestion control.

• UDP can blast away as fast as desired!

• can function in the face of congestion

• UDP use:

• streaming multimedia apps (loss tolerant, rate sensitive)

• DNS

• SNMP (Simple Network Management Protocol, 简单网络管理协)

• HTTP/3

• if reliable transfer needed over UDP (e.g., HTTP/3):

• add needed reliability at application layer

• add congestion control at application layer

RFC 768

Transport Layer Actions

UDP sender actions:

• is passed an application-layer message

• determines UDP segment header fields values

• creates UDP segment

• passes segment to IP

UDP receiver actions:

• checks UDP checksum header value

• extracts application-layer message

• demultiplexes message up to application via socket

3.3.1 UDP Segment Structure

3.3.2 UDP Checksum

Goal: detect errors (i.e., flipped bits) in transmitted segment

Sender:

• Treat contents of UDP segment (including UDP header fields and IP addresses) as sequence of 16-bit integers

• Checksum (校验和): addition (one's complement sum) of segment content

• Checksum value put into UDP checksum field

Receiver:

• Compute checksum of received segment

• Check if computed checksum equals checksum field value:

• Not equal - error detected

• Equal - no error detected. But maybe errors nonetheless? More later ….

Internet checksum

Example: add two 16-bit integers

1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0

1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 wraparound

1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0 sum

0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1 checksum

Note: when adding numbers, a carryout from the most significant bit needs to be added to the result

Weak protection!

1 1 1 0 0 1 1 0 0 1 1 0 0 1 0 1

1 1 0 1 0 1 0 1 0 1 0 1 0 1 1 0

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 wraparound

1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0 sum

0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1 checksum

Even though numbers have changed (bit flips), no change in checksum!

3.4 Principles of Reliable Data Transfer

Protocol mechanisms for reliable data transfer (rdt). Building an rdt protocol. Pipelining. Go-back-N. Selective Repeat.

Interfaces

3.4.1 Building a Reliable Data Transfer Protocol

We will:

• incrementally develop sender, receiver sides of reliable data transfer protocol (可靠数据传输协议, rdt)

• consider only unidirectional data transfer (单向数据传输)

• but control info will flow in both directions!

• use finite-state machines (FSM, 有限状态机) to specify sender, receiver

3.4.1.1 Reliable Data Transfer over a Perfectly Reliable Channel: rdt1.0

• underlying channel perfectly reliable

• no bit errors

• no loss of packets

• separate FSMs for sender, receiver:

• sender sends data into underlying channel

• receiver reads data from underlying channel

3.4.1.2 Reliable Data Transfer over a Channel with Bit Errors: rdt2.0

• underlying channel may flip bits in packet

• checksum to detect bit errors

• the question: how to recover from errors?

• Positive acknowledgements (ACKs, 肯定确认): receiver explicitly tells sender that packet received OK

• Negative acknowledgements (NAKs): receiver explicitly tells sender that packet had errors

• sender retransmits packet on receipt of NAK

stop-and-wait (停等)

sender sends one packet, then waits for receiver response

rdt2.0: the FSM Representation

Note: "state" of receiver (did the receiver get my message correctly?) isn't known to sender unless somehow communicated from receiver to sender

that's why we need a protocol!

rdt2.0 has a fatal flaw!

What happens if ACK/NAK corrupted?

• sender doesn't know what happened at receiver!

• can't just retransmit: possible duplicate

Handling duplicates:

• sender retransmits current packet if ACK/NAK corrupted

• sender adds sequence number (序号) to each packet

• receiver discards (doesn't deliver up) duplicate packet (冗余分组)

stop-and-wait

sender sends one packet, then waits for receiver response

Protocol that uses both ACKs and NAKs from the receiver to the sender: rdt2.1

The FSM Description

Discussion

Sender:

• sequence number added to packet

• two sequence numbers (0,1) will suffice. Why?

• must check if received ACK/NAK corrupted

• twice as many states

• state must "remember" whether "expected" packet should have sequence number of 0 or 1

Receiver:

• must check if received packet is duplicate

• state indicates whether 0 or 1 is expected packet sequence number

• note: receiver can not know if its last ACK/NAK received OK at sender

NAK-free Reliable Data Transfer Protocol for a Channel with Bit Errors: rdt2.2

• same functionality as rdt2.1, using ACKs only

• instead of NAK, receiver sends ACK for last packet received OK

• receiver must explicitly include sequence number of packet being ACKed

• duplicate ACK (冗余ACK) at sender results in same action as NAK: retransmit current packet

As we will see, TCP uses this approach to be NAK-free

3.4.1.3 Reliable Data Transfer over a Lossy Channel with Bit Errors: rdt3.0

New channel assumption: underlying channel can also lose packets (data, ACKs)

• checksum, sequence numbers, ACKs, retransmissions will be of help … but not quite enough

Approach: sender waits "reasonable" amount of time for ACK

• retransmits if no ACK received in this time

• if packet (or ACK) just delayed (not lost):

• retransmission will be duplicate, but sequence numbers already handles this!

• receiver must specify sequence number of packet being ACKed

• use countdown timer (倒计数定时器) to interrupt after "reasonable" amount of time

3.4.2 Pipelined Reliable Data Transfer Protocols

3.4.2.1 Performance of rdt3.0 (stop-and-wait)

• U_sender: utilization (利用率) – fraction of time sender busy sending

• example: 1 Gbps link, 15 ms prop. delay, 8000 bit packet

• time to transmit packet into channel:

rdt3.0: Stop-and-wait Operation

• rdt 3.0 protocol performance stinks!

• Protocol limits performance of underlying infrastructure (channel)

3.4.2.2 Solution: Pipelining

rdt3.0: Pipelined Operation

Pipelining (流水线): sender allows multiple, "in-flight", yet-to-be-acknowledged packets

• range of sequence numbers must be increased

• buffering at sender and/or receiver

Pipelining: increased utilization

3-packet pipelining tripled the utilization.

3.4.3 Go-Back-N (GBN, 回退N步)

GBN Sender

• Sender: "window" of up to N, consecutive transmitted but unACKed packets

• k-bit sequence number in packet header

• Cumulative acknowledgement (累计确认): ACK(n): ACKs all packets up to, including sequence number n

• on receiving ACK(n): move window forward to begin at n+1

• Timer for oldest in-flight packet

• timeout(n): retransmit packet n and all higher sequence number packets in window

GBN Receiver

• ACK-only: always send ACK for correctly-received packet so far, with highest in-order sequence number

• may generate duplicate ACKs

• need only remember rcv_base

• on receipt of out-of-order packet:

• can discard (don't buffer) or buffer: an implementation decision

• re-ACK packet with highest in-order sequence number

Go-Back-N in Action

3.4.4 Selective Repeat (SR, 选择重传)

• Receiver individually acknowledges all correctly received packets

• buffers packets, as needed, for eventual in-order delivery to upper layer

• Sender times-out/retransmits individually for unACKed packets

• sender maintains timer for each unACKed packet

• Sender window

• N consecutive sequence numbers

• limits sequence numbers of sent, unACKed packets

Sender, Receiver Windows

Sender, Receiver Events and Actions

Sender

Receiver

• Data received from above.

• if next available sequence number in window, send packet

Timeout.

• resend packet n, restart timer

• ACK receivedin [send_base, send_base+N].

• mark packet n as received

• if n smallest unACKed packet, advance window base to next unACKed sequence number

• Packet with sequence number in [rcv_base, rcv_base+N-1] is correctly received.

• send ACK(n)

• out-of-order: buffer

• in-order: deliver (also deliver buffered, in-order packets), advance window to next not-yet-received packet

• Packet with sequence number in[rcv_base-N, rcvbase-1] is correctly received.

• ACK(n)

• Otherwise

• ignore

Selective-repeat in Action

A dilemma!

Example:

• sequence numbers 0, 1, 2, 3

• a window size of three.

• receiver can't see sender side

• receiver behavior identical in both cases!

• something's (very) wrong!

Q: What relationship is needed between sequence # size and window size to avoid problem in scenario (b)?

The window size must be less than or equal to half the size of the sequence number space for SR protocols.

3.5 Connection-oriented Transport: TCP

The TCP connection and segment, RTT estimation and timeout, flow control

RFCs: 793,1122, 2018, 5681, 7323

• Reliable, in-order byte stream:

• no "message boundaries"

• Cumulative acknowledgements

• Pipelining:

• TCP congestion and flow control set window size

• Flow controlled:

• sender will not overwhelm receiver

3.5.1 The TCP Connection

• Connection-oriented (面向连接的):

• handshaking (exchange of control messages) initializes sender, receiver state before data exchange

• Full-duplex service (全双工服务):

• bi-directional data flow in same connection

• Point-to-point (点对点):

• one sender, one receiver

• Three-way handshake (三次握手).

• Send buffer (发送缓存)：

• Maximum segment size (MSS, 最大报文段长度)

• Maximum transmission unit (MTU, 最大传输单元)

• TCP segments (TCP报文段).

3.5.2 TCP Segment Structure

Sequence Numbers and Acknowledgment Numbers

• Sequence numbers:

• byte stream "number" of first byte in segment's data

• Acknowledgements:

• seq # of next byte expected from other side

• cumulative acknowledgement (累积确认)

• Q: how receiver handles out-of-order segments

• A: TCP spec doesn't say, - up to implementor

Telnet: A Case Study for Sequence and Acknowledgment Numbers

3.5.3 Round-Trip Time Estimation and Timeout

Q: how to set TCP timeout value?

• Longer than RTT, but RTT varies!

• Too short: premature timeout, unnecessary retransmissions

• Too long: slow reaction to segment loss

Estimating the Round-Trip Time

Q: how to estimate RTT?

• SampleRTT: measured time from segment transmission until ACK receipt

• Ignore retransmissions

• SampleRTT will vary, want estimated RTT "smoother"

• Average several recent measurements, not just current SampleRTT

EstimatedRTT = (1 – α) · EstimatedRTT + α · SampleRTT

• Exponential weighted moving average (EWMA, 指数加权移动平均)

• Influence of past sample decreases exponentially fast

• Typical value: α = 0.125

Setting and Managing the Retransmission Timeout Interval

• Timeout interval: EstimatedRTT plus "safety margin"

• Large variation in EstimatedRTT: want a larger safety margin

TimeoutInterval = EstimatedRTT + 4 · DevRTT

• DevRTT: EWMA of SampleRTT deviation from EstimatedRTT:

DevRTT = (1 – β) · DevRTT + β · | SampleRTT – EstimatedRTT |

(typically, β = 0.25)

3.5.4 Reliable Data Transfer

Event: data received from application above

• Create segment with sequence number

• Sequence number is byte-stream number of first data byte in segment

• Start timer if not already running

• Think of timer as for oldest unACKed segment

• Expiration interval: TimeOutInterval

Event: Timer timeout

• Retransmit segment that caused timeout

• Restart timer

Event: ACK receipt

• If ACK acknowledges previously unACKed segments

• Update what is known to be ACKed

• Start timer if there are still unACKed segments

A Few Interesting Scenarios

Fast Retransmit

Fast retransmit (快速重传):

If sender receives 3 additional ACKs for same data ("triple duplicate ACKs"), resend unACKed segment with smallest sequence number

• likely that unACKed segment lost, so don't wait for timeout

Receipt of three duplicate ACKs indicates 3 segments received after a missing segment – lost segment is likely. So retransmit!

3.5.5 Flow Control

Q: What happens if network layer delivers data faster than application layer removes data from socket buffers?

Flow control (流量控制): receiver controls sender, so sender won't overflow receiver's buffer by transmitting too much, too fast.

• TCP receiver "advertises" free buffer space in rwnd field in TCP header

• RcvBuffer size set via socket options (typical default is 4096 bytes)

• Many operating systems autoadjust RcvBuffer

• Sender limits amount of unACKed ("in-flight") data to received rwnd

• Guarantees receive buffer will not overflow

3.5.6 TCP Connection Management

3.5.6.1 TCP Connection Establishment

Before exchanging data, sender/receiver "handshake":

• agree to establish connection (each knowing the other willing to establish connection)

• agree on connection parameters (e.g., starting seq #s)

Agreeing to Establish a Connection

Two-way handshake

Q: Will 2-way handshake always work in network?

• Variable delays

• Retransmitted messages (e.g., req_conn(x)) due to message loss

• Message reordering

• Can't "see" other side

Two-way handshake scenarios:

TCP three-way handshake(三次握手)

3.5.6.2 TCP Connection Teardown

• Client, server each close their side of connection

• send TCP segment with FIN bit = 1

• Respond to received FIN with ACK

• on receiving FIN, ACK can be combined with own FIN

• Simultaneous FIN exchanges can be handled

3.6 Principles of Congestion Control

Causes and costs of congestion, approaches to congestion control

Congestion:

• informally: "too many sources attempting to send data at too high a rate"

• manifestations:

• long delays (queueing in router buffers)

• packet loss (buffer overflow at routers)

• different from flow control!

• a top-10 problem!

Congestion control: too many senders, sending too fast

Flow control: one sender too fast for one receiver

3.6.1 The Causes and the Costs of Congestion

Scenario 1: Two Senders, a Router with Infinite Buffers

Simplest scenario:

• one router, infinite buffers

• input, output link capacity: R

• two flows

• no retransmissions needed

Q: What happens as arrival rate _in approaches R/2?

One cost of a congested network—large queuing delays are experienced as the packet-arrival rate nears the link capacity.

Scenario 2: Two Senders and a Router with Finite Buffers

• one router, finite buffers

• sender retransmits lost, timed-out packet

• application-layer input = application-layer output: _in = _out

• transport-layer input includes retransmissions: '_in≥ _in

First, the unrealistic case

• Host A sends a packet only when a buffer is free.

The slightly more realistic case

• the sender retransmits only when a packet is known for certain to be lost.

Another cost of a congested network—the sender must perform retransmissions in order to compensate for dropped (lost) packets due to buffer overflow.

Finally, the case

• the sender may time out prematurely and retransmit a packet that has been delayed in the queue but not yet lost.

Yet another cost of a congested network—unneeded retransmissions by the sender in the face of large delays may cause a router to use its link bandwidth to forward unneeded copies of a packet.

Scenario 3: Four Senders, Routers with Finite Buffers, and Multihop Paths

• four senders

• multi-hop paths

• timeout/retransmit

If '_in is extremely large for all connections, the A–C end-to-end throughput goes to zero in the limit of heavy traffic.

Yet another cost of dropping a packet due to congestion—when a packet is dropped along a path, the transmission capacity that was used at each of the upstream links to forward that packet to the point at which it is dropped ends up having been wasted.

3.6.2 Approaches to Congestion Control

• End-end congestion control (端到端拥塞控制):

• no explicit feedback from network

• congestion inferred from observed loss, delay

• approach taken by TCP

• Network-assisted congestion control (网络辅助的拥塞控制):

• routers provide direct feedback to sending/receiving hosts with flows passing through congested router

• may indicate congestion level or explicitly set sending rate

• TCP ECN, ATM, DEC DECnet protocols

3.7 TCP Congestion Control

AIMD

approach: senders can increase sending rate until packet loss (congestion) occurs, then decrease sending rate on loss event

Multiplicative decrease detail: sending rate is

Cut in half on loss detected by triple duplicate ACK (TCP Reno)

Cut to 1 MSS (maximum segment size) when loss detected by timeout (TCP Tahoe)

Why AIMD?

AIMD – a distributed, asynchronous algorithm – has been shown to:

optimize congested flow rates network wide!

have desirable stability properties

details

TCP sending behavior:

roughly: send cwnd bytes, wait RTT for ACKS, then send more bytes

TCP sender limits transmission:

cwnd is dynamically adjusted in response to observed network congestion (implementing TCP congestion control)

3.7.1 Classic TCP Congestion Control

Classic TCP; Explicit Congestion Notification, delay-based TCP, fairness

Slow Start

TCP slow start

when connection begins, increase rate exponentially until first loss event:

initially cwnd = 1 MSS

double cwnd every RTT

done by incrementing cwnd for every ACK received

summary: initial rate is slow, but ramps up exponentially fast

Congestion Avoidance

TCP: from slow start to congestion avoidance

Q: when should the exponential increase switch to linear?

A: when cwnd gets to 1/2 of its value before timeout.

Implementation:

variable ssthresh

on loss event, ssthresh is set to 1/2 of cwnd just before loss event

Fast Recovery

TCP Congestion Control: Retrospective

Summary: TCP congestion control

TCP Cubic

Is there a better way than AIMD to "probe" for usable bandwidth?

Insight/intuition:

Wmax: sending rate at which congestion loss was detected

congestion state of bottleneck link probably (?) hasn't changed much

after cutting rate/window in half on loss, initially ramp to to Wmax faster, but then approach Wmax more slowly

K: point in time when TCP window size will reach Wmax

K itself is tuneable

increase W as a function of the cube of the distance between current time and K

larger increases when further away from K

smaller increases (cautious) when nearer K

TCP CUBIC default in Linux, most popular TCP for popular Web servers

TCP and the congested "bottleneck link"

TCP (classic, CUBIC) increase TCP's sending rate until packet loss occurs at some router's output: the bottleneck link

understanding congestion: useful to focus on congested bottleneck link

Keeping sender-to-receiver pipe "just full enough, but no fuller": keep bottleneck link busy transmitting, but avoid high delays/buffering

Macroscopic Description of TCP Reno Throughput

3.7.2 Network-Assisted Explicit Congestion Notification and Delay-based Congestion Control

Explicit Congestion Notification

Explicit congestion notification (ECN)

TCP deployments often implement network-assisted congestion control:

two bits in IP header (ToS field) marked by network router to indicate congestion

policy to determine marking chosen by network operator

congestion indication carried to destination

destination sets ECE bit on ACK segment to notify sender of congestion

involves both IP (IP header ECN bit marking) and TCP (TCP header C,E bit marking)

Delay-based Congestion Control

Delay-based approach:

RTTmin - minimum observed RTT (uncongested path)

uncongested throughput with congestion window cwnd is cwnd/RTTmin

if measured throughput "very close" to uncongested throughput

increase cwnd linearly /* since path not congested */

else if measured throughput "far below" uncongested throughout

decrease cwnd linearly /* since path is congested */

congestion control without inducing/forcing loss

maximizing throughout ("keeping the just pipe full… ") while keeping delay low ("…but not fuller")

a number of deployed TCPs take a delay-based approach

BBR deployed on Google's (internal) backbone network

3.7.3 Fairness

Fairness goal: if K TCP sessions share same bottleneck link of bandwidth R, each should have average rate of R/K

Q: is TCP Fair?

Example: two competing TCP sessions:

additive increase gives slope of 1, as throughout increases

multiplicative decrease decreases throughput proportionally

Is TCP fair?

A: Yes, under idealized assumptions:

same RTT

fixed number of sessions only in congestion avoidance

Fairness: must all network apps be "fair"?

Fairness and UDP

multimedia apps often do not use TCP

do not want rate throttled by congestion control

instead use UDP:

send audio/video at constant rate, tolerate packet loss

there is no "Internet police" policing use of congestion control

Fairness and Parallel TCP Connections

application can open multiple parallel connections between two hosts

web browsers do this , e.g., link of rate R with 9 existing connections:

new app asks for 1 TCP, gets rate R/10

new app asks for 11 TCPs, gets R/2

3.8 Evolution of Transport Layer Functionality

TCP Evolution. HTTP/3, QUIC: functionality in the application layer.

TCP, UDP: principal transport protocols for 40 years

different "flavors" of TCP developed, for specific scenarios:

moving transport–layer functions to application layer, on top of UDP

HTTP/3: QUIC

QUIC: Quick UDP Internet Connections

application-layer protocol, on top of UDP

increase performance of HTTP

deployed on many Google servers, apps (Chrome, mobile YouTube app)

adopts approaches we've studied in this chapter for connection establishment, error control, congestion control

error and congestion control: "Readers familiar with TCP's loss detection and congestion control will find algorithms here that parallel well-known TCP ones." [from QUIC specification]

connection establishment: reliability, congestion control, authentication, encryption, state established in one RTT

multiple application-level "streams" multiplexed over single QUIC connection

separate reliable data transfer, security

common congestion control

TCP (reliability, congestion control state) + TLS (authentication, crypto state)

码农公寓

Course Information

Related Resources