a Top-Down Approach (8th ed.):
Chapter 3 Transport Layer
3.1 Introduction and Transport-layer Services
Transport-layer services and protocols. Transport layer actions.
• provide logical communication (逻辑通信) between application processes running on different hosts
• transport protocols actions in end systems:
• sender: breaks application messages into segments (报文段), passes to network layer
• receiver: reassembles segments into messages, passes to application layer
• two transport protocols available to Internet applications
• TCP, UDP
3.1.1 Relationship Between Transport and Network Layers
• network layer: logical communication between hosts
• transport layer: logical communication between processes
• relies on, enhances, network layer services
3.1.2 Overview of the Transport Layer in the Internet
Transport Layer Actions
Sender:
• is passed an application-layer message
• determines segment header fields values
• creates segment
• passes segment to IP
Receiver:
• receives segment from IP
• checks header values
• extracts application-layer message
• demultiplexes (多路分解) message up to application via socket
Two principal Internet transport protocols
TCP: Transmission Control Protocol
• reliable (可靠), in-order delivery
• congestion control (拥塞控制)
• flow control
• connection setup
UDP: User Datagram Protocol
• unreliable (不可靠), unordered delivery
• no-frills (不提供不必要服务的) extension of "best-effort" (尽力而为) IP
Services not available:
• delay guarantees
• bandwidth guarantees
3.2 Multiplexing and Demultiplexing
What is multiplexing, demultiplexing? How is it done? How does it work in TCP and UDP?
Multiplexing (多路复用) at sender:
handle data from multiple sockets (套接字), add transport header (later used for demultiplexing)
Demultiplexing (多路分解) at receiver:
use header info to deliver received segments to correct socket
How demultiplexing works
• host receives IP datagrams
• each datagram has source IP address, destination IP address
• each datagram carries one transport-layer segment
• each segment has source, destination port number
• host uses IP addresses & port numbers to direct segment to appropriate socket
3.2.1.1 Connectionless Multiplexing and Demultiplexing
Recall:
• when creating socket, must specify host-local port #:
• DatagramSocket mySocket1 = new DatagramSocket(12534);
• when creating datagram to send into UDP socket, must specify
• destination IP address
• destination port #
• when receiving host receives UDP segment:
• checks destination port # in segment
• directs UDP segment to socket with that port #
↓
IP/UDP datagrams with same dest. port #, but different source IP addresses and/or source port numbers will be directed to same socket at receiving host
3.2.1.2 Connection-Oriented Multiplexing and Demultiplexing
• TCP socket identified by 4-tuple:
• source IP address
• source port number
• dest IP address
• dest port number
• demux: receiver uses all four values (4-tuple) to direct segment to appropriate socket
• server may support many simultaneous TCP sockets:
• each socket identified by its own 4-tuple
• each socket associated with a different connecting client
3.2.1.3 Summary
• Multiplexing, demultiplexing: based on segment, datagram header field values
• UDP: demultiplexing using destination port number (only)
• TCP: demultiplexing using 4-tuple: source and destination IP addresses, and port numbers
• Multiplexing/demultiplexing happen at all layers
3.3 Connectionless Transport: UDP
UDP segment structure. The Internet checksum.
• "no frills," "bare bones" Internet transport protocol
• "best effort" service, UDP segments may be:
• lost
• delivered out-of-order to app
• connectionless:
• no handshaking between UDP sender, receiver
• each UDP segment handled independently of others
Why is there a UDP?
• no connection establishment (which can add RTT delay)
• simple: no connection state at sender, receiver
• Small packet header overhead
• Finer application-level control over what data is sent, and when. No congestion control.
• UDP can blast away as fast as desired!
• can function in the face of congestion
• UDP use:
• streaming multimedia apps (loss tolerant, rate sensitive)
• DNS
• SNMP (Simple Network Management Protocol, 简单网络管理协)
• HTTP/3
• if reliable transfer needed over UDP (e.g., HTTP/3):
• add needed reliability at application layer
• add congestion control at application layer
RFC 768
Transport Layer Actions
UDP sender actions:
• is passed an application-layer message
• determines UDP segment header fields values
• creates UDP segment
• passes segment to IP
UDP receiver actions:
• checks UDP checksum header value
• extracts application-layer message
• demultiplexes message up to application via socket
3.3.1 UDP Segment Structure
3.3.2 UDP Checksum
Goal: detect errors (i.e., flipped bits) in transmitted segment
Sender:
• Treat contents of UDP segment (including UDP header fields and IP addresses) as sequence of 16-bit integers
• Checksum (校验和): addition (one's complement sum) of segment content
• Checksum value put into UDP checksum field
Receiver:
• Compute checksum of received segment
• Check if computed checksum equals checksum field value:
• Not equal - error detected
• Equal - no error detected. But maybe errors nonetheless? More later ….
Internet checksum
Example: add two 16-bit integers
1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 wraparound
1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0 sum
0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1 checksum
Note: when adding numbers, a carryout from the most significant bit needs to be added to the result
Weak protection!
1 1 1 0 0 1 1 0 0 1 1 0 0 1 0 1
1 1 0 1 0 1 0 1 0 1 0 1 0 1 1 0
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 wraparound
1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0 sum
0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1 checksum
Even though numbers have changed (bit flips), no change in checksum!
3.4 Principles of Reliable Data Transfer
Protocol mechanisms for reliable data transfer (rdt). Building an rdt protocol. Pipelining. Go-back-N. Selective Repeat.
Interfaces
3.4.1 Building a Reliable Data Transfer Protocol
We will:
• incrementally develop sender, receiver sides of reliable data transfer protocol (可靠数据传输协议, rdt)
• consider only unidirectional data transfer (单向数据传输)
• but control info will flow in both directions!
• use finite-state machines (FSM, 有限状态机) to specify sender, receiver
3.4.1.1 Reliable Data Transfer over a Perfectly Reliable Channel: rdt1.0
• underlying channel perfectly reliable
• no bit errors
• no loss of packets
• separate FSMs for sender, receiver:
• sender sends data into underlying channel
• receiver reads data from underlying channel
3.4.1.2 Reliable Data Transfer over a Channel with Bit Errors: rdt2.0
• underlying channel may flip bits in packet
• checksum to detect bit errors
• the question: how to recover from errors?
• Positive acknowledgements (ACKs, 肯定确认): receiver explicitly tells sender that packet received OK
• Negative acknowledgements (NAKs): receiver explicitly tells sender that packet had errors
• sender retransmits packet on receipt of NAK
stop-and-wait (停等)
sender sends one packet, then waits for receiver response
rdt2.0: the FSM Representation
Note: "state" of receiver (did the receiver get my message correctly?) isn't known to sender unless somehow communicated from receiver to sender
that's why we need a protocol!
rdt2.0 has a fatal flaw!
What happens if ACK/NAK corrupted?
• sender doesn't know what happened at receiver!
• can't just retransmit: possible duplicate
Handling duplicates:
• sender retransmits current packet if ACK/NAK corrupted
• sender adds sequence number (序号) to each packet
• receiver discards (doesn't deliver up) duplicate packet (冗余分组)
stop-and-wait
sender sends one packet, then waits for receiver response
Protocol that uses both ACKs and NAKs from the receiver to the sender: rdt2.1
The FSM Description
Discussion
Sender:
• sequence number added to packet
• two sequence numbers (0,1) will suffice. Why?
• must check if received ACK/NAK corrupted
• twice as many states
• state must "remember" whether "expected" packet should have sequence number of 0 or 1
Receiver:
• must check if received packet is duplicate
• state indicates whether 0 or 1 is expected packet sequence number
• note: receiver can not know if its last ACK/NAK received OK at sender
NAK-free Reliable Data Transfer Protocol for a Channel with Bit Errors: rdt2.2
• same functionality as rdt2.1, using ACKs only
• instead of NAK, receiver sends ACK for last packet received OK
• receiver must explicitly include sequence number of packet being ACKed
• duplicate ACK (冗余ACK) at sender results in same action as NAK: retransmit current packet
As we will see, TCP uses this approach to be NAK-free
3.4.1.3 Reliable Data Transfer over a Lossy Channel with Bit Errors: rdt3.0
New channel assumption: underlying channel can also lose packets (data, ACKs)
• checksum, sequence numbers, ACKs, retransmissions will be of help … but not quite enough
Approach: sender waits "reasonable" amount of time for ACK
• retransmits if no ACK received in this time
• if packet (or ACK) just delayed (not lost):
• retransmission will be duplicate, but sequence numbers already handles this!
• receiver must specify sequence number of packet being ACKed
• use countdown timer (倒计数定时器) to interrupt after "reasonable" amount of time
3.4.2 Pipelined Reliable Data Transfer Protocols
3.4.2.1 Performance of rdt3.0 (stop-and-wait)
• Usender: utilization (利用率) – fraction of time sender busy sending
• example: 1 Gbps link, 15 ms prop. delay, 8000 bit packet
• time to transmit packet into channel:
rdt3.0: Stop-and-wait Operation
• rdt 3.0 protocol performance stinks!
• Protocol limits performance of underlying infrastructure (channel)
3.4.2.2 Solution: Pipelining
rdt3.0: Pipelined Operation
Pipelining (流水线): sender allows multiple, "in-flight", yet-to-be-acknowledged packets
• range of sequence numbers must be increased
• buffering at sender and/or receiver
Pipelining: increased utilization
3-packet pipelining tripled the utilization.
3.4.3 Go-Back-N (GBN, 回退N步)
GBN Sender
• Sender: "window" of up to N, consecutive transmitted but unACKed packets
• k-bit sequence number in packet header
• Cumulative acknowledgement (累计确认): ACK(n): ACKs all packets up to, including sequence number n
• on receiving ACK(n): move window forward to begin at n+1
• Timer for oldest in-flight packet
• timeout(n): retransmit packet n and all higher sequence number packets in window
GBN Receiver
• ACK-only: always send ACK for correctly-received packet so far, with highest in-order sequence number
• may generate duplicate ACKs
• need only remember rcv_base
• on receipt of out-of-order packet:
• can discard (don't buffer) or buffer: an implementation decision
• re-ACK packet with highest in-order sequence number
Go-Back-N in Action
3.4.4 Selective Repeat (SR, 选择重传)
• Receiver individually acknowledges all correctly received packets
• buffers packets, as needed, for eventual in-order delivery to upper layer
• Sender times-out/retransmits individually for unACKed packets
• sender maintains timer for each unACKed packet
• Sender window
• N consecutive sequence numbers
• limits sequence numbers of sent, unACKed packets
Sender, Receiver Windows
Sender, Receiver Events and Actions
Sender |
Receiver |
• Data received from above. • if next available sequence number in window, send packet Timeout. • resend packet n, restart timer • ACK receivedin [send_base, send_base+N]. • mark packet n as received • if n smallest unACKed packet, advance window base to next unACKed sequence number |
• Packet with sequence number in [rcv_base, rcv_base+N-1] is correctly received. • send ACK(n) • out-of-order: buffer • in-order: deliver (also deliver buffered, in-order packets), advance window to next not-yet-received packet • Packet with sequence number in[rcv_base-N, rcvbase-1] is correctly received. • ACK(n) • Otherwise • ignore |
Selective-repeat in Action
A dilemma!
Example:
• sequence numbers 0, 1, 2, 3
• a window size of three.
• receiver can't see sender side
• receiver behavior identical in both cases!
• something's (very) wrong!
Q: What relationship is needed between sequence # size and window size to avoid problem in scenario (b)?
The window size must be less than or equal to half the size of the sequence number space for SR protocols.
3.5 Connection-oriented Transport: TCP
The TCP connection and segment, RTT estimation and timeout, flow control
RFCs: 793,1122, 2018, 5681, 7323
• Reliable, in-order byte stream:
• no "message boundaries"
• Cumulative acknowledgements
• Pipelining:
• TCP congestion and flow control set window size
• Flow controlled:
• sender will not overwhelm receiver
3.5.1 The TCP Connection
• Connection-oriented (面向连接的):
• handshaking (exchange of control messages) initializes sender, receiver state before data exchange
• Full-duplex service (全双工服务):
• bi-directional data flow in same connection
• Point-to-point (点对点):
• one sender, one receiver
• Three-way handshake (三次握手).
• Send buffer (发送缓存):
• Maximum segment size (MSS, 最大报文段长度)
• Maximum transmission unit (MTU, 最大传输单元)
• TCP segments (TCP报文段).
3.5.2 TCP Segment Structure
Sequence Numbers and Acknowledgment Numbers
• Sequence numbers:
• byte stream "number" of first byte in segment's data
• Acknowledgements:
• seq # of next byte expected from other side
• cumulative acknowledgement (累积确认)
• Q: how receiver handles out-of-order segments
• A: TCP spec doesn't say, - up to implementor
Telnet: A Case Study for Sequence and Acknowledgment Numbers
3.5.3 Round-Trip Time Estimation and Timeout
Q: how to set TCP timeout value?
• Longer than RTT, but RTT varies!
• Too short: premature timeout, unnecessary retransmissions
• Too long: slow reaction to segment loss
Estimating the Round-Trip Time
Q: how to estimate RTT?
• SampleRTT: measured time from segment transmission until ACK receipt
• Ignore retransmissions
• SampleRTT will vary, want estimated RTT "smoother"
• Average several recent measurements, not just current SampleRTT
EstimatedRTT = (1 – α) · EstimatedRTT + α · SampleRTT
• Exponential weighted moving average (EWMA, 指数加权移动平均)
• Influence of past sample decreases exponentially fast
• Typical value: α = 0.125
Setting and Managing the Retransmission Timeout Interval
• Timeout interval: EstimatedRTT plus "safety margin"
• Large variation in EstimatedRTT: want a larger safety margin
TimeoutInterval = EstimatedRTT + 4 · DevRTT
• DevRTT: EWMA of SampleRTT deviation from EstimatedRTT:
DevRTT = (1 – β) · DevRTT + β · | SampleRTT – EstimatedRTT |
(typically, β = 0.25)
3.5.4 Reliable Data Transfer
Event: data received from application above
• Create segment with sequence number
• Sequence number is byte-stream number of first data byte in segment
• Start timer if not already running
• Think of timer as for oldest unACKed segment
• Expiration interval: TimeOutInterval
Event: Timer timeout
• Retransmit segment that caused timeout
• Restart timer
Event: ACK receipt
• If ACK acknowledges previously unACKed segments
• Update what is known to be ACKed
• Start timer if there are still unACKed segments
A Few Interesting Scenarios
Fast Retransmit
Fast retransmit (快速重传):
If sender receives 3 additional ACKs for same data ("triple duplicate ACKs"), resend unACKed segment with smallest sequence number
• likely that unACKed segment lost, so don't wait for timeout
Receipt of three duplicate ACKs indicates 3 segments received after a missing segment – lost segment is likely. So retransmit!
3.5.5 Flow Control
Q: What happens if network layer delivers data faster than application layer removes data from socket buffers?
Flow control (流量控制): receiver controls sender, so sender won't overflow receiver's buffer by transmitting too much, too fast.
• TCP receiver "advertises" free buffer space in rwnd field in TCP header
• RcvBuffer size set via socket options (typical default is 4096 bytes)
• Many operating systems autoadjust RcvBuffer
• Sender limits amount of unACKed ("in-flight") data to received rwnd
• Guarantees receive buffer will not overflow
3.5.6 TCP Connection Management
3.5.6.1 TCP Connection Establishment
Before exchanging data, sender/receiver "handshake":
• agree to establish connection (each knowing the other willing to establish connection)
• agree on connection parameters (e.g., starting seq #s)
Agreeing to Establish a Connection
Two-way handshake
Q: Will 2-way handshake always work in network?
• Variable delays
• Retransmitted messages (e.g., req_conn(x)) due to message loss
• Message reordering
• Can't "see" other side
Two-way handshake scenarios:
TCP three-way handshake(三次握手)
3.5.6.2 TCP Connection Teardown
• Client, server each close their side of connection
• send TCP segment with FIN bit = 1
• Respond to received FIN with ACK
• on receiving FIN, ACK can be combined with own FIN
• Simultaneous FIN exchanges can be handled
3.6 Principles of Congestion Control
Causes and costs of congestion, approaches to congestion control
Congestion:
• informally: "too many sources attempting to send data at too high a rate"
• manifestations:
• long delays (queueing in router buffers)
• packet loss (buffer overflow at routers)
• different from flow control!
• a top-10 problem!
Congestion control: too many senders, sending too fast
Flow control: one sender too fast for one receiver
3.6.1 The Causes and the Costs of Congestion
Scenario 1: Two Senders, a Router with Infinite Buffers
Simplest scenario:
• one router, infinite buffers
• input, output link capacity: R
• two flows
• no retransmissions needed
Q: What happens as arrival rate λin approaches R/2?
One cost of a congested network—large queuing delays are experienced as the packet-arrival rate nears the link capacity.
Scenario 2: Two Senders and a Router with Finite Buffers
• one router, finite buffers
• sender retransmits lost, timed-out packet
• application-layer input = application-layer output: λin = λout
• transport-layer input includes retransmissions: λ' in ≥ λin
First, the unrealistic case
• Host A sends a packet only when a buffer is free.
The slightly more realistic case
• the sender retransmits only when a packet is known for certain to be lost.
Another cost of a congested network—the sender must perform retransmissions in order to compensate for dropped (lost) packets due to buffer overflow.
Finally, the case
• the sender may time out prematurely and retransmit a packet that has been delayed in the queue but not yet lost.
Yet another cost of a congested network—unneeded retransmissions by the sender in the face of large delays may cause a router to use its link bandwidth to forward unneeded copies of a packet.
Scenario 3: Four Senders, Routers with Finite Buffers, and Multihop Paths
• four senders
• multi-hop paths
• timeout/retransmit
If λ' in is extremely large for all connections, the A–C end-to-end throughput goes to zero in the limit of heavy traffic.
Yet another cost of dropping a packet due to congestion—when a packet is dropped along a path, the transmission capacity that was used at each of the upstream links to forward that packet to the point at which it is dropped ends up having been wasted.
3.6.2 Approaches to Congestion Control
• End-end congestion control (端到端拥塞控制):
• no explicit feedback from network
• congestion inferred from observed loss, delay
• approach taken by TCP
• Network-assisted congestion control (网络辅助的拥塞控制):
• routers provide direct feedback to sending/receiving hosts with flows passing through congested router
• may indicate congestion level or explicitly set sending rate
• TCP ECN, ATM, DEC DECnet protocols
3.7 TCP Congestion Control
Classic TCP; Explicit Congestion Notification, delay-based TCP, fairness
3.7.1 Classic TCP Congestion Control
Approach: have each sender limit the rate as a function of perceived congestion
• little congestion: increases;
• congestion: reduces.
Three questions.
• First, how does a TCP sender limit the rate at which it sends traffic into its connection?
The congestion window (拥塞窗口), denoted cwnd, imposes a constraint on the rate at which a TCP sender can send traffic into the network.
LastByteSent – LastByteAcked ≤ min{cwnd, rwnd}
TCP sending behavior:
• Roughly, every RTT, the sender
• at the beginning, sends cwnd bytes
• at the end, receives acknowledgments.
rate ≈ cwnd/RTT bytes/sec
By adjusting the value of cwnd, the sender can therefore adjust the rate at which it sends data into its connection.
• Second, how does a TCP sender perceive that there is congestion on the path between itself and the destination?
Because TCP uses acknowledgments to trigger (or clock) its increase in congestion window size, TCP is said to be self-clocking. How do the TCP senders determine their sending rates?
• A lost segment implies congestion, and hence, the TCP sender's rate should be decreased when a segment is lost.
• An acknowledged segment indicates that the network is delivering the sender's segments to the receiver, and hence, the sender's rate can be increased when an ACK arrives for a previously unacknowledged segment.
• Bandwidth probing.
• And third, what algorithm should the sender use to change its send rate as a function of perceived end-to-end congestion?
TCP congestion-control algorithm (TCP拥塞控制算法):
three major components:
(1) slow start (mandatory),
(2) congestion avoidance (mandatory), and
(3) fast recovery (recommended, but not required).
Slow Start
Event |
Actions |
New state |
Λ |
cwnd = 1 MSS |
Slow start |
new ACK |
cwnd = cwnd + 1 MSS |
Slow start |
Thus, the TCP send rate starts slow but grows exponentially during the slow start phase.
But when should this exponential growth end?
Event |
Actions |
"New" state |
timeout |
ssthresh = cwnd / 2 cwnd = 1 MSS |
Slow start |
new ACK |
cwnd = cwnd + 1 MSS |
Slow start |
cwnd ≥ ssthresh |
Λ |
Congestion avoidance |
dupACKcount == 3 |
Fast retransmit |
Fast recovery |
Congestion Avoidance
On entry, the cwnd is approximately half its value when congestion was last encountered! Thus,
Event |
Actions |
"New" state |
new ACK |
cwnd = cwnd + 1 MSS cwnd = cwnd + MSS •(MSS/cwnd) |
Congestion avoidance |
But when should congestion avoidance's linear increase (of 1 MSS per RTT) end?
Event |
Actions |
"New" state |
timeout |
ssthresh = cwnd / 2 cwnd = 1 MSS |
Slow Start |
dupACKcount == 3 |
ssthresh = cwnd / 2 cwnd = ssthresh + 3 • MSS |
Fast recovery |
Fast Recovery
Event |
Actions |
New state |
duplicate ACK |
cwnd = cwnd + MSS |
Fast recovery |
Eventually,
Event |
Actions |
New state |
new ACK |
cwnd = ssthresh |
Congestion avoidance |
timeout |
ssthresh = cwnd / 2 cwnd = 1 |
Slow start |
Congestion window is
• cut in half on loss detected by triple duplicate ACK (TCP Reno)
• cut to 1 MSS (maximum segment size) when loss detected by timeout (TCP Tahoe)
TCP Congestion Control: Retrospective
An additive-increase, multiplicative-decrease (加性增、乘性减, AIMD) form of congestion control.
Ignoring the initial slow-start period and assuming that losses are indicated by triple duplicate ACKs rather than timeouts, TCP's congestion control consists of linear (additive) increase in cwnd of 1 MSS per RTT and then a halving (multiplicative decrease) of cwnd on a triple duplicate-ACK event.
AIMD "saw tooth" behavior: "probing" for bandwidth.
Why AIMD?
• TCP's congestion-control algorithm serves as a distributed asynchronous-optimization algorithm that results in several important aspects of user and network performance being simultaneously optimized.
TCP Cubic
Insight:
If the state of the congested link where packet loss occurred hasn't changed much, then perhaps it's better to more quickly ramp up the sending rate to get close to the pre-loss sending rate and only
then probe cautiously for bandwidth.
TCP CUBIC only changes the congestion avoidance phase, as follows:
• Wmax: size of congestion control window when loss was last detected. K: the future point in time when window size will again reach Wmax, determined by tunable parameters.
• CUBIC increases the congestion window as a function of cube of the distance
between the current time, t, and K. Thus, when t is further away from K, the
congestion window size increases are much larger than when t is close to K.
Default in Linux; most popular TCP for popular Web servers.
3.7.2 Network-Assisted Explicit Congestion Notification and Delay-based Congestion Control
Explicit Congestion Notification
Explicit Congestion Notification (明确拥塞通告) [RFC 3168] is the form of network-assisted congestion control performed within the Internet. Both TCP and IP are involved. At the network layer, two bits in the Type of Service field of the IP datagram header are used for ECN.
Congestion indication is carried in the marked IP datagram to the destination host, which then informs the sending host.
Delay-based Congestion Control
RTTmin: the minimum of measurements at a sender (uncongested).
The uncongested throughput rate = cwnd/RTTmin.
• if the actual sender-measured throughput is close to this value,
• increase sending rate.
• if the actual sender-measured throughput is significantly less than the uncongested throughput rate,
• decrease sending rate.
"Keep the pipe just full, but no fuller".
• "Keeping the pipe full" means that links (in particular the bottleneck link that is limiting a connection's throughput) are kept busy transmitting, doing useful work;
• "but no fuller" means that there is nothing to gain (except increased delay!) if large queues are allowed to build up while the pipe is kept full.
Google used BBR on its private network.
3.7.3 Fairness
Consider K TCP connections, all passing through a bottleneck link with transmission rate R bps. (By bottleneck link, 瓶颈链路, we mean that for each connection, all the other links along the connection's path are not congested and have abundant transmission capacity as compared with the transmission capacity of the bottleneck link.)
A congestion control mechanism is said to be fair if the average transmission rate of each connection is approximately R/K.
Is TCP's AIMD algorithm fair?
The simple case: two TCP connections:
• increase throughputs along a 45-degree line
• decrease windows by a factor of two
In our idealized scenario,
• same RTT
• only a single TCP connection per host-destination pair
Fairness and UDP
• multimedia applications often do not run over TCP
• do not want transmission rate throttled
• instead, prefer UDP:
• pump audio/video at constant rate, occasionally lose packet
• not being fair—do not cooperate nor adjust
Fairness and Parallel TCP Connections
• A TCP-based application can use multiple parallel connections.
• Web browsers often use multiple parallel TCP connections. An example: consider a link of rate R supporting nine ongoing connections. New application comes along and also uses
• one TCP connection, gets transmission rate of R/10.
• 11 parallel TCP connections, gets an unfair allocation of more than R/2.
3.8 Evolution of Transport Layer Functionality
TCP Evolution. HTTP/3, QUIC: functionality in the application layer.
UDP and TCP—the two "work horses" of the Internet transport layer.
QUIC: Quick UDP Internet Connections
Moving transport–layer functions to application layer, on top of UDP.
• QUIC is a new application-layer protocol designed from the ground up to improve the performance of transport-layer services for secure HTTP.
• Using UDP as its underlying transport-layer protocol.
• Deployed on many Google servers, in its apps
QUIC uses many of the approaches for reliable data transfer, congestion control, and connection management.
Some of QUIC's major features include:
• Connection-Oriented and Secure. QUIC combines the handshakes needed to establish connection state with those needed for authentication and encryption, thus providing faster establishment than the protocol stack.
• Streams. several different application-level "streams" multiplex through a single QUIC connection
• Reliable, TCP-friendly congestion-controlled data transfer. No HOL blocking. "Readers familiar with TCP's loss detection and congestion control will find algorithms here that parallel well-known TCP ones." (from QUIC specification).