Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

Computer Networking:

a Top-Down Approach (8th ed.):

Notes of "Select" Lectures

 

Chapter 3 Transport Layer

3.1 Introduction and Transport-layer Services

Transport-layer services and protocols. Transport layer actions.

•    provide logical communication (逻辑通信) between application processes running on different hosts

•    transport protocols actions in end systems:

•    sender: breaks application messages into segments (报文段), passes to network layer

•    receiver: reassembles segments into messages, passes to application layer

•    two transport protocols available to Internet applications

•    TCP, UDP

3.1.1 Relationship Between Transport and Network Layers

•    network layer: logical communication between hosts

•    transport layer: logical communication between processes

•    relies on, enhances, network layer services

3.1.2 Overview of the Transport Layer in the Internet

Transport Layer Actions

Sender:

•    is passed an application-layer message

•    determines segment header fields values

•    creates segment

•    passes segment to IP

Receiver:

•    receives segment from IP

•    checks header values

•    extracts application-layer message

•    demultiplexes (多路分解) message up to application via socket

Two principal Internet transport protocols

TCP: Transmission Control Protocol

•    reliable (可靠), in-order delivery

•    congestion control (拥塞控制)

•    flow control

•    connection setup

UDP: User Datagram Protocol

•    unreliable (不可靠), unordered delivery

•    no-frills (不提供不必要服务的) extension of "best-effort" (尽力而为) IP

Services not available:

•    delay guarantees

•    bandwidth guarantees

3.2 Multiplexing and Demultiplexing

What is multiplexing, demultiplexing? How is it done? How does it work in TCP and UDP?

Multiplexing (多路复用) at sender:

handle data from multiple sockets (套接字), add transport header (later used for demultiplexing)

Demultiplexing (多路分解) at receiver:

use header info to deliver received segments to correct socket

How demultiplexing works

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

•    host receives IP datagrams

•    each datagram has source IP address, destination IP address

•    each datagram carries one transport-layer segment

•    each segment has source, destination port number

•    host uses IP addresses & port numbers to direct segment to appropriate socket

3.2.1.1 Connectionless Multiplexing and Demultiplexing

Recall:

•    when creating socket, must specify host-local port #:

•    DatagramSocket mySocket1 = new DatagramSocket(12534);

•    when creating datagram to send into UDP socket, must specify

•    destination IP address

•    destination port #

•    when receiving host receives UDP segment:

•    checks destination port # in segment

•    directs UDP segment to socket with that port #

IP/UDP datagrams with same dest. port #, but different source IP addresses and/or source port numbers will be directed to same socket at receiving host

3.2.1.2 Connection-Oriented Multiplexing and Demultiplexing

•    TCP socket identified by 4-tuple:

•    source IP address

•    source port number

•    dest IP address

•    dest port number

•    demux: receiver uses all four values (4-tuple) to direct segment to appropriate socket

•    server may support many simultaneous TCP sockets:

•    each socket identified by its own 4-tuple

•    each socket associated with a different connecting client

3.2.1.3 Summary

•    Multiplexing, demultiplexing: based on segment, datagram header field values

•    UDP: demultiplexing using destination port number (only)

•    TCP: demultiplexing using 4-tuple: source and destination IP addresses, and port numbers

•    Multiplexing/demultiplexing happen at all layers

3.3 Connectionless Transport: UDP

UDP segment structure. The Internet checksum.

•    "no frills," "bare bones" Internet transport protocol

•    "best effort" service, UDP segments may be:

•    lost

•    delivered out-of-order to app

•    connectionless:

•    no handshaking between UDP sender, receiver

•    each UDP segment handled independently of others

Why is there a UDP?

•    no connection establishment (which can add RTT delay)

•    simple: no connection state at sender, receiver

•    Small packet header overhead

•    Finer application-level control over what data is sent, and when. No congestion control.

•    UDP can blast away as fast as desired!

•    can function in the face of congestion

•    UDP use:

•    streaming multimedia apps (loss tolerant, rate sensitive)

•    DNS

•    SNMP (Simple Network Management Protocol, 简单网络管理协)

•    HTTP/3

•    if reliable transfer needed over UDP (e.g., HTTP/3):

•    add needed reliability at application layer

•    add congestion control at application layer

RFC 768

Transport Layer Actions

UDP sender actions:

•    is passed an application-layer message

•    determines UDP segment header fields values

•    creates UDP segment

•    passes segment to IP

UDP receiver actions:

•    checks UDP checksum header value

•    extracts application-layer message

•    demultiplexes message up to application via socket

 

3.3.1 UDP Segment Structure

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

3.3.2 UDP Checksum

Goal: detect errors (i.e., flipped bits) in transmitted segment

Sender:

•    Treat contents of UDP segment (including UDP header fields and IP addresses) as sequence of 16-bit integers

•    Checksum (校验和): addition (one's complement sum) of segment content

•    Checksum value put into UDP checksum field

Receiver:

•    Compute checksum of received segment

•    Check if computed checksum equals checksum field value:

•    Not equal - error detected

•    Equal - no error detected. But maybe errors nonetheless? More later ….

Internet checksum

Example: add two 16-bit integers

1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0

1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 wraparound

1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0 sum

0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1 checksum

Note: when adding numbers, a carryout from the most significant bit needs to be added to the result

Weak protection!

1 1 1 0 0 1 1 0 0 1 1 0 0 1 0 1

1 1 0 1 0 1 0 1 0 1 0 1 0 1 1 0

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 wraparound

1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0 sum

0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1 checksum

Even though numbers have changed (bit flips), no change in checksum!

3.4 Principles of Reliable Data Transfer

Protocol mechanisms for reliable data transfer (rdt). Building an rdt protocol. Pipelining. Go-back-N. Selective Repeat.

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

Interfaces

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

3.4.1 Building a Reliable Data Transfer Protocol

We will:

•    incrementally develop sender, receiver sides of reliable data transfer protocol (可靠数据传输协议, rdt)

•    consider only unidirectional data transfer (单向数据传输)

•    but control info will flow in both directions!

•    use finite-state machines (FSM, 有限状态机) to specify sender, receiver

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

3.4.1.1 Reliable Data Transfer over a Perfectly Reliable Channel: rdt1.0

•    underlying channel perfectly reliable

•    no bit errors

•    no loss of packets

•    separate FSMs for sender, receiver:

•    sender sends data into underlying channel

•    receiver reads data from underlying channel

 

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

3.4.1.2 Reliable Data Transfer over a Channel with Bit Errors: rdt2.0

•    underlying channel may flip bits in packet

•    checksum to detect bit errors

•    the question: how to recover from errors?

•    Positive acknowledgements (ACKs, 肯定确认): receiver explicitly tells sender that packet received OK

•    Negative acknowledgements (NAKs): receiver explicitly tells sender that packet had errors

•    sender retransmits packet on receipt of NAK

stop-and-wait (停等)

sender sends one packet, then waits for receiver response

rdt2.0: the FSM Representation

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

 

Note: "state" of receiver (did the receiver get my message correctly?) isn't known to sender unless somehow communicated from receiver to sender

that's why we need a protocol!

rdt2.0 has a fatal flaw!

What happens if ACK/NAK corrupted?

•    sender doesn't know what happened at receiver!

•    can't just retransmit: possible duplicate

Handling duplicates:

•    sender retransmits current packet if ACK/NAK corrupted

•    sender adds sequence number (序号) to each packet

•    receiver discards (doesn't deliver up) duplicate packet (冗余分组)

stop-and-wait

sender sends one packet, then waits for receiver response

Protocol that uses both ACKs and NAKs from the receiver to the sender: rdt2.1

The FSM Description

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

Discussion

Sender:

•    sequence number added to packet

•    two sequence numbers (0,1) will suffice. Why?

•    must check if received ACK/NAK corrupted

•    twice as many states

•    state must "remember" whether "expected" packet should have sequence number of 0 or 1

Receiver:

•    must check if received packet is duplicate

•    state indicates whether 0 or 1 is expected packet sequence number

•    note: receiver can not know if its last ACK/NAK received OK at sender

NAK-free Reliable Data Transfer Protocol for a Channel with Bit Errors: rdt2.2

•    same functionality as rdt2.1, using ACKs only

•    instead of NAK, receiver sends ACK for last packet received OK

•    receiver must explicitly include sequence number of packet being ACKed

•    duplicate ACK (冗余ACK) at sender results in same action as NAK: retransmit current packet

As we will see, TCP uses this approach to be NAK-free

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

3.4.1.3 Reliable Data Transfer over a Lossy Channel with Bit Errors: rdt3.0

New channel assumption: underlying channel can also lose packets (data, ACKs)

•    checksum, sequence numbers, ACKs, retransmissions will be of help … but not quite enough

Approach: sender waits "reasonable" amount of time for ACK

•    retransmits if no ACK received in this time

•    if packet (or ACK) just delayed (not lost):

•    retransmission will be duplicate, but sequence numbers already handles this!

•    receiver must specify sequence number of packet being ACKed

•    use countdown timer (倒计数定时器) to interrupt after "reasonable" amount of time

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

3.4.2 Pipelined Reliable Data Transfer Protocols

3.4.2.1 Performance of rdt3.0 (stop-and-wait)

•    Usender: utilization (利用率) – fraction of time sender busy sending

•    example: 1 Gbps link, 15 ms prop. delay, 8000 bit packet

•    time to transmit packet into channel:
Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

rdt3.0: Stop-and-wait Operation

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

•    rdt 3.0 protocol performance stinks!

•    Protocol limits performance of underlying infrastructure (channel)

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

3.4.2.2 Solution: Pipelining

rdt3.0: Pipelined Operation

Pipelining (流水线): sender allows multiple, "in-flight", yet-to-be-acknowledged packets

•    range of sequence numbers must be increased

•    buffering at sender and/or receiver

Pipelining: increased utilization

3-packet pipelining tripled the utilization.

3.4.3 Go-Back-N (GBN, 回退N步)

GBN Sender

•    Sender: "window" of up to N, consecutive transmitted but unACKed packets

•    k-bit sequence number in packet header
Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

•    Cumulative acknowledgement (累计确认): ACK(n): ACKs all packets up to, including sequence number n

•    on receiving ACK(n): move window forward to begin at n+1

•    Timer for oldest in-flight packet

•    timeout(n): retransmit packet n and all higher sequence number packets in window

GBN Receiver

•    ACK-only: always send ACK for correctly-received packet so far, with highest in-order sequence number

•    may generate duplicate ACKs

•    need only remember rcv_base

•    on receipt of out-of-order packet:

•    can discard (don't buffer) or buffer: an implementation decision

•    re-ACK packet with highest in-order sequence number

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

Go-Back-N in Action

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

3.4.4 Selective Repeat (SR, 选择重传)

•    Receiver individually acknowledges all correctly received packets

•    buffers packets, as needed, for eventual in-order delivery to upper layer

•    Sender times-out/retransmits individually for unACKed packets

•    sender maintains timer for each unACKed packet

•    Sender window

•    N consecutive sequence numbers

•    limits sequence numbers of sent, unACKed packets

Sender, Receiver Windows

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

Sender, Receiver Events and Actions

Sender

Receiver

•    Data received from above.

•    if next available sequence number in window, send packet

Timeout.

•    resend packet n, restart timer

•    ACK receivedin [send_base, send_base+N].

•    mark packet n as received

•    if n smallest unACKed packet, advance window base to next unACKed sequence number

•    Packet with sequence number in [rcv_base, rcv_base+N-1] is correctly received.

•    send ACK(n)

•    out-of-order: buffer

•    in-order: deliver (also deliver buffered, in-order packets), advance window to next not-yet-received packet

•    Packet with sequence number in[rcv_base-N, rcvbase-1] is correctly received.

•    ACK(n)

•    Otherwise

•    ignore

Selective-repeat in Action

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

A dilemma!

Example:

•    sequence numbers 0, 1, 2, 3

•    a window size of three.

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

•    receiver can't see sender side

•    receiver behavior identical in both cases!

•    something's (very) wrong!

Q: What relationship is needed between sequence # size and window size to avoid problem in scenario (b)?

The window size must be less than or equal to half the size of the sequence number space for SR protocols.

3.5 Connection-oriented Transport: TCP

The TCP connection and segment, RTT estimation and timeout, flow control

RFCs: 793,1122, 2018, 5681, 7323

•    Reliable, in-order byte stream:

•    no "message boundaries"

•    Cumulative acknowledgements

•    Pipelining:

•    TCP congestion and flow control set window size

•    Flow controlled:

•    sender will not overwhelm receiver

3.5.1 The TCP Connection

•    Connection-oriented (面向连接的):

•    handshaking (exchange of control messages) initializes sender, receiver state before data exchange

•    Full-duplex service (全双工服务):

•    bi-directional data flow in same connection

•    Point-to-point (点对点):

•    one sender, one receiver

•    Three-way handshake (三次握手).    

•    Send buffer (发送缓存):

•    Maximum segment size (MSS, 最大报文段长度)

•    Maximum transmission unit (MTU, 最大传输单元)

•    TCP segments (TCP报文段).

 

3.5.2 TCP Segment Structure

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

Sequence Numbers and Acknowledgment Numbers

•    Sequence numbers:

•    byte stream "number" of first byte in segment's data

•    Acknowledgements:

•    seq # of next byte expected from other side

•    cumulative acknowledgement (累积确认)

•    Q: how receiver handles out-of-order segments

•    A: TCP spec doesn't say, - up to implementor

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

Telnet: A Case Study for Sequence and Acknowledgment Numbers

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

3.5.3 Round-Trip Time Estimation and Timeout

Q: how to set TCP timeout value?

•    Longer than RTT, but RTT varies!

•    Too short: premature timeout, unnecessary retransmissions

•    Too long: slow reaction to segment loss

Estimating the Round-Trip Time

Q: how to estimate RTT?

•    SampleRTT: measured time from segment transmission until ACK receipt

•    Ignore retransmissions

•    SampleRTT will vary, want estimated RTT "smoother"

•    Average several recent measurements, not just current SampleRTT

EstimatedRTT = (1 – α) · EstimatedRTT + α · SampleRTT

•    Exponential weighted moving average (EWMA, 指数加权移动平均)

•    Influence of past sample decreases exponentially fast

•    Typical value: α = 0.125

Setting and Managing the Retransmission Timeout Interval

•    Timeout interval: EstimatedRTT plus "safety margin"

•    Large variation in EstimatedRTT: want a larger safety margin

TimeoutInterval = EstimatedRTT + 4 · DevRTT

•    DevRTT: EWMA of SampleRTT deviation from EstimatedRTT:

DevRTT = (1 – β) · DevRTT + β · | SampleRTT – EstimatedRTT |

(typically, β = 0.25)

3.5.4 Reliable Data Transfer

Event: data received from application above

•    Create segment with sequence number

•    Sequence number is byte-stream number of first data byte in segment

•    Start timer if not already running

•    Think of timer as for oldest unACKed segment

•    Expiration interval: TimeOutInterval

Event: Timer timeout

•    Retransmit segment that caused timeout

•    Restart timer

Event: ACK receipt

•    If ACK acknowledges previously unACKed segments

•    Update what is known to be ACKed

•    Start timer if there are still unACKed segments

A Few Interesting Scenarios

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

 

Fast Retransmit

Fast retransmit (快速重传):

If sender receives 3 additional ACKs for same data ("triple duplicate ACKs"), resend unACKed segment with smallest sequence number

•    likely that unACKed segment lost, so don't wait for timeout

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

Receipt of three duplicate ACKs indicates 3 segments received after a missing segment – lost segment is likely. So retransmit!

3.5.5 Flow Control

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

Q: What happens if network layer delivers data faster than application layer removes data from socket buffers?

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

Flow control (流量控制): receiver controls sender, so sender won't overflow receiver's buffer by transmitting too much, too fast.

•    TCP receiver "advertises" free buffer space in rwnd field in TCP header

•    RcvBuffer size set via socket options (typical default is 4096 bytes)

•    Many operating systems autoadjust RcvBuffer

•    Sender limits amount of unACKed ("in-flight") data to received rwnd

•    Guarantees receive buffer will not overflow

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

3.5.6 TCP Connection Management

3.5.6.1 TCP Connection Establishment

Before exchanging data, sender/receiver "handshake":

•    agree to establish connection (each knowing the other willing to establish connection)

•    agree on connection parameters (e.g., starting seq #s)

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

Agreeing to Establish a Connection

Two-way handshake

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

Q: Will 2-way handshake always work in network?

•    Variable delays

•    Retransmitted messages (e.g., req_conn(x)) due to message loss

•    Message reordering

•    Can't "see" other side

Two-way handshake scenarios:

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

TCP three-way handshake(三次握手)

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

3.5.6.2 TCP Connection Teardown

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

•    Client, server each close their side of connection

•    send TCP segment with FIN bit = 1

•    Respond to received FIN with ACK

•    on receiving FIN, ACK can be combined with own FIN

•    Simultaneous FIN exchanges can be handled

3.6 Principles of Congestion Control

Causes and costs of congestion, approaches to congestion control

    Congestion:

•    informally: "too many sources attempting to send data at too high a rate"

•    manifestations:

•    long delays (queueing in router buffers)

•    packet loss (buffer overflow at routers)

•    different from flow control!

•    a top-10 problem!

    Congestion control: too many senders, sending too fast

    Flow control: one sender too fast for one receiver

3.6.1 The Causes and the Costs of Congestion

Scenario 1: Two Senders, a Router with Infinite Buffers

    Simplest scenario:

•    one router, infinite buffers

•    input, output link capacity: R

•    two flows

•    no retransmissions needed

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

Q: What happens as arrival rate λin approaches R/2?

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

One cost of a congested network—large queuing delays are experienced as the packet-arrival rate nears the link capacity.

Scenario 2: Two Senders and a Router with Finite Buffers

•    one router, finite buffers

•    sender retransmits lost, timed-out packet

•    application-layer input = application-layer output: λin = λout

•    transport-layer input includes retransmissions: λ' in ≥ λin

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

    First, the unrealistic case

• Host A sends a packet only when a buffer is free.

    The slightly more realistic case

• the sender retransmits only when a packet is known for certain to be lost.

    Another cost of a congested network—the sender must perform retransmissions in order to compensate for dropped (lost) packets due to buffer overflow.

    Finally, the case

• the sender may time out prematurely and retransmit a packet that has been delayed in the queue but not yet lost.

Yet another cost of a congested network—unneeded retransmissions by the sender in the face of large delays may cause a router to use its link bandwidth to forward unneeded copies of a packet.

Scenario 3: Four Senders, Routers with Finite Buffers, and Multihop Paths

• four senders

• multi-hop paths

• timeout/retransmit

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

    If λ' in is extremely large for all connections, the A–C end-to-end throughput goes to zero in the limit of heavy traffic.

Yet another cost of dropping a packet due to congestion—when a packet is dropped along a path, the transmission capacity that was used at each of the upstream links to forward that packet to the point at which it is dropped ends up having been wasted.

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

3.6.2 Approaches to Congestion Control

•    End-end congestion control (端到端拥塞控制):

•    no explicit feedback from network

•    congestion inferred from observed loss, delay

•    approach taken by TCP

•    Network-assisted congestion control (网络辅助的拥塞控制):

•    routers provide direct feedback to sending/receiving hosts with flows passing through congested router

•    may indicate congestion level or explicitly set sending rate

•    TCP ECN, ATM, DEC DECnet protocols

 

3.7 TCP Congestion Control

Classic TCP; Explicit Congestion Notification, delay-based TCP, fairness

3.7.1 Classic TCP Congestion Control

Approach: have each sender limit the rate as a function of perceived congestion

•    little congestion: increases;

•    congestion: reduces.

Three questions.

•    First, how does a TCP sender limit the rate at which it sends traffic into its connection?

The congestion window (拥塞窗口), denoted cwnd, imposes a constraint on the rate at which a TCP sender can send traffic into the network.

LastByteSent – LastByteAcked ≤ min{cwnd, rwnd}

TCP sending behavior:

•    Roughly, every RTT, the sender

•    at the beginning, sends cwnd bytes

•    at the end, receives acknowledgments.

rate ≈ cwnd/RTT bytes/sec

By adjusting the value of cwnd, the sender can therefore adjust the rate at which it sends data into its connection.

•    Second, how does a TCP sender perceive that there is congestion on the path between itself and the destination?

Because TCP uses acknowledgments to trigger (or clock) its increase in congestion window size, TCP is said to be self-clocking. How do the TCP senders determine their sending rates?

•    A lost segment implies congestion, and hence, the TCP sender's rate should be decreased when a segment is lost.

•    An acknowledged segment indicates that the network is delivering the sender's segments to the receiver, and hence, the sender's rate can be increased when an ACK arrives for a previously unacknowledged segment.

•    Bandwidth probing.

•    And third, what algorithm should the sender use to change its send rate as a function of perceived end-to-end congestion?

TCP congestion-control algorithm (TCP拥塞控制算法):

three major components:
(1) slow start (mandatory),
(2) congestion avoidance (mandatory), and
(3) fast recovery (recommended, but not required).

Slow Start

Event

Actions

New state

Λ

cwnd = 1 MSS

Slow start

new ACK

cwnd = cwnd + 1 MSS

Slow start

 

    Thus, the TCP send rate starts slow but grows exponentially during the slow start phase.

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

    But when should this exponential growth end?

Event

Actions

"New" state

timeout

ssthresh = cwnd / 2

cwnd = 1 MSS

Slow start

new ACK

cwnd = cwnd + 1 MSS

Slow start

cwnd ≥ ssthresh

Λ

Congestion avoidance

dupACKcount == 3

Fast retransmit

Fast recovery

 

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

Congestion Avoidance

On entry, the cwnd is approximately half its value when congestion was last encountered! Thus,

Event

Actions

"New" state

new ACK

cwnd = cwnd + 1 MSS

cwnd = cwnd + MSS •(MSS/cwnd)

Congestion avoidance

    But when should congestion avoidance's linear increase (of 1 MSS per RTT) end?

Event

Actions

"New" state

timeout

ssthresh = cwnd / 2

cwnd = 1 MSS

Slow Start

dupACKcount == 3

ssthresh = cwnd / 2

cwnd = ssthresh + 3 • MSS

Fast recovery

Fast Recovery

Event

Actions

New state

duplicate ACK

cwnd = cwnd + MSS

Fast recovery

        Eventually,

Event

Actions

New state

new ACK

cwnd = ssthresh

Congestion avoidance

timeout

ssthresh = cwnd / 2

cwnd = 1

Slow start

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

    Congestion window is

•    cut in half on loss detected by triple duplicate ACK (TCP Reno)

•    cut to 1 MSS (maximum segment size) when loss detected by timeout (TCP Tahoe)

TCP Congestion Control: Retrospective

An additive-increase, multiplicative-decrease (加性增、乘性减, AIMD) form of congestion control.

Ignoring the initial slow-start period and assuming that losses are indicated by triple duplicate ACKs rather than timeouts, TCP's congestion control consists of linear (additive) increase in cwnd of 1 MSS per RTT and then a halving (multiplicative decrease) of cwnd on a triple duplicate-ACK event.

AIMD "saw tooth" behavior: "probing" for bandwidth.

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

    Why AIMD?

•    TCP's congestion-control algorithm serves as a distributed asynchronous-optimization algorithm that results in several important aspects of user and network performance being simultaneously optimized.

 

TCP Cubic

Insight:

If the state of the congested link where packet loss occurred hasn't changed much, then perhaps it's better to more quickly ramp up the sending rate to get close to the pre-loss sending rate and only

then probe cautiously for bandwidth.

TCP CUBIC only changes the congestion avoidance phase, as follows:

•    Wmax: size of congestion control window when loss was last detected. K: the future point in time when window size will again reach Wmax, determined by tunable parameters.

•    CUBIC increases the congestion window as a function of cube of the distance

between the current time, t, and K. Thus, when t is further away from K, the

congestion window size increases are much larger than when t is close to K.

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

Default in Linux; most popular TCP for popular Web servers.

3.7.2 Network-Assisted Explicit Congestion Notification and Delay-based Congestion Control

Explicit Congestion Notification

Explicit Congestion Notification (明确拥塞通告) [RFC 3168] is the form of network-assisted congestion control performed within the Internet. Both TCP and IP are involved. At the network layer, two bits in the Type of Service field of the IP datagram header are used for ECN.

Congestion indication is carried in the marked IP datagram to the destination host, which then informs the sending host.

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

Delay-based Congestion Control

RTTmin: the minimum of measurements at a sender (uncongested).

    The uncongested throughput rate = cwnd/RTTmin.

•    if the actual sender-measured throughput is close to this value,

•    increase sending rate.

•    if the actual sender-measured throughput is significantly less than the uncongested throughput rate,

•    decrease sending rate.

    "Keep the pipe just full, but no fuller".

•    "Keeping the pipe full" means that links (in particular the bottleneck link that is limiting a connection's throughput) are kept busy transmitting, doing useful work;

•    "but no fuller" means that there is nothing to gain (except increased delay!) if large queues are allowed to build up while the pipe is kept full.

    Google used BBR on its private network.

3.7.3 Fairness

Consider K TCP connections, all passing through a bottleneck link with transmission rate R bps. (By bottleneck link, 瓶颈链路, we mean that for each connection, all the other links along the connection's path are not congested and have abundant transmission capacity as compared with the transmission capacity of the bottleneck link.)

A congestion control mechanism is said to be fair if the average transmission rate of each connection is approximately R/K.

Is TCP's AIMD algorithm fair?

The simple case: two TCP connections:

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

•    increase throughputs along a 45-degree line

•    decrease windows by a factor of two

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

In our idealized scenario,

•    same RTT

•    only a single TCP connection per host-destination pair

Fairness and UDP

•    multimedia applications often do not run over TCP

•    do not want transmission rate throttled

•    instead, prefer UDP:

•    pump audio/video at constant rate, occasionally lose packet

•    not being fair—do not cooperate nor adjust

Fairness and Parallel TCP Connections

•    A TCP-based application can use multiple parallel connections.

•    Web browsers often use multiple parallel TCP connections. An example: consider a link of rate R supporting nine ongoing connections. New application comes along and also uses

•    one TCP connection, gets transmission rate of R/10.

•    11 parallel TCP connections, gets an unfair allocation of more than R/2.

3.8 Evolution of Transport Layer Functionality

TCP Evolution. HTTP/3, QUIC: functionality in the application layer.

UDP and TCP—the two "work horses" of the Internet transport layer.

QUIC: Quick UDP Internet Connections

Moving transport–layer functions to application layer, on top of UDP.

•    QUIC is a new application-layer protocol designed from the ground up to improve the performance of transport-layer services for secure HTTP.

•    Using UDP as its underlying transport-layer protocol.

•    Deployed on many Google servers, in its apps

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

QUIC uses many of the approaches for reliable data transfer, congestion control, and connection management.

Some of QUIC's major features include:

•    Connection-Oriented and Secure. QUIC combines the handshakes needed to establish connection state with those needed for authentication and encryption, thus providing faster establishment than the protocol stack.

•    Streams. several different application-level "streams" multiplex through a single QUIC connection

•    Reliable, TCP-friendly congestion-controlled data transfer. No HOL blocking. "Readers familiar with TCP's loss detection and congestion control will find algorithms here that parallel well-known TCP ones." (from QUIC specification).

Computer Networking: Notes of "Select" Lectures (Chapter 3: Transport Layer)

 

上一篇:(四十三:2021.04.14)《Methodology Camp》——Chapter 1:Task


下一篇:IOS Android支持中文与本地文件的读取写入