TCP Congestion Control Mechanisms - White Paper

Without TCP, application developers would have to build reliability .... Figure 4 illustrates the operation of the TCP slow-start mechanism. .... the key algorithms discussed above: slow-start, congestion avoidance, and fast retransmit. ... TCP Vegas, discussed in a number of research papers in 1994, enhances the congestion.
177KB taille 41 téléchargements 340 vues
White Paper

Supporting Differentiated Service Classes: TCP Congestion Control Mechanisms

Chuck Semeria Marketing Engineer

Juniper Networks, Inc. 1194 North Mathilda Avenue Sunnyvale, CA 94089 USA 408 745 2000 or 888 JUNIPER www.juniper.net Part Number: 200022-001 02/02

Contents Executive Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 TCP Segments and Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Segments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 TCP Acknowledgement Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 TCP Congestion Control Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Slow-Start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Congestion Avoidance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Fast Retransmission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Fast Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Example: Throughput for a Typical TCP Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Recent Enhancements to TCP Congestion Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Appendix: References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 RFCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Textbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Technical Papers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

List of Figures Figure 1: Figure 2: Figure 3: Figure 4: Figure 5: Figure 6: Figure 7: Figure 8:

TCP Segments and Sequence Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 ACKing a Single Segment or Multiple Segments . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Segment Misordering and Duplicate ACKs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 TCP Slow-Start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Slow-Start with Congestion Avoidance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Fast Recovery after Fast Retransmission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Throughput for a Sample TCP Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Real-World TCP Throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Copyright © 2002, Juniper Networks, Inc.

Executive Summary This paper is part of a series of papers published by Juniper Networks that describe the mechanisms that allow you to support differentiated service classes in large Internet Protocol (IP) networks. This paper provides an overview of the classic Transmission Control Protocol (TCP) congestion control mechanisms, including slow-start, congestion avoidance, fast retransmit, and fast recovery. It also briefly discusses several of other minor enhancements that are designed to fine-tune TCP performance when responding to congestion indications. Host TCP congestion control mechanisms work with router active queue memory management techniques including random early detection (RED), weighted RED (WRED), and explicit congestion notification (ECN) to allow you to control the average queuing delay while supporting transient fluctuations in queue size. The other papers in this series provide technical discussions of queue scheduling disciplines, active queue memory management, and other issues related to the deployment of differentiated service classes in your network.

Perspective TCP is the predominant transport protocol used in public and private IP networks. Depending on the statistics that you reference, TCP-based traffic accounts for 80 to 95 percent of all traffic on large IP networks. Applications frequently send large amounts of data across a network from one host to another. IP is a connectionless protocol that does not guarantee that the data it carries will not be damaged, lost, duplicated, or misordered. Consequently, applications that require a reliable data transfer service use TCP to establish virtual connections across an unpredictable and unreliable network. Without TCP, application developers would have to build reliability (including packet loss detection and recovery) into each application. The fundamental characteristics of TCP include the following: ■ TCP is a connection-oriented service. Before data can be transmitted between two hosts,

one system initiates a connection by “calling” the other system. If the system receiving the connection request accepts the call, messages are exchanged between the two hosts to verify that the session is authorized and to provide parameters that control the data exchange. ■ TCP provides a reliable delivery service. While the data stream is transmitted, the hosts at

each end of the connection exchange acknowledgements (ACKs) verifying that data has been received without error. The source TCP maintains a record of the packets that it sends and waits for an ACK before sending the next set of packets. The source TCP also maintains a timer indicating when it sends a packet and retransmits the packet if the timer expires before the ACK is received from the remote host. ■ The source TCP always attempts to fill the “pipe” between the sending and receiving hosts

while adapting its transmission rate to avoid potential congestion in the network. TCP continually monitors and modifies its transmission rate so that the rate at which it injects packets into the network is just below the point at which packet loss occurs.

Copyright © 2002, Juniper Networks, Inc.

3

Supporting Differentiated Service Classes: TCP Congestion Control Mechanisms

■ All TCP connections are full duplex. This means that a TCP connection supports the

simultaneous transfer of data in both directions. While it is beyond the scope of this paper to provide a complete description of TCP, this paper does provide a brief summary of the relevant TCP congestion control mechanisms that execute on host systems. It is important to understand how TCP responds to congestion if you are to fully understand the mechanisms that routers use to support the delivery of differentiated service classes. Keep in mind that the descriptions in this paper communicate only the essence of these fundamental concepts and take some liberties with what you would actually see if you examined a packet trace in a production network. This paper primarily explains the basics without attempting to explain the behavior of every deviant in every corner case.

TCP Segments and Acknowledgements This section reviews the following aspects of TCP that are fundamental to an understanding its adaptive timeout and retransmission strategy: ■ Segments ■ Acknowledgements

Segments The basic unit of transfer between two hosts in a TCP connection is called a segment. A segment consists of a TCP header and its associated data. Since each TCP segment is transmitted in an IP datagram and because IP datagrams can be reordered as they cross the network, TCP segments can arrive at the destination TCP in a different order than originally transmitted by the source TCP. Of course they can also be corrupted, dropped, or duplicated along the way. For the stream of bytes that the source TCP transmits to the destination TCP, the source TCP assigns a sequence number to each byte in the stream. To allow the destination TCP to keep track of what it has received and reorder misordered segments, each TCP header carries a 32-bit sequence number that is used to identify the data carried in each segment. The sequence number in each TCP header is set to the specific number that the source TCP assigns to the first byte of data in the given segment (see Figure 1). Figure 1: TCP Segments and Sequence Numbers Data stream of 8000 bytes 8000

8000

7001 7001 Data

TCP header

7001

4000

4000

3001 3001 Data

Copyright © 2002, Juniper Networks, Inc.

TCP header

3001 3000

3000

2001 2000

2001 2001 Data

TCP header

2000

1001 1000

1

1001 1001

1000

Data

TCP header

1 Data

1 TCP header

4

Supporting Differentiated Service Classes: TCP Congestion Control Mechanisms

In Figure 1, the source TCP needs to transmit a stream of 8000 bytes to the destination TCP. The source TCP divides the stream into eight 1000-byte groups and prepends a TCP header to each group to create eight TCP segments. Note that the sequence number carried in each TCP header represents the number that the source TCP assigns to the first byte of data carried in each segment.

TCP Acknowledgement Process TCP uses ACKs to support the reliable transmission of data. When the source TCP transmits segments, it expects the destination TCP to ACK the segments when they are received. Figure 2 illustrates how the destination TCP can respond to the receipt of segments by either sending an ACK for each individual segment or by acknowledging multiple segments using a single ACK. Note that the ACK number used by the destination TCP is the number of the next byte in the stream that the destination TCP expects to receive from the source TCP. If the destination TCP ACKs 2001, it informs the source TCP that it has successfully received all bytes up to and including byte 2000. Figure 2: ACKing a Single Segment or Multiple Segments Source TCP

1000

Destination TCP

1

1

Source TCP

Destination TCP

1000

1

2000

1001

1

ACK = 1001 2000

1001

1001

1001 ACK = 2001

ACK = 2001

Because IP packets can be reordered as they cross your network, TCP segments can also be reordered as they cross your network. When the destination TCP receives a misordered segment, it responds by immediately transmitting a duplicate ACK to the source TCP (see Figure 3).

Copyright © 2002, Juniper Networks, Inc.

5

Supporting Differentiated Service Classes: TCP Congestion Control Mechanisms

Figure 3: Segment Misordering and Duplicate ACKs Source TCP

Destination TCP

1000

1

1

Receive segment 1

2000

1001

1001

Receive segment 2 ACK = 2001

4000

3001

3001

Receive segment 4 ACK = 2001

3000

2001

ACK segment 2 (a single ACK for all data through byte 2000 )

2001

Expecting byte 2001, segments are misordered, respond with a duplicate ACK Receive segment 3

ACK = 4001

Segment 3 received, ACK segment 4 (a single ACK for all data through byte 4000 )

5000

4001

1001

Receive segment 5

6000

5001

1001

Receive segment 6 ACK = 6001

ACK segment 6 (a single ACK for all data through byte 6000 )

TCP Congestion Control Mechanisms TCP congestion control prevents a source from exceeding network capacity by allowing it to adapt its transmission rate to avoid congestion in routers, on links, or at the destination host. The basic congestion control mechanisms supported by TCP include: ■ Slow-start ■ Congestion avoidance ■ Fast retransmission ■ Fast recovery

Slow-Start When a TCP connection is first established, the source TCP does not transmit a full receiver’s advertised window of segments. Instead, the source TCP avoids exceeding the capacity of the network by transmitting only a few packets at the beginning, waiting for the ACKs to those

Copyright © 2002, Juniper Networks, Inc.

6

Supporting Differentiated Service Classes: TCP Congestion Control Mechanisms

packets, and then gradually increasing its transmission rate. This allows the source TCP to probe the network to determine the amount of bandwidth that is currently available for the connection. This slow-start mechanism is used: ■ At the beginning of each new TCP connection ■ When an existing TCP connection is restarted after a long idle period ■ When an existing TCP connection is restarted after the retransmission timer expires

As a result, slow-start keeps TCP from flooding the network with packets when a new TCP session is established or immediately after a period of congestion ends. Figure 4 illustrates the operation of the TCP slow-start mechanism. With slow-start the sender must maintain a congestion window (cwnd) which represents its estimate of the amount of traffic that the network can absorb without becoming congested (its transmission window size). When a TCP session is first established, cwnd is initialized to the size of a single segment advertised by the destination host at the other end of the connection. The source TCP can transmit the minimum of its cwnd (representing flow control imposed by the sender) and the destination’s advertised window (representing flow control imposed by the receiver). Figure 4: TCP Slow-Start Source

Destination

1 Segment CWDN = 1

ACK 2 Segments

CWND

CWDN = 2

4 Segments 1 CWDN = 4

Time

The source TCP initiates slow-start by transmitting one segment and waiting for its ACK. When the ACK is received, the source increases cwnd from one to two, and two segments are sent. When these two segments are acknowledged, the source increases cwnd from two to four, and four segments are sent. The exponential growth of cwnd continues until either its value exceeds the destination’s advertised window or packets are dropped due to congestion. The following section, "Congestion Avoidance," describes how the source TCP responds to packet loss.

Copyright © 2002, Juniper Networks, Inc.

7

Supporting Differentiated Service Classes: TCP Congestion Control Mechanisms

The source TCP can determine that a packet has been dropped by the network in one of two ways: ■ Duplicate ACKs ■ Expiration of the retransmission timer

The absence of a single segment in the middle of a transmission window of segments causes the destination TCP to immediately generate a duplicate ACK. Recall that TCP does not send negative acknowledgements (NACKs) or ACKs using packet numbers. Rather, the destination TCP cumulatively ACKs data that has been received in sequence by responding with a sequence number in the data stream. For example, when a destination TCP receives all of the data in the stream up to byte 2000, it responds with an ACK of 2001 indicating that the next segment the destination expects to receive begins with byte 2001. If a segment is dropped by an intermediate router, the destination TCP continues to buffer subsequent packets as they arrive but, because it has not received the next segment that it expected to receive, it continues to ACK 2001. Because the receipt of duplicate ACKs can also mean that the segment is misordered, the source TCP uses the receipt of three duplicate ACKs as an indication that a packet is lost and not misordered. The loss of the last packet in a transmission window of segments causes the retransmission timer of the source TCP to expire because there are no subsequent segments to generate a duplicate ACK. The source TCP expects the receiver to transmit ACKs as it successfully receives new bytes in the data stream. Each time the sender transmits a segment, it starts a timer and waits for an ACK. This timer supports adaptive retransmission because the timeout value changes as the sample round trip times (RTTs) of the connection constantly change with the load placed on the network. If the retransmission timer expires before the data in the segment is acknowledged, the source TCP assumes that the segment was either lost or corrupted and retransmits the segment.

Congestion Avoidance When the source TCP discovers that a packet has been dropped by the network, it sets the variable ssthresh (slow-start threshold) equal to one-half of the current value of cwnd. The source reduces its transmission rate by returning to slow-start mode, but this time it exponentially increases its transmission rate until cwnd is equal to the value of ssthresh. At this point, the sender increases cwnd linearly (by at most one segment per RTT), allowing it to slowly increase its transmission rate as it begins to approach the previous cwnd value that caused packets to be dropped. When the value of cwnd is less than or equal to ssthresh, the source TCP is in slow-start mode; when cwnd is greater than ssthresh, the source TCP is in congestion avoidance mode (see Figure 5).

Copyright © 2002, Juniper Networks, Inc.

8

Supporting Differentiated Service Classes: TCP Congestion Control Mechanisms

Figure 5: Slow-Start with Congestion Avoidance CWND

Point of network congestion y Slow-start

y/2

ssthresh Congestion avoidance

1 Time

Slow-start with congestion avoidance causes TCP to reduce the value of cwnd by half each time it experiences a packet loss. Consequently, if congestion leading to packet loss continues for a period of time, the volume of traffic injected into the network and the rate of retransmission by the source TCP decrease exponentially. This causes the source TCP to back off and allows routers to empty their congested queues.

Fast Retransmission As discussed earlier, TCP assumes that a packet has been dropped when it receives duplicate ACKs. The challenge is that the receipt of duplicate ACKs can also mean that the packet may simply be out of order. Rather than immediately responding to a duplicate ACK by retransmitting the lost segment, the source TCP waits until it receives three duplicate ACKs. Fast retransmission enhances TCP performance in the following ways: ■ Eliminates unnecessary packet retransmission and wasted network capacity if the packet is

simply out of order and not dropped ■ Allows higher channel utilization and connection throughput ■ Allows TCP to not wait for the retransmission timer to expire before resending a potentially

lost segment

Fast Recovery When the source TCP receives duplicate ACKs, data is still flowing to the destination because the destination TCP can generate duplicate ACKs only if subsequent segments are received. In this case, the source TCP does not suddenly reduce the flow of data by returning to slow-start. Instead, after responding to the receipt of three duplicate ACKs by retransmitting the lost segment, the source TCP sets cwnd to half its current value and performs congestion avoidance. This provides better overall throughput for the TCP session (see Figure 6).

Copyright © 2002, Juniper Networks, Inc.

9

Supporting Differentiated Service Classes: TCP Congestion Control Mechanisms

Figure 6: Fast Recovery after Fast Retransmission Sender receives a duplicate ACK

CWND y

y/2

1 Time

Fast recovery prevents the TCP session pipe from being completely empty after the fast retransmission of a single lost segment. This enhances TCP session performance by eliminating the need to return to slow-start and then slowly fill the TCP session pipe after a single packet loss from a window of data. However, while fast recovery improves TCP performance when a single packet is dropped from a window of data, it does not improve performance when multiple packets are dropped from a window of data.

Example: Throughput for a Typical TCP Session Figure 7 illustrates the throughput for a sample TCP session over time. Figure 7: Throughput for a Sample TCP Flow CWND

Point of network congestion Slow-start B

C

y/2 A 1

Fast recovery and congestion avoidance

Fast recovery and congestion avoidance

Rretransmission timer time-out interval

y

Slow-start and congestion avoidance

D

E

Fast recovery and congestion avoidance

F

Fast recovery and congestion avoidance

G

Fast recovery and congestion avoidance

When a TCP connection is first established, the source TCP attempts to avoid immediately overloading the network by assuming that the network has very little capacity. As shown in throughput curve A, TCP begins with slow-start but rapidly increases its transmission rate to quickly determine the current capacity of the network.

Copyright © 2002, Juniper Networks, Inc.

10

Supporting Differentiated Service Classes: TCP Congestion Control Mechanisms

Slow-start eventually reaches a transmission rate where the source TCP receives duplicate ACKs indicating either the loss or misordering of a segment in the middle of a transmission window of segments. As shown in throughput curve B, the source TCP sets ssthresh equal to one-half of the current value of cwnd and then performs fast restart with congestion avoidance as it approaches the previous value of cwnd that resulted in packet loss. Fast restart with congestion avoidance will eventually reach a transmission rate where the source TCP receives duplicate ACKs indicating either the loss or misordering of a segment in the middle of a transmission window of segments. As shown in throughput curve C, the source TCP sets ssthresh equal to one-half of the current value of cwnd and then performs fast restart with congestion avoidance. Now, assume that the last segment in a transmission window of segments is dropped by an intermediate router due to a buffer overload. This causes the source TCP to wait for the retransmission timer to expire before it can retransmit the lost segment. As shown in throughput curve D, the source TCP sets ssthresh equal to one-half of the current value of cwnd, returns to slow-start, and performs congestion avoidance. Eventually the source TCP reaches a transmission rate where it receives a duplicate ACK indicating either the loss or misordering of a segment in the middle of a stream of segments. Throughput curves E, F, and G show TCP in its familiar “sawtooth” operational mode. These throughput curves show how the source TCP periodically receives duplicate ACKs indicating either the loss or misordering of a segment in the middle of a transmission window of segments. TCP responds by executing fast recovery with congestion avoidance as it continues to probe the network. Figure 7 presents an idealized version of the throughput for a TCP session because the value of cwnd that results in packet loss remains constant. In a production network, cwnd is constantly changing, resulting in a real-world throughput curve that looks more like Figure 8. Figure 8: Real-World TCP Throughput Point of network congestion CWND

Time

Recent Enhancements to TCP Congestion Control The information about how TCP maintains fairness among TCP flows that share a common bottleneck link, maximizes the packet throughput for each session, and avoids congesting the network has increased dramatically since the mid to late 1980s. Modern TCP implementations support a number of different algorithms designed to control network congestion and maintain suitable packet throughput, as follows:

Copyright © 2002, Juniper Networks, Inc.

11

Supporting Differentiated Service Classes: TCP Congestion Control Mechanisms

■ TCP Tahoe, first implemented in 4.3 BSD Tahoe TCP in 1988, initiated support for three of

the key algorithms discussed above: slow-start, congestion avoidance, and fast retransmit. These algorithms were originally proposed by Van Jacobson in 1988. ■ TCP Reno, first implemented in 4.3 BSD Reno TCP in 1990, supports all of the Van Jacobson

enhancements introduced in TCP Tahoe and extends the fast retransmit algorithm to support fast recovery. By supporting fast recovery, TCP Reno overcomes the throughput performance limitations of TCP Tahoe that occur when a single packet is lost from a window of data. ■ TCP Vegas, discussed in a number of research papers in 1994, enhances the congestion

avoidance algorithm of TCP Tahoe and TCP Reno by dynamically increasing and decreasing the transmission window size according to the observed RTT of the packets that it has previously sent. If the observed RTT becomes large, the network is experiencing congestion, causing TCP Vegas to reduce its window size. Likewise, if the observed RTT becomes small, the network is not experiencing congestion causing TCP Vegas to increase its window size. Another modification introduced by TCP Vegas is that during slow-start the rate of cwnd increase is half that of TCP Tahoe and TCP Reno—cwnd is doubled with the receipt of every other ACK instead of every ACK. ■ TCP selective acknowledgement (SACK), specified in Request for Comments 2018 (October

1996), enhances the throughput performance of TCP Reno when multiple packets are dropped from a single window of data. When a TCP receiver observes that arriving packets are not continuous (the packets are out of order), it responds to the TCP sender with ACKs that contain the SACK option. This option contains information that allows the TCP sender to specifically identify which packets have been received by the destination TCP. This information allows the TCP sender to accurately determine which segments are missing and retransmit only the missing packets. The TCP SACK option is currently being implemented in many popular operating systems and will soon be widely deployed. ■ TCP NewReno, specified in RFC 2582 (April 1999), enhances TCP throughput performance

when multiple packets are dropped from a single window of data for TCP Reno connections that do not support the TCP SACK option. When multiple packets are dropped from a single window of data, the ACK for the retransmitted packet acknowledges some but not all of the packets transmitted before the fast retransmit. This is referred to as a partial ACK. During fast recovery when a TCP sender receives a partial ACK, the TCP sender concludes that the indicated packet was lost and retransmits that packet. TCP NewReno overcomes the throughput performance penalty when multiple segments are dropped from a single window of data. ■ The duplicate-SACK (D-SACK) extension, specified in RFC 2883 (January 2000), allows a

TCP receiver to use a SACK to report the receipt of duplicate segments. This extension allows the TCP sender to identify the segment received by the TCP receiver, including duplicate segments. If the TCP sender determines that the destination TCP received two copies of a segment and that the retransmission of the duplicate segment was unnecessary, the TCP sender can undo the halving of cwnd. The D-SACK extension overcomes the throughput performance penalty that results from halving the congestion window. ■ The Limited Transmit extension, specified in RFC 3042 (January 2001), enhances TCP

throughput performance by avoiding unnecessary retransmit timeouts. The source TCP, instead of transmitting the packet suspected of being dropped, transmits a new segment after receiving one or two duplicate ACKs. The Limited Transmit mechanism enhances TCP throughput performance by allowing a TCP session with a small window to recover

Copyright © 2002, Juniper Networks, Inc.

12

Supporting Differentiated Service Classes: TCP Congestion Control Mechanisms

from less than a full window of packet loss without a retransmit timeout. Recall that explicit congestion notification (ECN) is an active queue management mechanism that also helps to avoid unnecessary retransmit timeouts. The list of proposed enhancements continues to grow as research and standards communities expand their knowledge about TCP performance and develop new extensions as new problems arise. However, the following challenges are associated with these enhancements: ■ The enhancements require a considerable amount of time to achieve wide-spread

deployment. This means that you can be assured that your network will be required to carry flows where TCP senders and receivers execute divergent TCPs. These TCPs are implemented by different vendors, execute on different operating systems, support different sets of congestion control algorithms, vary in their conformance to Internet Engineering Task Force (IETF) standards, provide different levels of performance when running on hosts than when running on servers, and can interact in unpredictable ways when executing congestion control algorithms. ■ Most of these enhancements are designed to streamline long-term TCP sessions by

avoiding unnecessary retransmit timeouts and improving performance when experiencing reordered, delayed, or corrupted packets. They are not specifically designed to enhance the performance of most of the traffic found on large IP networks—short-term Web-based traffic flows. Since host response to active queue memory management techniques (RED, WRED, and ECN) determines how well routers can manage congestion in the core of your network, you need to be aware of the many issues that determine if the flows traversing your network are TCP compatible, non-TCP compatible, or simply nonresponsive.

Summary This paper covered several TCP congestion control mechanisms that allow host systems to respond to implicit or explicit congestion indications. The ability of host systems to control their transmission rate allows you to manage the average queuing delay in core routers and support transient fluctuations in queue size. These standard TCP congestion control mechanisms include slow-start, congestion avoidance, fast retransmit, and fast recovery. This paper also described several recent enhancements to TCP, including TCP Reno, TCP Vegas, TCP SACK, TCP NewReno, D-SACK, and Limited Transmit.

Copyright © 2002, Juniper Networks, Inc.

13

Supporting Differentiated Service Classes: TCP Congestion Control Mechanisms

Appendix: References RFCs RFC 793, Postel, J. “Transmission Control Protocol - DARPA Internet Program Protocol Specification.” DARPA, September 1981. RFC 813, Clark, D. “Window and Acknowledgment Strategy in TCP.” July 1982. RFC 1072, Jacobson, V. and R. Braden. “TCP Extensions for Long-Delay Paths.” October 1988. RFC 1191, Mogul, J. and S. Deering. “Path MTU Discovery.” November 1990. RFC 1323, Jacobson, V., Braden, R., and D. Borman. “TCP Extensions for High Performance.” May 1992. RFC 2018, Mathis, M., Mahdavi, J., Floyd, S. and A. Romanow. “TCP Selective Acknowledgement Options.” October 1996. RFC 2414, Allman, M., Floyd, S. and C. Partridge. “Increasing TCP's Initial Window Size.” September 1998. RFC 2581, Allman, M., Paxson, V. and W. Stevens. “TCP Congestion Control.” April 1999. RFC 2582, Floyd, S. and T. Henderson. “The NewReno Modification to TCP’s Fast Recovery Algorithm.” April 1999. RFC 2883, Floyd, S., Mahdavi, J., Mathis, M., and M. Podolsky. “An Extension to the Selective Acknowledgement (SACK) Option for TCP.” July 2000. RFC 3042, Allman, M., Balakrishnan, H., and S. Floyd. “Enhancing TCP's Loss Recovery Using Limited Transmit.” January 2001.

Textbooks Comer, Douglas. Internetworking with TCP/IP Vol. II: ANSI C Version: Design, Implementation, and Internals. Prentice Hall, June 1998. (ISBN 0139738436) Comer, Douglas and Stevens, David L. Internetworking with TCP/IP Vol. I: Principles, Protocols, and Architecture. Prentice Hall, February 2000. (ISBN 0130183806) Huston, Geoff. Internet Performance Survival Guide: QoS Strategies for Multiservice Network. John Wiley & Sons, February 2000. (ISBN 0471378089) Partridge, Craig. Gigabit Networking. Addison-Wesley Pub. Co., January 1994. (ISBN 0201563339) Stevens, W. Richard. TCP/IP Illustrated, Volume 1: The Protocols. The Addison-Wesley Professional Computing Series; Addison-Wesley Pub. Co., January 1994. (ISBN 0201633469) Stevens, W. Richard and Wright, Gary R. (Contributor). TCP/IP Illustrated, Volume 2: The Implementation. The Addison-Wesley Professional Computing Series, Addison-Wesley Pub. Co., January 1995. (ISBN 020163354X)

Technical Papers Allman, M. and A. Faulk. “On the Effective Evaluation of TCP.” ACM Computer Communication Review. October 1999.

Copyright © 2002, Juniper Networks, Inc.

14

Supporting Differentiated Service Classes: TCP Congestion Control Mechanisms

Fall, K. and S. Floyd. “Simulation-Based Comparisons of Tahoe, Reno and SACK TCP”, Computer Communication Review. July 1996. Floyd, S. “A Report on Some Recent Developments in TCP Congestion Control.” June 2000. Hoe, J. “Improving the Start-Up Behavior of a Congestion Control Scheme for TCP.” ACM SIGCOMM, August 1996. Jacobson, V. “Congestion Avoidance and Control”, Computer Communication Review, vol. 18, no. 4, pp. 314-329, August 1988.

Copyright © 2002, Juniper Networks, Inc. All rights reserved. Juniper Networks is registered in the U.S. Patent and Trademark Office and in other countries as a trademark of Juniper Networks, Inc. G10, Internet Processor, Internet Processor II, JUNOS, JUNOScript, M5, M10, M20, M40, M40e and M160 are trademarks of Juniper Networks, Inc. All other trademarks, service marks, registered trademarks, or registered service marks are the property of their respective owners. All specifications are subject to change without notice. Juniper Networks assumes no responsibility for any inaccuracies in this document. Juniper Networks reserves the right to change, modify, transfer, or otherwise revise this publication without notice.

Copyright © 2002, Juniper Networks, Inc.

15