You are viewing fragglet

fragglet [userpic]

It's the bandwidth, stupid.

June 4th, 2007 (10:05 pm)

I saw this blog entry linked on Digg (it currently has over 2000 diggs), and felt that I should respond to it.

The author claims that poor latency is causing problems with TCP congestion control algorithms. Basically, this entire article is based on a flawed understanding of how TCP works.

TCP has built-in congestion control algorithms that attempt to determine the amount of available bandwidth between two hosts on a network, and determine the rate at which to transmit information. If you transmit data faster than the link can handle, you end up with lost packets, where as if you transmit data too slow, you aren't using the full capacity of your network, so it's important to try to find the optimum point. These algorithms aren't based on latency: they can be affected by latency in some ways, but the overall effect in determining the available bandwidth is in general not affected by latency.

The author uses the analogy of passing sand scoops over a wall to explain his point. Unfortunately, it's a false analogy. A better analogy would be trucks driving between cities. Imagine that you have two warehouses, one in Southampton and one in Manchester. You want to transport things from Southampton to Manchester, so you put the things on a truck, the truck drives to Manchester and then drives back again.

Suppose you move the Manchester depot to Edinburgh instead. Now the trucks have to drive a lot further. If you only have one truck, doubling the latency halves the transfer rate. However, the point to realise is that with TCP, there is more than one truck. The author says, "As distance increases, the TCP window shrinks". This is the exact opposite of what happens in TCP. To use the trucks analogy again, if you increase the distance between depots, the logical thing to do is to increase the number of trucks to sustain the same throughput. This is exactly what TCP does. TCP window size = number of trucks. Latency increase leads to window size increase.

There are flaws in the existing congestion control algorithms. For example, there is a problem that people are experiencing on very high bandwidth connections where TCP window size does not scale up fast enough. However, this only affects very high bandwidth networks: 10 gigabits or more. This isn't something that will affect users on a home DSL line.

Finally, yes, latency is important for certain applications. Gaming and video conferencing are two examples of applications where latency is incredibly important. The reason is that in these situations low latency is important. Arguably, the popularity of Web 2.0 applications where users need fast updates from web servers also means that latency has increased importance. However, when speaking about download speeds, latency is irrelevant. Here, bandwidth is all that matters.

Comments

Posted by: fragglet (fragglet)
Posted at: June 4th, 2007 09:25 pm (UTC)
preemptive
Chocolate Doom

I'm just posting this comment to preempt any "the Internet isn't a truck, it's a series of tubes" jokes.

Posted by: LionsPhil (lionsphil)
Posted at: June 4th, 2007 09:40 pm (UTC)
Re: preemptive
ORLY

Spoilsport.

Posted by: LionsPhil (lionsphil)
Posted at: June 7th, 2007 07:52 pm (UTC)
BOOYAH!
Internets

CAN I HAZ REMOTE LINKING?

Posted by: ((Anonymous))
Posted at: June 4th, 2007 10:08 pm (UTC)
Thanks for linking to my blog article

Fragglet,

Thanks for linking to my article. You are correct that TCP will, to use your analogy, add more trucks, up to a point. How many trucks TCP will support depends on the host OSs on each end, many of which still default to a 64K Byte TCP window. You are incorrect in assuming this only applies to 10Gb/s links. On a long-distance WAN, latency can have an impact on T1s. You are also incorrect in assuming that the TCP window will not shrink due to congestion control algorithms. Most high-latent connections will also experience an increase in packet loss. When packets are lost, the congestion algorithm will decrease the congestion window.

While my sandbag analogy is not perfect, it does describe a fairly complex concept in language that a non-technical person can understand. As distance increases, it takes longer for the packet to travel round trip (the wall) and in some cases the TCP window (the container) shrinks. I hope you will check back in for the 2nd part in the series. In that article I will discuss what to do about latency. This includes tweaking the host TCP stack to increase the "number of trucks" as well as using network accelerators.

Thanks,

-Bill
http://www.edgeblog.net

Posted by: fragglet (fragglet)
Posted at: June 5th, 2007 12:05 am (UTC)
Re: Thanks for linking to my blog article

> Thanks for linking to my article. You are correct that TCP will, to use
> your analogy, add more trucks, up to a point. How many trucks TCP will
> support depends on the host OSs on each end

This is incorrect. The congestion control algorithms run on the sending side, not the destination. It is the behaviour of the congestion control algorithms of the sending OS that determines the TCP window size.

> Most high-latent connections will also experience an increase in packet
> loss. When packets are lost, the congestion algorithm will decrease the
> congestion window.

This is the normal behaviour of the congestion control algorithms. Furthermore, you're making the flawed assumption that latency causes packet loss, which is not true. Latency and packet loss are both symptoms of network congestion, caused by bandwidth being maxed out at a router. To understand why this is the case, you have to think about how routers work. Packets arrive at a router and are put into a queue. They get transferred over some form of link and retransmitted onto another network.

In an ideal situation, the queue has at most one packet stored in it. If packets arrive faster than the bandwidth of the link between the networks (or the bandwidth of the networks themselves), the queue backs up, as packets are held, waiting for the next one to be retransmitted. It's kind of like 30 people all trying to get onto a bus at once.

In the extreme situation, packets get lost because the queue is a limited size (you can't keep queueing packets forever). So after a while, any more incoming packets just get dropped, resulting in dropped packets. There are other reasons for dropped packets, but they basically all involve your network hardware being broken. Network congestion due to lack of bandwidth is the main cause of packet loss. This is why it's used by the congestion control algorithms as a signal to reduce the transmit rate (ie. reduce the transmit rate).

I seriously suggest you go and read Jacobson's original paper on congestion avoidance, as it explains the problems of congestion avoidance from first principles and how the TCP Reno algorithms help solve these.

Posted by: ((Anonymous))
Posted at: June 5th, 2007 12:53 am (UTC)
Re: Thanks for linking to my blog article

Fragglet,

I appreciate your comments and your interest. 3 things:

1) Most applications these days, but certainly not all, are bi-directional. Sometimes I'm the sender and sometimes I'm the receiver. That's why I said the host on each side. Since the premise of this article is a WAN design, where sometimes clients are sending data and sometimes they are receiving it, I need to be aware of the limitations at both ends.

2) I did not say that latency causes packet loss; I said there was a correlation between the two. I will drop more packets on my trans-atlantic MPLS circuit than on my point-to-point link between two locations in California, all other things being equal.

3) Even if the window stays the same size, it still takes longer for a complete round trip transaction to occur over a highly latent connection. This is the whole reason RFC 1323 exists!

Obviously your contention is that high-latency networks should not have a problem because TCP will magically deal with the issue. The simple fact is that this is not the case. That is why companies design around this problem with CDNs and network accelerators. I hope you'll check out and link to part 2 tomorrow.

Thanks,

-Bill

Posted by: fragglet (fragglet)
Posted at: June 5th, 2007 01:10 am (UTC)
Re: Thanks for linking to my blog article

> 1) Most applications these days, but certainly not all, are
> bi-directional. Sometimes I'm the sender and sometimes I'm the receiver.
> That's why I said the host on each side. Since the premise of this article
> is a WAN design, where sometimes clients are sending data and sometimes
> they are receiving it, I need to be aware of the limitations at both ends.

Although you're right in that most network protocols are bi-directional (eg. HTTP), congestion control only takes effect when the congestion window is reached. In a typical download over HTTP, the client making the request will not reach the congestion window size. The congestion control algorithms on the client are therefore irrelevant. It's the server's algorithms that matter, because it's the one sending lots of data and hitting the congestion window ceiling.

> 2) I did not say that latency causes packet loss; I said there was a
> correlation between the two. I will drop more packets on my trans-atlantic
> MPLS circuit than on my point-to-point link between two locations in
> California, all other things being equal.

Correlation does not equal causation! As I explained, high latencies and lost packets are both symptoms of network congestion. What is the solution to network congestion? .... add more bandwidth!

> 3) Even if the window stays the same size, it still takes longer for a
> complete round trip transaction to occur over a highly latent connection.
> This is the whole reason RFC 1323 exists!

Actually, no. Read the introduction to RFC1323 that explains the reasons for its existence:


The introduction of fiber optics is resulting in ever-higher
transmission speeds, and the fastest paths are moving out of the
domain for which TCP was originally engineered.


> Obviously your contention is that high-latency networks should not have a
> problem because TCP will magically deal with the issue. The simple fact is
> that this is not the case. That is why companies design around this
> problem with CDNs and network accelerators.

No, this is not what I am saying. You are saying that latency causes network problems and that by improving latency you can improve your network. I assert that this is false. If you have latency problems, they are a symptom of network congestion. If your network is suffering from serious congestion, it probably needs more bandwidth.

Posted by: ((Anonymous))
Posted at: June 5th, 2007 02:41 am (UTC)
Re: Thanks for linking to my blog article

Fragglet,

>”You are saying that latency causes network problems and that by improving latency you can improve your network. I assert that this is false. If you have latency problems, they are a symptom of network congestion. If your network is suffering from serious congestion, it probably needs more bandwidth.”

Wow. It is impressive how someone can miss the point so completely so many times. While network congestion will add to latency, latency is in and of itself a problem. In a network with zero congestion, latency will still be a problem. The problem is distance. More bandwidth cannot improve upon the speed of light. Sorry. This is the whole point of my article. Latency does cause issues unrelated to bandwidth or congestion. Those issues can be reduced with planning.

Thanks for commenting.

-Bill

Posted by: fragglet (fragglet)
Posted at: June 5th, 2007 08:30 am (UTC)
Re: Thanks for linking to my blog article

Yes, and I've explained to you how this is not a problem with TCP. As the latency due to distance increases, the number of packets in transit (TCP window size) is increased. So latency will not affect download speeds.

Posted by: juliettepizag (juliettepizag)
Posted at: July 16th, 2008 07:11 pm (UTC)

For you have to make sure that TCP window size is set large enough to reduce idle time of a sender waiting for an ACK to get back from the receiver before they can resume transmitting packets.

Posted by: fragglet (fragglet)
Posted at: July 16th, 2008 10:16 pm (UTC)
Window size

You don't really "set" window size with TCP, it's determined automatically by the congestion control algorithms.

Posted by: lizzystine (lizzystine)
Posted at: October 9th, 2008 04:33 pm (UTC)

For example, information about the other players is sent to the client even when the client has no need for them (the high cycle demand filter in deciding which "enemies the player sees" is in the client, the server may send data about all of the players in each "map", "field", "realm" etc.

Posted by: ((Anonymous))
Posted at: June 4th, 2007 10:34 pm (UTC)
Seriously?

Do you know how networks work? Distance and latency will effect every TCP application.

Posted by: fragglet (fragglet)
Posted at: June 5th, 2007 12:15 am (UTC)
Re: Seriously?
Ouch!

Yes Bill, I do know how networks work, and I also know how to identify someone posting on my blog by their IP address.

Posted by: ((Anonymous))
Posted at: June 5th, 2007 12:38 am (UTC)
Re: Seriously?

That one wasn't me. I sign my posts. Most likely, someone I work with (it's a big company). Maybe you also know how NAT works?!? ;-)

-Bill

Posted by: fragglet (fragglet)
Posted at: June 5th, 2007 01:02 am (UTC)
Re: Seriously?

AT&T WorldNet Services ATT (NET-12-0-0-0-1)
                                  12.0.0.0 - 12.255.255.255
COPART COPART435-210 (NET-12-25-210-0-1)
                                  12.25.210.0 - 12.25.210.255
AT&T Worldnet seems to be their residential option. AT&T Business Internet would be the business/commercial one. Perhaps it was your wife or one of your kids that made the comment?

Posted by: ((Anonymous))
Posted at: November 7th, 2007 11:18 am (UTC)
Latency _is_ a problem

As a network engineer I think I have a fairly good grasp on latency, bandwidth and TCP congestion control algorithms.

I have to say: Latency _IS_ a problem.

I totally understand that latency is _less_ important for downloading things, because you have large TCP packets and congestion control algorithms are effective.
I'm not going to explain all the limitations of these congestion algorithms, but there are a lot of them.

Unfortunately, not all TCP communication lasts long enough for these congestion control algorithms to do their work. (HTTP requests and reply.)

And not all is TCP, so these TCP congestion control algorithms do not apply anyway. (DNS request, gaming, ESP, VoIP, ...)

Latency _is_ a problem, which can be _solved_ by good network design, or sometimes _worked around_ by controlling window sizes as TCP does.
I want to stress "sometimes".

So, for large downloads, latency is not very important.
For everything else, it is.

Posted by: fragglet (fragglet)
Posted at: November 7th, 2007 11:57 am (UTC)
Re: Latency _is_ a problem
Ouch!

You're correct, except that a large proportion of the "normal" uses of the Internet involve downloading things: web browsing, email, etc. HTTP requests and replies are included in this, because although a single HTTP request doesn't last for long, typically multiple HTTP requests are combined together in a single TCP connection used for downloading a complete website (HTML pages, images, etc). This gives the congestion control algorithms the opportunity to adjust. I explain some of this in my second followup.

You're correct about DNS requests, but the latency on a DNS request should never be too big, because you should be using the DNS server provided by your ISP. If the DNS replies from your ISP's server are in the range of hundreds of milliseconds, then yes, there is something seriously wrong with the network :-)

There are latency sensitive applications (games and VoIP are good examples), but in the majority of uses of networks, bandwidth is more important.

18 Read Comments