HTTP长连接 Persistent Connections与Connection: Keep-Alive

2018年3月24日 3754点热度 0人点赞 0条评论

[title]持久连接[/title]

基础概念

HTTP/1.1(以及HTTP/1.0的各种增强版本)允许HTTP设备在事务处理结束后仍将TCP 连接保持在打开状态，以便为未来的HTTP请求重用现存连接在事务处理结束后仍然保持在打开状态的TCP连接被称为持久连接。
非持久连接在每次事务处理结束后就会关闭。
重用TCP连接可以加速数据传输，因为：

避免每次都经历缓慢的连接建立阶段，以及每次都执行关闭操作，节省耗时和带宽
避免TCP连接慢启动特性的拥塞适应阶段

持久连接有两种类型：

比较老的HTTP/1.0+ "keep-alive"连接
现代的HTTP/1.1 "persistent"连接

使用场景

一个web页面中内嵌的图片通常都来自同一个Web站点，而且相当一部分的超链接都指向同一个站点。如果初始化了一个持久连接，我们就可以通过此连接发起更多目标服务器相同的请求。

HTTP/1.0+ keep-alive连接

HTTP/1.0+中支持的是keep-alive连接。

keep-alive握手过程

HTTP/1.0+支持keep-alive连接，但默认并未激活。客户端通过发送一个包含Connection: Keep-Alive首部的请求来请求服务器激活keep-alive连接，即将这条连接保持在打开状态。
如果服务器愿意为下一条请求重用此连接，就会在响应中包含相同的首部。若没有，服务器就会在发回响应报文后关闭连接。客户端就是通过检测响应中是否包含Connection: Keep-Alive响应首部来判断服务器是否会在发送响应后关闭连接的
假如服务器同意使用keep-alive连接，那么接下来客户端必须在所有希望保持持久连接的请求中包含Connection: Keep-Alive首部。如果没有发送该首部，服务器会在那条请求后关闭连接。
那么何时关闭持久连接呢？注意，Connection: Keep-Alive首部只是请求将连接保持在活跃状态。即使服务器和客户端都同意建立持久连接了，它们仍可以在任意时刻关闭空闲的keep-alive连接，且可随意限制keep-alive连接所处理事务的数量。我们可以通过Keep-Alive选项调节它们的行为，具体请看下一部分。

Keep-Alive选项

用法：Keep-Alive: name[=value][, name=[value]]...
完全可选，但只有在包含了Connection: Keep-Alive首部的情况下才可使用它。

参数timeout：在Keep-Alive响应首部中发送，告诉客户端服务器估计会在打开状态保持到连接空闲多长时间后关闭连接。
参数max：在Keep-Alive响应首部中发送，告诉客户端服务器还会为另外几个http事务将连接保持在打开状态。
注意，这两个参数值仅仅是估计，并非承诺。

例如：

Connection: Keep-Alive
Keep-Alive: max=5, timeout=120

说明服务器最多还会为另外5个事务保持连接在打开状态，或者将打开状态保持到连接空闲了2两分钟后关闭。

persistent连接

HTTP/1.1逐渐停止了对keep-alive连接的支持，用persistent连接替代了它。

与keep-alive连接不同，HTTP/1.1中persistent连接默认就是激活的，除非特别指明，否则HTTP/1.1认为所有连接都是持久的。
HTTP/1.1的客户端假定在收到的响应后，除非报文包含了Connection: Close首部，否则客户端就认为连接仍为维持在打开状态。

如果客户端要建立一个非持久连接，则需要在请求中包含Connection: Close首部；服务器在处理完该事务后，就会在响应中包含Connection: Close首部以告知客户端连接已关闭。如果客户端不想在一条persistent连接上发送更多请求了，就应该在最后一条请求中包含Connection: Close首部。

只要服务器决定在事务处理结束后关闭连接，就必须在响应中包含Connection: Close首部。但不发送Connection: Close首部也并不意味着服务器承诺永远将连接保持在打开状态。

同样地，不管连接是否维持在打开状态，或Connection首部取了什么值，客户端和服务器仍然可以随时关闭空闲连接。

规则和限制

一个客户端对任何服务器或代理最多只能维护两条持久连接，以防服务器过载。

[title]HTTP长连接[/title]

What is HTTP Persistent Connections?
HTTP persistent connections, also called HTTP keep-alive, or HTTP connection reuse, is the idea of using the same TCP connection to send and receive multiple HTTP requests/responses, as opposed to opening a new one for every single request/response pair. Using persistent connections is very important for improving HTTP performance.

什么是HTTP长连接？
HTTP长连接，与一般每次发起http请求或响应都要建立一个tcp连接不同，http长连接利用同一个tcp连接处理多个http请求和响应，也叫HTTP keep-alive，或者http连接重用。使用http长连接可以提高http请求/响应的性能。

There are several advantages of using persistent connections, including:

Network friendly. Less network traffic due to fewer setting up and tearing down of TCP connections.
Reduced latency on subsequent request. Due to avoidance of initial TCP handshake
Long lasting connections allowing TCP sufficient time to determine the congestion state of the network, thus to react appropriately.

使用http长连接有很多好处，包括：

更少的建立和关闭tcp连接，可以减少网络流量。
因为已建立的tcp握手，减少后续请求的延时。
长时间的连接让tcp有充足的时间判断网络的拥塞情况，方便做出下步操作。

The advantages are even more obvious with HTTPS or HTTP over SSL/TLS. There, persistent connections may reduce the number of costly SSL/TLS handshake to establish security associations, in addition to the initial TCP connection set up.
In HTTP/1.1, persistent connections are the default behavior of any connection. That is, unless otherwise indicated, the client SHOULD assume that the server will maintain a persistent connection, even after error responses from the server. However, the protocol provides means for a client and a server to signal the closing of a TCP connection.

这些优点在使用https连接时更显著。可以减少多次建立高消耗的SSL/TLS握手。
在HTTP/1.1中，默认使用的是长连接方式。客户端默认服务端会保持长连接，即便返回错误响应；除非明确指示不使用长连接。同时，协议中也指定了客户端可以发送关闭信号到服务端来关闭TCP连接。

What makes a connection reusable?
Since TCP by its nature is a stream based protocol, in order to reuse an existing connection, the HTTP protocol has to have a way to indicate the end of the previous response and the beginning of the next one. Thus, it is required that all messages on the connection MUST have a self-defined message length (i.e., one not defined by closure of the connection). Self demarcation is achieved by either setting the Content-Length header, or in the case of chunked transfer encoded entity body, each chunk starts with a size, and the response body ends with a special last chunk.

怎样是连接可以重用？
因为TCP是基于流的协议，所以HTTP协议需要有一种方式来指示前一个响应的结束和后一个响应的开始来重用已建立的连接。所以，它要求连接中传输的信息必须有自定义的消息长度。自定义消息长度可以通过设置 Content-Length 消息头，若传输编码的实体内容块，则每个数据块的标明数据块的大小，而且响应体也是以一个特殊的数据块结束。

What happens if there are proxy servers in between?
Since persistent connections applies to only one transport link, it is important that proxy servers correctly signal persistent/or-non-persistent connections separately with its clients and the origin servers (or to other proxy servers). From a HTTP client or server's perspective, as far as persistence connection is concerned, the presence or absence of proxy servers is transparent.

若中间存在代理服务器将会如何?
因为长连接仅占用一条传输链路，所以代理服务器能否正确得与客户端和服务器端（或者其他代理服务器）发送长连接或非长连接的信号尤为重要。但是HTTP的客户端或服务器端来看，代理服务器对他们来说是透明的，即便长连接是需要关注的。

What does the current JDK do for Keep-Alive?
The JDK supports both HTTP/1.1 and HTTP/1.0 persistent connections.

When the application finishes reading the response body or when the application calls close() on the InputStream returned by URLConnection.getInputStream(), the JDK's HTTP protocol handler will try to clean up the connection and if successful, put the connection into a connection cache for reuse by future HTTP requests.

The support for HTTP keep-Alive is done transparently. However, it can be controlled by system properties http.keepAlive, and http.maxConnections, as well as by HTTP/1.1 specified request and response headers.

当前的JDK如何处理Keep-Alive？
JDK同时支持HTTP/1.1 和 HTTP/1.0。
当应用程序读取完响应体内容后或者调用 close() 关闭了URLConnection.getInputStream()返回的流，JDK中的HTTP协议句柄将关闭连接，并将连接放到连接缓存中，以便后面的HTTP请求使用。
对HTTP keep-Alive 的支持是透明的。但是，你也可以通过系统属性http.keepAlive和http.maxConnections以及HTTP/1.1协议中的特定的请求响应头来控制。

The system properties that control the behavior of Keep-Alive are:
http.keepAlive=<boolean>
default: true

Indicates if keep alive (persistent) connections should be supported.
http.maxConnections=<int>
default: 5

Indicates the maximum number of connections per destination to be kept alive at any given time

HTTP header that influences connection persistence is:
Connection: close

If the "Connection" header is specified with the value "close" in either the request or the response header fields, it indicates that the connection should not be considered 'persistent' after the current request/response is complete.

控制Keep-Alive表现的系统属性有：

http.keepAlive=<布尔值>
默认: true
指定长连接是否支持

http.maxConnections=<整数>
默认: 5
指定对同一个服务器保持的长连接的最大个数。

影响长连接的HTTP header是：
Connection: close
如果请求或响应中的Connection header被指定为close，表示在当前请求或响应完成后将关闭TCP连接。

The current implementation doesn't buffer the response body. Which means that the application has to finish reading the response body or call close() to abandon the rest of the response body, in order for that connection to be reused. Furthermore, current implementation will not try block-reading when cleaning up the connection, meaning if the whole response body is not available, the connection will not be reused.

JDK中的当前实现不支持缓存响应体，所以应用程序必须读取完响应体内容或者调用close()关闭流并丢弃未读内容来重用连接。此外，当前实现在清理连接时并未使用阻塞读，这就意味这如果响应体不可用，连接将不能被重用。

What's new in Tiger?
When the application encounters a HTTP 400 or 500 response, it may ignore the IOException and then may issue another HTTP request. In this case, the underlying TCP connection won't be Kept-Alive because the response body is still there to be consumed, so the socket connection is not cleared, therefore not available for reuse. What the application needs to do is call HttpURLConnection.getErrorStream() after catching the IOException , read the response body, then close the stream. However, some existing applications are not doing this. As a result, they do not benefit from persistent connections. To address this problem, we have introduced a workaround.

The workaround involves buffering the response body if the response is >=400, up to a certain amount and within a time limit, thus freeing up the underlying socket connection for reuse. The rationale behind this is that when the server responds with a >=400 error (client error or server error. One example is "404: File Not Found" error), the server usually sends a small response body to explain whom to contact and what to do to recover.

JDK1.5中的新特性
当应用接收到400或500的HTTP响应时，它将忽略IOException 而另发一个HTTP 请求。这种情况下，底层的TCP连接将不会再保持，因为响应内容还在等待被读取，socket 连接未清理，不能被重用。应用可以在捕获IOException 以后调用HttpURLConnection.getErrorStream() ，读取响应内容然后关闭流。但是现存的应用没有这么做，不能体现出长连接的优势。为了解决这个问题，介绍下workaround。

当响应体的状态码大于或等于400的时候，workaround 将在一定时间内缓存一定数量的响应内容，释放底层的socket连接来重用。基本原理是当响应状态码大于或等于400时，服务器端会发送一个简短的响应体来指明连接谁以及如何恢复连接。

Several new Sun implementation specific properties are introduced to help clean up the connections after error response from the server.

The major one is:

sun.net.http.errorstream.enableBuffering=<boolean>
default: false

With the above system property set to true (default is false), when the response code is >=400, the HTTP handler will try to buffer the response body. Thus freeing up the underlying socket connection for reuse. Thus, even if the application doesn't call getErrorStream(), read the response body, and then call close(), the underlying socket connection may still be kept-alive and reused.

The following two system properties provide further control to the error stream buffering behavior:

sun.net.http.errorstream.timeout=<int> in millisecond
default: 300 millisecond

sun.net.http.errorstream.bufferSize=<int> in bytes
default: 4096 bytes

下面介绍一些SUN实现中的特定属性来帮助接收到错误响应体后清理连接：
主要的一个是：
sun.net.http.errorstream.enableBuffering=<布尔值>
默认: false

当上面属性设置为true后，在接收到响应码大于或等于400是，HTTP 句柄将尝试缓存响应内容。释放底层的socket连接来重用。所以，即便应用不调用getErrorStream()来读取响应内容，或者调用close()关闭流，底层的socket连接也将保持连接状态。

下面的两个系统属性是为了更进一步控制错误流的缓存行为：
sun.net.http.errorstream.timeout=<int> in 毫秒
默认: 300 毫秒

sun.net.http.errorstream.bufferSize=<int> in bytes
默认: 4096 bytes

What can you do to help with Keep-Alive?
Do not abandon a connection by ignoring the response body. Doing so may results in idle TCP connections. That needs to be garbage collected when they are no longer referenced.

If getInputStream() successfully returns, read the entire response body.

When calling getInputStream() from HttpURLConnection, if an IOException occurs, catch the exception and call getErrorStream() to get the response body (if there is any).

Reading the response body cleans up the connection even if you are not interested in the response content itself. But if the response body is long and you are not interested in the rest of it after seeing the beginning, you can close the InputStream. But you need to be aware that more data could be on its way. Thus the connection may not be cleared for reuse.

Here's a code example that complies to the above recommendation:

你如何做可以保持连接为连接状态呢？
不要忽略响应体而丢弃连接。这样会是TCP连接闲置，当不再被引用后将会被垃圾回收器回收。
如果getInputStream()返回成功，读取全部响应内容。如果抛出IOException ，捕获异常并调用getErrorStream() 读取响应内容（如果存在响应内容）。

即便你对响应内容不感兴趣，也要读取它，以便清理连接。但是，如果响应内容很长，你读取到开始部分后就不感兴趣了，可以调用close()来关闭流。值得注意的是，其他部分的数据已在读取中，所以连接将不能被清理进而被重用。

下面是一个基于上面建议的代码样例：

try {
    URL a = new URL(args[0]);
    URLConnection urlc = a.openConnection();
    is = conn.getInputStream();
    int ret = 0;
    while ((ret = is.read(buf)) > 0) {
      processBuf(buf);
    }
    // close the inputstream
    is.close();
} catch (IOException e) {
    try {
        respCode = ((HttpURLConnection)conn).getResponseCode();
        es = ((HttpURLConnection)conn).getErrorStream();
        int ret = 0;
        // read the response body
        while ((ret = es.read(buf)) > 0) {
            processBuf(buf);
        }
        // close the errorstream
        es.close();
    } catch(IOException ex) {
        // deal with the exception
    }
}

If you know ahead of time that you won't be interested in the response body, you should issue a HEAD request instead of a GET request. For example when you are only interested in the meta info of the web resource or when testing for its validity, accessibility and recent modification. Here's a code snippet:

如果你预先就对响应内容不感兴趣，你可以使用HEAD 请求来代替GET 请求。例如，获取web资源的meta信息或者测试它的有效性，可访问性以及最近的修改。下面是代码片段：

URL a = new URL(args[0]);
URLConnection urlc = a.openConnection();
HttpURLConnection httpc = (HttpURLConnection)urlc;
// only interested in the length of the resource
httpc.setRequestMethod("HEAD");
int len = httpc.getContentLength();

本文出自：https://www.jianshu.com/p/0a47fc776314

http://lzhw1985.iteye.com/blog/1991857