2016 November 29
I stumbled on a peculiar bug in Slim Framework that has remained unsolved for 5 years and has eluded many including Slim Framework's author himself. I decided to investigate and found the solution.
See also: https://github.com/slimphp/Slim/issues/2083
Have you ever used ApacheBench and found that your PHP application is fast if you use the HTTP Keep-Alive feature, but really slow if you don't? So slow, you always get less than 1 request per second?
This article describes the discovery of this strange problem, the investigation that revealed that ApacheBench is not at fault, and an effective solution to prevent the same problem in your own framework or application.
This started as a curiosity any framework author would have: "How does it compare to other frameworks?" The first thing that came to mind is of course performance, so I selected several frameworks to compare to. When I was benchmarking, I found something curious about Slim Framework.
I benchmarked Slim Framework's "Welcome to Slim!" example and two of my own
using my favorite ApacheBench command: ab -t 2 -c 10 -k localhost/application-name/
.
The results are good but I noticed that one of my application ignores Keep-Alive unlike the other two, probably because the response is too big.
I redid the benchmark without the -k
switch, expecting that the other two would just be a little slower.
I was surprised.
Framework | Requests/second | Time/request | Content length |
---|---|---|---|
Slim Framework 3.6.0 ("Welcome to Slim!" example) | 0.18 #/sec | 5524.410 ms | 16 bytes |
Cog Framework 1.0 (Hello World) | 1682.88 #/sec | 0.594 ms | 722 bytes |
Cog Framework 1.0 + Template (Router Benchmark) | 1054.36 #/sec | 0.948 ms | 12269 bytes |
That can't be right. That is a lot slower, abnormally slower, even. I repeated the benchmark several times and got the same result: Slim can only serve 1 request in more than 5 seconds. Something odd is going on here.
A search revealed a thread discussing the same problem started in January 2012 with no conclusion, so it looks like the problem has been around for almost 5 years, maybe even longer. In the discussion thread, joined by Slim's author Josh Lockhart, several suspicions arose about the likely cause of the problem:
Somehow thinking that I can help, I decided to investigate. I discovered that the closest suspicion to the actual cause is the first one. That's right, Slim Framework has a bug in the way it handles requests.
After finding the exact cause, the solution is easy. Although Slim seems to be the only framework with the bug, it is clear that the bug is actually not exclusive to Slim, thus your framework or application may also have the same bug and can also benefit from the same solution.
If it is just the solution you're interested in, jump down to it or scroll down. This is the investigation.
While studying Slim's source code, I found that Slim always returns HTTP/1.1 response by default without examining the request. So at first, the third suspicion might be the right one. To verify, I downloaded the specification: RFC7230.
Soon enough, I found a relevant text. RFC7230 Section 2.6 says this about protocol versioning: The minor version advertises the sender's communication capabilities even when the sender is only using a backwards-compatible subset of the protocol, thereby letting the recipient know that more advanced features can be used in response (by servers) or in future requests (by clients).
In other words, sending an HTTP/1.1 response to an HTTP/1.0 request is allowed as a way to advertise that the server is capable of handling higher protocol version than the one received in the request. By doing so, the server is basically saying "Hey, I can handle version 1.1 of the protocol" and this is not a problem.
In fact, further in, the same section states: A server SHOULD send a response version equal to the highest version to which the server is conformant that has a major version less than or equal to the one received in the request. Meaning, if a server supports HTTP/1.9 (assuming it exists) it should send an HTTP/1.9 response to any HTTP/1.x request.
Also in practice, my Hello World and Router Benchmark applications both return HTTP/1.1 responses without getting Slim's problem. So the version mismatch suspicion is incorrect.
On to ApacheBench.
In the discussion thread, Josh stated his strong suspicion that the problem is in ApacheBench. Considering no other framework has the same problem, Slim was the more likely culprit but it wasn't conclusive. As luck would have it, someone posted his ApacheBench log which I found helpful because I noticed a problem and for some reason, I forgot about the -v switch.
Here's my log:
Benchmarking localhost (be patient)...INFO: GET header == --- GET /slim/ HTTP/1.0 Host: localhost User-Agent: ApacheBench/2.3 Accept: */* --- LOG: header received: HTTP/1.1 200 OK Date: Tue, 29 Nov 2016 10:10:35 GMT Server: Apache/2.4.9 (Win32) Content-Length: 16 Content-Type: text/html; charset=UTF-8 Welcome to Slim! ..done Time taken for tests: 5.516 seconds Complete requests: 1
That's the problem, right there.
That is exactly the same problem I see in the log posted in the discussion thread. If you are familiar with the differences between HTTP/1.0 and HTTP/1.1, you can immediately see that Slim sends the wrong response.
If you don't see the problem, don't worry. Before the end of this article, I promise you will be able to.
This is what happens when ApacheBench interacts with Slim without the -k
switch:
When ApacheBench sends an HTTP/1.0 request, it expects that the server will close the connection. This complies with Section A.1.2 that states: In HTTP/1.0, each connection is established by the client prior to the request and closed by the server after sending the response.
In other words, ApacheBench doesn't have an issue closing the connection as Josh suspected. Quite the opposite, it is actually waiting for the server to close the connection, in compliance with the specification.
Then why doesn't the server close the connection? Because the server is waiting for additional requests, also in compliance with the specification. Section 6.3 states: A client MAY send additional requests on a persistent connection until it sends or receives a "close" connection option or receives an HTTP/1.0 response without a "keep-alive" connection option.
Because Slim sends the wrong response, the server believes that the connection is persistent, therefore it keeps it open to wait for additional requests until a timeout occurs, even if the client is actually waiting for connection closure.
So there you have it. Both Apache and ApacheBench are doing their jobs correctly according to specification. Slim inadvertently got them waiting for each other by sending the wrong response. This blog post that Josh referred to unfortunately has the wrong conclusion because it shows the wrong response sent to ApacheBench as well, although there is no indication about what framework was being used, if any.
Why is it wrong? Because it causes a deadlock and therefore it is not sufficiently backwards-compatible. Section 2.6 states: A recipient can assume that a message with a higher minor version, when sent to a recipient that has not yet indicated support for that higher version, is sufficiently backwards-compatible to be safely processed by any implementation of the same major version.
The conclusion: Slim Framework is the guilty party. It has been for years.
If Slim has been sending the wrong response for years, what is the correct response? The correct response is clearly defined in Section 6.3 that I quoted above, which means there are two choices:
Click here or scroll back up to see the log again. That's what I saw. For that specific request, there is "HTTP/1.1" in the response but no "Connection: close". Now you can see the problem too. I promised, didn't I?
Without one of these responses, you get a deadlock: the HTTP/1.0 client and the HTTP/1.1 server will wait for each other until a timeout occurs due to conflicting expectations by both parties because HTTP/1.0 defaults to non-persistent connections while HTTP/1.1 defaults to persistent connections, the exact opposite.
The timeout is whichever timeout that occurs first from the parties involved. Apache defaults to 5 seconds, other servers are different. I've seen one that defaults to 75,000,000 seconds which is over 2 years. Fortunately ApacheBench defaults to a more sensible 30 seconds.
After the investigation, the solution is easy: just send "Connection: close" to all HTTP/1.0 requests with no Connection header.
// Send "Connection: close" to all HTTP/1.0 requests with no Connection header if($_SERVER['SERVER_PROTOCOL']=='HTTP/1.0' && empty($_SERVER['HTTP_CONNECTION'])){ header('Connection: close'); }
I recommend putting the solution at the start of your framework or application for four reasons:
I also recommend adding my full name to it with a text that reminds you of who is responsible if the code doesn't work. This will do:
// By Reinir Puradinata <-- Blame this guy if this doesn't work
Be creative with it. Go nuts. Now let's put the solution in Slim.
There are two approaches on how to put the solution in Slim.
The first and the easiest is my recommendation above.
This is the new Slim\App
constructor:
public function __construct($container = []) { // Send "Connection: close" to all HTTP/1.0 requests with no Connection header if ($_SERVER['SERVER_PROTOCOL'] == 'HTTP/1.0' && empty($_SERVER['HTTP_CONNECTION'])) { header('Connection: close'); } if (is_array($container)) { $container = new Container($container); } if (!$container instanceof ContainerInterface) { throw new InvalidArgumentException('Expected a ContainerInterface'); } $this->container = $container; }
The second approach is by following the way Slim sends headers, which appears to be in Slim\DefaultServicesProvider
class.
Since I'm not a Slim expert or even a user I can't be sure if the solution will always be executed due to the "Default" in the class name which suggests that it or parts of it can be replaced.
Thus my recommendation holds: use the first approach.
With the solution in place, Slim Framework 3.6.0 now sends the correct response. Well, at least my copy does:
Benchmarking localhost (be patient)...INFO: GET header == --- GET /slim/ HTTP/1.0 Host: localhost User-Agent: ApacheBench/2.3 Accept: */* --- LOG: header received: HTTP/1.1 200 OK Date: Tue, 29 Nov 2016 10:11:56 GMT Server: Apache/2.4.9 (Win32) Connection: close Content-Length: 16 Content-Type: text/html; charset=UTF-8 Welcome to Slim! ..done Time taken for tests: 0.005 seconds Complete requests: 1
Because the solution is adaptive, advanced features such as persistent connections are not affected
as shown in this benchmark where Slim is faster with the -k
switch than without:
Framework | With -k | Without -k | Content length |
---|---|---|---|
Slim Framework 3.6.0 Fixed ("Welcome to Slim!" example) | 791.95 #/sec | 682.26 #/sec | 16 bytes |
Cog Framework 1.0 (Hello World) | 2543.58 #/sec | 1685.06 #/sec | 722 bytes |
Cog Framework 1.0 + Template (Router Benchmark) | 1054.94 #/sec | 1056.91 #/sec | 12269 bytes |
Now that's more like it.
There are several things to learn from this. The first is when in doubt, refer to the specification. It exists for a reason.
The second is if an issue arises, always assume responsibility, particularly if you are unable to figure out what's going on. This rule will serve you well because it will motivate you to scrutinize your own code, leaving less chance that a bug goes unsolved for a long time. If you still can't solve the problem you're having, ask me. Maybe I can help.
To implement the solution in your framework or application, simply follow my recommendation and example with Slim Framework above. Even if your code doesn't have the problem at the moment, the solution will prevent it from occurring in the future.