Internet Explorer and cacheing: beware of the Vary
Today a real-life situation we came across at Tweakers.net (the company I work for): one of our server-admins, let's call him moto-moi, noticed that IE-clients were requesting some of our static files (icons and images) far more frequent than other clients, which is particularly noticable since only around 34% of the hits on our site (at least from registered users) are from IE-based clients. This lead to one conclusion: somehow IE refuses to cache these files. So we started investigating...
First I fired up the good ol' IE7 (yes, it comes with brand new packaging but still having cobwebs and all) and downloaded Fiddler. Then I visited our frontpage and saw around 92 requests, most for small static images. I revisited our frontpage and still around 80 requests where necessary, again most for small static images that should have been cached on the first hit. This smelled really bad, especially since I knew for a fact that other browsers have no problems cacheing these images.
I jotted down the response headers for one particular request which was something like this (can't exactly replay at this moment):
HTTP/1.x 200 OK Date: Mon, 11 Dec 2006 23:07:49 GMT Server: Apache/1.3.37 (Unix) PHP/4.4.4 mod_gzip/126.96.36.199a Vary: Accept-Encoding Cache-Control: max-age=25920000 Expires: Sun, 07 Oct 2007 23:07:49 GMT Last-Modified: Sat, 12 Apr 2003 21:59:31 GMT Accept-Ranges: bytes Content-Length: 315 Connection: close Content-Type: image/gif
Then I did the same request on the same file saved on my harddrive which gave me these headers:
HTTP/1.x 200 OK Date: Mon, 11 Dec 2006 23:14:02 GMT Server: Apache/1.3.29 (Win32) PHP/4.4.4 Last-Modified: Fri, 01 Dec 2006 00:09:18 GMT Etag: "0-401-456f72ae" Accept-Ranges: bytes Content-Length: 315 Keep-Alive: timeout=15, max=100 Connection: Keep-Alive Content-Type: image/gif
It was clear that my local environment differed somewhat from our production environment , but on subsequent requests at least it appeared that the image from my local environment was being cached! One of the differences in the server response must have been responsible, only question that remained was what exactly?
To cut a long story short: while reconfiguring my local server to match our production settings I found out that the Vary-header was the actual culprit, and that mod_gzip (a module responsible for compressing content before sending it to the client and thus saving bandwidth) added this particular header, even though we explicitly excluded images from being compressed by this module by means of a mimetype filter (images have a mimetype that matches image/* e.g. image/gif, image/jpeg and so on).
A quick search on the internet indeed verified that Internet Explorer doesn't handle the Vary-header very well (a direct violation of RFC2616 as mentioned above) and that mod_gzip only surpresses the Vary-header when you exclude files by filename, uri or handler, but not (as in our case) when you exclude files by mimetype!
With this knowledge it was really quite simple to craft a solution: just make sure that mod_gzip isn't activated for any type of image-file and for some known scripts that output images. The standard FilesMatch directive was just perfect for that. The result: an immediate decrease of around 30% in apache requests and much faster response-times when using IE to visit our site!
Comments are closed