Monday, July 26, 2010

ETag implementation for ASP.NET

implementing the logic for handling requests whose headers specify If-None-Match (ETag) and/or If-Modified-Since values. This is not something easy, since ASP.NET does not offer support for it directly, nor have primitives/methods/functions for it and, by default, always returns 200 OK, no matter the headers of the request (apart from errors, such as 404, and so).
The idea behind this is quite simple; let’s suppose a dialog between a browser (B) and a web server (WS):
B: Hi, can you give me a copy of ~/document.aspx?
WS: Of course. Here you are: 200Kb of code. Thanks for coming, 200 OK.
B: Hi again, can you give me a copy of ~/another-document.aspx?
WS: Yes, we’re here to serve. Here you are: 160Kb. Thanks for coming, 200 OK.
(Now the user clicks on a link that points to ~/document.aspx or goes back in his browsing history)
B: Sorry for disturbing you again, can I have another copy of ~/document.aspx
WS: No problem at all. Here you are: 200Kb of code (the same as before). Thanks for coming, 200 OK.
Stupid, isn’t it? The way for enhancing the dialogue and avoid unnecessary traffic is having a richer vocabulary (If-None-Match & If-Modified-Since). Here is the same dialogue with these improvements:
B: Hi can you give me a copy of ~/document.aspx?
WS: Of course. Here you are: 200Kb of code. ISBN is 55511122 (ETag) and this is the 2009 edition (Last-Modified). Thanks for coming, 200 OK.
B: Hi again, can you give me a copy of ~/another-document.aspx?
WS: Yes, we are here to serve. Here you are: 160Kb. ISBN is 555111333 (ETag) and it is the 2007 edition (Last-Modified). Thanks for coming, 200 OK.
(Now the time passes and the user goes back to ~/document.aspx, maybe it was in his favorites, or arrived to the same file after browsing for a while)
B: Hi again, I already have a copy of ~/document.aspx, ISBN is 555111222 (If-None-Match), dated 2009 (If-Modified-Since). Is there any update for it?
WS: Let me check… No, you are up to date, 0Kb transferred, 304 Not modified.
It sounds more logical. It takes a little more dialogue (negotiation) previous to the transaction, but if the conditions are met, these extra words saves time and money (bandwidth) on both parties.
Most of the browsers nowadays support such a negotiation, but the web server must do it also in order to get benefits. Unfortunately IIS only supports conditional GET natively for static files. If you want to use it also for dynamic content (ASP.NET files) you need to add support for it programmatically. That’s what we are going to show here.

Calculating Last-Modified response header.

To begin with, the server needs to know when a page was last modified. This is very easy for static contents, a simple mapping between the web page being requested and the file in the underlying file system and you are done. The calculation of this date for .ASPX files is a little more complicated. You need to consider all the dependencies for the content being served and calculate the most recent date among them. For instance, let’s suppose the browser requests a page at ~/default.aspx and this file is based on a masterpage called ~/MasterPage.master which has a menu inside it, that grabs its contents from the file ~/web.sitemap. In the simplest scenario (no content being retrieved from a database, no user controls), ~/default.aspx will contain static content within. In this case, the Last-Modified value will be the most recent last modification time of these files:
  • ~/default.aspx
  • ~/default.aspx.vb (Optionally, depending on your pages having code behind which modifies the output or not)
  • ~/MasterPage.master
  • ~/MasterPage.master.vb (Optionally)
  • ~/web.sitemap
The last-mod time is retrieved using System.IO.File.GetLastWriteTime. In case of the content being retrieved from a database, you must have a column for storing last-mod-time (when the content was last written) in order to use this functionality.

Calculating ETag response header.

The second key of the dialogue is the ETag value. It is simply a hash function for the final contents being served. If you have any way (with low CPU footprint) for calculating a hash based on certain textual input, it can be used. In our implementation, we used CRC32 but any other will work the same way. We calculate the ETag value making a CRC32 checksum of any dependant content plus the last-mod-dates of these dependencies. I our simplest case, the concatenation of all these strings:
  • ~/default.aspx last write time
  • ~/default.aspx.vb last write time (not likely, but optionally necessary)
  • ~/MasterPage.master last write time
  • ~/MasterPage.master.vb last write time (Optionally)
  • ~/web.sitemap last write time
  • ~/default.aspx contents
  • ~/default.aspx.vb contents (Optionally, but not likely, to speed up calculations)
  • ~/MasterPage.master contents
  • ~/MasterPage.master.vb (Optionally)
  • ~/web.sitemap contents
And then a CRC32 of the whole. If your content is really dynamically generated (from a database, or by code), you will need to use it also, like any other dependency and include it in the former list.
It might seem too much burden, too much CPU usage but, as everything, it really depends on the website:
High CPU usage Low CPU usage
High volume This scenario might not cope with the extra CPU needed. See Note*. You can safely spend CPU cycles in order to save some bandwidth. Implementing conditional GETs is a must.
Low volume What kind of web server is it? Definitely not a public web server as we know them. Implementing conditional GETs will give your website the impression of being served faster.
Note*: Consider this question: Is your CPU usage so high partly because the same contents are requested once and again by the same users? If you answer is yes (or maybe), an extra CPU usage with the intention of allowing client-side caching and conditional GETs will, globally viewed, lower your overall CPU usage and also the bandwidth being used. Giving a try to this idea and decide for yourself afterwards.

Returning them in the response.

Once we have calculated both the Last-Modified & Etag values, we need to return them with the response of the page. This is done using the following lines of code:
Response.Cache.SetLastModified(LastModifiedValue.ToUniversalTime)
Response.Cache.SetETag(ETagValue)

Looking for the values in request’s headers.

Now that our pages’ responses are generated with Last-Modified and ETag headers, we need to check for those values in the requests too. The names of those parameters, when asked via request headers differ from the original names:
Response headers names Request headers names
Last-Modified If-Modified-Since
ETag If-None-Match
The logic for deciding if we should return 200 OK or 304 Not modified is as follows:
  • If both values (If-Modified-Since & If-None-Match) were provided in the request and both match, return 304 and no content (0 bytes)
  • If any of them do NOT match, return 200 and the complete page
  • If only one of them was specified (If-Modified-Since or If-None-Match), it decides.
  • If none were provided, always return 200 and the complete page.
In order to return 304 and no content for the page the code to be used is:
Response.Clear()
Response.StatusCode = System.Net.HttpStatusCode.NotModified
Response.SuppressContent = True

No comments:

Post a Comment