Tuesday, October 29, 2013

Defeat Browser Caching

Caching is used at various levels. When you visit a website, your browser fetches the data from the web server and a part of it is stored on local storage temporarily. These can contain source code, image etc. So the next time you open the same website, your request doesn't have to traverse long hierarchy from DNS all the way to country specific web server. This saves time and bandwidth.

Browser Caching may be good for most websites or most parts of it however, it’s not advantageous in number of situations.

  • A secure banking site may not like any of its part to be cached as the customers system can be compromised and cached information can be used to gain insight into customers account. (Yes it uses SSL but the intruder can still know the time of access etc)
  • Parts which need are updated very frequently. For example, the ad block may still show the old “advertisement.gif” and user may click thinking the he’s lucky to always be the millionth visitor!
I have seen these two scenario’s in past one year.

Old Ads

Scenario 1 - In a large organization, it often happens the Development team fixes an issue raised by the Testing team and closes the bug but testers can still find the flaw!
Or
Scenario 2 - There might be various development teams in a web product which is already deployed at the client site. The UI team makes changes but these changes are not visible to end user. The angry client admonishes UI team manager and the blame is passed to business logic team who had no role in it!
The reason for all these is neither Testing team nor Business logic team but Browser Caching!!!

Fighting it out

You must remember that you cannot delete the user’s cache programmatically. Hence, all the methods revolve around prevention.

Method 1 – Tell the browser not to cache.

HTML
<meta http-equiv="cache-control" content="no-cache, no-store, must-revalidate" /> 
<meta http-equiv="expires" content="-1" /> 
<meta http-equiv="pragma" content="no-cache" />

Servlet


response.setHeader("Cache-Control", "no-cache, no-store, must-revalidate"); 
response.setHeader("Pragma", "no-cache");
response.setDateHeader("Expires", 0);

And one can make it for PHP with header(), ASP with Response.AppendHeader() etc.

This approach is useful if your application was not already cached. If it was, then browser may fetch the cached version only. Also doesn’t work on version of IE (http://support.microsoft.com/kb/321722). The above method is necessary but not sufficient. (Please read edit for the method below)

Method 2(a) – Random number next to link (un-escaped characters)
Change
<link REL="STYLESHEET" TYPE="text/css" HREF/css/default.css"/>
To
 <script >document.write('<link REL="STYLESHEET" TYPE="text/css" HREF="/css/default.css?' + new Date().getTime() + '"></link>');</script>
Each time a random number is generated, it changes the URL and browser believes that it doesn’t have a cached copy.


Method 2(b) – Random number next to link (escaped characters)
Change
<script language="JavaScript" src="/jsdir/My.js"/>
To
<script >    document.write(unescape("%3Cscript src='/jsdir/My.js?" +  new Date().getTime() + "' type='text/javascript'%3E%3C/script%3E"));</script>

The method 2(a) worked for both Google Chrome and Mozilla.

But for IE you need to follow two approaches.
1. If you have CSS then use 2(a), and for
2. JavaScripts use 2(b).


EDIT: Full proof method - Change in Method 1

    This method works on all browsers and you won't need to restart the webserver! Simply include the <head> tag with meta tags two times, one after the <body> tag (needed only for IE).

<html>
<head>
<title>title</title> 
 <meta http-equiv="expires" content="-1" /> 
 <meta http-equiv="pragma" content="no-cache" />
</head>
<body>
  Body goes here
</body>
 <head> 
 <meta http-equiv="expires" content="-1" /> 
 <meta http-equiv="pragma" content="no-cache" />
 </head>
</html>