Making Plone cacheable
Getting the Datenschleuder production ready was really a three step process: setting up the ZEO cluster, configuring Squid and lastly making Plone cacheable.
Of these three steps, the last one is the most crucial. It's nice to have a lot of ZEO nodes slaving away, but even doubling the amount of nodes will at best only double your performance. Using Squid (or Apache's mod_cache for that matter) on the other hand will yield impressive results on identical hardware - but only if you've configured your site to be cachable. In other words, sticking a squid in front of a vanilla plone instance will give you squat.
The good news is though, that it turns out to be fairly easy to achieve this - if you're willing to make some rather harsh compromises.
In order to achieve effective caching, that's also easy to implement we've imposed the following restrictions:
- Only public (i.e. anonymously viewable) content will be cached
- All authenticated traffic will be encrypted (good idea, anyway!)
- All anonymous traffic will be unencrypted (not ideal, but that's why it's called a compromise)
- No dynamic content on the public pages.
Of all these restrictions the last one weighs in most heavily, in my opinion. It means that we've stripped all publicly viewable pages of the Datenschleuder of any kind of dynamically generated elements such as calendar, Usertracking.
Restrictions two and three are enforced inside Apache (or your URL rewriter in case of loadbalancing).
Once you've ensured these four restrictions, the rest is a piece of cake. Click on the link below to get your hands on the gory details...
The easiest way to make a HTTP object cachable is to give it a max-age. This will signal any cache-aware handlers (i.e. your browser, proxies or dedicated caches such as squid) that the object might be 'fresh' and will thus attempt to not request it directly.. Finetuning this is black magic and deals with concepts of validation ("Finding out, whether an object really is fresh or not, without incurring the ugly overload of actually requesting the whole damn thing and then find out after the proverbial milk has been spilled.") modification time stamps and other vodoo.
In this case we've simply configured Plone to claim that alle objects are fresh for an hour, period. (This means, that new content might take an hour to show up at clients - not exactly a big deal with a quarterly magazine ;-)
On the other hand, this means, assuming just ten requests per second that only one in 36.000 requests will actually be dealt with by Zope!
To achieve this, customize the global_cache_settings template (found in plone_skins/plone_templates) to the following:
<metal:cacheheaders define-macro="cacheheaders">
<metal:block tal:define="dummy
python:request.RESPONSE.setHeader('Content-Type',
'text/html;;charset=%s' % charset)" />
<metal:block tal:define="dummy
python:request.RESPONSE.setHeader('Content-Language', lang)" />
<metal:block tal:define="dummy
python:request.RESPONSE.setHeader('Cache-Control','public,
max-age=3600')" />
<metal:block tal:content="structure
python:here.enableHTTPCompression(request=request, debug=0)" />
</metal:cacheheaders>
If your site updates more often, just lower the threshhold to a couple of minutes.
P.S. What MarsEdit really needs is an function "HTML-encode selection" in order to efficiently include stuff like the template above....

Re: Making Plone cacheable
Thanks a ton for this information! This is a really great tutorial.