Proposal for a Zope3-based caching strategy
Regular readers of this blog know, that I've been obsesseddealing with a lot of performance issues with this site in particular and Plone/Zope in general over the past months. And truth be told, I actually kind of enjoyed that – because I learned a lot... and was sufficiently motivated, because it was my own ass, that I had to save from very real trouble (mainly because of my paying customers who have their Plone sites on the same machine as tomster.org)
To sum up the current state (before moving on):
- Cache-Fu is entirely Five-agnostic. This means, that it won't set any caching headers for Five based templates. I.e. also not for basesyndication's feed templates.
- Feeds get a whole lot of repeated access, but change much more rarely than they get accessed (even on more prolific blogs than mine ;-)
- fatsyndication's default adapter has a very generic but also very expensive implementation for a feedsource's modification date (it simply queries all of its contained objects and then sorts them by modification date) because it doesn't (and shouldn't!) "know" anything about specific implementation details
tomster.orgcurrently contains roughly 500 blog entries.tomster.org/blog's feeds currently receive ca. 30.000 requests per month combined
In summary (without going into further detail), this meant that every month the zope instance hosting tomster.org was traversing and sorting 500 objects (not brains) three times for each of those 30.000 requests and then rendering the pagetemplate for said objects from scratch... for content that changed maybe ten to twenty times during that period... insane... and quite costly, too. Another (and similar) factor was search bots crawling the archives, they too created massive spikes in cpu usage (despite that content changing even less, i.e. not at all during that period...)
What I'm trying to say, I guess, is that it wasn't pretty. As an ad-hoc measure I've created some enhancements for fatsyndication and have written a catalog-based implementation for the modification date for Quills. With that implementation it was trivial to provide accurate
Modified-Since headers for blog entries, the blog view, the archives and, last-but-not-least the feed templates. As a result, the majority of accesses to tomster.org/blog are now served from Squid and the overall CPU-usage has gone down considerably (also, I like to think, that the sites hosted here have become quite snappy™ ;-)
But let's face it: all of that's just cosmetics. What's really called for is an approach that is both more generic and more efficient. Because even if that catalog-based implementation of the modification date is a whole lot more efficient than the the previous method it still is called 30.000 times a month, even though only ten to twenty times would be sufficient. In other words, it would be nice to switch from a request driven model to a changes driven one. And from where I'm standing, Zope3 brings everything necessary to the table, namely events and annotations.
In a nutshell, my current vision of the affair would be this: a caching product (let's call it
cacheable for the time being) would provide default adapters for the standard content types providing the effective modification date for any object. For non-folderish content that would be (in most cases) simply their Dublin Core modification date. For folderish-content the default adapter would return the youngest modification date of its children. This modification date could be inserted into any template rendering said object and thus enabling caching – so far, so good.
This value could be stored as an annotation to that object and could thus be retrieved very cheaply. But... how to guarantee, that that value is accurate and up-to-date? Events to the rescue. Our hypothetical
cacheable product would have to register itself for creation- and modification events. Each of these adapters would contain some kind of logic that would determine, whether the object that triggered the event should be considered relevant to it (i.e. the interface could define something like isRelevant(request, object)). If that is the case, it would update its effective modification date annotation to now (and perhaps trigger some invalidation mechanism inside the caching tool such as squid or varnish).
For example, an adapter for Quills would register itself for modification- and creation events for blog entries and comments. Then perhaps its
isRelevant() method could check, whether the object in question is one of the most recent 15 blog entries (or whatever the syndication tool tells it, what should be considered). If so, it would set its own effective modification date to now.
Obviously, the default adapter could simply just pass on the modification date of the portal_catalog because that would be a) very cheap and b) very conservative.
Anyway, this is just me kicking some unfinished thoughts around in public. The whole concept is far from finished, but I'd rather embarrass myself with some half-baked ideas than just chew on this on my own. Anybody nerdy enough to find interest in this subject is more than welcome to join the discussion!
n.b. For the simple fact, that this hypothetical product hasn't been written yet, you might need to force-reload this blog entry after submitting a comment in order to see your comment straight away... especially Safari seems to be rather aggressive in its client-side caching... a simple shift- or alt-reload should always deliver a fresh copy of your comment, though...
