Skip to content. | Skip to navigation

Sections
Personal tools
What is this?
Hi, my name is Tom Lazar and I'm a Plone and Zope developer based in Berlin, Germany and this is my personal and professional (no big difference, really...) website.
 

quills

Oct 29, 2006

Proposal for a Zope3-based caching strategy

Filed Under:
Regular readers of this blog know, that I've been obsesseddealing with a lot of performance issues with this site in particular and Plone/Zope in general over the past months. And truth be told, I actually kind of enjoyed that – because I learned a lot... and was sufficiently motivated, because it was my own ass, that I had to save from very real trouble (mainly because of my paying customers who have their Plone sites on the same machine as tomster.org)
To sum up the current state (before moving on):
  • Cache-Fu is entirely Five-agnostic. This means, that it won't set any caching headers for Five based templates. I.e. also not for basesyndication's feed templates.
  • Feeds get a whole lot of repeated access, but change much more rarely than they get accessed (even on more prolific blogs than mine ;-)
  • fatsyndication's default adapter has a very generic but also very expensive implementation for a feedsource's modification date (it simply queries all of its contained objects and then sorts them by modification date) because it doesn't (and shouldn't!) "know" anything about specific implementation details
  • tomster.org currently contains roughly 500 blog entries.
  • tomster.org/blog's feeds currently receive ca. 30.000 requests per month combined
In summary (without going into further detail), this meant that every month the zope instance hosting tomster.org was traversing and sorting 500 objects (not brains) three times for each of those 30.000 requests and then rendering the pagetemplate for said objects from scratch... for content that changed maybe ten to twenty times during that period... insane... and quite costly, too. Another (and similar) factor was search bots crawling the archives, they too created massive spikes in cpu usage (despite that content changing even less, i.e. not at all during that period...)
What I'm trying to say, I guess, is that it wasn't pretty. As an ad-hoc measure I've created some enhancements for fatsyndication and have written a catalog-based implementation for the modification date for Quills. With that implementation it was trivial to provide accurate Modified-Since headers for blog entries, the blog view, the archives and, last-but-not-least the feed templates. As a result, the majority of accesses to tomster.org/blog are now served from Squid and the overall CPU-usage has gone down considerably (also, I like to think, that the sites hosted here have become quite snappy™ ;-)
But let's face it: all of that's just cosmetics. What's really called for is an approach that is both more generic and more efficient. Because even if that catalog-based implementation of the modification date is a whole lot more efficient than the the previous method it still is called 30.000 times a month, even though only ten to twenty times would be sufficient. In other words, it would be nice to switch from a request driven model to a changes driven one. And from where I'm standing, Zope3 brings everything necessary to the table, namely events and annotations.
In a nutshell, my current vision of the affair would be this: a caching product (let's call it cacheable for the time being) would provide default adapters for the standard content types providing the effective modification date for any object. For non-folderish content that would be (in most cases) simply their Dublin Core modification date. For folderish-content the default adapter would return the youngest modification date of its children. This modification date could be inserted into any template rendering said object and thus enabling caching – so far, so good.
This value could be stored as an annotation to that object and could thus be retrieved very cheaply. But... how to guarantee, that that value is accurate and up-to-date? Events to the rescue. Our hypothetical cacheable product would have to register itself for creation- and modification events. Each of these adapters would contain some kind of logic that would determine, whether the object that triggered the event should be considered relevant to it (i.e. the interface could define something like isRelevant(request, object)). If that is the case, it would update its effective modification date annotation to now (and perhaps trigger some invalidation mechanism inside the caching tool such as squid or varnish).
For example, an adapter for Quills would register itself for modification- and creation events for blog entries and comments. Then perhaps its isRelevant() method could check, whether the object in question is one of the most recent 15 blog entries (or whatever the syndication tool tells it, what should be considered). If so, it would set its own effective modification date to now.
Obviously, the default adapter could simply just pass on the modification date of the portal_catalog because that would be a) very cheap and b) very conservative.
Anyway, this is just me kicking some unfinished thoughts around in public. The whole concept is far from finished, but I'd rather embarrass myself with some half-baked ideas than just chew on this on my own. Anybody nerdy enough to find interest in this subject is more than welcome to join the discussion!
n.b. For the simple fact, that this hypothetical product hasn't been written yet, you might need to force-reload this blog entry after submitting a comment in order to see your comment straight away... especially Safari seems to be rather aggressive in its client-side caching... a simple shift- or alt-reload should always deliver a fresh copy of your comment, though...

Aug 18, 2006

Zope, XML-RPC and DateTime

Filed Under:

One word: duh!

And here I was, wondering all along why TextMate's neat Blogging bundle didn't work with Quills' MetaWeblogAPI implementation. When retrieving or posting an entry it would always barf at the entry's dateCreated attribute. Being the author^Wculprit of Quills MetaWeblogAPI I figured, I might be doing something wrong here. And sure enough: turns out I'm returning the date in RFC822 format, whereas the XML-RPC spec requires ISO8601.
So, I changed it, but still no dice. WTF? Using the indispensable XML-RPC Client debugger I noticed, that a Wordpress instance that I used as reference returned the following XML-RPC for dateCreated:
<member>
    <name>dateCreated</name>
    <value>
        <dateTime.iso8601>20060713T19:04:45</dateTime.iso8601>
    </value>
</member>

whereas Quills...

<member>
    <name>dateCreated</name>
    <value>
        <string>2006-08-19T00:33:47+02:00</string>
    </value>
</member>
...returned a String object, not a DateTime object... But returning a DateTime object
        struct = {
'postid': obj.UID(),
'dateCreated': obj.effective(),
[...]

resulted in a exceptions.OverflowError - long int exceeds XML-RPC limits... *sigh* Turns out, Zope's XMLRPC library only learned how to deal with DateTime objects as recently as in version 2.9.3(!) -- whereas tomster.org is being hosted on 2.8.8.

At this point I was in no mood to give up -- I had gone too far already. So I migrated tomster.org to Zope2.9.4 and sure enough! Quills now returned a XML-RPC DateTime object, w00t! Except... it looks like this
<dateTime.iso8601>2006-08-19T00:33:47+02:00</dateTime.iso8601>

instead of

<dateTime.iso8601>20060713T19:04:45</dateTime.iso8601>
and so the blogging client still barfs on it!!
I was just about to dig further, when I noticed that Zope2.9.4 was segfaulting with the Data.fs of tomster.org ca. once every minute... I guess it just wasn't meant to be...
Now I'm back at Zope 2.8.8, a little bit wiser regarding XML-RPC... and still writing blog entries in Kupu... And if you've read this entry all the way to here, you probably should consider getting a life. I know, I am ;-)

Back (once again)

Serving syndicated content non-statically can be dangerous

So, tomster.org is back again... kind of... After lots of experimenting with CacheFu and the Firefox plugin HTTP LiveHeaders and looking at my Apache logs I finally found out, what could have been the cause for the dismal performance of this site:

  • tomster.org currently has an average of ca. 3900 pageviews per day
  • 25% of which are for for my atom or rss feeds
  • none of which ever returned a 304 Not Modified Code!
So, essentially this instance was busy generating the same old xml files ca. 1000 times per day, that factually only changed every couple of days *cough* weeks... Ouch!

But having narrowed down the problem, I was now able to take measures. After a bit of RTFM (CacheFu's that is) and looking at its control panel I found the solution: I had to add the feed-ids to the list of cacheable templates, like so:



This was possible, because the site-product for tomster.org provides its own ZPT-templates for the atom feeds which take precedence over the atom.xml Five view that Quills' basesyndication Product provides. Because sadly, I haven't found a way (yet) how to make Five views cachable -- which is why the RSS feed of this site is still being (re-)generated for every request. Luckily, 80% of feed views access the atom feeds and not the RSS.

So, if anybody has any idea on how to make CacheFu cache Five views, please speak up!

While I was playing around with CacheFu I added the WeblogEntry content type to its 'content rule' and Weblog and WeblogArchive to its 'container rule' -- with the result that also the front page is now able to return 304s.

Now, the way I understand it, this won't speed up access for first-time visitors at all, but clicking back and forth on tomster.org has become noticably snappier -- and, of course, all those feedreaders out there, will only retrieve a feed if it's actually changed.

I hope, we can sort this out for the Five views, though -- or else any Quills instance could quickly turn a plonesite into a snail -- and we don't want that...

Jul 15, 2006

Test 1,2,3...

Filed Under:

"Is this mic on?"

This is just a public testdrive of a few bugfixes that I've applied (yet again) to the MetaWeblogAPI of Quills. Somehow stuff that I've already gotten to work in the 0.9 branch didn't quite survive the transition to the trunk... talk about scratching one's own itch, heh!

So, unless anything unexpected happens during the publication of this entry I can finally declare Quill's MetaWeblogAPI to be in a usable and working state ;-)

Jun 28, 2006

Site Update

Filed Under:

Quills development is regaining momentum

Thanks to Reinouts recent work on the Quills/trunk I was able to migrate tomster.org’s blog instance from the 0.9 branch to trunk - and while I was at it, I updated the entire site from Plone 2.1.2 to 2.5.

As Reinout says himself — there’s really not much to report, as everything went rather smoothly. For me personally, however, this allround positive experience came just at the right time, as I have been wrestling with some nasty low-level Archetypes issues for the past several days — with increasing amounts of frustration. This made me realize, that usually when I have my gripes with Plone it’s connected with Zope2 or Archetypes but hardly ever with Plone itself — go figure!

Having migrated my own blog to Quills/trunk also finally gave me ample motivation to migrate some fixes that I’ve applied to the 0.9 branch to it. Most notably,

  • MetaWeblogAPI works (this post was created with MarsEdit)
  • The Comments management screen has been converted to use a catalog query and thus received a significant speedup
  • You can now filter comments by author, subject and body text — very useful for weeding out spam bots!
  • Using the portal_catalog also finally made it possible to sort comments by creation date (previously they had been sorted by the date of the post they belonged to, which meant that new spam to old posts was hard to spot)
  • fatsyndication now honours portal_syndication’s getMaxItem property (default is 15)
  • fixed the rss.xml template of basesyndication to validate and avoid continued embarressment on Planet Plone ;-) (thanks Ree!)

The most fun feature that the trunk offers (thanks to Tim) is that it’s ditched its own WeblogTopic content type in favour of using Plone’s own keywords — this makes it significantly easier to tag entries and allows for a nifty tag-cloud portlet.

It’s really nice to see that development of Quills has gained momentum again and that the trunk is now in a usable shape.