Skip to content. | Skip to navigation

Sections
Personal tools
What is this?
Hi, my name is Tom Lazar and I'm a Plone and Zope developer based in Berlin, Germany and this is my personal and professional (no big difference, really...) website.
 

exim

Jul 06, 2005

Back on Track

Filed Under:

"Spam? What spam? There is no spam here..."

There now, that's better!

  PID USERNAME PRI   SIZE    RES     CPU COMMAND  
93942 spamd 96 43560K 42836K 0.00% perl5.8.6
95119 cyrus 4 39308K 4304K 0.00% imapd
93943 spamd 4 32956K 32276K 0.00% perl5.8.6
93940 spamd 4 32364K 31628K 0.00% perl5.8.6
93941 spamd 4 30064K 29356K 0.00% perl5.8.6

See, I told ya, it's possible ;-) Actually, the strategy was pretty simple and straightforward, since we've already been doing the more obvious steps such as using spamd instead of spamassassin. Our goal was to

  • reduce memory footprint of spamd processes
  • use a local DNS to speedup name lookups

While the latter was just a matter getting it over and done with, the former proved to be a bit more tricky. We opted to try, what happens, if we limit the size of mails processed by spamd - tests had shown, that even when just one of our clients was simply sending out a single email with a large(ish) attachment (say, 4 - 6Mb) the corresponding spamd child would eat up to 500Mb of RAM and max out the CPU.

It just proved that our particular setup with Exim doesn't seem to be so well documented than the other two (it's only been introduced with Exim 4.5.x). Basicially, you compile exim with WITH_CONTENT_SCAN=yes and then can use the spam directive in your exim configuration rather than setting up various transports and routers where you pipe the mail through spamc (the leightweight client to spamd).

After hunting down the place where to modify Exim's spamc parameters in vain I finally stumbled over the crucial bit here. I could tell, that it was going to work, because it immediately seemed so obvious in hindsight - God bless Ronan McGlue ;-)

The logic behind it is this: since the occurence of the spam directive actually triggers the call to spamc, simply add a condition to that ACL such as the following one:

  warn  message = X-Spam-Score: $spam_score ($spam_bar)
spam = nobody:true
condition = ${if <{$message_size}{100k}{1}{0}}

After that everything looked dandy, i.e. the load was fine, even when lots of mails were pouring in. However, the spamd children still consumed 200 to 400Mb of RAM each, which produced lots of swap activity. I opted to cut down the number of client connections for a spamd process from the default 200 to 100, this means, that it will respawn sooner - the freshly spawned process weighs in with just some 40Mb and then starts to grow. If this setup will hold well during the next days I'll try increasing the amount of child processes from four to six or eight.

spamd_flags="-m4 --max-conn-per-child=100 --ident-timeout=30 \
-x -d -r /var/run/spamd/spamd.pid -u spamd \
-H /var/spool/spamd"

Again, eventhough it's a pain in the ass having to deal with such nuisances I can't deny that I actually enjoy tinkering with this kind of stuff ;-) And as usual, a big shout out to Cryx, I really wouldn't wanna do this with anyone else ;-)

Bogged down

Filed Under:

"Das Internet ist keine Blümchenwiese."

Man, talk about 'major suckage'... Recently Cryx and I have been noticing increasing periods where our mailserver would more or less crawl to a halt. For several minutes, CPU load would be maxed out at 99.x%, swap-usage up to 50%. SMTP would time out, the webmail interface become unuseably slow. The culprit? SpamAssassin with up to four child processes weighing in with 200 to 500Mb RAM consumption each!

Since it happened so irregularily it was difficult to pinpoint the exact bottleneck within the spam filtering process but today it just got so bad, that clients started calling me on the phone and asking, what was going on - in an ad hoc measure we simply had to shut down spamd - now everything is purring along again - load average 0.00 ;-)

But, of course, none of the mails are now being tagged with X-Spam-Scores anymore (rendering my new avelsieve setup quite pointless).

Well, tomorrow Cryx and I will get together and try to figure something out in peace and quiet - afterall, we're only talking about <100 IMAP accounts on a 3GHz machine with 1Gb RAM - we definitely should be able to remedy the situation! It does pain me, though, to think of all the things I actually could be doing instead of dealing with this modern day plague.


Apr 27, 2005

Geek Adrenaline (once again)

Filed Under:

From the Shit-Happens-Department

When it rains, it pours, I guess. So just after spending a weekend procrastinating work and finally getting back to it, the drive of the machine hosting this site broke! After getting home at 2 a.m. I noticed that it’s not reachable anymore and immeditately issued a reboot request through HostEurope’s management interface. And what do you know? At 02:20 I get a call from the technician on duty telling me that the machine was frozen and was now in the process of checking the drive and that he would get back when he had a prompt. At 02:30 I noticed that the machine could be pinged again. At 02:32 I got another call and we went through the list of files, that fsck had reported broken - phew! nothing important, just a couple of log files… To make a long story short: today co-admin Cryx and I spent a full ten hours migrating some 106 Cyrus based IMAP accounts weighing in at 8Gb that still had been residing there to the new mail server. (We had been meaning to do this for ages but their was no real pressing need - until yesterday!)

Despite being somewhat of a seasoned admin meanwhile, I still have a lot of respect for any operations involving changes in IMAP and DNS (we had to change MX-records for a few dozen domains)- it’s complicated stuff and people can get really upset if it doesn’t work… We modified my existing script to handle the new task and actually managed to pull off the whole stunt between 22:00 and midnight after spending the day preparing everything - and as far we can tell, no emails have been lost at all - due to our ever-so-nifty migration strategy ;-).

So, if you happen to be one of my customers: sorry for not having given you prior notice, but with the damaged drive we just didn’t want to take any risk! I’ll be sending out a detailed explanation of what happened during the day. Meanwhile most providers have already picked up the new IP-Address for mail.tomster.org and after sending a gazillion test mails from each and every server we’ve got an account on I’m pretty confident, that come this morning nobody will know that anything has happened at all - knock on wood! All in all, though, I’m quite pleased how things went down and how we handled the situation. Also, I couldn’t help but notice how much fun it was using TextMate and Python in handling this real-world problem…

And to Cryx: you’re one hell of a guy to work with, thanks for everything!!

Feb 19, 2004

550 Unknown user - finally!

Filed Under:

Yesterday I finally understood, why my Exim setup wouldn't give permanent errors when confronted with email to users that didn't exist on the server (anymore).

Instead, what my configuration yielded was something like temporarily rejected RCPT : error in redirect data: no local part in "@cyrus-local-delivery.tomster.org".

While of course not being exactly elegant this also had some other unpleasant side-effects. For one, I am currently hosting a domain of a once rather busy publishing company and you wouldn't believe the amount of newsletters the former employees are still receiving. In some cases these people haven't been working their for over two years. But of course, the newsletter bots keep on trying, afterall, all they ever get is a temporary error.

Well, that's fixed now ;-) So in case you're using my setup or something similar, you will want to take another look at it.

Now, that my scripts for maintaining the user accounts and passwords and my config files have been polished, I'm ready to take the next step: I want to keep the entire setup in XML-files and use XSLT-Stylesheets to create Exim's config files. Perhaps in Spring, when my current projects have been completed...