How I fsck'd my mail server

Hi everyone,

First post, and I thought I'd share my recent troubles with everyone, so it hopefully won't happen to you. I'm still trying to figure out when I screwed up, but I definitely know HOW I did. I also know I won't do it again!

Running a debian mail server with messagewall, amavis, spamassassin, qmail & vpopmail. Ext3 FS.

We just installed messagewall and changed some settings in amavis, and we were running out of memory. It was all caused by spam. I was trying to see where it was coming from, and instead of being smart and doing things the right way, I decided to be lazy and set up one of my busier mail clients with a temporary "catch-all" account. I set it up, looked to see if the messages were getting tagged, where they were coming from, etc. So I looked at it, and I guess in my haste I deleted the directory where that mail account stores new mail (no, I don't know WHY)... and then I forgot to turn off the catch-all.

A month or two later (two weeks ago), I rebooted the server for the first time in about a year. It didn't come back up. I sent a panicked email to the network admin. He gave me the console login and we walked through it. There was a disk problem, and I needed to run fsck. I did ... and it ran all morning ... all day ... and all night. And it still wasn't done. Unluckily I didn't have a good back-up solution, but luckily there weren't many clients on the box and it was a weekend. So we did a fastboot and ran with the errors on the disk all week. There was nothing actually wrong with the disk, just a ton of inodes that had a setting of 2 that it was resetting to 1.

I looked in lost+found, and there were a ton of individual files - 11GB in total. And it wasn't done.

Sometime late in the week, I watched syslog for a few minutes and saw several error messages like this:

Code:
Feb 11 11:52:02 host qmail: 1108140722.047433 delivery 213544: \
failure: user_does_not_exist,_but_will_deliver_to_/var/lib/vpopmail/domains/domain.com/catchall    

//link_REALLY_failed_/var/lib/vpopmail/domains/domain.com/catchall/Maildir/tmp/1108140722.20655.tminus0,S=14773_ 

/var/lib/vpopmail/domains/domain.com/postmaster/Maildir/new/1108140722.20655.tminus0,S=14773_errno_=_2/system_error/
So I ran fsck overnight again at the end of the week, and again last night. It finally finished this morning at 3am. Total: 14GB of spam. I don't know why the messages didn't delete correctly, and if anyone has a clue as to why they didn't, I'd like to know. Just curiosity - I don't think I'll do that again.

However, I did learn my lesson: Do it right the first time, or you'll ruin your weekends How I fsck'd my mail server

 

 

 

 

Top