Jeff Turner's Weblog: April 2003 Archives

April 23, 2003

RDF: CSS syntax?

Micah Dubinko suggests using a CSS syntax for RDF:

CSS-syntax:

@namespace dc url(http://purl.org/dc/elements/1.1/)

:root {
dc|description: "A discussion of the broader context and relevance of XML/RDF techniques.";
dc|creator: "Uche Ogbuji";
}

Note that the CSS3 :root selector is used to make a 'this here document' self-reference. To make assertions about other URIs, you could either use the url() function as a selector, or select an element that points off to some URI.

Some will argue that the last thing RDF needs is another syntax. Yet, none of the existing ones are workable within DTD-valid XHTML, it seems. Re-using CSS parsing technology seems like a good compromise. Maybe.

Sounds like a good idea to me. Metadata inherits in the same way styles do, so the cascading comes in handy. Maybe we should use this in Forrest.

Posted by jefft at 01:20 PM

April 17, 2003

Bayesian spam zapping with bogofilter

It's now been a week since I started using Bogofilter a Bayesian network spam catching affair by ESR, to filter out the 11-odd spam messages I get per day. I have previously been using an elaborate procmail system called SpamBouncer, which works reasonably well, but blocks some BigPond users (Telstra being a major source of spam), and is generally hard to update.

So far bogofilter has worked very well, with no false positives, and only a few misses. The best part was that I got to use my lovingly hoarded spam collection to 'train' the network:

cat Mail/spam.incoming | formail -s bogofilter -s

formail, in particular formail -s <cmd> is extremely useful for mbox tinkering. The -s option splits an incoming mbox stream, and runs <cmd> for every message; in this case, telling bogofilter to classify the contents as spam.

In the same way, one tells bogofilter what isn't spam:

cat $MAIL | formail -s bogofilter -n

bogofilter is now trained and can be used to filter incoming mail. The man page has a sample procmail recipe that positively reinforces whatever decision is made, so the network is constantly adapting. When bogofilter lets a spam mail through, this can be rectified with (in mutt) ''|bogofilter -Ns', and then bouncing the message to oneself to test the change.

Oh, and if bogofilter seems.. really uncannily good at classifying existing mail, check that your previous spam software hasn't added custom headers to each mail. Real spammers are very inconsiderate, and don't add headers like 'X-SBClass: spam', so its no good training on such emails ;)

Posted by jefft at 10:19 PM