Viruses

Last week, when “W32.Beagle.A@mm” started to spread out, I decided to switch my mailing lists to “moderated”.

Not that my subscribers can be affected by viruses (I configure my mailing lists to scrap binary attachments to reduce the risk), but a flurry of mails saying “Hi” was something I wanted to avoid, even if they are safe!

This week, when “W32.Novarg.A@mm” came out, I decided I needed something stronger and more automatic…

After some googling, I have installed clamav and amavis. That took me some time and has been painful (the mail system is probably what’s the most complex on my servers, with many different programs involved: postfix, procmail, cyrus, procmail and now clamav and amavis), but I am pretty happy with this small achievement.

Clamav is what’s doing the real work. It’s an open source anti-virus scanner. It comes with an update daemon and several virus signatures seem to be added daily as far as I can tell on my limited experience.

Also open source, Amavis is what does the interface between the MTA (Postfix in my case) and the virus scanner (Clamav in my case). I have installed a flavor of Amavis named “Amavisd-new”. Amavis is highly configurable. You can tell him when a virus is using fake sender addresses and in that case, it won’t send a report to the sender. I wish more systems and admins could be using that feature to avoid flooding the net with rubbish virus notifications!

With this setup, I have switched my mailing lists to their normal mode again and I am now watching viruses being caught: the rate has reached 30 viruses per hour. 30 mails that won’t leave my SMTP server and never spread their virus…

That may be a small achievement, but I feel a good “net citizen” :) … If more SMTP servers (including those from ISPs) were equipped with such tools, the viruses would spread much, much, much slower.

Share and Enjoy:
  • Identi.ca
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Add to favorites

Generating samples out of a schema

For one of my projects, I need to generate sample documents out of a W3C XML Schema.

I had a quick look on existing products but ruled tham out since I don’t believe they meet the requirements imposed by my customer:

  • The values need to be “significant” (no “abcd” for a name).
  • The schema is using some advanced WXS features such as derivation and xsi:type.

I plan to implement that as a two step XSLT transformation:

  1. The WXS schema is transformed into a simplified RELAX NG schema.
  2. The simple RNG schema is used to generate the samples.

The benefit of having a valid schema as the intermediate document between the two steps is that that should facilitate the debugging: if a sample document isn’t valid per the original schema, a validation against the intermediate schema should show if the bug is in step one or step two.

The reason why I have chosen RELAX NG for this intermediate schema is that it’s easier to write and easier to process but also that I can have a single schema for all the namespaces used in the original schema (that wouldn’t be possible with a W3C XML Schema).

I plan to code lists of meaningful values as WXS schema annotations and convert these annotations into RELAX NG choices so that all the alternatives (both in the achema itself and in the meaningful values) are translated into RNG choices in the intermediate schema.

Not trivial, quite a lot of work but that seems feasible to me!

Share and Enjoy:
  • Identi.ca
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Add to favorites

RDF Query Languages

I am thinking about writing a RSS aggregator. Not that it is original or has never been done, but just as an exercise to play with a RDF database and so far I am still not convinced of the way to go!

As a query language, I really like Versa, but RDQL seems like having much more traction and I have given it a closer look.

What I don’t like about it is that it seems to bring the RDF data model to its end: after you’ve used it, you don’t see triples any longer but tables of resources.

To take an example from the RDQL tutorial:

SELECT ?resource, ?givenName
WHERE (?resource, <http://www.w3.org/2001/vcard-rdf/3.0#N>, ?z) ,
      (?z, <http://www.w3.org/2001/vcard-rdf/3.0#Given>, ?givenName)

Returns:

resource                         | givenName
============================================
<http://somewhere/JohnSmith/>    | "John"
<http://somewhere/RebeccaSmith/> | "Rebecca"
<http://somewhere/SarahJones/>   | "Sarah"
<http://somewhere/MattJones/>    | "Matthew"

Where are the triples gone?

What does the strenght of other query languages in other domains such as SQL or XPath is that their data models are the same in input as in output: the result of a SQL select statement is basically like a table and I can do sub queries on this result or insert it into a table, the result of an XPath query is (or can be) a nodeset which I can output as XML and on which I can perform a new XPath query.

By contrast, the result of a RDQL query seems to be a bunch of resoures that I can’t really use as triples.

Why is that a problem?

Let’s say I want to create a RSS channel with RSS items meeting a condition. With XPath I can just write “//rss:item[my condition]” and I have a nodeset with the complete definition of these items. With RDQL, I can write a query that will give me these items as resources but I haven’t seen how I could get these items with their descriptions as triples ready to be serialized back as RSS.

What I’d really like, is a query language which would let me do with RDF graphs what I am doing in XPath!

Share and Enjoy:
  • Identi.ca
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Add to favorites

RDDL 2.0

Jonathan Borden has announced a version 2.0 of RDDL.

I am (or have been?) a great believer and promoter of RDDL, but I am disapointed by this version. I think that RDDL 1.0 was better than RDDL 2.0 and have sent a mail to explain why.

If you look at this example (from examplotron):

<rddl:resource
  id="xsd-schema"
  xlink:arcrole="http://www.rddl.org/purposes#schema-validation"
  xlink:role="http://www.w3.org/2001/XMLSchema"
  xlink:title="W3C XMLSchema for examplotron"
  xlink:href="examplotron.xsd"
  xlink:type="simple"xlink:show="none"
  xlink:embed="none">
<div class="resource">
<h4>W3C XML Schema for examplotron</h4>
<p>This W3C XML Schema (Proposed Recommendation, 16 March 2001)
<a href="examplotron.xsd">schema</a> describes the examplotron
vocabulary and can be imported in W3C XML Schema to validate examplotron schemas.</p>
</div>
</rddl:resource>

A link is expressed between the full description of the schema (including the whole <div/> element) and the schema.

With the new proposal you would end up with something such as:

<div class="resource">
<h4>W3C XML Schema for examplotron</h4>
<p>This W3C XML Schema (Proposed Recommendation, 16 March 2001)
<a href="examplotron.xsd"
   rddl:nature="http://www.w3.org/2001/XMLSchema"
   rddl:purpose="http://www.rddl.org/purposes#schema-validation">schema</a>
describes the examplotron
vocabulary and can be imported in W3C XML Schema to validate examplotron schemas.</p>
</div>

And the link is now between the schema and the much less significant piece of text “schema”.

And there is no way to keep the expressive power of RDDL 1.0 because the content model of the XHTML <a/> element doesn’t allow it!

The argument beyond this modification is that the syntax is simpler. That might be, but I think that it’s just not working. Yet another example of a vocabulary that should be as simple as possible but not simpler!

The other thing I don’t like is that they have used the same namespace which means that all the existing RDDL document implicitely point to the description of a vocabulary which is very different from their content.

Share and Enjoy:
  • Identi.ca
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Add to favorites

Syncato

Why Syncato?

Because Syncato isn’t only a weblog system, but also a real XML publication system, or if you prefer, a XML database used as a weblog system. And because this XML database can be queried with XPath which lets us fully take advantage of its XML nature. And finally because all that has been done the right way, following the principles of the web (REST architecture, use of HTTP GET, PUT and DELETE, RSS syndication, …) and using Python which is my language of choice.

Why this weblog?

To have a good reason to use Syncato :-) … And also because I wanted to have a channel where I could publish stories that are not appropriate for my usual channels (XMLfr, XML.com, xmlhack).

Share and Enjoy:
  • Identi.ca
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Add to favorites