RDF Query Languages

I am thinking about writing a RSS aggregator. Not that it is original or has never been done, but just as an exercise to play with a RDF database and so far I am still not convinced of the way to go!

As a query language, I really like Versa, but RDQL seems like having much more traction and I have given it a closer look.

What I don’t like about it is that it seems to bring the RDF data model to its end: after you’ve used it, you don’t see triples any longer but tables of resources.

To take an example from the RDQL tutorial:

SELECT ?resource, ?givenName
WHERE (?resource, <http://www.w3.org/2001/vcard-rdf/3.0#N>, ?z) ,
      (?z, <http://www.w3.org/2001/vcard-rdf/3.0#Given>, ?givenName)

Returns:

resource                         | givenName
============================================
<http://somewhere/JohnSmith/>    | "John"
<http://somewhere/RebeccaSmith/> | "Rebecca"
<http://somewhere/SarahJones/>   | "Sarah"
<http://somewhere/MattJones/>    | "Matthew"

Where are the triples gone?

What does the strenght of other query languages in other domains such as SQL or XPath is that their data models are the same in input as in output: the result of a SQL select statement is basically like a table and I can do sub queries on this result or insert it into a table, the result of an XPath query is (or can be) a nodeset which I can output as XML and on which I can perform a new XPath query.

By contrast, the result of a RDQL query seems to be a bunch of resoures that I can’t really use as triples.

Why is that a problem?

Let’s say I want to create a RSS channel with RSS items meeting a condition. With XPath I can just write “//rss:item[my condition]” and I have a nodeset with the complete definition of these items. With RDQL, I can write a query that will give me these items as resources but I haven’t seen how I could get these items with their descriptions as triples ready to be serialized back as RSS.

What I’d really like, is a query language which would let me do with RDF graphs what I am doing in XPath!

Share and Enjoy:
  • Identi.ca
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Add to favorites

8 thoughts on “RDF Query Languages”

  1. Hi Eric,

    you can get tripels as a result from a query if you do something like this:

    ———————————————————————
    SELECT ?s, ?p, ?o
    FROM source.rdf
    WHERE (?person, vcard:N, ?b),
    (?b, vcard:Given, ?name),
    (?s, ?p, ?o)
    AND ((?s eq ?person) && (?p eq vcard:N) && (?o eq ?b)) ||
    ((?s eq ?b) && (?p eq vcard:Given) && (?o eq ?name))
    USING vcard FOR http://www.w3.org/2001/vcard-rdf/3.0
    ———————————————————————

    The triple-pattern (?s, ?p, ?o) in the WHERE-clause ensures that you
    retrieve triples from the original rdf-model.

    The variables must be bound correctly to the corresponding tripels
    (?person, N, ?b) and (?b, Given, ?name). this is done in the (quite
    complex) AND-Clause.

    As result you get the tripels {(person, N, b), (b, Given, name)} that describe the given name of the persons in the rdf-model.

    ……..

    Otherwise, if you want to flatten the name structure of a person from
    (person, N, b), (b, Given, name) to (person, givenName, name) that means you want tripels of the form (person, givenName, name) as a result you have to do something tricky like this:

    Insert a triple (givenName, someproperty, uniqueid) to your rdf-model and execute following query:

    SELECT ?person, ?gn ?givenName
    FROM source.rdf
    WHERE (?person, vcard:N, ?b),
    (?b, vcard:given, ?givenName),
    (?gn, someproperty, uniqueid)
    USING vcard FOR …

    The variable ?gn is bound to the new property givenName.
    Now you get new tripels containing the flattened name information.

  2. Hi Eric,
    I’ve (re)done some RSS (/OCS) aggregator code using Jena (+XSLT front end for non-RDF RSS), and had planned to use RDQL as the primary query mechanisms (eventually hidden by UI). I can’t remember the details, but it was possible to get the results into a usable triple form. I’ve not had much chance to play recently, but when I last looked I got a little stuck on date-based queries.

    RSS 1.0 has dates in elements as W3CDTF strings, but I couldn’t find a neat way of doing “select items where date > last week” as comparators in (Jena) RDQL only appear to work on numbers (I’ve not looked into using xsd datatypes in comparisons yet, but I suspect that won’t be possible either).

    For now I’ve got a workaround where when the item is loaded into the RDF Model, an additional triple is added with the date as the number of milliseconds since 1970-01-01. Not neat.

  3. Hi Dan,

    You’re right that it’s an API issue, but the query implementations I have used (starting with Guha’s rdfDB) give focus on getting the table of variables binding rather than getting the subgraph.

    Thinking about it, that’s probably what Dave has in mind when he says that his new Redland query API will return triples as well as table of resources.

    Anyway, I am not sure that it’s exactly what I need here: more than getting the result back as triples, I’d like to be able to keep them in the database as a sub-graph on which I could run other queries and that can finally serialize as RDF/XML.

    Ideally, the subgraph should be manipulated at query level rather than at API level since one may need to join the subgraph to the outer graph. That could be done like you can “select into” in SQL…

    Thanks for your answer that has clarified my understanding both RDQL and my own needs.

  4. Versa is well worth studying. We are in early days for RDF query, and can do with any/all experience w.r.t. the design tradeoffs involved (www-rdf-rules@w3.org would be grateful for any such findings, I’m sure).

    The RDQL-like approach is popular in part because it is conceptually very simple: a query is a graph with bits labelled as missing, and the resultset is bindings of that graph to your data.

    I think there is an answer to your ‘where are the triples’ query: the response to a query can either be conceptualised as a matched subgraph, or as a more SQL-like tabular resultset. You can transform the latter to the former by plugging values back into the original query clauses.

    Guha was quite clear about this back in http://www.w3.org/TandS/QL/QL98/pp/enabling.html and I believe RDQL systems are in the same tradition. The proposal is that we think of RDF queries as “simply an RDF model (i.e., a directed labelled graph), some of whose resources and properties may represent variables.”

    Following from that, the idea is that there are two outputs to every query, ie.:

    1. A subgraph (of the KB against which the query is issued) which matches the query.
    2. A table of sets of legal bindings for the variables, i.e., when these bindings are applied to the variables in the query, we get (1).

    Since you can get to the former from the latter, you can always find your way back to the ‘triples’ view. I have no idea offhand whether RDQL implementations offer convenience API calls to support this transformation, but it should be pretty easy. For each row, make some RDF statements by plugging in your result values to the the template-ized RDF statements in the ‘WHERE’ clause of the query. This story is probably complicated a bit by datatype-based constraints etc., but not much.

Leave a Reply

Your email address will not be published. Required fields are marked *

Enter your OpenID as your website to log and skip name and email validation and moderation!