Introducing SPO(G)

SPO(G) is not a new syntax, nor a format or a protocol. It is, however, a syntactic profile and a convention.

<sparql xmlns="http://www.w3.org/2005/sparql-results#">
  <head>
    <variable name="g"/>
    <variable name="s"/>
    <variable name="p"/>
    <variable name="o"/>
  </head>
  <results>
    ...
  </results>
</sparql>

I have previously written about exchange of named RDF graphs through the use of quads-in-zips, and while that approach works just fine, it needs to be implemented on both sides of the exchange.

This is also true for SPO(G), but with SPARQL implementations being widespread, the export side is already in place all around the web, and the import side is quite easily implemented — I have sent an implementation of SPARQLXMLResultLoader for ARC to bengee, and while he is also busy working on a streaming serialiser, it seems likely it’ll be a part of a coming release of ARC.

As can be seen from the example, SPO(G) is simply a constrained SPARQL Query Results XML Format: It needs to have three or four variables, s, p, and o must be present, with g being optional (making it YARS). For all results, all variables must be bound.

SPO(G) isn’t as compact as quads-in-zips, but there’s no reason for it not to be compressed during exchange, either through a manual process or via the usual gzip-encoding on-the-fly.

I should perhaps write it up properly, but I think I’d rather go off and implement it for Redland.

11 thoughts on “Introducing SPO(G)

  1. I wonder if ordering/grouping of the quads (by graph) should be mandatory, for performance/implementation reasons?

    And maybe also make it possible to use subject/property/object/graph variable names as well?

  2. A fine idea.

    One minor issue / clarification:

    “For all results, all variables must be bound”

    If you drop that for g then you can serialise all datasets. Any reason not to do that?

  3. Damian,

    I may not understand your question correctly, but if g is not bound for all results, then it’s just the SPO-format?

    If you are concerned about the case of some triples not being in a named graph, then I guess it could be supported, but I assume different sources/stores would have different interpretations of that, perhaps considering them to belong to the default graph. I guess that could be useful…

  4. Quantification issue :-) “g is not bound for all results” No, I mean g may not be bound.

    “considering them to belong to the default graph”

    Yep, so any sparql dataset could be serialised, which would be very useful for dumping / backup purposes (although it doesn’t support named but empty graphs).

    SELECT ?s ?p ?o ?g # I've probably messed this up somewhere
    {
    { ?s ?p ?o }
    UNION
    { GRAPH ?g { ?s ?p ?o } }
    }

    This would also allow the equivalent of CONSTRUCTing datasets, which isn’t possible currently.

  5. Allright, I think I got it now, all of it. ;-)

    Does a named-but-empty graph really exist if it doesn’t contain any triples? :-)

    In any case, I suppose they could be supported with some sort of extension mechanism, but I’m not sure how much is allowed in the SPARQL Query Results XML Format. New custom elements in the results section seem to forbidden.

  6. I guess I’d drop the “all variables must be bound” requirement, too. SPARQL XML allows empty bindings, so I’d say consumers should decide on their own how to handle non-complete rows, e.g. ARQ/Jena could accept “g***” as an empty, named graph; other stores could ignore it. ARC would perhaps only process full “gspo” rows as it doesn’t like “unnamed” graphs, Resource-centric implementations could perhaps process “gs**”. It would be possible to support use cases such as “Start with a clean system, but seed it with a list of resource IDs from this “*s**” SPO(G) file”. Or “Import a “**p*” SPO(G) doc as ‘p a rdf:Property'” etc.

    Long story short, SPO(G) is a handy data exchange format, and even non-RDF exporters and/or consumers could use it. I’d keep it as unrestricted as possible.

  7. “Does a named-but-empty graph really exist if it doesn’t contain any triples?”

    It does in sparql :-)
    SELECT ?g { GRAPH ?g {} }
    You might wonder how many stores support it. My guess is: not many ;-)

  8. There is already a sparql results xml format building
    in rasqal SVN that reads this format. That could be
    repurposed or altered to provide this as a parser.

    It’s a streaming parser based on the raptor sax2 api
    so it wouldn’t even have to build a DOM, so it would
    work for large docs.

  9. @Damian: Exactly :)

    @Dave: That’s great news. I wonder how much effort it would take to glue the parts together for building an importer (I guess it could be a part of Redland and/or available from rdfproc) — the export part I’m assuming we needn’t worry about.

Comments are closed.