This is a proposal for a new direction for Versa 2.0. I call it Boulder Vice because I need a name, and that will do.

This overview assumes familiarity with Versa 1 (and not necessarily with any Versa 2 work to date)

Introduction to Boulder Vice Versa

So far Versa 2.0 efforts have been focused on largely incremental improvements to Versa 1.0 to fill out capability gaps and such. I (Uche Ogbuji) must admit that one of the things that has slowed me down on Versa 2.0 (though I haven't always really clearly understood this reason) is that I think Versa needs more than that. I think it needs a fresh re-think, from the reason for being all the way through language design. This proposal follows a lot of thought on this matter. Feedback is welcome.

Boulder Vice would reposition Versa as more of a general-purpose Web data query language. Certainly it will be usable with the RDF model, but it would also have a richer, more general domain, and to that end include its own simple, well-bounded data model. This is not as radical as you would think. Versa 1.0 had its own data model, based on a mix of XPath 1.0 and RDF-1999 conventions. Versa 2.0 was tending in the same direction with an update of the RDF data model to the later RECs. (Some of the complexity and artifice that comes with these later RECs bogged down Versa 2.0 discussion for a while). XPath originated its own, very successful data model apart from the more XML-core Infoset. Even SPARQL elaborates on the RDF model (with named subgraphs, for example).

BV Versa is a graph traversal language that serves for general-purpose Web query. It's designed to query metadata and RESTful relationships between resources. It comprises a simple data model and a query language.

Motivating domain and data model

Here are some of the motivating factors for Boulder Vice Versa.

So much of what's happening with regard to Web data, whether we like it or not, is not happening directly in RDF. A good example is social networking relationships. Sure the LOD] is trying hard to keep up with all that work, but I'm not convinced they'll always be able to do so. It would be nice to generalize the idea of Web data query to address this more directly.

Some of the stuff happening in more general Web data does not fit the RDF model well as is, anyway. n-ary relationships and ordered information are a great example. RDF is frankly pretty cumbersome in the way it captures such constructs, and it would be nice not to have to wrangle with RDF-isms while trying to get to the essence of the data. It would be nice for Versa to be able to handle such constructs more directly.

SPARQL, like it or not, has official dress on it as RDF query language, and it would be productive to reduce the head-to-head positioning.

The Boulder Vice Versa 2.0 data model is based on RDF, but designed for more expressiveness.

Domain of operation

Boulder Vice Versa operates on the information space represented by the World-Wide Web. It operates on any implementation or architectural system compatible with such an information space, including:

Boulder Vice Versa operates on its own model approximating the practical nature of such information spaces.

Synopsis of data model

Boulder Vice Versa supports a few primitive data types:

Lists are virtual data structures, intended to be lazy, i.e. accessed through iterators (similar to the Python iterator protocol or RDBMS rows with cursor support). Tuples, however, are data structures consumed as a whole.

The working information space comprises a set of graphs, each of which is represented as a set of n-tuples (not merely triples, as in RDF, though n-tuples accommodate RDF as well as all the various RDF subgraph/quad proposals, for example ordering a set of triples can be done by having a tuple slot for the ordinal). Note: a single tuple in Versa is just a special case subgraph of length 1.

A tuple is of the form: (subject, rel, value, [attrib1, ..., attribN])

There are three special, implicit attributes in Versa: subject, rel, and value

Traversal

The heart of BVV is much as in Versa 1: the traversal operator. To select all RDF labels containing the phrase "spam":

all() - atom:category -> "mytopic"

The biggest change is a new syntax to control what part of a tuple is followed in a traversal. This is called a pivot operator.

Let's take the following subgraph:

{
  (subject=<http://purl.org/person/uche>, rel=foaf:name, value="Uche"),
  (subject=<http://purl.org/person/uche>, rel=m:mass, value=95, u:unit=<http://purl.org/unit/kg>),
  (subject=<http://purl.org/unit/kg>, rel=m:scheme, value=<http://purl.org/standards/si>),
}

For the most part this is the usual set of subject, pedicate, object triples (called subject, rel and value, respectively in BVV parlance), but the second tuple is an n-ary relation. You can access uche's scalar weight value in the usual way:

<http://purl.org/person/uche> - m:mass -> *

This is because the default is to follow the value item. You can access the units in which Uche's weight is expressed by pivoting to the unit tuple item:

<http://purl.org/person/uche> - m:mass[u:unit] -> *

Another example:

<http://example.com/doc> - atom:category -> contains("tech")

queries the main value

<http://example.com/doc> - atom:category[atom:scheme] -> $dublincore

queries the scheme parameter

For RDF named subgraph approaches such as 4Suite's scopes or SPARQL's, you could query to select by scope as follows:

<http://example.com/doc> - atom:category[versa:scope = $mysubgraph] -> *

Here is an example of a tuple matched by the above query

(<http://example.com/doc>, atom:category, "high tech", [versa:scope = $mysubgraph])

Implementation notes

Versa 2 will not specify implementation, but it's important to consider how it might be implemented as validation of the approach.

A BVV tuple is of the same essential character as a Python dict. The following tuple:

(<http://purl.org/person/uche>, m:mass, 95, u:unit=<http://purl.org/unit/kg>, db:context=<http://example.org/subgraph1>)

Is equivalent to the following dict:

{
  subject: uri("http://purl.org/person/uche"),
  rel: uri("m:mass"),
  value: 95,
  uri("u:unit"): uri("http://purl.org/unit/kg"),
  uri("db:context"): uri("http://example.org/subgraph1")
}

Where subject, rel and value are hashable, singleton instances representing the built-in components of a tuple, and uri is a class representing a URI.

From an RDBMS point of view the following, not-very-normalized approach works:

CREATE TABLE statement (id key, subject text, rel text, value text);
CREATE TABLE attribute (stmtref foreign key, attkey text, attvalue text);

use cases

A Versa 1-style basic traversal expression:

all() - atom:category -> "mytopic"

However, if you think of an atom category, it's an n-ary relationship:

<entry> - subject ---->  "mytopic"

                   |

                   |
                   +- scheme ->  http://www.dmoz.org/

Meaning that the category comes from the Open Directory. You can approximate this in RDF with the usual blank node technique:

<entry> - a:subject -> [bnode] --- rdf:value ->  "mytopic"
                                |
                                |
                                +- m:scheme -->  http://www.dmoz.org/

In Versa 1 you could say "give me all schemes in use" as follows:

(all() - a:category -> *) - a:scheme -> *
#result: list(@"http://www.dmoz.org/")

Which is not syntactically horrible, but it does introduce that magic intermediate object which does not really exist in the underlying model. In BVV you can do:

all() - a:category[a:scheme] -> *

You can also use a pivot as a predicate. "Give me all categories in the DMOZ scheme".

all() - a:category[eq($a:scheme, <http://www.dmoz.org/>)] -> *

You can see how within a traversal each of the tuple item domains is available as a special, temporary variable.

Web triggers

A Web trigger is a generalization of the idea that one RESTful request can trigger another request. This covers redirection, regular server code operation that invokes other URIs (e.g. POSTing a comment to a blog might trigger a POST to a spam-checker service to approve the comment), etc.

The following tuple asserts the operation of a Web trigger:

(<http://example.com/blog>, r:triggers, <http://example.com/blog/indexe>, [r:if-method="POST"], [r:target-method="POST"])
(<http://example.com/blog/entry>, r:triggers, <http://example.com/spam-checker>, [r:if-method="POST"], [r:target-method="POST"])

"Does a POST to blog 1 update the blog index?"

<http://example.com/blog1> - r:triggers[$r:if-method="POST"] -> eq(<http://example.com/blogindexes>)

"Does a POST to blog 1 trigger any POST request?"

<http://example.com/blog1> - r:triggers[$r:if-method="POST"][$r:target-method] -> eq("POST")

etc.

The nice thing is that this syntax can be used easily to address other metadata of classic triples, such as confidence and trust assertions, time/place, general context, etc.

Atom collection query and navigation, e.g. ad-hoc query on planetatom: "Give me all posts by Sun employees on Java".

Social network exploration; "Who in my High School network works for IBM"

Notes on comparison to other query languages

re:

<http://example.com/doc> - atom:category[versa:scope = $mysubgraph] -> *

Here is an example of a tuple matched by the above query

(<http://example.com/doc>, atom:category, "high tech", [versa:scope = $mysubgraph])

The equivalent query in SPARQL would be:

select ?category where {GRAPH $mysubgraph {<http://example.com/doc> atom:category ?category}}

An earlier Versa proposal was:

scoped-subquery(define: <http://example.com/doc> - atom:category -> *, $mysubgraph)

So the Versa 1.0 query:

<http://example.com/doc> - properties() -> *

Would be in Versa 2.0:

<http://example.com/doc> - *[versa:link] -> *

"2008-05-22" <- atom:updated - *

Notes

Versa should make it easy to navigate relationships in patterns and order not originally planned. For example it should make it unnecessary to maintain redundant].

Versa/Boulder_vice (last edited 2008-11-24 18:46:30 by localhost)