The Jessica Project

Wednesday, October 18, 2006

RDF and JSR-170 - two possible approaches

As I mentioned, I've been thinking about RDF and JSR-170 for a while so here are some possible directions:

  1. One approach would be to model the JSR-170 data graph in RDF. So the repository could not store any RDF - only RDF that could be constructed using the JSR-170 API - but you could create a view on the repository that was valid RDF. This view could then be searched with SPARQL. There are several use cases for this: one is you want to integrate the data source with other data sources that talk SPARQL. Another is JSR-170 is a language specific standard but today web service based standards. Using RDF/SPARQL/REST, it would be possible to make a nice standards compliant web service interface to JSR-170 that was language agnostic. Producing a prototype of this wouldn't be too difficult - it would just be a matter of doing some integration work between a JSR-170 implementation like Jackrabbit and a SPARQL web server like Joseki.
  2. JSR-170 is attempting to fill a particular use case. However, to do this, it has to create many pieces of technology that are JSR-170 specific - for example the query language or the schema language. An alternative approach would be to try to consider how you would do the same thing using RDF technology. This approach wouldn't be compatible with JSR-170, it would be a replacement. It would be much more flexible than (1), as it would support any RDF metadata. Whereas JSR-170 is a Java API, this approach would specify a web service API.

Obviously the second option is more work, so (1) sounds like a good place to start. If people get used to SPARQL based end points for their repository, then it won't be too hard to wean them from (1) to (2).

Introduction - Mark Butler

Hi! Stu and Andy have been kind enough to let me hang out with them on the Jessica project.

My name is Mark Butler and currently I am a senior lecturer at University of the West of England in information architecture. I used to work at HP Labs Bristol where I worked on the digital media project specifically the content repository which was a web service based replacement for a JSR-170 repository. I also have quite a bit of background in the Semantic Web as I worked on the SIMILE project and was the lead developer on the first Longwell prototype. I am also lead developer on DELI, an open source API for using UAProf, an RDF based standard for mobile phones.

I've been thinking about the relationship between RDF and JSR-170 for some time so when I found out about Jessica I was keen to join in and contribute.

Friday, September 22, 2006

Intermission

Sorry about that. What with changing jobs and going on holiday and generally having other stuff to do, Jessica's been pushed aside for a while. But she's back now, and we'll push on with the design. So, where were we? Ah, reviews of the current API. I think I should review JSR 170 and its successors first though, to ensure that Jessica can still match up to it.

Tuesday, May 23, 2006

Parental Guidance

Now that Jessica has established how to model an RDF graph as a JCR Workspace, we need to provide a means of navigating back up the graph in terms of the JCR tree. The problem here is that a graph provides multiple inheritance for a node, whereas the JCR API does not (being a tree, of course). However, recent versions of the JSR 170 specification do allow for graph support, in that they haven't stated that multiple inheritance has been specifically prohibited. This makes sense, since the JSR must satisfy many different vendor-specific repository APIs, and it can't be ruled out that some of those might require support for a node (or equivalent) that has more than one parent.

So, if the API doesn't allow for immediate access to multiple parents of a JCR Node, how should Jessica go about exposing them? We can already expose the object nodes of a subject node using the Node.getNodes() method, where each JCR child Node in the returned NodeIterator represents an object node. The predicates for these object nodes are the names of the object nodes themselves.

But the JCR API only provides Item.getParent() (inherited by the Node class) to return a single Node parent. At first glance, Jessica could return a parent Node instance from the getParent() method that refers to its sibling parents as Property references. So:


Node parent = node.getParent(); // Returns "primary" parent.
for (PropertyIterator pi = parent.getReferences(); pi.hasNext();) {
Property p = pi.nextProperty();
Node sibling = p.getNode(); // Returns siblings.
}


This way, the siblings of any identified parent are exposed through the JCR's REFERENCE PropertyType.

Although this is a solution, is it good enough for Jessica? It should probably be implemented anyway, but I feel that there is a more elegant answer to this problem, until perhaps JSR 170 provides a more public means of doing the same.

Consider this an RFC...

Monday, May 22, 2006

A few points...

Right, now that Jessica is pretty much set up on SourceForge, it's time to put together some design thoughts.

Jessica, like the JCR specification examples, will be exposed to a system using JNDI. This will provide an abstraction away from concrete implementation references, so that when we get to repository clustering, etc. it won't make a difference.

A JCR Workspace is equivalent to an RDF graph. The namespaces used in the graph would be registered in the normal way using the JCR NamespaceRegistry.

A JCR Node will be used to represent a subject or object RDF node. The predicate between those nodes will be specified using the object node's name. The value of a subject or object node would be set using a JCR Property that belongs to that Node, whose name is jessica:node.

Consider the following example (taken from W3C):




Here, the Dublin Core predicate dc:title maps to an RDF object node. Jessica would define the URI of the predicate using the JCR Node's name, and the RDF literal of the object node would be stored as that a JCR Property of that JCR Node, named jessica:node.





The code fragment below shows how Jessica's implementation of the JCR API would handle the above RDF graph.


Repository r = new JessicaRepository();
Session s = r.login();
Workspace w = s.getWorkspace();
NamespaceRegistry nr = w.getNamespaceRegistry();
nr.registerNamespace("ex", "http://www.example.org/stuff/1.0/");
Node root = s.getRootNode();
Node article = root.addNode("ex:demoArticle");
article.setProperty(JessicaNode.VALUE,
"http://www.w3.org/TR/rdf-syntax-grammar");

Node title = article.addNode("ex:title");
title.setProperty(JessicaNode.VALUE,
"RDF/XML Syntax Specification (Revised)");

Node editor = article.addNode("ex:editor");
Node fullName = editor.addNode("ex:fullName");
fullName.setProperty(JessicaNode.VALUE, "Dave Beckett");
Node homePage = editor.addNode("ex:homePage");
homePage.setProperty(JessicaNode.VALUE, "http://purl.org/net/dajobe");

Node subject = (Node) s.getItem("/ex:demoArticle/ex:editor");


This way, the JCR tree traversal is achieved using the predicates' names, which makes logical sense. The Node having object reference subject shows how this is done.

A starter-for-ten

I've finally got CVS set up for Jessica now, so I've integrated Eclipse to the SourceForge server, ready to put some code there.

When I say "I", I mean "we", as my good friend Mr Andrew Latham has kindly joined to get some work done.

Wednesday, May 17, 2006

Late-night wine and cigarettes

I need to sort out a roadmap. As soon as it's done, I'll post it here. For starters, though:

1. Design the store-agnostic implementation of the JCR API, to fulfil its SPI layer.
2. Design and implement an SPI layer for Jena.

I'll arrange something that will support a Maven repository for Jessica too. All in good time though...

The story so far...

Now that I have a moment, I thought it proper to provide a bit of background about the Jessica Project.

I've always liked the intent of the Java Content Repository, as I work in an industry that depends on systems whose core lies with CMSes and repositories. The number of vendors that provide variations on this theme has become silly, and each claims to do even more marvellous but irrelevant stuff with media.

One of the other things we depend on is metadata. Strictly speaking, we use any old metadata, as different people have different (and odd) ways of specifying it. But to standardise things, we translate it into RDF (modelled as RDF/XML).

What I want to do is combine these two, currently disparate, things. I want a content repository that adheres to a recognised Java API, and I want properly defined RDF metadata to describe it all. The provision of RDF lends itself to things like OWL and SPARQL to provide proper (ontological) meaning, and robust access to my content.

Now, many people at this point argue things like:

* people should just use an RDF store if they want RDF;
* RDF just isn't right for JSR 170 (step forward, Mr Mazzocchi).

Well, these do have some validity to them. But remember that there's still a need for a standardised repository API that has RDF support. There are plenty of RDF stores out there, and each has their own API, so the standardising thing has gone for a burton then. Stefan Mazzocchi makes it clear that RDF, when modelled as RDF/XML, provides fundamental differences in the meaning of the metadata when it's compared to it's non-RDF XML equivalent. He's absolutely right.

Also, RDF graphs don't quite fit into the vision of the JCR API, which uses a tree structure (i.e. multiple parents are a bit of a bugger).

But, after some thinking, I've found that all of these problems can be solved, without any real interference with the existing API specification for JSR 170. So that's what Jessica is indended to provide: an RDF-based JCR. It's Query API will be furthered to support SPARQL, and it'll integrate with RDF stores like Jena and Sesame, in the same way as the JCR API is meant to integrate with vendor-specific solutions.

So that's it. It's an experiment at the moment, but I'll get there eventually. I do get the impression that it'll be an uphill struggle* against the ivory tower RDF fanatics, and from the "classic" JCR camp, but we'll see how it goes.

I really must get round to getting some code out.
______________________________________________________
* By "uphill struggle", I mean "quiet, well-mannered debate", naturally :)

Jessica has been approved

Jessica was approved by SourceForge today, so I've got the go-ahead to start some design! I have the feeling that Jessica is going to be both my pride-and-joy and a royal pain in my arse, but as my geography teacher used to say, "that's the way the cookie crumbles" in his bizarre, Hungarian accent.

So now I have to create the homepage and actually do some work...