Whitepaper: The ClioPatria Semantic Web server

What is ClioPatria?

ClioPatria is a (SWI-)Prolog hosted HTTP application-server with libraries for Semantic Web reasoning and a set of JavaScript libraries for presenting results in a browser. Another way to describe ClioPatria is as ``Tomcat+Sesame (or Jena) with additional reasoning libraries in Prolog, completed by JavaScript presentation components''.

Why is ClioPatria based on Prolog?

Prolog is a logic-based language using a simple depth-first resolution strategy (SLD resolution). This gives two readings to the same piece of code: the declarative reading and the procedural reading. The declarative reading facilitates understanding of the code and allows for reasoning about it. The procedural reading allows for specifying algorithms and sequential aspects of the code, something which we often need to describe interaction. In addition, Prolog is reflexive: it can reason about Prolog programs and construct them at runtime. Finally, Prolog is, like the RDF triple-model, relational. This match of paradigms avoids the complications involved with using Object Oriented languages for handling RDF (see below). We illustrate the fit between RDF and Prolog by translating an example query from the official SPARQL document:

SPARQL:

PREFIX foaf:   <http://xmlns.com/foaf/0.1/>
SELECT ?name ?mbox
WHERE
  { ?x foaf:name ?name .
    ?x foaf:mbox ?mbox }

Below we define this query as a Prolog predicate. The translation is natural and compact. The query is expressed as a Prolog program rather than a string. This ensures that Prolog can reason about it: it validates the syntax and verifies the dependencies between this code fragment and the remainder of the application. See Wikipedia if you need some introduction to the language.

:- rdf_register_ns(foaf, 'http://xmlns.com/foaf/0.1/').

name_mbox(Name, MBox) :-
        rdf(X, foaf:name, literal(Name)),
        rdf(X, foaf:mbox, MBox).

We can run this query interactively from the terminal as illustrated below. Here, the `;' is typed by the user asking for another solution and the final `.' indicates there are no more solutions.

?- name_mbox(Name, MBox).
Name = 'Johnny Lee Outlaw',
MBox = 'mailto:jlow@example.com' ;

Name = 'Peter Goodguy',
MBox = 'mailto:peter@example.org'.

Returning all solutions is all that is provided by the SPARQL query. However, our program is capable of doing more because it describes the logical relation ``_Name_ is the name of an entity that has mailbox _MBox_''. Therefore, we can ask:

?- name_mbox('Johnny Lee Outlaw', X).
MBox = 'mailto:jlow@example.com'.

This is different from a loop over the resuls from the SPARQL query because the query does not iterate over all name-mailbox tuples, but only over those that have a resource with a name-property with the value 'Johnny Lee Outlaw'. Finally, we can use the relation as a boolean test:

?- name_mbox('Johnny Lee Outlaw', 'mailto:peter@example.org').
false.

Prolog's resolution technique has created a powerful building block for more complex queries from this simple translation of the SPARQL query. We can use this to create more complex queries. E.g., if we want to send a personalised message to all members on a mailinglist, we need the members and their names. The code below combines a simple statement query with the already-defined relation name_mbox/2.

employee_name_email(List, MBox, Name) :-
        rdf(List, list:member, MBox),
        name_mbox(Name, MBox).

Optimising queries?

Above, we used simple Prolog SLD resolution to join RDF statement queries. Proper RDF query language implementation perform optimisation. Here we can exploit Prolog's reflexive capabilities. The code below reorganises the conjunction of the two rdf/3 goals to achieve optimal performance dynamically. This optimisation is based on the database dynamics and which arguments are given (instantiated). E.g., if we call this relation with a given MBox it will swap the two RDF statements.

:- use_module(library(semweb/rdf_optimise))).

name_mbox(Name, MBox) :-
        rdf_optimise((rdf(X, foaf:name, Name),
                      rdf(X, foaf:mbox, MBox)),
                     Goal),
        call(Goal).

Our benefits?

The single primitive rdf(Subject, Predicate, Object) (rdf/3) suffices to realise all the basic graph-pattern matching that can be done in SPARQL.
We can give a name to a query (as name_mbox/2 above) and build complex queries from simple ones. This greatly simplifies maintenance of complex queries.
Optimisation and unfolding can be used to achieve optimal performance at small cost.
Instead of the predefined SPARQL and SeRQL functions and conditions, we can apply any Prolog predicate as condition or function anywhere in the query. We can also introduce other relations (e.g., from an RDBMS) into our predicates.
The RDF store is tightly connected to Prolog. This allows for arbitrary reasoning with and exploration of the RDF graph at low cost.

How does this compare to using Java?

Above, we compared querying the triple-store in Prolog with SPARQL. What if we compare Prolog to using Sesame/Jena as a library? The marriage between Object Oriented systems and relational data in general and RDF in particular is discussed in ActiveRDF: embedding Semantic Web data into object-oriented languages. Roughly, static languages such as Java allow for three approaches. Each of these either require setting up an enumerator or dealing with sets.

GetStatement(): Query statements based on a pattern. This is comparable to what our Prolog based approach does, but in our approach a single call deals with all possible patterns dynamically. In Java we have to find what is given and loop through the bindings for the remaining values explicitely.
GetObject(): Query resources. In this schema an initial URI is used to create an object that reflects the URI. Methods on this object values on a given predicate. Note that this predicate must be specified as a string and thus escapes the analysis of the compiler.
Create an enumerator from a SPARQL query provided as a string. This approach again uses a string. Building this string is cumbersome and vulnerable to script injection (a security risc).

Joining results from any of these three possibilities requires hand-crafted (nested) loops. Statically typed Object Oriented languages cannot easily overcomes these problems. For dynamically typed languages such as Ruby, the situation can be improved significantly as demonstrated by the ActiveRDF project. ActiveRDF abstracts from the RDF store and builds upon a well-established web-application development platform, but its handling of RDF is still cumbersome compared to what a relational language such as Prolog can achieve.

What does ClioPatria provide?

This section gives the highlights of the functionality you can find in ClioPatria.

Core SWI-Prolog libraries

Web-application development

Developing a dynamic web-page is easy: register a predicate as a handler for an HTTP `file'. The predicate writes a document that conforms to the CGI specification and the server infrastructure takes care of the rest. In the file mbox.pl, we defined three ways to generate a table of names and mailboxes.

RDF storage

The main-memory store is a natural extension to Prolog. It is memory-efficient. The RDF store provides:

Reliable file-based persistency
Load and unload of data-sources
Full persistent history of modifications

Full text search

Using rdf_find_literals/2, the user can query literals that contain words. The literal-search facility allows for searching tokens and prefixes as well as fuzzy search (case-insensitive, stemming, sounds-like (metaphone)) and numerical search (exact, larger, smaller, range).

ClioPatria extensions

The development environment

ClioPatria is an interactive and self-documenting system that provides basic user-management and utilities to examine RDF data.

SeRQL/SPARQL endpoint

Although generally not used for application development, the compliant RDF query endpoints make ClioPatria a standard component in Semantic Web applications that use such endpoints.

Linked Open Data serer

Serve RDF repositories or fragments thereof as LOD with a single-line declaration.

The CPACK package manager

This allows for submitting, and updating submissions (developer) or installing packages with all its dependencies (users) and make it a breeze to add functionality to your ClioPatria installation. For example, the following command installs an interactive search module isearch and its dependencies (owl and statistics).

?- cpack_install(isearch).

Running ClioPatria

You need two things to run ClioPatria: the ClioPatria sources and a recent version of SWI-Prolog that fits your computer. The ClioPatria resource viewer can display a graph that places a resource in context. This facility requires Graphviz.

ClioPatria runs on all major platforms supported by SWI-Prolog: Windows, MacOSX, Linux and, from source, on almost any Unix system. It supports both the 32-bit and 64-bit of these operating systems. Demanding servers (more than 10 million triples, complex queries) quickly need the 64-bit versions. Although 64-bit Linux based servers provide the most scalable, fast and robust platform for ClioPatria servers, ClioPatria can be used comfortably on a 32-bit Windows XP machine with 1GB memory as long as the dataset is limited to a few million triples.