View source with raw comments or as raw
    1/*  Part of ClioPatria SeRQL and SPARQL server
    2
    3    Author:        Michiel Hildebrand and Jan Wielemaker
    4    E-mail:        michielh@few.vu.nl
    5    WWW:           http://www.swi-prolog.org
    6    Copyright (c)  2010-2018, VU University Amsterdam
    7    All rights reserved.
    8
    9    Redistribution and use in source and binary forms, with or without
   10    modification, are permitted provided that the following conditions
   11    are met:
   12
   13    1. Redistributions of source code must retain the above copyright
   14       notice, this list of conditions and the following disclaimer.
   15
   16    2. Redistributions in binary form must reproduce the above copyright
   17       notice, this list of conditions and the following disclaimer in
   18       the documentation and/or other materials provided with the
   19       distribution.
   20
   21    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
   22    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
   23    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
   24    FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
   25    COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
   26    INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
   27    BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
   28    LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
   29    CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
   30    LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
   31    ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
   32    POSSIBILITY OF SUCH DAMAGE.
   33*/
   34
   35:- module(api_lod,
   36          [ lod_api/2                   % +Request
   37          ]).   38
   39:- use_module(library(http/http_dispatch)).   40:- use_module(library(http/http_json)).   41:- use_module(library(http/http_host)).   42:- use_module(library(http/http_request_value)).   43:- use_module(library(http/http_cors)).   44:- use_module(library(semweb/rdf_db)).   45:- use_module(library(semweb/rdf_json)).   46:- use_module(library(semweb/rdf_describe)).   47:- use_module(library(settings)).   48:- use_module(library(option)).   49:- use_module(library(rdf_write)).   50:- use_module(library(semweb/rdf_turtle_write)).   51:- use_module(library(uri)).   52:- use_module(library(debug)).   53:- use_module(library(apply)).   54:- use_module(library(dcg/basics)).   55:- use_module(library(base64)).   56:- use_module(library(utf8)).

LOD - Linked Open Data server

Linked (Open) Data turns RDF URIs (indentifiers) into URLs (locators). Requesting the data behind the URL returns a description of the resource. So, if we see a resource http://example.com/employe/bill, we get do an HTTP GET request and expect to receive a description of bill. This module adds LOD facilities to ClioPatria.

Running the LOD server

There are several ways to run the LOD server.

  1. The simplest way to realise LOD is to run ClioPatria there where the authority component of the URL points to (see uri_components/2 for decomposing URIs). This implies you must be able to create a DNS binding for the host and be able to run ClioPatria there.
  2. Sometimes the above does not work, because the port is already assigned to another machine, you are not allowed to run ClioPatria on the target host, the target is behind a firewall, etc. In that case, notable if the host runs Apache, you can exploit the Apache module mod_proxy and proxy the connections to a location where ClioPatria runs. If you ensure that the path on Apache is the same as the path on ClioPatria, the following Apache configuration rule solves the problem:
    ProxyPass /rdf/ http://cliopatria-host:3020/rdf/
  3. Both above methods require no further configuration. Unfortunately, they require a registered domain control over DNS and administrative rights over certain machines. A solution that doesn't require this is to use www.purl.org. This allows you to redirect URLs within the purl domain to any location you control. The redirection method can be defined with purl. In the semantic web community, we typically use See other (303). The catch is that if the address arrives at ClioPatria, we no longer know where it came from. This is not a problem in (1), as there was no redirect. It is also not a problem in (2), because Apache adds a header x-forwarded-host. Unfortunately, there is no way to tell you are activated through a redirect, let alone where the redirect came from.

    To deal with this situation, we use the redirected_from option of lod_api/2. For example, if http://www.purl.org/vocabularies/myvoc/ is redirected to /myvoc/ on ClioPatria, we use:

    :- http_handler('/myvoc/',
                    lod_api([ redirected_from('http://www.purl.org/vocabularies/myvoc/')
                            ]),
                    [ prefix ]).

By default, there is no HTTP handler pointing to lod_api/2. The example above describes how to deal with redirected URIs. The cases (1) and (2) must also be implemented by registering a handler. This can be as blunt as registering a handler for the root of the server, but typically one would use one or more handlers that deal with sub-trees that act as Linked Data repositories. Handler declarations should use absolute addresses to guarantee a match with the RDF URIs, even if the server is relocated by means of the http:prefix setting. For example:

:- http_handler('/rdf/', lod_api([]), [prefix]).
See also
- http://linkeddata.org/ */
  130:- setting(lod:redirect, boolean, false,
  131           'If true, redirect from accept-header to extension').
 lod_api(+Options, +Request)
Reply to a Linked Data request. The handler is capable of three output formats. It decides on the desired format based on the HTTP Accept header-field. If no acceptable format is found, it replies with a human-readable description of the resource using ClioPatria RDF browser-page as defined by list_resource//2.

Options:

redirected_from(+URL)
This option must be provided when using a purl.org or similar redirect. See overall documentation of this library.
bounded_description(+Type)
Description style to use. See rdf_bounded_description/4. The default is cbd (Concise Bounded Description)
  152lod_api(_Options, Request) :-
  153    \+ memberchk(path_info(_), Request),
  154    !,
  155    accepts(Request, AcceptList),
  156    preferred_format(AcceptList, Format),
  157    (   Format == html
  158    ->  http_link_to_id(home, [], Redirect)
  159    ;   http_link_to_id(well_known_void, [], Redirect)
  160    ),
  161    http_redirect(see_other, Redirect, Request).
  162lod_api(_Options, Request) :-
  163    memberchk(path_info('/.well-known/void'), Request),
  164    !,
  165    http_link_to_id(well_known_void, [], Redirect),
  166    http_redirect(see_other, Redirect, Request).
  167lod_api(Options, Request) :-
  168    lod_uri(Request, URI, Options),
  169    debug(lod, 'LOD URI: ~q', [URI]),
  170    accepts(Request, AcceptList),
  171    triple_filter(Request, Filter),
  172    cors_enable,
  173    lod_request(URI, AcceptList, Request, Filter, Options).
  174
  175accepts(Request, AcceptList) :-
  176    (   memberchk(accept(AcceptHeader), Request)
  177    ->  (   atom(AcceptHeader)      % compatibility
  178        ->  http_parse_header_value(accept, AcceptHeader, AcceptList)
  179        ;   AcceptList = AcceptHeader
  180        )
  181    ;   AcceptList = []
  182    ).
 triple_filter(+Request, -Filter) is det
Extract Triple-Filter from Request. Ignores the filter if it is invalid.
  189triple_filter(Request, Filter) :-
  190    catch(phrase(triple_filter(Request), Filter), E,
  191          (print_message(warning, E),fail)),
  192    !.
  193triple_filter(_, []).
 triple_filter(+Text)//
Translate an RDF triple pattern into a list of rdf(S,P,O) terms.
  200triple_filter([]) -->
  201    [].
  202triple_filter([triple_filter(Filter)|T]) -->
  203    !,
  204    one_triple_filter(Filter),
  205    triple_filter(T).
  206triple_filter([_|T]) -->
  207    triple_filter(T).
  208
  209one_triple_filter(Encoded) -->
  210    { string_codes(Encoded, EncCodes),
  211      phrase(base64(UTF8Bytes), EncCodes),
  212      phrase(utf8_codes(PlainCodes), UTF8Bytes),
  213      string_codes(Filter, PlainCodes),
  214      split_string(Filter, "\r\n", "\r\n", Filters),
  215      maplist(map_triple_filter, Filters, Triples)
  216    },
  217    string(Triples).
  218
  219map_triple_filter(String, rdf(S,P,O)) :-
  220    split_string(String, "\s\t", "\s\t", [SS,SP,SO]),
  221    triple_term(SS, S),
  222    triple_term(SP, P),
  223    triple_term(SO, O).
  224
  225triple_term("?", _) :- !.
  226triple_term(S, N) :-
  227    string_codes(S, Codes),
  228    phrase(sparql_grammar:graph_term(N), Codes).
 lod_request(+URI, +AcceptList, +Request, +Filter, +Options)
Handle an LOD request.
  234lod_request(URI, AcceptList, Request, Filter, Options) :-
  235    lod_resource(URI),
  236    !,
  237    preferred_format(AcceptList, Format),
  238    debug(lod, 'LOD Format: ~q', [Format]),
  239    (   cliopatria:redirect_uri(Format, URI, SeeOther)
  240    ->  http_redirect(see_other, SeeOther, Request)
  241    ;   setting(lod:redirect, true),
  242        redirect(URI, AcceptList, SeeOther)
  243    ->  http_redirect(see_other, SeeOther, Request)
  244    ;   lod_describe(Format, URI, Request, Filter, Options)
  245    ).
  246lod_request(URL, _AcceptList, Request, Filter, Options) :-
  247    format_request(URL, URI, Format),
  248    !,
  249    lod_describe(Format, URI, Request, Filter, Options).
  250lod_request(URI, _AcceptList, _Request, _Filter, _) :-
  251    throw(http_reply(not_found(URI))).
 lod_uri(+Request, -URI, +Options)
URI is the originally requested URI. This predicate deals with redirections if the HTTP handler was registered using the option redirected_from(URL). Otherwise it resolves the correct global URI using http_current_host/4.
  261lod_uri(Request, URI, Options) :-
  262    memberchk(redirected_from(Org), Options),
  263    memberchk(request_uri(ReqURI), Request),
  264    handler_location(Request, Location),
  265    atom_concat(Location, Rest, ReqURI),
  266    atom_concat(Org, Rest, URI).
  267lod_uri(Request, URI, _) :-
  268    memberchk(request_uri(ReqURI), Request),
  269    http_current_host(Request, Host, Port,
  270                      [ global(true)
  271                      ]),
  272    (   Port == 80
  273    ->  atomic_list_concat(['http://', Host, ReqURI], URI)
  274    ;   atomic_list_concat(['http://', Host, :, Port, ReqURI], URI)
  275    ).
 handler_location(+Request, -Location) is det
Location is the requested location on the server. This includes the handler location, normally concatenated with the path_info.
  283handler_location(Request, Location) :-
  284    memberchk(path(Path), Request),
  285    (   memberchk(path_info(Rest), Request),
  286        atom_concat(Location, Rest, Path)
  287    ->  true
  288    ;   Location = Path
  289    ).
 redirect(+URI, +AcceptList, -RedirectURL)
Succeeds if URI is in the store and a RedirectURL is found for it.
  297redirect(URI, AcceptList, To) :-
  298    lod_resource(URI),
  299    preferred_format(AcceptList, Format),
  300    (   cliopatria:redirect_uri(Format, URI, To)
  301    ->  true
  302    ;   uri_components(URI, URIComponents),
  303        uri_data(path, URIComponents, Path0),
  304        format_suffix(Format, Suffix),
  305        file_name_extension(Path0, Suffix, Path),
  306        uri_data(path, URIComponents, Path, ToComponents),
  307        uri_components(To, ToComponents)
  308    ).
 preferred_format(+AcceptList, -Format) is det
Format is the highest ranked mimetype found in the Acceptlist of the request and that we can support. Expects an AcceptList sorted by rank.
  317preferred_format(AcceptList, Format) :-
  318    member(media(MimeType,_,_,_), AcceptList),
  319    ground(MimeType),
  320    mimetype_format(MimeType, Format),
  321    !.
  322preferred_format(_, html).
 format_request(+URL, -URI, -Format) is semidet
True if URL contains a suffix that corresponds to a supported output format, and the global URI occurs in the database.
  330format_request(URL, URI, Format) :-
  331    uri_components(URL, URLComponents),
  332    uri_data(path, URLComponents, Path),
  333    file_name_extension(Base, Ext, Path),
  334    (   format_suffix(Format, Ext),
  335        mimetype_format(_, Format)
  336    ->  true
  337    ),
  338    uri_data(path, URLComponents, Base, PlainComponents),
  339    uri_components(URI, PlainComponents),
  340    lod_resource(URI).
 lod_describe(+Format, +URI, +Request, +Filter, +Options) is det
Write an HTTP document describing URI to in Format to the current output. Format is defined by mimetype_format/2.
  348lod_describe(html, URI, Request, _, _) :-
  349    !,
  350    (   rdf_graph(URI)
  351    ->  http_link_to_id(list_graph, [graph=URI], Redirect)
  352    ;   http_link_to_id(list_resource, [r=URI], Redirect)
  353    ),
  354    http_redirect(see_other, Redirect, Request).
  355lod_describe(Format, URI, _Request, Filter, Options) :-
  356    lod_description(URI, RDF, Filter, Options),
  357    send_graph(Format, RDF).
  358
  359send_graph(xmlrdf, RDF) :-
  360    format('Content-type: application/rdf+xml; charset=UTF-8~n~n'),
  361    rdf_write_xml(current_output, RDF).
  362send_graph(json, RDF) :-
  363    graph_json(RDF, JSON),
  364    reply_json(JSON).
  365send_graph(turtle, RDF) :-
  366    format('Content-type: text/turtle; charset=UTF-8~n~n'),
  367    rdf_save_turtle(stream(current_output),
  368                    [ expand(triple_in(RDF)),
  369                      only_known_prefixes(true),
  370                      silent(true)
  371                    ]).
 triple_in(+RDF, ?S, ?P, ?O, ?G) is nondet
Lookup a triple in the graph RDF, represented as a list of rdf(S,P,O).
To be done
- Describe required indexing from rdf_save_turtle/2 and implement that if the graph is big.
  381:- public triple_in/5.                  % called from send_graph/2.
  382
  383triple_in(RDF, S,P,O,_G) :-
  384    member(rdf(S,P,O), RDF).
 lod_description(+URI, -RDF, +Filter, +Options) is det
RDF is a graph represented as a list of rdf(S,P,O) that describes URI.

This predicate is hooked by cliopatria:lod_description/2. The default is implemented by resource_CBD/3.

See also
- SPARQL DESCRIBE
  397lod_description(URI, RDF, _, _) :-
  398    cliopatria:lod_description(URI, RDF),
  399    !.
  400lod_description(URI, RDF, Filter, Options) :-
  401    option(bounded_description(Type), Options, cbd),
  402    echo_filter(Filter),
  403    rdf_bounded_description(rdf, Type, Filter, URI, RDF).
  404
  405echo_filter([]) :- !.
  406echo_filter(Filters) :-
  407    copy_term(Filters, Filters1),
  408    term_variables(Filters1, Vars),
  409    maplist(=(?), Vars),
  410    filters_to_ntriples(Filters1, NTriples),
  411    split_string(NTriples, "\n", "\n.\s", Strings0),
  412    maplist(insert_q, Strings0, Strings),
  413    atomics_to_string(Strings, "\n", String),
  414    base64(String, Encoded),
  415    format('Triple-Filter: ~w\r\n', [Encoded]).
  416
  417insert_q(String, QString) :-
  418    split_string(String, " ", "", [S,P,O|M]),
  419    map_q(S, QS),
  420    map_q(P, QP),
  421    map_q(O, QO),
  422    atomics_to_string([QS,QP,QO|M], " ", QString).
  423
  424map_q("<?>", "?") :- !.
  425map_q(S, S).
  426
  427filters_to_ntriples(Filters, String) :-
  428    with_output_to(
  429        string(String),
  430        rdf_save_ntriples(stream(current_output),
  431                          [ expand(api_lod:triple_in(Filters))])).
 mimetype_format(?MimeType, ?Format) is nondet
Conversion between mimetypes and formats.
  438mimetype_format(application/'rdf+xml',  xmlrdf).
  439mimetype_format(application/json,       json).
  440mimetype_format(application/'x-turtle', turtle).
  441mimetype_format(text/turtle,            turtle).
  442mimetype_format(text/html,              html).
 format_suffix(?Format, ?Suffix) is nondet
Suffix is the file name extension used for Format.
  448format_suffix(xmlrdf, rdf).
  449format_suffix(json,   json).
  450format_suffix(html,   html).
  451format_suffix(turtle, ttl).
 lod_resource(+Resource) is semidet
True if Resource is an existing resource for the LOD server. Typically, this means it appears as a subject, but when considering symmetric bounded descriptions, it should certainly also hold for resources that only appear as object.
  461lod_resource(Resource) :-
  462    (   rdf(Resource, _, _)
  463    ;   rdf(_, Resource, _)
  464    ;   rdf(_, _, Resource)
  465    ;   rdf_graph(Resource)
  466    ),
  467    !.
  468
  469
  470                 /*******************************
  471                 *             HOOKS            *
  472                 *******************************/
  473
  474:- multifile
  475    cliopatria:redirect_uri/3,
  476    cliopatria:lod_description/2.
 cliopatria:redirect_uri(+Format, +URI, -RedirectURL)
Compose a RedirectionURL based on the output Format and the URI that is in our RDF database. For example, this could map the URI http://example.com/employe/bill into Bill's homepage at http://example.com/~bill if Format is html. The default is to a format-specific extension to the path component of URI, returning e.g., http://example.com/employe/bill.rdf if the requested format is RDF.
Arguments:
Format- is one of xmlrdf, =turtle, json or html.
See also
- This hook is used by redirect/3.
 cliopatria:lod_description(+URI, -RDF:list(rdf(s,p,o)))
RDF is list of triples describing URI. The default is to use the Concise Bounded Description as implemented by resource_CBD/3.
See also
- This hook is used by lod_description/2
- library(semweb/rdf_describe) provides several definitions of bounded descriptions.