View source with formatted comments or as raw
    1/*  Part of ClioPatria SeRQL and SPARQL server
    2
    3    Author:        Michiel Hildebrand and Jan Wielemaker
    4    E-mail:        michielh@few.vu.nl
    5    WWW:           http://www.swi-prolog.org
    6    Copyright (c)  2010-2018, VU University Amsterdam
    7    All rights reserved.
    8
    9    Redistribution and use in source and binary forms, with or without
   10    modification, are permitted provided that the following conditions
   11    are met:
   12
   13    1. Redistributions of source code must retain the above copyright
   14       notice, this list of conditions and the following disclaimer.
   15
   16    2. Redistributions in binary form must reproduce the above copyright
   17       notice, this list of conditions and the following disclaimer in
   18       the documentation and/or other materials provided with the
   19       distribution.
   20
   21    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
   22    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
   23    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
   24    FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
   25    COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
   26    INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
   27    BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
   28    LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
   29    CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
   30    LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
   31    ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
   32    POSSIBILITY OF SUCH DAMAGE.
   33*/
   34
   35:- module(api_lod,
   36          [ lod_api/2                   % +Request
   37          ]).   38
   39:- use_module(library(http/http_dispatch)).   40:- use_module(library(http/http_json)).   41:- use_module(library(http/http_host)).   42:- use_module(library(http/http_request_value)).   43:- use_module(library(http/http_cors)).   44:- use_module(library(semweb/rdf_db)).   45:- use_module(library(semweb/rdf_json)).   46:- use_module(library(semweb/rdf_describe)).   47:- use_module(library(settings)).   48:- use_module(library(option)).   49:- use_module(library(rdf_write)).   50:- use_module(library(semweb/rdf_turtle_write)).   51:- use_module(library(uri)).   52:- use_module(library(debug)).   53:- use_module(library(apply)).   54:- use_module(library(dcg/basics)).   55:- use_module(library(base64)).   56:- use_module(library(utf8)).   57
   58
   59/** <module> LOD - Linked Open Data server
   60
   61Linked (Open) Data turns RDF URIs   (indentifiers) into URLs (locators).
   62Requesting the data  behind  the  URL   returns  a  description  of  the
   63resource. So, if we see   a resource http://example.com/employe/bill, we
   64get do an HTTP GET  request  and   expect  to  receive  a description of
   65_bill_.  This module adds LOD facilities to ClioPatria.
   66
   67---++ Running the LOD server
   68
   69There are several ways to run the LOD server.
   70
   71    1. The simplest way to realise LOD is to run ClioPatria there where
   72    the authority component of the URL points to (see uri_components/2
   73    for decomposing URIs).  This implies you must be able to create a
   74    DNS binding for the host and be able to run ClioPatria there.
   75
   76    2. Sometimes the above does not work, because the port is already
   77    assigned to another machine, you are not allowed to run ClioPatria
   78    on the target host, the target is behind a firewall, etc. In that
   79    case, notable if the host runs Apache, you can exploit the Apache
   80    module =mod_proxy= and proxy the connections to a location where
   81    ClioPatria runs. If you ensure that the path on Apache is the same
   82    as the path on ClioPatria, the following Apache configuration rule
   83    solves the problem:
   84
   85    ==
   86    ProxyPass /rdf/ http://cliopatria-host:3020/rdf/
   87    ==
   88
   89    3. Both above methods require no further configuration.
   90    Unfortunately, they require a registered domain control over DNS
   91    and administrative rights over certain machines.  A solution that
   92    doesn't require this is to use www.purl.org.  This allows you to
   93    redirect URLs within the purl domain to any location you control.
   94    The redirection method can be defined with purl.  In the semantic
   95    web community, we typically use *|See other|* (303).  The catch
   96    is that if the address arrives at ClioPatria, we no longer know
   97    where it came from.  This is not a problem in (1), as there was
   98    no redirect.  It is also not a problem in (2), because Apache
   99    adds a header =|x-forwarded-host|=.  Unfortunately, there is
  100    no way to tell you are activated through a redirect, let alone
  101    where the redirect came from.
  102
  103    To deal with this situation, we use the redirected_from option of
  104    lod_api/2. For example, if http://www.purl.org/vocabularies/myvoc/
  105    is redirected to /myvoc/ on ClioPatria, we use:
  106
  107    ==
  108    :- http_handler('/myvoc/',
  109                    lod_api([ redirected_from('http://www.purl.org/vocabularies/myvoc/')
  110                            ]),
  111                    [ prefix ]).
  112    ==
  113
  114By default, there is no HTTP handler pointing to lod_api/2. The example
  115above describes how to deal with redirected URIs. The cases (1) and (2)
  116must also be implemented by registering a handler. This can be as blunt
  117as registering a handler for the root of the server, but typically one
  118would use one or more handlers that deal with sub-trees that act as
  119Linked Data repositories. Handler declarations should use absolute
  120addresses to guarantee a match with the RDF URIs, even if the server is
  121relocated by means of the http:prefix setting. For example:
  122
  123    ==
  124    :- http_handler('/rdf/', lod_api([]), [prefix]).
  125    ==
  126
  127@see http://linkeddata.org/
  128*/
  129
  130:- setting(lod:redirect, boolean, false,
  131           'If true, redirect from accept-header to extension').  132
  133%!  lod_api(+Options, +Request)
  134%
  135%   Reply to a Linked Data request. The  handler is capable of three
  136%   output formats. It decides on the   desired  format based on the
  137%   HTTP =Accept= header-field. If no acceptable format is found, it
  138%   replies with a human-readable description  of the resource using
  139%   ClioPatria RDF browser-page as defined by list_resource//2.
  140%
  141%   Options:
  142%
  143%       * redirected_from(+URL)
  144%       This option must be provided when using a purl.org or
  145%       similar redirect.  See overall documentation of this
  146%       library.
  147%
  148%       * bounded_description(+Type)
  149%       Description style to use.  See rdf_bounded_description/4.
  150%       The default is =cbd= (Concise Bounded Description)
  151
  152lod_api(_Options, Request) :-
  153    \+ memberchk(path_info(_), Request),
  154    !,
  155    accepts(Request, AcceptList),
  156    preferred_format(AcceptList, Format),
  157    (   Format == html
  158    ->  http_link_to_id(home, [], Redirect)
  159    ;   http_link_to_id(well_known_void, [], Redirect)
  160    ),
  161    http_redirect(see_other, Redirect, Request).
  162lod_api(_Options, Request) :-
  163    memberchk(path_info('/.well-known/void'), Request),
  164    !,
  165    http_link_to_id(well_known_void, [], Redirect),
  166    http_redirect(see_other, Redirect, Request).
  167lod_api(Options, Request) :-
  168    lod_uri(Request, URI, Options),
  169    debug(lod, 'LOD URI: ~q', [URI]),
  170    accepts(Request, AcceptList),
  171    triple_filter(Request, Filter),
  172    cors_enable,
  173    lod_request(URI, AcceptList, Request, Filter, Options).
  174
  175accepts(Request, AcceptList) :-
  176    (   memberchk(accept(AcceptHeader), Request)
  177    ->  (   atom(AcceptHeader)      % compatibility
  178        ->  http_parse_header_value(accept, AcceptHeader, AcceptList)
  179        ;   AcceptList = AcceptHeader
  180        )
  181    ;   AcceptList = []
  182    ).
  183
  184%!  triple_filter(+Request, -Filter) is det.
  185%
  186%   Extract Triple-Filter from Request.  Ignores the filter if it
  187%   is invalid.
  188
  189triple_filter(Request, Filter) :-
  190    catch(phrase(triple_filter(Request), Filter), E,
  191          (print_message(warning, E),fail)),
  192    !.
  193triple_filter(_, []).
  194
  195
  196%!  triple_filter(+Text)//
  197%
  198%   Translate an RDF triple pattern into a list of rdf(S,P,O) terms.
  199
  200triple_filter([]) -->
  201    [].
  202triple_filter([triple_filter(Filter)|T]) -->
  203    !,
  204    one_triple_filter(Filter),
  205    triple_filter(T).
  206triple_filter([_|T]) -->
  207    triple_filter(T).
  208
  209one_triple_filter(Encoded) -->
  210    { string_codes(Encoded, EncCodes),
  211      phrase(base64(UTF8Bytes), EncCodes),
  212      phrase(utf8_codes(PlainCodes), UTF8Bytes),
  213      string_codes(Filter, PlainCodes),
  214      split_string(Filter, "\r\n", "\r\n", Filters),
  215      maplist(map_triple_filter, Filters, Triples)
  216    },
  217    string(Triples).
  218
  219map_triple_filter(String, rdf(S,P,O)) :-
  220    split_string(String, "\s\t", "\s\t", [SS,SP,SO]),
  221    triple_term(SS, S),
  222    triple_term(SP, P),
  223    triple_term(SO, O).
  224
  225triple_term("?", _) :- !.
  226triple_term(S, N) :-
  227    string_codes(S, Codes),
  228    phrase(sparql_grammar:graph_term(N), Codes).
  229
  230%!  lod_request(+URI, +AcceptList, +Request, +Filter, +Options)
  231%
  232%   Handle an LOD request.
  233
  234lod_request(URI, AcceptList, Request, Filter, Options) :-
  235    lod_resource(URI),
  236    !,
  237    preferred_format(AcceptList, Format),
  238    debug(lod, 'LOD Format: ~q', [Format]),
  239    (   cliopatria:redirect_uri(Format, URI, SeeOther)
  240    ->  http_redirect(see_other, SeeOther, Request)
  241    ;   setting(lod:redirect, true),
  242        redirect(URI, AcceptList, SeeOther)
  243    ->  http_redirect(see_other, SeeOther, Request)
  244    ;   lod_describe(Format, URI, Request, Filter, Options)
  245    ).
  246lod_request(URL, _AcceptList, Request, Filter, Options) :-
  247    format_request(URL, URI, Format),
  248    !,
  249    lod_describe(Format, URI, Request, Filter, Options).
  250lod_request(URI, _AcceptList, _Request, _Filter, _) :-
  251    throw(http_reply(not_found(URI))).
  252
  253
  254%!  lod_uri(+Request, -URI, +Options)
  255%
  256%   URI is the originally requested URI.   This predicate deals with
  257%   redirections if the HTTP handler was registered using the option
  258%   redirected_from(URL). Otherwise it resolves   the correct global
  259%   URI using http_current_host/4.
  260
  261lod_uri(Request, URI, Options) :-
  262    memberchk(redirected_from(Org), Options),
  263    memberchk(request_uri(ReqURI), Request),
  264    handler_location(Request, Location),
  265    atom_concat(Location, Rest, ReqURI),
  266    atom_concat(Org, Rest, URI).
  267lod_uri(Request, URI, _) :-
  268    memberchk(request_uri(ReqURI), Request),
  269    http_current_host(Request, Host, Port,
  270                      [ global(true)
  271                      ]),
  272    (   Port == 80
  273    ->  atomic_list_concat(['http://', Host, ReqURI], URI)
  274    ;   atomic_list_concat(['http://', Host, :, Port, ReqURI], URI)
  275    ).
  276
  277
  278%!  handler_location(+Request, -Location) is det.
  279%
  280%   Location is the requested location on  the server. This includes
  281%   the handler location, normally concatenated with the path_info.
  282
  283handler_location(Request, Location) :-
  284    memberchk(path(Path), Request),
  285    (   memberchk(path_info(Rest), Request),
  286        atom_concat(Location, Rest, Path)
  287    ->  true
  288    ;   Location = Path
  289    ).
  290
  291
  292%!  redirect(+URI, +AcceptList, -RedirectURL)
  293%
  294%   Succeeds if URI is in the store and a RedirectURL is found for
  295%   it.
  296
  297redirect(URI, AcceptList, To) :-
  298    lod_resource(URI),
  299    preferred_format(AcceptList, Format),
  300    (   cliopatria:redirect_uri(Format, URI, To)
  301    ->  true
  302    ;   uri_components(URI, URIComponents),
  303        uri_data(path, URIComponents, Path0),
  304        format_suffix(Format, Suffix),
  305        file_name_extension(Path0, Suffix, Path),
  306        uri_data(path, URIComponents, Path, ToComponents),
  307        uri_components(To, ToComponents)
  308    ).
  309
  310
  311%!  preferred_format(+AcceptList, -Format) is det.
  312%
  313%   Format is the highest ranked mimetype found in the Acceptlist of
  314%   the request and that  we  can   support.  Expects  an AcceptList
  315%   sorted by rank.
  316
  317preferred_format(AcceptList, Format) :-
  318    member(media(MimeType,_,_,_), AcceptList),
  319    ground(MimeType),
  320    mimetype_format(MimeType, Format),
  321    !.
  322preferred_format(_, html).
  323
  324
  325%!  format_request(+URL, -URI, -Format) is semidet.
  326%
  327%   True if URL contains a suffix   that  corresponds to a supported
  328%   output format, and the global URI occurs in the database.
  329
  330format_request(URL, URI, Format) :-
  331    uri_components(URL, URLComponents),
  332    uri_data(path, URLComponents, Path),
  333    file_name_extension(Base, Ext, Path),
  334    (   format_suffix(Format, Ext),
  335        mimetype_format(_, Format)
  336    ->  true
  337    ),
  338    uri_data(path, URLComponents, Base, PlainComponents),
  339    uri_components(URI, PlainComponents),
  340    lod_resource(URI).
  341
  342
  343%!  lod_describe(+Format, +URI, +Request, +Filter, +Options) is det.
  344%
  345%   Write an HTTP document  describing  URI   to  in  Format  to the
  346%   current output. Format is defined by mimetype_format/2.
  347
  348lod_describe(html, URI, Request, _, _) :-
  349    !,
  350    (   rdf_graph(URI)
  351    ->  http_link_to_id(list_graph, [graph=URI], Redirect)
  352    ;   http_link_to_id(list_resource, [r=URI], Redirect)
  353    ),
  354    http_redirect(see_other, Redirect, Request).
  355lod_describe(Format, URI, _Request, Filter, Options) :-
  356    lod_description(URI, RDF, Filter, Options),
  357    send_graph(Format, RDF).
  358
  359send_graph(xmlrdf, RDF) :-
  360    format('Content-type: application/rdf+xml; charset=UTF-8~n~n'),
  361    rdf_write_xml(current_output, RDF).
  362send_graph(json, RDF) :-
  363    graph_json(RDF, JSON),
  364    reply_json(JSON).
  365send_graph(turtle, RDF) :-
  366    format('Content-type: text/turtle; charset=UTF-8~n~n'),
  367    rdf_save_turtle(stream(current_output),
  368                    [ expand(triple_in(RDF)),
  369                      only_known_prefixes(true),
  370                      silent(true)
  371                    ]).
  372
  373%!  triple_in(+RDF, ?S,?P,?O, ?G) is nondet.
  374%
  375%   Lookup a triple in the graph RDF, represented as a list of
  376%   rdf(S,P,O).
  377%
  378%   @tbd    Describe required indexing from rdf_save_turtle/2 and
  379%           implement that if the graph is big.
  380
  381:- public triple_in/5.                  % called from send_graph/2.
  382
  383triple_in(RDF, S,P,O,_G) :-
  384    member(rdf(S,P,O), RDF).
  385
  386
  387%!  lod_description(+URI, -RDF, +Filter, +Options) is det.
  388%
  389%   RDF is a  graph  represented  as   a  list  of  rdf(S,P,O)  that
  390%   describes URI.
  391%
  392%   This predicate is hooked   by  cliopatria:lod_description/2. The
  393%   default is implemented by resource_CBD/3.
  394%
  395%   @see SPARQL DESCRIBE
  396
  397lod_description(URI, RDF, _, _) :-
  398    cliopatria:lod_description(URI, RDF),
  399    !.
  400lod_description(URI, RDF, Filter, Options) :-
  401    option(bounded_description(Type), Options, cbd),
  402    echo_filter(Filter),
  403    rdf_bounded_description(rdf, Type, Filter, URI, RDF).
  404
  405echo_filter([]) :- !.
  406echo_filter(Filters) :-
  407    copy_term(Filters, Filters1),
  408    term_variables(Filters1, Vars),
  409    maplist(=(?), Vars),
  410    filters_to_ntriples(Filters1, NTriples),
  411    split_string(NTriples, "\n", "\n.\s", Strings0),
  412    maplist(insert_q, Strings0, Strings),
  413    atomics_to_string(Strings, "\n", String),
  414    base64(String, Encoded),
  415    format('Triple-Filter: ~w\r\n', [Encoded]).
  416
  417insert_q(String, QString) :-
  418    split_string(String, " ", "", [S,P,O|M]),
  419    map_q(S, QS),
  420    map_q(P, QP),
  421    map_q(O, QO),
  422    atomics_to_string([QS,QP,QO|M], " ", QString).
  423
  424map_q("<?>", "?") :- !.
  425map_q(S, S).
  426
  427filters_to_ntriples(Filters, String) :-
  428    with_output_to(
  429        string(String),
  430        rdf_save_ntriples(stream(current_output),
  431                          [ expand(api_lod:triple_in(Filters))])).
  432
  433
  434%!  mimetype_format(?MimeType, ?Format) is nondet.
  435%
  436%   Conversion between mimetypes and formats.
  437
  438mimetype_format(application/'rdf+xml',  xmlrdf).
  439mimetype_format(application/json,       json).
  440mimetype_format(application/'x-turtle', turtle).
  441mimetype_format(text/turtle,            turtle).
  442mimetype_format(text/html,              html).
  443
  444%!  format_suffix(?Format, ?Suffix) is nondet.
  445%
  446%   Suffix is the file name extension used for Format.
  447
  448format_suffix(xmlrdf, rdf).
  449format_suffix(json,   json).
  450format_suffix(html,   html).
  451format_suffix(turtle, ttl).
  452
  453
  454%!  lod_resource(+Resource) is semidet.
  455%
  456%   True if Resource is an  existing   resource  for the LOD server.
  457%   Typically,  this  means  it  appears  as  a  subject,  but  when
  458%   considering symmetric bounded descriptions,  it should certainly
  459%   also hold for resources that only appear as object.
  460
  461lod_resource(Resource) :-
  462    (   rdf(Resource, _, _)
  463    ;   rdf(_, Resource, _)
  464    ;   rdf(_, _, Resource)
  465    ;   rdf_graph(Resource)
  466    ),
  467    !.
  468
  469
  470                 /*******************************
  471                 *             HOOKS            *
  472                 *******************************/
  473
  474:- multifile
  475    cliopatria:redirect_uri/3,
  476    cliopatria:lod_description/2.  477
  478%!  cliopatria:redirect_uri(+Format, +URI, -RedirectURL)
  479%
  480%   Compose a RedirectionURL based on the  output Format and the URI
  481%   that is in our RDF database. For example, this could map the URI
  482%   http://example.com/employe/bill   into   Bill's    homepage   at
  483%   http://example.com/~bill if Format is =html=.  The default is to
  484%   a format-specific extension  to  the   path  component  of  URI,
  485%   returning  e.g.,  http://example.com/employe/bill.rdf    if  the
  486%   requested format is RDF.
  487%
  488%   @see This hook is used by redirect/3.
  489%   @param Format is one of =xmlrdf=, =turtle, =json= or =html=.
  490
  491
  492%!  cliopatria:lod_description(+URI, -RDF:list(rdf(s,p,o)))
  493%
  494%   RDF is list of triples describing URI. The default is to use the
  495%   Concise Bounded Description as implemented by resource_CBD/3.
  496%
  497%   @see This hook is used by lod_description/2
  498%   @see library(semweb/rdf_describe) provides several definitions
  499%   of bounded descriptions.