xmlrdf/commit

Copied files from original xmlrdf

author	Jan Wielemaker
	Mon Nov 29 12:44:54 2010 +0100
committer	Jan Wielemaker
	Mon Nov 29 12:44:54 2010 +0100
commit	41d0b95bc1499d15cd253cd1bc6a1ecd8b57335a
tree	dfc7e3b7b42c39cad27dce9af05275ab9b46455d
parent	1cb1db337eb3f0f288c379592b98ece661311482
Diff style: patch stat
diff --git a/lib/xmlrdf/README.txt b/lib/xmlrdf/README.txt
new file mode 100644
index 0000000..b9fd0ce
--- /dev/null
+++ b/lib/xmlrdf/README.txt
@@ -0,0 +1,566 @@
+---+ Introduction
+
+Many datasets are transferred as XML, providing a tree-based datamodel
+that is purely syntactic in nature. Semantic processing is standardised
+around RDF, which provides a graph-based model. In the transformation
+process we must identify syntactic artifacts such as meaningless
+ordering in the XML data, lacking structure (e.g., the _creator_ of an
+artwork is not a literal string by a person identified by a resource
+with properties) and overly structured data (e.g. the dimension of an
+object is a property of the object, not of some placeholder that
+combines physical properties of the object). These syntactic artifacts
+must be translated into a proper semantic model where objects and
+properties are typed and semantically related to common vocabularies
+such as SKOS and Dublin Core.
+
+This document describes our toolkit for supporting this transformation
+process, together with examples taken from actual translations. The
+toolkit is implemented in SWI-Prolog and can be downloaded using GIT
+from one of the addresses below. Running the toolkit requires
+SWI-Prolog, which can be downloaded from http://www.swi-prolog.org
+for Windows, MacOS and Linux or in source for many other platforms.
+
+    * git://eculture.cs.vu.nl/home/git/econnect/xmlrdf.git
+    * http://eculture.cs.vu.nl/git/econnect/xmlrdf.git
+
+The graph-rewrite engine is written in Prolog.  This document does not
+assume any knowledge about Prolog. The rule-language is, as far as
+possible, a clean declarative graph-rewrite language. The transformation
+process for actual data however can be complicated.  For these cases the
+rule-system allow mixing rules with arbitrary Prolog code, providing an
+unconstrained transformation system. We provide a (dynamically extended)
+library of Prolog routines for typical conversion tasks.
+
+
+---+ Converting XML into RDF
+
+The core idea behind  converting  `data-xml'   into  RDF  is  that every
+complex XML element maps to a resource  (often a bnode) and every atomic
+attribute maps to an attribute of this bnode. Such a translation gives a
+valid RDF document, which is much easier to access for further
+processing.
+
+There are a few places where we must be more subtle in the initial
+conversion.  First, the XML reserved attributes:
+
+    * The xml:lang attribute is kept around and if we create an RDF
+    literal, it is used to create a literal in the current language.
+
+    * xmlns declarations are ignored (they make the declarations
+    available to the application, but the namespaces are already
+    processed by the XML parser).
+
+Second, we may wish to map some   of our properties into rdfs:XMLLiteral
+or RDF dataTypes. In particular the first   _must_  be done in the first
+pass to avoid all the complexities  of   turning  the  RDF back into XML
+(think of the above mentioned   declarations,  but ordering requirements
+can make this fundamentally impossible).
+
+Because this step needs type information   about properties, we might as
+well allow for some very  simple   transformations  in  the first phase.
+These  transformations  are  guided  by  the   target  RDF  schema.  The
+transformation process can add additional properties   to the target RDF
+properties and RDF classes. The property   is  called map:xmlname, where
+the =map= prefix is currently  defined as http://cs.vu.nl/eculture/map/.
+If this property is associated  to  a   class,  an  XML element with the
+defined name is translated into an  instance   of  this  class. If it is
+associated to a property, it  affects   XML  attribute or atomic element
+translation in two ways:
+
+    * It uses the RDF property name rather than the XML name for
+    the property
+
+    * The rdfs:range of the property affects the value translation:
+
+	* If it is rdfs:XMLLiteral, the sub-element is translated
+	to an RDF XMLLiteral.
+
+	* If it is an XSD datatype, the sub-element is translated
+	into a typed RDF literal
+
+	* It it is a proper class and the value is a valid URI, the
+	URI is used as value without translation into a literal.
+
+Below is an example  that  maps   XML  elements  =record=  into vra:Work
+instances  and  maps  the  XML  attribute  =title=  into  the  vra:title
+property. Note that it is not required   (and  not desirable) to add the
+=|map:xmlname|= properties to the actual schema files. Instead, put them
+in a separate file and load both into the conversion engine.
+
+    ==
+    @prefix   vra: <http://www.vraweb.org/vracore/vracore3#> .
+    @prefix   map: <http://cs.vu.nl/eculture/map/> .
+
+    # Map element-names to rdf:type
+
+    vra:Work map:xmlname "record" .
+
+    # Map xml attribute and sub-element names to properties
+
+    vra:title map:xmlname "title" .
+    ==
+
+---+ Default XML name mapping
+
+The initial XML to RDF mapper uses the XML attribute and tag-names for
+creating RDF properties.  It provides two optional processing steps that
+make identifiers fit better with the RDF practice.
+
+    1. It can add a _prefix_ to each XML name to create a fully
+    qualified URI.  E.g.,
+
+	==
+	?- rdf_current_ns(ahm, Prefix),
+	   load_xml_as_rdf('data.xml',
+			   [ prefix(Prefix)
+			   ]).
+	==
+
+    2. It `restyles' XML identifiers. Notably identifiers that contain a
+    dot (.) are hard to process using Turtle.  The library identifies
+    alphanumerical substrings of the XML name and constructs new
+    identifiers from these parts.  By default, predicates start with
+    a lowercase letter and each new part starts with an uppercase
+    letter, as in =oneTwo=.  Types (classes) start with an uppercase
+    letter, as in =OneTwo=.  This behaviour can be controlled with the
+    options =predicate_style= and =class_style= of load_xml_as_rdf/2.
+
+
+---+ Subsequent meta-data mapping
+
+Further mapping of meta-data consists of the following steps:
+
+    1. Fix the node-structure.
+    2. Re-establish internal links.
+    3. Re-establish external links.
+    4. Create a mapping schema that link the classes and predicates
+       of the source to the target schema (e.g., Dublin Core).
+    5. Assign URIs to blank nodes where applicable.
+
+---++ Fix the node-structure
+
+Source-data generally uses a record structure. Sometimes, each record is
+a simple flat list of properties, while in other cases it has a deeply
+nested structure.  We distinguish three types of properties:
+
+    1. Properties with a clear single literal value, such as a
+    collection-identifier.  Such properties are directly mapped to
+    RDF literals.
+
+    2. Properties with instance-specific scope that may have
+    annotations. Typical examples are the title (multiple, translations,
+    who has given the work a title, etc.) or a dimension (unit, which
+    dimension, etc.).  In this case, we create a new RDF node for each
+    instance.
+
+    3. Properties that link to external resources: persons (creator),
+    material (linking to a controlled vocabulary), etc.  In this case
+    the mapper unites multiple values that have the same properties.
+    E.g., we create a single creator node for all creators found in
+    a collection that have the same name a date of birth.
+
+    Addition (bibliographical) information is accumulated in the RDF
+    node.
+
+    PROBLEM: sometimes the additional information clarifies the relation
+    of the shared resource to a specific work and sometimes it privides
+    more information about the resource (e.g. place of birth).
+
+For cases (2) and (3) above, each metadata field has zero or more RDF
+nodes that act as value. The principal value is represented by
+rdf:value, while the others use the original property name. E.g., the
+AHM data contains
+
+    ==
+    Record title Title .
+    Record title.type Type .
+    ==
+
+This is translated into
+
+    ==
+    Record title [ a ahm:Title ;
+		   rdf:value "Some title" ;
+		   ahm:titleType Type ;
+		 ] .
+    ==
+
+If the work has multiple titles, each title is represented by a
+separate node.
+
+Because this step may involve using ordering information of the initial
+XML data that is still present in the raw converted RDF graph, this step
+must be performed before the data is saved.
+
+
+---++ Re-establish internal links
+
+This step is generally trivial. Some properties represent links to other
+works in the collection. The property value is typically a literal
+representing a unique identifier to the target object such as the
+collection identifier or a database key. This step replaces the
+predicate-value with an actual link to the target resource.
+
+
+---++ Re-establish external links
+
+This step re-establishes links from external resources such as
+vocabularies which we know to be used during the annotation. In this
+step we only make mapping for which we are _absolutely_ sure. I.e., if
+there is any ambiguity, which is not uncommon, we maintain the value as
+a blank node created in step (1).
+
+
+---++ Create a mapping schema
+
+It is adviced to maintain the original property- and type-names
+(classes) in the RDF because this
+
+    1. Allows to reason about possible subtle differences between the
+    source-specific properties and properties that come from generic
+    schemas such as Dublin Core. E.g., a creator as listed for a work in
+    a museum for architecture is typically an architect and the work in
+    the museum is some form of reproduction on the real physical object.
+    If we had replacd the original creator property by
+    =|dcterms:creator|=, this information is lost.
+
+    2. It makes it much easier to relate the RDF to the original
+    collection data.  One of the advantages of this is that it becomes
+    easier to reuse the result of semantic enrichment in the original
+    data-source.
+
+The toolkit provides a predicate to derive the initial schema from the
+converted data using the predicate make_schema/2:
+
+    * [[make_schema/2]]
+
+After running this predicate, the schema can be downloaded from the
+target graph through the web-interface, or it can be saved using
+rdf_save_turtle/2, as in
+
+    ==
+    ?- make_schema(data, schema).
+    ?- rdf_save_turtle('schema.ttl', [graph(schema)]).
+    ==
+
+
+---++ Assign URIs to blank nodes where applicable.
+
+Any blank node we may wish to link to from the outside world needs to be
+given a real URI.  The record-URIs are typically created from the
+collection-identifier. For other blank nodes, we look for distinguishing
+(short) literals.
+
+
+---+ Enriching the crude RDF
+
+The obtained RDF is generally rather crude. Typical `flaws' are:
+
+    * It contains literals where it should have references to other RDF
+    instances
+
+    * One probably wants proper resources for many of the blank nodes.
+
+    * Some blank nodes provide no semantic organization and must be
+    removed.
+
+    * At other place, intermediate instances must be created (as
+    blank nodes or named instances).
+
+    * In addition to the above, some literal fields need to be
+    rewritten, sometimes to (multiple) new literals and sometimes
+    to a named or bnode instance.
+
+Our rewrite language is a production-rule   system,  where the syntax is
+modelled  after  CHR  (a  committed-choice    language   for  constraint
+programming) and the triple notation is   based  on Turtle/SPARQL. There
+are 3 types of production rules:
+
+    * *|Propagation rules|* add triples
+
+    * *|Simplication rules|* delete triples and add new triples.
+
+    * *|Simpagation rules|* are in between. They match triples, delete
+    triples and add triples,
+
+The overall syntax for the three rule-types is (in the order above):
+
+    ==
+    <name>? @@ <triple>* ==> <guard>? , <triple>*.
+    <name>? @@ <triple>* <=> <guard>? , <triple>*.
+    <name>? @@ <triple>* \ <triple>* <=> <guard>? , <triple>*.
+    ==
+
+Here, <guard> is an arbitrary Prolog term. <triple> is a triple
+in a Turtle-like, but Prolog native, syntax:
+
+    ==
+    { <subject> , <predicate> , <object> }
+    ==
+
+Any of these fields  may  contain  a   variable,  written  as  a  Prolog
+variable: an uppercase letter followed by   zero or more letters, digits
+or the underscore. E.g.,  =Hello=,   =Hello_world=,  =A9=. Resources are
+either fully (single-)quoted Prolog atoms (E.g. 'http://example.com/me',
+or terms of the form <prefix>  :   <local>,  where <prefix> is a defined
+prefix (see rdf_register_ns/2) and <local> is   a possible quoted Prolog
+atom. E.g., =|vra:title|= or =|ulan:'Person'|= (note the quotes to avoid
+interpretation as a variable). Literals can use a more elaborate syntax:
+
+    ==
+    <string> ^^ <type>
+    <string> @ <lang>
+    <string>
+    literal(Atom)
+    ==
+
+Here, <string> is  a  double-quoted  Prolog   string  and  <type>  is  a
+resource. The form literal(Atom) can be  used   to  match the text of an
+otherwise unqualified literal with a variable.  I.e.,
+
+    ==
+    { S, vra:title, literal(A) }
+    ==
+
+has the same meaning as the SPARQL expression =|?S vra:title ?A FILTER
+isLiteral(?A)|=,
+
+Triples in the _condition_ side can  be   postfixed  using '?', in which
+case they are optional matches. If the triple cannot be matched, triples
+on the production-side that use the variable are ignored.
+
+Triples in the _condition_  can  also  be   enclosed  in  a  Prolog list
+([...]), In this case, the triples are   requested  to be in the *order*
+specified. Ordering is not an official part   of  the RDF specs, but the
+SWI-Prolog RDF store maintains the order of  triples in generated in the
+XML conversion process. An ordered set  can   match  multiple times on a
+given subject, where it AB can match both AAABBB and ABABAB.  Both forms
+appear in real-world XML data.
+
+Finally, on the _production_ side, the _object_ can take this form:
+
+    ==
+    bnode([ {<predicate> = <object>}
+	  ],
+	  [ {<option>}
+	  ])
+    ==
+
+This means, `for the object, create a bnode from the given <predicate> =
+<object> pairs'. The <option>s guide the  process. At this moment, there
+is only one option with two values:
+
+    ==
+    share_if(equal)
+    share_if(equal([<predicate>*]))
+    ==
+
+Without any option, each execution of the rule creates a new bnode. With
+the =share_if= option  =equal=,  it  uses   the  same  bnode-id  for all
+productions that produce the same   predicate-object  list (in canonical
+order, after removing duplicates). Using the last form, it considers two
+blank nodes equal if they have the same triples on the given predicates.
+All other predicates are simply added to the blank-node.
+
+
+---++ Renaming resoures (or naming blank-nodes)
+
+The construct =|{X}|= can be used on the  condition and action side of a
+rule. If used, there must be  exactly   one  such construct, one for the
+resource to be deleted  and  one  for   the  resource  to  be added. All
+resources for which the  condition  matches   are  renamed.  Below is an
+example rule. The first triple extracts the identifier. This triple must
+remain in the database. The =|\ {A}=|  binds the (blank node) identifier
+to be renamed. The two Prolog guards verify that the resource is a blank
+node and generate an identifier (URI).  The _action_ (_|{S}|_) gives the
+rule engine the URI that must be given to the matched =|{A}=|.
+
+    ==
+    work_uris @@
+    { A, vra:'idNumber.currentRepository', ID } \ {A} <=>
+	rdf_is_bnode(A),
+	literal_to_id(ID, ahm, S),
+	{S}.
+    ==
+
+
+---++ Putting triples in another graph
+
+Triples created by the _action_ side of a rule are added to the graph
+that is being rewritten. It is also possible to add them to another
+graph using the syntax below:
+
+    ==
+	{ S,P,O } >> Graph
+    ==
+
+E.g., if we want to store the information about person resources that
+we create in a graph named =persons=, we can so so using a rule like
+this:
+
+    ==
+    person @@
+    {S, creator, Name},
+    {S, 'creator.date_of_birth', Born} ?,
+    {S, 'creator.date_of_death', Died} ?,
+    {S, 'creator.role', Role} ?
+	    <=>
+	    Name \== "onbekend",
+	    name_to_id(Name, ahm, Creator),
+	    { S, vra:creator, Creator },
+	    { Creator, rdf:type, ulan:'Person' } >> persons,
+	    { Creator, vp:labelPreferred, Name } >> persons,
+	    { Creator, ulan:birthDate, Born }    >> persons,
+	    { Creator, ulan:deathDate, Died }    >> persons,
+	    { Creator, ulan:role, Role }	 >> persons.
+    ==
+
+
+---++ Utility predicates
+
+The rewriting process is often guided by a _guard_ which is, as already
+mentioned, an arbitrary Prolog goal. Because translation of repositories
+shares a lot of common tasks, we plan to develop a library for these.
+This section documents the available predicates.
+
+    * [[find_in_vocabulary/3]]
+    * [[literal_to_id/3]]
+    * [[name_to_id/3]]
+
+
+---+ Putting it all together (examples)
+
+Below we give some rules that we wrote to convert real data.
+
+---++ Deleting a triple
+
+Sometimes XML contains data that  simply   means  `nothing'.  We want to
+delete this data:
+
+    ==
+    {_, creator, "onbekend" } <=>
+	true.
+    ==
+
+Now, in the data from which this was  extracted, this is a bit too crude
+because some records keep data  about   the  creator even though his/her
+name is not known. Therefore, we preceed the   rule with the rule of the
+next section. Note that the order of   rules  matter: a rule is executed
+before the next one. In this particular   case we could have removed the
+=|{S, creator, "onbekend"}|= triple from the   example  below to make it
+match after the rule above is executed.
+
+
+---++ Preserving info about unknown creators
+
+The example below deals with entries in the database where the `creator'
+is unknown (Dutch: _onbekend_), but some  properties are known about him
+or her. The remainder of  the   condition  matches  possible information
+about this creator using an `optional' match. The _guard_ verifies there
+is at least some information  about   our  unknown creator. The _action_
+part of the rule associates a new blank node as a creator.
+
+    ==
+    creator_onbekend @@
+    {S, creator, "onbekend"},
+    {S, 'creator.date_of_birth', Born} ?,
+    {S, 'creator.date_of_death', Died} ?,
+    {S, 'creator.role', Role} ?
+	    <=>
+	    at_least_one_given([Born, Died, Role]),
+	    { S, vra:creator,
+	      bnode([ ulan:birthDate = Born,
+		      ulan:deathDate = Died,
+		      ulan:role = Role
+		    ])
+	    }.
+
+    at_least_one_given(Values) :-
+	    member(V, Values),
+	    ground(V), !.
+    ==
+
+---++ Negation
+
+Negation is only provided as Prolog negation--by-failure in the guard.
+This implies that we cannot use the =|{...}|= triple notation to test on
+the absence of a triple, but instead we need to use the SWI-Prolog
+RDF-DB primitive rdf/3. For example, to delete all person records that
+have no name, we can use the rule below. The first triple verifies the
+record-type. The second matches all triples on that record and the guard
+verifies that the subject has no triples for the property ahm:name.
+
+    ==
+    delete_no_name @@
+    { S, rdf:type, ahm:'Person' },
+    { S, _, _ }
+	    <=>
+	    \+ rdf(S, ahm:name, _).
+    ==
+
+---+ Running the toolkit
+
+Currently, there is no well-defined workflow for running the tools. The
+files run.pl and rewrite.pl contain a skeleton that I use to convert the
+data from AHM (Amsterdams Historisch Museum). The file run.pl loads
+relevant background data and defines run/0 to call the initial
+converter. The relevant steps of the initial converter are to load VRA
+and mapping.ttl that contains the map:xmlname declarations discussed
+above. Next, we load the XML into crude RDF using the call below. The
+options specify that the input in XML without namespaces (dialect =xml=
+rather than =xmlns=) and that the file contains XML elements named
+=record= as the desired unit of data for conversion.
+
+    ==
+    run(File) :-
+	    load_xml_as_rdf(File,
+			    [ dialect(xml),
+			      unit(record)
+			    ]).
+    ==
+
+The result can be browsed by typing =|?- triple20.|=
+
+The file rewrite.pl scripts the rewrite phase.  It sets up namespaces,
+calls to the rewrite predicates with the proper arguments and finally
+provides the rules.  Here are the toplevel predicates:
+
+    * [[rewrite/0]]
+    * [[rewrite/1]]
+    * [[rewrite/2]]
+    * [[list_rules/0]]
+
+Below is an example run, showing all available rules and running a
+single rule. The example demonstrates that rules are applied until a
+fixed-point is reached (i.e., the RDF database does not change by
+applying the rules).
+
+    ==
+    ?- [rewrite].
+    true.
+
+    ?- list_rules.
+    Defined RDF mapping rules:
+
+	    title_translations
+	    dimension
+	    work_uris
+	    creator_sequence
+	    creator_onbekend
+	    delete_unknown_creator
+	    delete_empty_literal
+	    creator
+	    material_aat
+	    related_object
+
+    true.
+
+    ?- rewrite(delete_empty_literal).
+    % Applying ... delete_empty_literal (1)
+    % 0.100 seconds; 23,456 changes; 2,008,860 --> 1,985,404 triples
+    % Step 1: generation 2,020,746 --> 2,044,202
+    % Applying ... delete_empty_literal (1)
+    % 0.000 seconds; no change
+    % Step 2: generation 2,044,202 --> 2,044,202
+    true.
+    ==
diff --git a/lib/xmlrdf/rdf_convert_util.pl b/lib/xmlrdf/rdf_convert_util.pl
new file mode 100644
index 0000000..821b5cc
--- /dev/null
+++ b/lib/xmlrdf/rdf_convert_util.pl
@@ -0,0 +1,130 @@
+:- module(rdf_convert_util,
+	  [ rdf_literal/1,		% @Term
+	    literal_to_id/3,		% +Literal, +NameSpace, -Id
+	    name_to_id/3,		% +Literal, +NameSpace, -Id
+	    edm_identifier/4		% +URI, +Orig, -New, NewURI
+	  ]).
+:- use_module(library(semweb/rdf_db)).
+:- use_module(library(apply)).
+
+
+%%	rdf_literal(Term) is semidet.
+%
+%	True if Term is an RDF literal
+
+rdf_literal(Term) :-
+	compound(Term),
+	Term = literal(_).
+
+%%	name_to_id(+Literal, +NS, -ID)
+%
+%	Similar to literal_to_id/3, but  intended   to  deal with person
+%	names.
+%
+%	@tbd	Now simply the same as literal_to_id/3
+
+name_to_id(Literal, NS, ID) :-
+	literal_to_id(Literal, NS, ID).
+
+%%	literal_to_id(+LiteralOrList, +NS, -ID) is det.
+%
+%	Generate an identifier from a literal  by mapping all characters
+%	that  are  not  allowed  in   a    (Turtle)   identifier  to  _.
+%	LiteralOrList can be a list. In this  case we generate an id for
+%	each element in LiteralOrList and append  these. A typical usage
+%	scenario is to add a type:
+%
+%	    ==
+%	    literal_to_id(['book-', Literal], NS, ID)
+%	    ==
+%
+%	Another is to add the label of the parent:
+%
+%	    ==
+%	    literal_to_id([ParentLit, '-', Literal], NS, ID)
+%	    ==
+%
+%	@tbd	Verify that the generated URI is unique!
+%	@tbd	Remove diacritics for non-iso-latin-1 text
+
+literal_to_id(Literals, NS, URI) :-
+	is_list(Literals), !,
+	maplist(literal_to_id, Literals, IDs),
+	atomic_list_concat(IDs, ID),
+	rdf_current_ns(NS, Prefix),
+	atom_concat(Prefix, ID, URI).
+literal_to_id(Literal, NS, URI) :-
+	literal_to_id(Literal, ID),
+	rdf_current_ns(NS, Prefix),
+	atom_concat(Prefix, ID, URI).
+
+literal_to_id(Literal, ID) :-
+	text_of_literal(Literal, Text),
+	text_to_id(Text, ID).
+
+text_of_literal(Var, _) :-
+	var(Var), !,
+	instantiation_error(Var).
+text_of_literal(literal(Lit), Text) :- !,
+	text_of_literal(Lit, Text).
+text_of_literal(type(_, Text), Text).
+text_of_literal(lang(_, Text), Text).
+text_of_literal(Text, Text) :-
+	atomic(Text).
+
+text_to_id(Text, Id) :-
+	unaccent_atom(Text, T1),
+	atom_codes(T1, Codes),
+	maplist(map_non_id_char, Codes, Codes1),
+	normalize_underscores(Codes1, Codes2),
+	atom_codes(Id, Codes2).
+
+map_non_id_char(0'_, 0'_) :- !.
+map_non_id_char(0'-, 0'-) :- !.
+map_non_id_char(C, C) :-
+	code_type(C, csym), !.
+map_non_id_char(_, 0'_).
+
+normalize_underscores([0'_|T0], T) :- !,
+	normalize_underscores(T0, T).
+normalize_underscores([], [0'_]) :- !.
+normalize_underscores(In, Out) :-
+	normalize_underscores_2(In, Out).
+
+normalize_underscores_2([], []).
+normalize_underscores_2([0'_|T0], Can) :- !,
+	normalize_underscores(T0, T),
+	(   T == [0'_]
+	->  Can = []
+	;   Can = [0'_|T]
+	).
+normalize_underscores_2([H|T0], [H|T]) :-
+	normalize_underscores_2(T0, T).
+
+
+%%	edm_identifier(URI, +Orig, +New, -NewURI)
+%
+%	Translate betweem the various EDM identifiers.  E.g.:
+%
+%	==
+%		edm_identifier(Proxy, proxy, aggregate, Aggregate)
+%	==
+%
+%	@error	domain_error(edm_uri, URI) if the URI doesn't contain
+%		=|/<orig>-|= or contains it multiple times.
+
+edm_identifier(URI, Orig, New, NewURI) :-
+	subst_pattern(Orig, OP),
+	(   sub_atom(URI, B, _, A, OP),
+	    sub_atom(URI, _, A, 0, End),
+	    sub_atom(URI, 0, B, _, Start),
+	    \+ sub_atom(End, _, _, _, OP),
+	    \+ sub_atom(Start, _, _, _, OP)
+	->  subst_pattern(New, NP),
+	    atomic_list_concat([Start, NP, End], NewURI)
+	;   domain_error(edm_uri, URI)
+	).
+
+subst_pattern(Text, Pattern) :-
+	atomic_list_concat([/, Text, -], Pattern).
+
diff --git a/lib/xmlrdf/rdf_name_bnodes.pl b/lib/xmlrdf/rdf_name_bnodes.pl
new file mode 100644
index 0000000..527f3ce
--- /dev/null
+++ b/lib/xmlrdf/rdf_name_bnodes.pl
@@ -0,0 +1,199 @@
+/*  File:    rdf_name_bnodes.pl
+    Author:  Jan Wielemaker
+    Created: Jan 14 2010
+    Purpose: Create URIs to blank nodes
+*/
+
+:- module(rdf_name_bnodes,
+	  [ name_bnodes/3,		% +Set, -Names, +Graph
+	    name_instances/4		% +Class, -P, -Pairs, +Graph
+	  ]).
+:- use_module(library(semweb/rdf_db)).
+:- use_module(library(pairs)).
+:- use_module(library(apply)).
+:- use_module(library(debug)).
+
+/** <module> Establish a URI for a set of RDF blank-nodes
+
+This library can propose and implement a naming schema for a set of RDF
+blank-nodes.
+
+Naming schemas:
+
+    * Create from a unique property
+    In this case, give preference to word-like properties over
+    database keys.  Make the property-literal turtle friendly.
+    We call this key <ID>.  Use <base><ID> as identifier.
+
+    * Create from a semi-unique property + property of parent.
+    If there is a property that is nearly unique and the nodes
+    are organised in a hierarchy, use the label of the parent
+    (recursively) to make the names unique.  Use
+    <base><parent-ID>-<ID>
+
+    * If the set can be split into multiple classes, each of
+    which can have unique names attached based on one of the
+    other schemas, use <base><class-ID>-<ID>
+
+    * If the blank-nodes are used exactly once as a property
+    of another resource, use <other>-<property-ID>.  If the property
+    appears multiple times, try <other>-<property-ID>-<ID> or,
+    if all fails, <other>-<property-ID>-<N>
+
+Steps:
+
+    * Find shared (literal) properties
+    * Split-by-class
+
+@see literal_to_id/3 for generating Turtle-friendly idenfiers.
+*/
+
+:- rdf_meta
+	name_instances(r, r, -, r).
+
+%%	name_instances(+Class, ?P, -Pairs, +Graph)
+
+name_instances(Class, P, Pairs, Graph) :-
+	findall(R, rdf(R, rdf:type, Class), Rs),
+	sort(Rs, Set),
+	name_bnodes(Set, P, Names, Graph),
+	pairs_keys_values(Pairs, Set, Names).
+
+
+%%	name_resources(+Resources, -Names, ?Graph)
+
+name_bnodes(Set, Names, Graph) :-
+	name_bnodes(Set, _, Names, Graph).
+
+name_bnodes(Set, P, Names, Graph) :-
+	length(Set, RCount),
+	shared_property(Set, P, Graph),
+	debug(name_bnodes, 'Trying property ~q', [P]),
+	maplist(local_name(Graph, P), Set, Names),
+	sort(Names, Sorted),
+	length(Sorted, NCount),
+	(   RCount == NCount
+	->  true
+	;   NU is RCount - NCount,
+	    debug(name_bnodes, '~D of ~D non-unique', [NU, RCount]),
+	    fail
+	).
+
+%%	local_name(+Graph, +P, +R, -Name) is nondet.
+%
+%	Propose a local name for R based on P.
+%
+%	@tbd	Add 'n' if the results starts with a digit
+
+local_name(Graph, P, R, Name) :-
+	findall(T, property_text(R, P, T, Graph), Ts),
+	Ts \== [], !,
+	maplist(text_to_id, Ts, IDL),
+	sort(IDL, SIDL),
+	atomic_list_concat(SIDL, -, Name).
+
+
+%%	property_text(+R, +P, -Text, ?Graph) is nondet.
+%
+%	Fetch a textual value for the property P.
+
+property_text(R, P, Text, Graph) :-
+	rdf(R, P, Value, Graph),
+	text_of(Value, Text).
+
+text_of(literal(X), Text) :- !,
+	text_of_literal(X, Text).
+text_of(R, Text) :-
+	rdf_is_bnode(R),
+	rdf_has(R, rdf:value, V),
+	text_of(V, Text).
+
+text_of_literal(Text, Text) :-
+	atom(Text), !.
+text_of_literal(lang(_, Text), Text).
+text_of_literal(type(_, Text), Text).
+
+%%	shared_property(+Set, -P, +Graph) is nondet.
+%
+%	True if P is a property that appears on all instances of Set.
+%
+%	@tbd	Should we also allow for super-properties?
+
+shared_property(Set, P, Graph) :-
+	map_list_to_pairs(property_count(Graph), Set, Keyed),
+	keysort(Keyed, KeySorted),
+	pairs_values(KeySorted, [H|T]),
+	property_of(P, Graph, H),
+	(   maplist(property_of(P, Graph), T)
+	->  true
+	).
+
+%%	property_count(+Graph, +R, -Count) is det.
+%
+%	Count is the number of distinct properties on the resource R.
+
+property_count(Graph, R, Count) :-
+	findall(P, rdf(R, P, _, Graph), Ps),
+	sort(Ps, Set),
+	length(Set, Count).
+
+%%	property_of(?P, +Graph, +Resource) is nondet.
+%
+%	True if P is a property on Resource in Graph.
+
+property_of(P, Graph, R) :-
+	atom(P), !,
+	(   rdf(R, P, _, Graph)
+	->  true
+	).
+property_of(P, Graph, R) :-
+	findall(P, rdf(R, P, _, Graph), Ps),
+	sort(Ps, Set),
+	member(P, Set).
+
+
+		 /*******************************
+		 *		UTIL		*
+		 *******************************/
+
+:- dynamic
+	text_id_cache/2.
+
+text_to_id(Text, Id) :-
+	(   text_id_cache(Text, Id0)
+	->  Id = Id0
+	;   text_to_id_raw(Text, Id0)
+	->  assertz(text_id_cache(Text, Id0)),
+	    Id = Id0
+	;   debug(name_bnodes, 'No id from ~q', [Text]),
+	    fail
+	).
+
+text_to_id_raw(Text, Id) :-
+	unaccent_atom(Text, T1),
+	atom_codes(T1, Codes),
+	maplist(map_non_id_char, Codes, Codes1),
+	normalize_underscores(Codes1, Codes2),
+	atom_codes(Id, Codes2).
+
+map_non_id_char(0'_, 0'_) :- !.
+map_non_id_char(0'-, 0'-) :- !.
+map_non_id_char(C, C) :-
+	code_type(C, csym), !.
+map_non_id_char(_, 0'_).
+
+normalize_underscores([0'_|T0], T) :- !,
+	normalize_underscores(T0, T).
+normalize_underscores([], [0'_]) :- !.
+normalize_underscores(In, Out) :-
+	normalize_underscores_2(In, Out).
+
+normalize_underscores_2([], []).
+normalize_underscores_2([0'_|T0], Can) :- !,
+	normalize_underscores(T0, T),
+	(   T == [0'_]
+	->  Can = []
+	;   Can = [0'_|T]
+	).
+normalize_underscores_2([H|T0], [H|T]) :-
+	normalize_underscores_2(T0, T).
diff --git a/lib/xmlrdf/rdf_rename.pl b/lib/xmlrdf/rdf_rename.pl
new file mode 100644
index 0000000..537a1a3
--- /dev/null
+++ b/lib/xmlrdf/rdf_rename.pl
@@ -0,0 +1,22 @@
+:- module(rdf_rename,
+	  [ rdf_rename/3		% +Old, -New, ?Graph
+	  ]).
+:- use_module(library(semweb/rdf_db)).
+
+%%	rdf_rename(+OldResource, +NewResource, ?Graph) is det.
+%
+%	Rename  a  resource,  changing  all   references  in  all  three
+%	positions of the triple. If Graph  is given, renaming is limited
+%	to triples that are associated to the matching graph.
+
+rdf_rename(Old, Old, _) :- !.
+rdf_rename(Old, New, Graph) :-
+	rdf_transaction(rename(Old, New, Graph), rename(Old, New)).
+
+rename(Old, New, G) :-
+	forall(rdf(Old, P, O, G),
+	       rdf_update(Old, P, O, G, subject(New))),
+	forall(rdf(S, Old, O, G),
+	       rdf_update(S, Old, O, G, predicate(New))),
+	forall(rdf(S, P, Old, G),
+	       rdf_update(S, P, Old, G, object(New))).
diff --git a/lib/xmlrdf/rdf_rewrite.pl b/lib/xmlrdf/rdf_rewrite.pl
new file mode 100644
index 0000000..ca236d1
--- /dev/null
+++ b/lib/xmlrdf/rdf_rewrite.pl
@@ -0,0 +1,926 @@
+:- module(rdf_rewrite,
+	  [ op(1200, xfx, (@@)),	% Name @@ Rule
+	    op(1180, xfx, ==>),		% Head ==> Body
+	    op(1180, xfx, <=>),		% Head <=> Body
+	    op(1100, xfx, \),		% Head \ Del <=> Body
+	    op(200,  fx, ^),		% ^Predicate
+	    op(700, xfx, ^^),		% Text^^Type
+	    op(700, xfx, @),		% Text@Lang
+	    op(200, xf, ?),		% Triple ?
+	    op(200, xfx, >>),		% Triple >> Graph
+					% Toplevel converter
+	    rdf_rewrite_rules/0,
+	    rdf_list_rule/1,		% +Name
+	    rdf_rewrite/2,		% +Graph, Rule
+	    rdf_rewrite/1,		% +Graph
+					% Runtime support
+	    subject_triple_sequence/3,	% +Pattern, -Data, +Graph
+	    rdf_assert_new/4,		% +S,+P,+O,+Graph
+	    rdf_retract_if_ground/4,	% +S,+P,+O,+Graph
+	    rdf_assert_if_ground/4,	% +S,+P,+O,+Graph
+	    rdf_set_lang/3,		% +LitIn, +Lang, -LitOut
+	    rdf_set_type/3,		% +LitIn, +Type, -LitOut
+					% Re-exporting
+	    rdf_rename/3		% +Old,+New,?Graph
+	  ]).
+:- use_module(library(semweb/rdf_db)).
+:- use_module(library(semweb/rdfs)).
+:- use_module(library(error)).
+:- use_module(library(apply)).
+:- use_module(library(lists)).
+:- use_module(library(debug)).
+:- use_module(library(uri)).
+:- use_module(library(option)).
+:- use_module(library(pairs)).
+:- use_module(rdf_rename).
+
+
+/** <module> A generic RDF rewrite engine
+
+Triple notation:
+
+	{Subject, Predicate, Object}
+
+Object is one of:
+
+	* URI, written as ns:local or 'full URI'
+	* "literal"
+	* "literal"^^URI
+	* "literal"@lang
+	* Variable
+
+Rename URIs:  {X} is a shorthand for {X,P,O},{S,X,O},{S,P,X}
+
+{S, ^vra:idNumber, ID} \ {S} <=>
+	a vra:'Work',
+	make_identifier(ID, URI),
+	{URI}.
+
+Map into
+
+rdf_mapping_rule(Id, Name, Graph, Actions, Options) :-
+	Code
+
+Where Actions is a conjunction of the following statements:
+
+    * rdf_retractall(S,P,O,Graph)
+    * rdf_retract_if_ground(S,P,O,Graph)
+    * rdf_assert_new(S,P,O,Graph)
+    * rdf_assert_if_ground(S,P,O,Graph)
+*/
+
+		 /*******************************
+		 *	   THE REWRITER		*
+		 *******************************/
+
+:- meta_predicate
+	rdf_rewrite(:),
+	rdf_rewrite(:, +).
+
+rdf_rewrite(Graph) :-
+	rdf_rewrite(Graph, _).
+
+rdf_rewrite(Module:Graph, Rule) :-
+	rdf_generation(G0),
+	rewrite_step(Module, Graph, Rule),
+	rdf_generation(G1),
+	debug(rdf_rewrite, 'Rewrite: generation ~D --> ~D',
+	      [G0,G1]),
+	G0 \== G1, !.
+
+rewrite_step(Module, Graph, Rule) :-
+	(   mapping_rules(Module, Rules),
+	    \+ \+ member(Rule-_, Rules)
+	->  true
+	;   existence_error(rule, Rule)
+	),
+	(   member(Rule-Pairs, Rules),
+	    nth1(I, Pairs, Id-Options),
+	    debug(rdf_rewrite, 'Applying ... ~q (~d)', [Rule, I]),
+	    rdf_generation(G0),
+	    rdf_statistics(triples(TC0)),
+	    statistics(cputime, T0),
+	    (	option(transaction(true), Options, true)
+	    ->	rdf_transaction(call_rewrite_rule(Id, Module, Graph, Options),
+				Rule)
+	    ;	call_rewrite_rule(Id, Module, Graph, Options)
+	    ),
+	    statistics(cputime, T1),
+	    rdf_statistics(triples(TC1)),
+	    rdf_generation(G1),
+	    T is T1 - T0,
+	    GDiff is G1-G0,
+	    (	GDiff == 0
+	    ->	debug(rdf_rewrite, '~3f seconds; no change', [T])
+	    ;   debug(rdf_rewrite, '~3f seconds; ~D changes; ~D --> ~D triples',
+		      [T, GDiff, TC0, TC1])
+	    ),
+	    fail
+	;   true
+	).
+
+call_rewrite_rule(Rule, Module, Graph, Options) :-
+	bnode_terms(Options, BNodes, BNodeOptions, _RestOptions),
+	BNodes \== [], !,
+	Template =.. [v,Actions|BNodes],
+	findall(Template,
+		Module:rdf_mapping_rule(Rule, _Name, Graph, Actions, Options),
+		Bag),
+	create_bnodes(BNodeOptions, 2, Bag, Graph),
+	call_actions(Bag).
+call_rewrite_rule(Rule, Module, Graph, Options) :-
+	findall(Actions,
+		Module:rdf_mapping_rule(Rule, _Name, Graph, Actions, Options),
+		Goals),
+	maplist(call, Goals).
+
+%%	rdf_rewrite_rules
+%
+%	List available rules
+
+rdf_rewrite_rules :-
+	format('Defined RDF mapping rules:~n~n', []),
+	(   mapping_rules(_, Rules),
+	    forall(append(Seen, [Rule-Ids|_], Rules),
+		   list_rule(Rule, Ids, Seen)),
+	    fail
+	;   true
+	),
+	format('~n', []).
+
+list_rule(Rule, [_Id], Seen) :-
+	memberchk(Rule-_, Seen), !,
+	format('\t~q ~t~40|(DISCONTIGUOUS)~n', [Rule]).
+list_rule(Rule, [_Id], _) :- !,
+	format('\t~q~n', [Rule]).
+list_rule(Rule, Ids, Seen) :-
+	memberchk(Rule-_, Seen), !,
+	length(Ids, Len),
+	format('\t~q ~t~40|(~d rules, DISCONTIGUOUS)~n', [Rule, Len]).
+list_rule(Rule, Ids, _) :-
+	length(Ids, Len),
+	format('\t~q ~t~40|(~d rules)~n', [Rule, Len]).
+
+
+%%	mapping_rules(?Module, -Rules) is nondet.
+%
+%	@param Rules is a list Name-IdOptionPairs
+
+mapping_rules(Module, Rules) :-
+	current_module(Module),
+	current_predicate(Module:rdf_mapping_rule/5),
+	findall(Name-(Id-Options),
+		clause(Module:rdf_mapping_rule(Id, Name, _, _, Options), _),
+		Pairs),
+	group_pairs_by_key(Pairs, Rules).
+
+
+%%	rdf_list_rule(+Name) is det.
+%
+%	Produce a listing of the generated Prolog for the named rule.
+
+:- meta_predicate
+	rdf_list_rule(:).
+
+rdf_list_rule(M:Name) :-
+	(   (   M == user
+	    ;	M == rdf_rewrite
+	    )
+	->  true
+	;   Module = M
+	),
+	(   current_module(Module),
+	    current_predicate(Module:rdf_mapping_rule/5),
+	    Head = Module:rdf_mapping_rule(_, Name, _, _, _),
+	    forall(clause(Head, Body),
+		   portray_clause((Head :- Body))),
+	    fail
+	;   true
+	).
+
+
+		 /*******************************
+		 *	    BNODE MAGIC		*
+		 *******************************/
+
+%%	bnode_terms(+RuleOptions, -BNodeTemplates, -BNodeOptions, -RestOpts)
+%
+%	Split the option-list
+
+bnode_terms([], [], [], []).
+bnode_terms([bnode(BN, Props, Options)|T0],
+	    [bnode(BN, Props)|BNT],
+	    [Options|OT],
+	    Rest) :-
+	bnode_terms(T0, BNT, OT, Rest).
+bnode_terms([H|T0], BN, O, [H|T]) :-
+	bnode_terms(T0, BN, O, T).
+
+
+%%	create_bnodes(+BNOptions, +Index, +Bag, +Graph)
+%
+
+create_bnodes([], _, _, _).
+create_bnodes([BNOptions|OT], I, Bag, Graph) :-
+	create_bnodes_arg(Bag, I, BNOptions, Graph),
+	I2 is I + 1,
+	create_bnodes(OT, I2, Bag, Graph).
+
+%%	create_bnodes(+BNTerms, +Options, +Graph)
+%
+%	Share blank nodes.
+%
+%	@param BNTerms is a list bnode(Id, Properties)
+%	@param Options describes the sharing.  Currently supports
+%
+%		* equal
+%		All properties must be equal
+%		* equal(ListOfProperties)
+%		Only the indicated properties must be equal
+
+create_bnodes_arg(BNTerms, I, Options, Graph) :-
+	option(share_if(Share), Options, equal),
+	key_bnodes(BNTerms, Share, I, KTerms),
+	group_pairs_by_key(KTerms, Grouped),
+	maplist(make_bnode(Graph), Grouped).
+
+make_bnode(Graph, _Key-[bnode(BN, P0)|BNodes]) :-
+	merge_bnodes(BNodes, BN, P0, Properties),
+	rdf_bnode(BN),
+	forall(member(P=O, Properties),
+	       rdf_assert_if_ground(BN, P, O, Graph)).
+
+merge_bnodes([], _, PL, PL).
+merge_bnodes([bnode(BN, P2)|T], BN, PL0, PL) :-
+	union(PL0, P2, PL1),
+	merge_bnodes(T, BN, PL1, PL).
+
+
+key_bnodes([], _, _, []).
+key_bnodes([Templ|T0], Share, I, [Keyed|T]) :-
+	key_bnode(Share, I, Templ, Keyed),
+	key_bnodes(T0, Share, I, T).
+
+
+key_bnode(Equal, I, Template, Key-BNode) :- !,
+	arg(I, Template, BNode),
+	arg(2, BNode, Properties),
+	(   Equal == equal
+	->  sort(Properties, Key)
+	;   Equal = equal(L)
+	->  maplist(pvalues(Properties), L, Key)
+	;   domain_error(share_if, Equal)
+	).
+
+pvalues([], _, []).
+pvalues([P=V|T0], P, [V|T]) :-
+	ground(V), !,
+	pvalues(T0, P, T).
+pvalues([_|T0], P, T) :-
+	pvalues(T0, P, T).
+
+
+%%	call_actions(+Templates)
+%
+%	Call the actions associated with  each template-instantiation of
+%	the findall.
+
+call_actions([]).
+call_actions([Template|T]) :-
+	arg(1, Template, Actions),
+	Actions,
+	call_actions(T).
+
+
+		 /*******************************
+		 *	  TERM-EXPANSION		*
+		 *******************************/
+
+%%	expand_rule(+Rule, -Clause) is det.
+%
+%	Expand the rule-language into  proper   Prolog  rules. Rules are
+%	clauses for rdf_mapping_rule/4.
+
+expand_rule(Name@@Rule, Clause) :-
+	rule_id(Id),
+	expand_rule(Rule, Name, Id, Clause).
+expand_rule(Rule, Clause) :-
+	rule_term(Rule), !,
+	(   rule_id(Id),
+	    expand_rule(Rule, Id, Id, Clause)
+	->  true
+	;   print_message(warning, illegal_rdf_rule)
+	).
+expand_rule(Term0, Term) :-
+	expand_rdf(Term0, Term),
+	Term0 \== Term.
+
+
+expand_rule((Keep \ Delete <=> Body), Name, Id,
+	    (rdf_mapping_rule(Id, Name, Graph, Actions, Options) :-
+	    	Rule)) :- !,
+	expand_body(Body, Guard, Add, Options0),
+	actions(Graph, Delete, Add, Actions),
+	(   Actions = rdf_rename(_,_,_),
+	    Options0 == []
+	->  Options = [transaction(false)]
+	;   Options = Options0
+	),
+	rule_body(Graph, Keep, Delete, Guard, Rule).
+expand_rule((Delete <=> Body), Name, Id,
+	    (rdf_mapping_rule(Id, Name, Graph, Actions, Options) :-
+	    	Rule)) :- !,
+	expand_body(Body, Guard, Add, Options),
+	actions(Graph, Delete, Add, Actions),
+	rule_body(Graph, true, Delete, Guard, Rule).
+expand_rule((Keep ==> Body), Name, Id,
+	    (rdf_mapping_rule(Id, Name, Graph, Actions, Options) :-
+	    	Rule)) :- !,
+	expand_body(Body, Guard, Add, Options),
+	actions(Graph, true, Add, Actions),
+	rule_body(Graph, Keep, true, Guard, Rule).
+
+rule_term(_<=>_).
+rule_term(_==>_).
+
+%%	rule_id(-Id)
+%
+%	Give  an  identifier  to  the  rule.    Currently   we  use  the
+%	source-location. We probably need some way to name the rule, but
+%	a good syntax is hard. (Name  @  Rule)   as  used  by CHR is not
+%	possible because @ is already used for language-tagged literals.
+
+rule_id(Id) :-
+	source_location(File, Line),
+	uri_file_name(URI, File),
+	atomic_list_concat([URI, #, Line], Id).
+
+%%	expand_body(+Body, -Guard, -Add, -Options) is det.
+%
+%	Split the body into two goal-lists; one describing the
+%	guard and one adding data.
+
+expand_body(Body, Guard, Add, Options) :-
+	comma_list(Body, Members),
+	partition(is_triple, Members, Add0, Guard0),
+	phrase(expand_add(Add0, MoreGuard, Options), Add),
+	append(Guard0, MoreGuard, Guard1),
+	expand_rdf(Guard1, Guard).
+
+is_triple(V) :-
+	var(V), !, fail.
+is_triple({}(_)).
+is_triple({}(_)>>_).
+
+%%	expand_add(+In, -Goal, -Options)//
+%
+%	Expand object-lists in triples to generate a blank-node
+
+expand_add([], [], []) --> [].
+expand_add([Triple|T0], [rdf_bnode(BN)|Goals], Options) -->
+	{ triple(Triple, S,P,O,G),
+	  nonvar(O),
+	  O = bnode(Properties), !
+	},
+	g_triple(G,S,P,BN),
+	bnode_triples(Properties, BN),
+	expand_add(T0, Goals, Options).
+expand_add([Triple|T0], Goals,
+	   [bnode(BN, Properties, BNOptions)|Options]) -->
+	{ triple(Triple, S,P,O,G),
+	  nonvar(O),
+	  O = bnode(Properties0, BNOptions0), !,
+	  expand_rdf(Properties0, Properties),
+	  expand_rdf(BNOptions0, BNOptions)
+	},
+	g_triple(G,S,P,BN),
+	expand_add(T0, Goals, Options).
+expand_add([Triple|T0], [Fix|Goals], Options) -->
+	{ triple(Triple, S,P,O,G),
+	  nonvar(O),
+	  (   O = (Var@Lan)
+	  ->  Fix = rdf_set_lang(Var,Lan,O2)
+	  ;   O = (Var@Type)
+	  ->  Fix = rdf_set_type(Var,Type,O2)
+	  )
+	}, !,
+	g_triple(G,S,P,O2),
+	expand_add(T0, Goals, Options).
+expand_add([X|T0], Goals, Options) -->
+	[X],
+	expand_add(T0, Goals, Options).
+
+g_triple(-, S,P,O) --> !,
+	[ {S,P,O} ].
+g_triple(G, S,P,O) -->
+	[ {S,P,O} >> G ].
+
+bnode_triples([], _) --> [].
+bnode_triples([P=O|T], S) -->
+	[ {S,P,O} ],
+	bnode_triples(T, S).
+
+
+%%	actions(+Graph, +Delete, +AddList, -Actions) is det.
+%
+%	Create an action-goal from the list of   RDF  objects to add and
+%	delete.
+
+actions(Graph, Delete, AddList, Actions) :-
+	comma_list(Delete, DelList0),
+	flatten(DelList0, DelList),	% Deal with sequences
+	join_actions(Graph, DelList, AddList, ActionList),
+	comma_list(Actions, ActionList).
+
+join_actions(Graph, DelList, AddList, Actions) :-
+	select(Del, DelList, RDel),
+	single_resource(Del, R0),
+	select(Add, AddList, RAdd),
+	single_resource(Add, R1), !,
+	no_more_single_updates(RDel),
+	no_more_single_updates(RAdd),
+	Actions = [rdf_rename(R0, R1, Graph)|RActions],
+	join_actions(Graph, RDel, RAdd, RActions).
+join_actions(Graph, DelList, AddList, Actions) :-
+	delete_actions(DelList, Graph, CondVars, Actions, AddActions),
+	add_actions(AddList, Graph, CondVars, AddActions, []).
+
+single_resource({R}, R) :-
+	var(R), !.
+single_resource({R}, R) :-
+	nonvar(R),
+	R \= (_,_).
+
+no_more_single_updates(List) :-
+	member(X, List),
+	single_resource(X, _), !,
+	representation_error(multiple_single_resources).
+no_more_single_updates(_).
+
+
+delete_actions([], _, [], L, L).
+delete_actions([X|_], _, _, _, _) :-
+	var(X), !,
+	instantiation_error(X).
+delete_actions([Triple?|T0], Graph,
+	       CondVars,
+	       [rdf_retract_if_ground(S,P,O,Graph)|T], L) :- !,
+	expanded_triple(Triple, S,P,O),
+	term_variables(Triple, CondVars, CVTail),
+	delete_actions(T0, Graph, CVTail, T, L).
+delete_actions([Triple|T0], Graph, CondVars,
+	       [rdf_retractall(S,P,O,Graph)|T], L) :-
+	expanded_triple(Triple, S,P,O),
+	delete_actions(T0, Graph, CondVars, T, L).
+
+
+%%	add_actions(+Triples, +Graph, +CondVars, +Actions, ?ActionTail)
+%
+%	@tbd	conditional-variable computation in ==> rules is missing,
+%		which is why disabled this and always use
+%		rdf_assert_if_ground/4 for now.  This works fine, but
+%		makes it harder to track bugs in rules.
+
+add_actions([], _, _, L, L).
+add_actions([Triple>>Graph|T0], Graph0, CV, Actions, L) :- !,
+	add_actions([Triple], Graph, CV, Actions, Tail),
+	add_actions(T0, Graph0, CV, Tail, L).
+add_actions([Triple|T0], Graph, CV, [Action|T], L) :-
+	Action = rdf_assert_if_ground(S,P,O, Graph),
+	expanded_triple(Triple, S,P,O),
+	add_actions(T0, Graph, CV, T, L).
+
+
+%%	rule_body(+Graph, +Keep, +Delete, +Guard:list, -Rule)
+%
+%	Construct the actual body for our mapping rule.
+%
+%	@tbd:	Use the RDF query optimizer to finish the job.
+
+rule_body(Graph, Keep, Delete, GuardList, Rule) :-
+	comma_list(Guard, GuardList),
+	make_goal((Keep, Delete, Guard), Graph, Rule0),
+	expand_goal(Rule0, Rule).
+
+make_goal(G, _, G) :-
+	var(G), !.
+make_goal((A0,B0), Graph, (A,B)) :- !,
+	make_goal(A0, Graph, A),
+	make_goal(B0, Graph, B).
+make_goal((A0;B0), Graph, (A;B)) :- !,
+	make_goal(A0, Graph, A),
+	make_goal(B0, Graph, B).
+make_goal((A0->B0), Graph, (A->B)) :- !,
+	make_goal(A0, Graph, A),
+	make_goal(B0, Graph, B).
+make_goal(List, Graph, Goal) :-
+	is_list(List), !,
+	(   same_subject_triples(List, Subject, Pairs)
+	->  Goal = subject_triple_sequence(Subject, Pairs, Graph)
+	;   type_error(same_subject_triples, List)
+	).
+make_goal(X, Graph, Goal) :-
+	expanded_triple(X, S,P,O), !,
+	make_rdf_goal(S,P,O, Graph, Goal).
+make_goal(T, _, true) :-
+	single_resource(T, _), !.
+make_goal(X?, Graph, (G*->true;true)) :- !,
+	make_goal(X, Graph, G).
+make_goal(G, _, G).
+
+make_rdf_goal(S,SP,O, _, Goal) :-
+	nonvar(SP),
+	SP = ^P, !,
+	Goal = rdf_has(S, P, O).
+make_rdf_goal(S,P,O, _, Goal) :-
+	Goal = rdf(S, P, O).
+
+
+%%	same_subject_triples(+List, -Subject, -PredObjPairs) is semidet.
+%
+%	Matches [{S,P,O}, {S,P2,O2}, ...]
+
+same_subject_triples([H|T0], S, [P-O|T]) :-
+	expanded_triple(H, S,P,O),
+	same_subject_triples_2(T0, S, T).
+
+same_subject_triples_2([], _, []).
+same_subject_triples_2([H|T0], S, [P-O|T]) :-
+	expanded_triple(H, S1,P,O),
+	S1 == S,
+	same_subject_triples_2(T0, S, T).
+
+
+%%	expanded_triple(+Term, -S,-P,-O) is semidet.
+%
+%	As triple/4, expanding the 3 arguments.
+
+expanded_triple(Triple, S,P,O) :-
+	triple(Triple, S0, P0, O0),
+	expand_resource(S0, S),
+	expand_predicate(P0, P),
+	expand_object(O0, O).
+
+expand_predicate(P, P) :-
+	var(P), !.
+expand_predicate(^P0, ^P) :-
+	expand_resource(P0, P).
+expand_predicate(P0, P) :-
+	expand_resource(P0, P).
+
+
+%%	triple(+Term, -S,-P,-O) is semidet.
+%
+%	True if Term is of the form   {S,P,O}. Note that all {...} terms
+%	are mapped to the canonical Prolog term   {}(Arg), so we must be
+%	careful  when  matching.  In   particular,    {_}   =  {_,_,_}!.
+%	Alternative,  we  could  use  subsumbes/2  to  do  the  checking
+%	properly.
+
+triple({}(X), S,P,O) :-
+	nonvar(X),
+	X = (S,X2),
+	nonvar(X2),
+	X2 = (P,O).
+
+triple(T>>G, S,P,O,G) :- !,
+	triple(T,S,P,O).
+triple(T, S,P,O,-) :- !,
+	triple(T,S,P,O).
+
+
+%%	expand_resource(+In, -Out) is det.
+
+expand_resource(X, X) :-
+	var(X), !.
+expand_resource(X, X) :-
+	atom(X), !.
+expand_resource(NS:Local, Global) :-
+	must_be(atom, NS),
+	must_be(atom, Local),
+	(   rdf_current_ns(NS, Full)
+	->  atom_concat(Full, Local, Global)
+	;   existence_error(namespace, NS)
+	).
+
+expand_object(O, O) :-
+	var(O), !.
+expand_object(literal(X), literal(X)) :- !.
+expand_object("", literal('')) :- !.
+expand_object(O, O) :-
+	atom(O), !.
+expand_object(NS:Local, O) :- !,
+	expand_resource(NS:Local, O).
+expand_object(String^^R, literal(type(Type, Text))) :- !,
+	to_literal_text(String, Text),
+	expand_resource(R, Type).
+expand_object(String@Lang, literal(lang(Lang, Text))) :- !,
+	to_literal_text(String, Text).
+expand_object(String, literal(Text)) :-
+	to_literal_text(String, Text).
+
+to_literal_text(Var, Var) :-
+	var(Var), !.
+to_literal_text(String, Text) :-
+	string(String), !,
+	atom_concat(String, '', Text).
+to_literal_text(String, Text) :-
+	atom_codes(Text, String).
+
+%%	comma_list(+Conjunction, -List) is det.
+%%	comma_list(-Conjunction, +List) is det.
+%
+%	Translate between a Prolog  conjunction   and  a  list. Elements
+%	=true= are removes from both  translations.   The  empty list is
+%	mapped to a single =true=.
+
+comma_list(Conj, List) :-
+	is_list(List),
+	list_comma(List, Conj).
+comma_list(Conj, List) :-
+	phrase(comma_list(Conj), List).
+
+comma_list(A)     --> {var(A)}, !, [A].
+comma_list((A,B)) --> !, comma_list(A), comma_list(B).
+comma_list(true)  --> !, [].
+comma_list(A)     --> [A].
+
+list_comma([], true).
+list_comma([H|T], C) :-
+	(   T == []
+	->  C = H
+	;   H == true
+	->  list_comma(T, C)
+	;   C = (H,B),
+	    list_comma(T, B)
+	).
+
+
+		 /*******************************
+		 *	 GENERAL CLAUSES	*
+		 *******************************/
+
+%%	expand_rdf(+Term0, -Term) is det.
+%
+%	Expand our symbolic representation to RDF structures anywhere in
+%	the code. This is somewhat dubious  because mapping "..." into a
+%	literal can be ambiguous and such   is  mapping ns:local for RDF
+%	namespaces.
+%
+%	@tbd Should we introduce `Term to quote terms?
+
+expand_rdf(Term0, Term) :-
+	compound(Term0), !,
+	(   expand_to_literal(Term0, Term)
+	->  true
+	;   Term0 = NS:Local,
+	    atom(NS), atom(Local)
+	->  expand_resource(NS:Local, Term)
+	;   Term0 =.. [F|Args0],
+	    maplist(expand_rdf, Args0, Args),
+	    Term =.. [F|Args]
+	).
+expand_rdf(Term, Term).
+
+expand_to_literal(Text^^Type, literal(type(URI, Value))) :- !,
+	to_literal_text(Text, Value),
+	expand_resource(Type, URI).
+expand_to_literal(Text@Lang, literal(lang(Lang, Value))) :- !,
+	to_literal_text(Text, Value).
+expand_to_literal(Text, literal(Value)) :-
+	is_list(Text),
+	maplist(sensible_char, Text),
+	atom_codes(Value, Text).
+
+sensible_char(C) :-
+	integer(C),
+	sensible_char_2(C).
+
+sensible_char_2(C) :-
+	between(32, 127, C), !.
+sensible_char_2(0'\t).
+sensible_char_2(0'\n).
+
+
+		 /*******************************
+		 *	      RUNTIME		*
+		 *******************************/
+
+%%	rdf_assert_new(+S,+P,+O,+Graph) is det.
+
+rdf_assert_new(S,P,O,Graph) :-
+	rdf(S,P,O,Graph), !.
+rdf_assert_new(S,P,O,Graph) :-
+	rdf_assert(S,P,O,Graph).
+
+%%	rdf_assert_if_ground(+S,+P,+O,+Graph) is det.
+%
+%	Assert if the goal is instantiated.  Used to deal with
+%	optional assertions
+
+rdf_assert_if_ground(S,P,O,Graph) :-
+	nonvar(S), nonvar(P), nonvar(O), !,
+	(   rdf(S,P,O,Graph)
+	->  true
+	;   rdf_assert(S,P,O,Graph)
+	).
+rdf_assert_if_ground(_,_,_,_).
+
+%%	rdf_retract_if_ground(+S,+P,+O,+Graph) is det.
+%
+%	Retract if the goal is instantiated.  Used to deal with
+%	optional retraction
+
+rdf_retract_if_ground(S,P,O,Graph) :-
+	nonvar(S), nonvar(P), nonvar(O), !,
+	rdf_retractall(S,P,O,Graph).
+rdf_retract_if_ground(_,_,_,_).
+
+
+%%	subject_triple_sequence(?S, +Pattern, ?Graph) is nondet.
+%
+%	True if S has the P-O pairs from Pattern in the same order as in
+%	Pattern.
+
+subject_triple_sequence(S, Pattern, Graph) :-
+	best_guard(Pattern, S, Guard),
+	findall(S, Guard, SList),
+	sort(SList, SSet),
+	assertion(maplist(atom, SSet)),
+	member(S, SSet),
+	findall(P-O, rdf(S, P, O, Graph), Data),
+	sequence_in(Pattern, Data).
+
+best_guard([], S, rdf_subject(S)).
+best_guard([P-O|T], S, Guard) :-
+	copy_term(P-O, P1-O1),
+	estimate(rdf(S,P1,O1), C0),
+	best_guard(T, S, C0, rdf(S,P1,O1), Guard).
+
+best_guard([], _, _, G, G).
+best_guard([P-O|T], S, C0, G0, G) :-
+	copy_term(P-O, P1-O1),
+	estimate(rdf(S,P1,O1), C1),
+	(   C1 < C0
+	->  best_guard(T, S, C1, rdf(S,P1,O1), G)
+	;   best_guard(T, S, C0, G0, G)
+	).
+
+estimate(rdf(_,P,O), C) :-
+	atom(P), var(O), !,
+	rdf_predicate_property(P, triples(C)).
+estimate(rdf(S,P,O), C) :-
+	rdf_estimate_complexity(S,P,O,C).
+
+
+
+
+%%	sequence_in(+ListPattern, -Data) is nondet.
+%
+%	True if ListPattern appears in Data.   There are two patterns of
+%	interest to us, for which we  give   an  example  if the pattern
+%	appears three times:
+%
+%		a,b,a,b,a,b
+%		a,a,a,b,b,b
+%
+%	In both cases, the pattern [a,b]  must   match  3 times. To deal
+%	with this, we first try to match the 2nd type.
+%
+%	@param ListPattern is a list of Pred-Object pairs.
+
+sequence_in(Pattern, Data) :-
+	maplist(indices(Data), Pattern, Places),
+	maplist(arity, Places, [Len|Lens]),
+	all_same(Lens, Len),
+	Array =.. [d|Data],
+	between(1, Len, I),
+	data(Pattern, I, Places, Array).
+
+data([], _, _, _).
+data([H|T], I, [P|PT], Data) :-
+	arg(I, P, Place),
+	arg(Place, Data, H),
+	data(T, I, PT, Data).
+
+indices(Data, H, Indices) :-
+	findall(I, nth1(I, Data, H), IndexList),
+	Indices =.. [i|IndexList].
+
+arity(Term, Arity) :-
+	functor(Term, _, Arity).
+
+all_same([], _).
+all_same([H|T], H) :-
+	all_same(T, H).
+
+
+%%	rdf_set_lang(+O0, +Lang, -O)
+%
+%	Set/change the language of a literal
+
+rdf_set_lang(Lit, Lang, literal(lang(Lang, Text))) :-
+	text_of_literal(Lit, Text).
+
+%%	rdf_set_type(+O0, +Lang, -O)
+%
+%	Set/change the type of a literal
+
+rdf_set_type(Lit, Type, literal(type(Type, Text))) :-
+	text_of_literal(Lit, Text).
+
+text_of_literal(Var, _) :-
+	var(Var), !,
+	fail.
+text_of_literal(literal(Lit), Text) :- !,
+	text_of_literal(Lit, Text).
+text_of_literal(lang(_, Text), Text) :- !.
+text_of_literal(type(_, Text), Text) :- !.
+text_of_literal(Text, Text).
+
+
+		 /*******************************
+		 *	      EXPANSION		*
+		 *******************************/
+
+user:term_expansion(In, Out) :-
+	prolog_load_context(module, Module),
+	predicate_property(Module:rdf_rewrite(_),
+			   imported_from(rdf_rewrite)),
+	expand_rule(In, Out).
+
+
+		 /*******************************
+		 *	 PCE EMACS SUPPORT	*
+		 *******************************/
+
+:- multifile
+	emacs_prolog_colours:term_colours/2,
+	emacs_prolog_colours:goal_colours/2,
+	emacs_prolog_colours:style/2,
+	emacs_prolog_colours:identify/2,
+	prolog:called_by/2.
+
+term_colours((_Name@@Rule),
+	     expanded - [ identifier, RuleColours ]) :-
+	term_colours(Rule, RuleColours).
+term_colours((Head <=> Body),
+	     expanded - [ HeadColours, BodyColours ]) :-
+	head_colours(Head, HeadColours),
+	body_colours(Body, BodyColours).
+term_colours((Head ==> Body),
+	     expanded - [ HeadColours, BodyColours ]) :-
+	head_colours(Head, keep, HeadColours),
+	body_colours(Body, BodyColours).
+
+head_colours((Keep \ Del),
+	     expanded - [ KeepColours, DelColours ]) :- !,
+	head_colours(Keep, keep, KeepColours),
+	head_colours(Del, del, DelColours).
+head_colours(Del, Colours) :-
+	head_colours(Del, del, Colours).
+
+%%	head_colours(+BodyTerm, +KeepDel, -Colours) is det.
+
+head_colours(Var, _, classify) :-
+	var(Var).
+head_colours((A,B), Keep, control - [AC, BC]) :- !,
+	head_colours(A, Keep, AC),
+	head_colours(B, Keep, BC).
+head_colours(X?, Keep, optional - [Colours ]) :- !,
+	head_colours(X, Keep, Colours).
+head_colours(List, Keep, sequence - Colours) :-
+	is_list(List), !,
+	head_sequence(List, Keep, Colours).
+head_colours({_}, keep, keep_triple - [classify ]) :- !.
+head_colours({_}, del, del_triple - [classify ]) :- !.
+head_colours(_, _, classify).
+
+head_sequence([], _, []).
+head_sequence([H|T0], Keep, [C|T]) :-
+	head_colours(H, Keep, C),
+	head_sequence(T0, Keep, T).
+
+%%	body_colours(+BodyTerm, -Colours) is det.
+
+body_colours(Var, classify) :-
+	var(Var), !.
+body_colours((A,B), control - [AC, BC]) :- !,
+	body_colours(A, AC),
+	body_colours(B, BC).
+body_colours(X>>_, redirect - [Colours, graph]) :- !,
+	body_colours(X, Colours).
+body_colours({_}, add_triple - [classify ]) :- !.
+body_colours(_, body).
+
+emacs_prolog_colours:term_colours(Term, Colours) :-
+	term_colours(Term, Colours).
+
+:- op(990, xfx, :=).			% allow compiling without XPCE
+:- op(200, fy, @).
+
+emacs_prolog_colours:style(add_triple, style(background := '#a2ffa1')).
+emacs_prolog_colours:style(del_triple, style(background := '#ffb3b3')).
+emacs_prolog_colours:style(optional,   style(bold := @on)).
+emacs_prolog_colours:style(sequence,   style(bold := @on)).
+emacs_prolog_colours:style(graph,      style(bold := @on)).
diff --git a/lib/xmlrdf/rdf_schema.pl b/lib/xmlrdf/rdf_schema.pl
new file mode 100644
index 0000000..adb390b
--- /dev/null
+++ b/lib/xmlrdf/rdf_schema.pl
@@ -0,0 +1,132 @@
+:- module(rdf_schema,
+	  [ make_schema/2		% +DataGraph, +SchemaGraph
+	  ]).
+:- use_module(library(semweb/rdf_db)).
+:- use_module(library(semweb/rdfs)).
+
+%%	make_schema(+Graph, +SchemaGraph) is det.
+%
+%	Create an initial  schema  by   providing  definitions  for  all
+%	predicates and types (classes) used  in   Graph.  The  schema is
+%	dumped into the graph SchemaGraph.
+%
+%	This  predicate  is  typically   used    _after_   running   the
+%	rewrite-rules to reflect renamed typed and properties.
+
+make_schema(Data, Schema) :-
+	rdf_retractall(_,_,_,Schema),
+	rdf_transaction(make_schema_(Data, Schema), make_schema).
+
+make_schema_(Data, Schema) :-
+	forall(predicate_in_graph(Data, P),
+	       define_predicate(P, Data, Schema)),
+	forall(type_in_graph(Data, Class),
+	       define_type(Class, Schema)).
+
+define_predicate(P, _, _) :-
+	rdf_global_id(rdf:_, P), !.
+define_predicate(P, _, _) :-
+	rdf_global_id(rdfs:_, P), !.
+define_predicate(P, DataGraph, Graph) :-
+	copy_data(P, Graph),
+	rdf_assert(P, rdf:type, rdf:'Property', Graph),
+	assign_label(P, Graph),
+	predicate_statistics(DataGraph, P, _C,
+			     _Subjects, _Objects,
+			     Domains, Ranges),
+	(   Domains = [Dom]
+	->  rdf_assert(P, rdfs:domain, Dom, Graph)
+	;   true
+	),
+	(   Ranges = [Range]
+	->  rdf_assert(P, rdfs:range, Range, Graph)
+	;   true
+	).
+
+
+define_type(C, Graph) :-
+	copy_data(C, Graph),
+	rdf_assert(C, rdf:type, rdfs:'Class', Graph),
+	assign_label(C, Graph).
+
+
+assign_label(S, Graph) :-
+	(   rdf(S, rdfs:label, _)
+	->  true
+	;   rdfs_label(S, Label),
+	    Label \== S
+	->  rdf_assert(S, rdfs:label, literal(Label), Graph)
+	;   true
+	).
+
+
+copy_data(S, Graph) :-
+	rdf_retractall(S,_,_,Graph),
+	forall((rdf(S,P,O,G), G \== Graph),
+	       rdf_assert(S,P,O,Graph)).
+
+
+		 /*******************************
+		 *	        QUERY		*
+		 *******************************/
+
+predicate_in_graph(Graph, P) :-
+	rdf_current_predicate(P),
+	once(rdf(_,P,_,Graph)).
+
+%%	type_in_graph(+Graph, -Class)
+%
+%	Generate the unique types in Graph
+
+:- thread_local
+	type_seen/1.
+
+type_in_graph(Graph, Class) :-
+	call_cleanup(type_in_graph2(Graph, Class),
+		     retractall(type_seen(_))).
+
+type_in_graph2(Graph, Class) :-
+	subject_in_graph(Graph, S),
+	(   rdf(S, rdf:type, Class)
+	*-> true
+	;   rdf_equal(Class, rdfs:'Resource')
+	),
+	(   type_seen(Class)
+	->  fail
+	;   assert(type_seen(Class))
+	).
+
+
+subject_in_graph(Graph, S) :-
+	rdf_subject(S),
+	once(rdf(S, _, _, Graph)).
+
+predicate_statistics(Graph, P, C, Subjects, Objects, Domains, Ranges) :-
+	findall(S-O, rdf(S,P,O,Graph), Pairs),
+	length(Pairs, C),
+	pairs_keys_values(Pairs, Ss, Os),
+	sort(Ss, Subjects),
+	sort(Os, Objects),
+	resources_types(Subjects, Graph, Domains),
+	resources_types(Objects, Graph, Ranges).
+
+resources_types(URIs, Graph, Types) :-
+	findall(T, resource_type_in(URIs, Graph, T), TList),
+	sort(TList, Types).
+
+resource_type_in(List, Graph, T) :-
+	member(URI, List),
+	resource_type(URI, Graph, T).
+
+%%	resource_type(+URI, +Graph, -Type) is det.
+
+resource_type(URI, Graph, T) :-
+	(   URI = literal(Lit)
+	->  (   Lit = type(T, _)
+	    ->	true
+	    ;	rdf_equal(T, rdfs:'Literal')
+	    )
+	;   rdf(URI, rdf:type, T, Graph)
+	*-> true
+	;   rdf_equal(T, rdfs:'Resource')
+	).
diff --git a/lib/xmlrdf/xmlrdf.pl b/lib/xmlrdf/xmlrdf.pl
new file mode 100644
index 0000000..689fac1
--- /dev/null
+++ b/lib/xmlrdf/xmlrdf.pl
@@ -0,0 +1,615 @@
+/*  File:    xmlrdf.pl
+    Author:  Jan Wielemaker
+    Created: Oct 26 2009
+    Purpose: Generic translation from XML to RDF
+*/
+
+:- module(xmlrdf,
+	  [ load_xml_as_rdf/2		% +Input, +Options
+	  ]).
+:- use_module(library(semweb/rdf_db)).
+:- use_module(library(semweb/rdfs)).
+:- use_module(library(semweb/rdf_turtle)).
+:- use_module(library(semweb/rdf_turtle_write)).
+:- use_module(library(http/http_open)).
+:- use_module(library(sgml)).
+:- use_module(library(uri)).
+:- use_module(library(option)).
+:- use_module(library(debug)).
+:- use_module(library(xsdp_types)).
+:- use_module(library(record)).
+:- use_module(library(apply)).
+
+:- rdf_register_ns(map, 'http://cs.vu.nl/eculture/map/').
+
+:- record
+	option(units:list=[],
+	       dialect:oneof([xml,xmlns])=xmlns,
+	       graph:atom=data,
+	       prefix:atom=(-),
+	       class_style:oneof(['OneTwo','oneTwo',
+				  'one_two','One_Two',keep])='OneTwo',
+	       predicate_style:oneof(['OneTwo','oneTwo',
+				      'one_two','One_Two',keep])='oneTwo').
+
+%%	load_xml_as_rdf(From, Options)
+%
+%	Convert an XML file into `crude' RDF. From is either a filename,
+%	a  URL  (using  either  =file=  or  =http=  scheme)  or  a  term
+%	stream(Stream). Options is a list of the following options:
+%
+%	    * unit(+Elements)
+%	    If provided, consider elements whose name match one of
+%	    the members of the list Elements a toplevel structure
+%	    and process the file one element at a time.  If there
+%	    is just one toplevel structure, this may be passed without
+%	    using a list.
+%
+%	    * dialect(Dialect)
+%	    One of =xml= or =xmlns=.  Use =xmlns= if the file contains
+%	    xmlns= attributes and XML names of the form ns:local.  If
+%	    neither is present, the file must be processed using the
+%	    =xml= dialect.
+%
+%	    * graph(+Graph)
+%	    RDF Graph for storing the output.  Default is =data=
+%
+%	    * prefix(+Prefix)
+%	    Create a URI from an XML name by putting Prefix in front
+%	    of it.  If we are processing _xmlns_ (see dialect), the
+%	    XML namespace declaration is ignored.  I.e., the URI is
+%	    formed from Prefix followed by the XML local name.
+%
+%	    If this option is not present the URI is simply the XML
+%	    name if the dialect is =xml= or the XML namespace followed
+%	    by the local name if the dialect is =xmlns=.
+%
+%	    * predicate_style(+Style)
+%	    Change the `identifier style' for RDF predicates creates
+%	    from XML names.  The default is 'oneTwo'.  Other values are
+%	    'OneTwo', 'one_two' or 'One_Two'.  The value =keep= uses
+%	    the XML name directly as RDF name.  This is valid, but
+%	    often leads to names that cannot be written using the Turtle
+%	    and SPARQL shorthand notation.
+%
+%	    * class_style(+Style)
+%	    Same as predicate_style, but used when generating a
+%	    class-name.  The default is 'OneTwo'.
+
+load_xml_as_rdf(From, Options) :-
+	canonical_unit_option(Options, COptions),
+	make_option(COptions, Record, _Rest),
+	flush_name_uri_cache,
+	flush_property_map,
+	rdf_statistics(triples(C0)),
+	statistics(cputime, T0),
+	setup_call_cleanup(open_input(From, In, Cleanup),
+			   process(In, Record),
+			   Cleanup),
+	statistics(cputime, T1),
+	rdf_statistics(triples(C1)),
+	T is T1-T0,
+	C is C1-C0,
+	print_message(informational, xmlrdf(loaded(From, T, C))).
+
+canonical_unit_option(Options, COptions) :-
+	select_option(unit(Unit), Options, Rest), !,
+	to_list(Unit, Units),
+	COptions = [units(Units)|Rest].
+canonical_unit_option(Options, Options).
+
+to_list(List, List) :-
+	is_list(List), !.
+to_list(Elem, [Elem]).
+
+%%	open_input(+Spec, -Stream, -Close)
+%
+%	Open the input Spec, returning  Stream   and  a closure Close to
+%	revert the side-effects.
+
+open_input(stream(In), In, true) :- !.
+open_input(URL, In, Cleanup) :-
+	atom(URL),
+	uri_file_name(URL, File), !,
+	open(File, read, In, [type(binary)]),
+	Cleanup = close(In).
+open_input(URL, In, Cleanup) :-
+	atom(URL),
+	uri_components(URL, Data),
+	uri_data(scheme, Data, http), !,
+	http_open(URL, In, []),
+	set_stream(In, file_name(URL)),
+	Cleanup = close(In).
+open_input(Spec, In, Cleanup) :-
+	absolute_file_name(Spec, Path, [access(read)]),
+	open(Path, read, In, [type(binary)]),
+	Cleanup = close(In).
+
+%%	process(+Stream, +Options)
+
+process(Stream, Options) :-
+	option_units(Options, Units),
+	Units \== [], !,
+	b_setval(xmlrdf_unit, Units),
+	b_setval(xmlrdf_options, Options),
+	setup_call_cleanup(new_sgml_parser(Parser, []),
+			   (   configure_parser(Parser, Options),
+			       sgml_parse(Parser,
+					  [ source(Stream),
+					    call(begin, on_begin)
+					  ])
+			   ),
+			   free_sgml_parser(Parser)).
+process(Stream, Options) :-
+	setup_call_cleanup(new_sgml_parser(Parser, []),
+			   (   configure_parser(Parser, Options),
+			       sgml_parse(Parser,
+					  [ source(Stream),
+					    document(Document)
+					  ])
+			   ),
+			   free_sgml_parser(Parser)),
+	Document = [Element],
+	convert(Element, Options).
+
+
+configure_parser(Parser, Options) :-
+	option_dialect(Options, Dialect),
+	set_sgml_parser(Parser, dialect(Dialect)),
+	set_sgml_parser(Parser, space(sgml)).
+
+
+on_begin(Element, Attr, Parser) :-
+	b_getval(xmlrdf_unit, Elements),
+	memberchk(Element, Elements), !,
+	b_getval(xmlrdf_options, Options),
+	sgml_parse(Parser,
+		   [ document(Content),
+		     parse(content)
+		   ]),
+	convert(element(Element, Attr, Content), Options).
+
+
+		 /*******************************
+		 *	  RDF CONVERSION	*
+		 *******************************/
+
+%%	convert(+Element, +Options) is det.
+
+convert(Element, Options) :-
+	option_graph(Options, Graph),
+	element_uri(Element, URI),
+	element_type(Element, Type, Options),
+	rdf_assert(URI, rdf:type, Type, Graph),
+	set_properties(URI, Element, Options).
+
+element_uri(_Element, URI) :-
+	rdf_bnode(URI).
+
+element_type(element(Name, _, _), Class, _) :-
+	rdf(Class, map:xmlname, literal(Name)),
+	rdfs_individual_of(Class, rdfs:'Class'), !.
+element_type(element(EName, _, _), Name, Options) :-
+	name_to_uri(EName, class, Name, Options),
+	debug(xmlrdf(type), 'No element type for element-name ~p', [Name]).
+
+
+		 /*******************************
+		 *	     PROPERTIES		*
+		 *******************************/
+
+%%	set_properties(+URI, +Element, +Options) is det.
+
+set_properties(URL, element(_, Attrs, Content), Options) :-
+	set_properties(URL, element(_, Attrs, Content), -, Options).
+
+set_properties(URL, element(_, Attrs, Content), Lang, Options) :-
+	setp_from_attributes(Attrs, URL, Lang, Lang1, Options),
+	setp_from_content(Content, URL, Lang1, Options).
+
+setp_from_attributes([], _, Lang, Lang, _).
+setp_from_attributes([xmlns:_=_|T], URL, Lang0, Lang, Options) :- !,
+	setp_from_attributes(T, URL, Lang0, Lang, Options).
+setp_from_attributes([xmlns=_|T], URL, Lang0, Lang, Options) :- !,
+	setp_from_attributes(T, URL, Lang0, Lang, Options).
+setp_from_attributes([xml:_=_|T], URL, Lang0, Lang, Options) :- !,
+	setp_from_attributes(T, URL, Lang0, Lang, Options).
+setp_from_attributes([AttName=Value|T], URL, Lang0, Lang, Options) :-
+	map_literal_property(URL, AttName, Prop, Options),
+	option_graph(Options, Graph),
+	(   Lang0 == (-)
+	->  rdf_assert(URL, Prop, literal(Value), Graph)
+	;   rdf_assert(URL, Prop, literal(lang(Lang0, Value)), Graph)
+	),
+	setp_from_attributes(T, URL, Lang0, Lang, Options).
+
+
+%%	setp_from_content(+Content, +URL, +Lang, +Options) is det.
+%
+%	Create  attributes  for  URL  from  the  given  content.  If  we
+%	encounter CDATA, this is typically  from   an  element  that has
+%	attributes. We use rdf:value for the property in this case.
+
+setp_from_content([], _, _, _).
+setp_from_content([element(Name, Attrs0, Content)|T], URL, Lang, Options) :- !,
+	exclude(xmlns_property, Attrs0, Attrs),
+	setp_from_content_element(element(Name, Attrs, Content), URL,
+				  Lang, Options),
+	setp_from_content(T, URL, Lang, Options).
+setp_from_content([Text|T], URL, Lang, Options) :-
+	make_literal_value(Lang, Text, Value),
+	option_graph(Options, Graph),
+	rdf_assert(URL, rdf:value, Value, Graph),
+	setp_from_content(T, URL, Lang, Options).
+
+
+xmlns_property(xmlns=_) :- !.
+xmlns_property(xmlns:_=_) :- !.
+xmlns_property(P=_) :-
+	atom(P),
+	sub_atom(P, 0, _, _, 'xmlns:'), !.
+
+
+%%	setp_from_content_element(+Element, +URL, +Lang, +Options) is det.
+%
+%	Create a property for URL from the   XML element Element. If the
+%	property is mapped, we know the property and target datatype. If
+%	not, we must decide whether to go for  a literal or an bnode. If
+%	all data can be expressed as  a   literal,  we use a literal and
+%	else we create a bnode.  We  can   only  express  the  data as a
+%	literal if it  has  no  attributes   or  the  only  attribute is
+%	xml:lang.
+
+setp_from_content_element(element(EName, AL, CL), URL, Lang, Options) :-
+	mapped_property(URL, EName, Prop, Type, Options), !,
+	debug(xmlrdf(pmap), '~p ~p', [URL, Prop]),
+	make_value(EName, AL, CL, Type, Value, Lang, Options),
+	option_graph(Options, Graph),
+	rdf_assert(URL, Prop, Value, Graph).
+setp_from_content_element(element(EName, [], [Text]), URL, Lang, Options) :-
+	atom(Text), !,
+	name_to_uri(EName, predicate, Prop, Options),
+	make_literal_value(Lang, Text, Value),
+	option_graph(Options, Graph),
+	rdf_assert(URL, Prop, Value, Graph).
+setp_from_content_element(element(EName, [], []), URL, _, Options) :- !,
+	name_to_uri(EName, predicate, Prop, Options),
+	option_graph(Options, Graph),
+	rdf_assert(URL, Prop, literal(''), Graph).
+setp_from_content_element(element(EName, [xml:lang=Lang], [Text]),
+			  URL, _, Options) :-
+	atom(Text), !,
+	name_to_uri(EName, predicate, Prop, Options),
+	make_literal_value(Lang, Text, Value),
+	option_graph(Options, Graph),
+	rdf_assert(URL, Prop, Value, Graph).
+setp_from_content_element(element(EName, Attrs0, Content),
+			  URL, Lang, Options) :-
+	name_to_uri(EName, predicate, Prop, Options),
+	name_to_uri(EName, class, Type, Options),
+	(   select(xml:lang=Lang1, Attrs0, Attrs)
+	->  true
+	;   Lang1 = Lang,
+	    Attrs = Attrs0
+	),
+	make_value(EName, Attrs, Content, Type, Value, Lang1, Options),
+	option_graph(Options, Graph),
+	rdf_assert(URL, Prop, Value, Graph).
+
+make_literal_value(-,    Text, literal(Text)) :- !.
+make_literal_value(Lang, Text, literal(lang(Lang, Text))).
+
+
+%%	make_value(+Element, +Attributes, +Content, +Type, -Value,
+%%		   +Lang, +Options)
+
+make_value(_, Atts, Content, Literal, literal(Value), Lang, _) :-
+	rdf_equal(rdfs:'Literal', Literal),
+	(   Content = [Text],
+	    atom(Text)
+	->  true
+	;   Content == []
+	->  Text = ''
+	), !,
+	(   memberchk(xml:lang=TheLang, Atts)
+	->  Value = lang(TheLang, Text)
+	;   Lang = (-)
+	->  Value = Text
+	;   Value = lang(Lang, Text)
+	).
+make_value(_, _, [Text], Type, literal(type(Type, Text)), _, _) :-
+	atom(Text),
+	datatype_type(Type), !.
+make_value(_, _, [], Type, literal(type(Type, '')), _, _) :-
+	datatype_type(Type), !.
+make_value(_, Attrs, Content, Type, literal(type(XMLLit, Content)), _, _) :-
+	rdf_equal(rdf:'XMLLiteral', XMLLit),
+	maplist(xml_attribute, Attrs),
+	(   Type = XMLLit
+	;   rdf_equal(rdfs:'Literal', Type)
+	), !.
+make_value(Element, Attrs, Content, Type, ValueURI, Lang, Options) :-
+	rdf_equal(rdf:'XMLLiteral', XMLLit),
+	(   Type = XMLLit
+	;   rdf_equal(rdfs:'Literal', Type)
+	), !,
+	element_uri(element(Element, Attrs, Content), ValueURI),
+	option_graph(Options, Graph),
+	rdf_assert(ValueURI, rdf:type, Type, Graph),
+	setp_from_attributes(Attrs, ValueURI, Lang, _Lang1, Options),
+	rdf_assert(ValueURI, rdf:value, literal(type(XMLLit, Content)),Graph).
+make_value(_, [], [URL], Type, URL, _, _) :-
+	atom(URL),
+	rdfs_subclass_of(Type, rdfs:'Resource'), !.
+make_value(Element, Attrs, Content, Type, ValueURI, Lang, Options) :-
+	element_uri(element(Element, Attrs, Content), ValueURI),
+	option_graph(Options, Graph),
+	rdf_assert(ValueURI, rdf:type, Type, Graph),
+	setp_from_attributes(Attrs, ValueURI, Lang, Lang1, Options),
+	setp_from_content(Content, ValueURI, Lang1, Options).
+
+datatype_type(URI) :-
+	xsdp_uri_type(URI, _Type).
+
+xml_attribute(xml:_=_).
+xml_attribute(xmlns:_=_).
+
+
+		 /*******************************
+		 *	       MAP		*
+		 *******************************/
+
+%%	map_literal_property(+Subject, +XMLName, -RDFProperty, +Options)
+%%		is det.
+%
+%	RDFProperty is a URI for the   property  indicated in the source
+%	with XMLName. The property will be  added to Subject. Subject is
+%	guaranteed to have an rdf:type when this predicate is called.
+
+:- thread_local
+	literal_property_map/3,
+	property_map/5.
+
+flush_property_map :-
+	retractall(literal_property_map(_,_,_)),
+	retractall(property_map(_,_,_,_,_)).
+
+map_literal_property(Subject, XMLName, RDFProperty, Options) :-
+	rdf(Subject, rdf:type, Class),
+	(   literal_property_map(XMLName, Class, RDFProperty)
+	->  true
+	;   map_lprop_class(Subject, XMLName, RDFProperty, Options),
+	    assert(literal_property_map(XMLName, Class, RDFProperty))
+	).
+
+map_lprop_class(Subject, XMLName, RDFProperty, Options) :-
+	mapped_property(Subject, XMLName, RDFProperty, _Type, Options), !.
+map_lprop_class(_Subject, XMLName, Prop, Options) :-
+	name_to_uri(XMLName, predicate, Prop, Options),
+	rdf_equal(rdfs:'Literal', Literal),
+	update_schema(XMLName, Prop, Literal).
+
+update_schema(XMLName, Prop, Type) :-
+	name_to_atom(XMLName, Atom),
+	(   rdfs_individual_of(Prop, rdf:'Property')
+	->  true
+	;   rdf_assert(Prop, rdf:type, rdf:'Property', schema)
+	),
+	(   rdf_has(Prop, rdfs:range, _Range)
+	->  true
+	;   rdf_assert(Prop, rdfs:range, Type, schema)
+	),
+	(   rdf_has(Prop, map:xmlname, _)
+	->  true
+	;   rdf_assert(Prop, map:xmlname, literal(Atom), schema)
+	).
+
+%%	mapped_property(+Subject, +XMLName,
+%%		  	-RDFProperty, -RDFType, +Options) is semidet.
+%
+%	True if XMLName is mapped  to   RDFProperty  with  the given RDF
+%	type. There are three ways  to  decide   that  we  deal  with an
+%	established mapping:
+%
+%	    1. There is a property with map:xmlname with a literal value
+%	    that matches the XML Name.  Namespace qualified names are
+%	    written as <prefix><local>
+%
+%	    2. There is a property with map:xmlname that matches the
+%	    default translation.
+%
+%	    3. There is a property with a URI that matches the default
+%	    translation that has a type.
+
+mapped_property(Subject, XMLName, Prop, Type, Options) :-
+	rdf(Subject, rdf:type, Class),
+	(   property_map(XMLName, Class, Prop, Type, Mapped)
+	->  Mapped == true
+	;   mapped_property_nc(Subject, XMLName, Prop, Type, Options)
+	->  assert(property_map(XMLName, Class, Prop, Type, true))
+	;   assert(property_map(XMLName, Class, _, _, false)),
+	    fail
+	).
+
+mapped_property_nc(_Subject, XMLName, Prop, Type, Options) :-
+	name_to_uri(XMLName, predicate, Prop0, Options),
+	(   (   name_to_atom(XMLName, Atom),
+		rdf(Prop, map:xmlname, literal(Atom))
+	    ;	rdf(Prop, map:xmlname, Prop0)
+	    ),
+	    rdfs_individual_of(Prop, rdf:'Property')
+	->  (   rdf(Prop, rdfs:range, Type)
+	    ->  true
+	    ;   rdf_equal(rdfs:'Literal', Type)
+	    )
+	;   Prop = Prop0,
+	    rdf(Prop0, rdfs:range, Type)
+	->  true
+	).
+
+name_to_atom(Prefix:Local, Name) :-
+	atom_concat(Prefix, Local, Name).
+name_to_atom(Name, Name).
+
+
+		 /*******************************
+		 *	       UTIL		*
+		 *******************************/
+
+%%	name_to_uri(+XMLName, +Type, -URI, +Options) is det.
+%
+%	@param	XMLName is an atom for dialect =xml= or a term
+%		Prefix:Local when using the =xmlns= dialect.
+%	@param	Type is one of =class= or =predicate=
+%	@URI	Is an RDF resource (atom)
+
+:- thread_local
+	name_uri_cache/4.
+
+flush_name_uri_cache :-
+	retractall(name_uri_cache(_,_,_,_)).
+
+name_to_uri(NS:Local, Type, URI, Options) :- !,
+	(   name_uri_cache(Local, NS:Local, Type, URI)
+	->  true
+	;   name_to_uri_nc(NS:Local, Type, URI, Options),
+	    assert(name_uri_cache(Local, NS:Local, Type, URI))
+	).
+name_to_uri(Name, Type, URI, Options) :-
+	(   name_uri_cache(Name, Name, Type, URI)
+	->  true
+	;   name_to_uri_nc(Name, Type, URI, Options),
+	    assert(name_uri_cache(Name, Name, Type, URI))
+	).
+
+name_to_uri_nc(NS:Local, Type, URI, Options) :- !,
+	restyle(Type, Local, Local1, Options),
+	(   option_prefix(Options, Prefix),
+	    Prefix \== (-)
+	->  atom_concat(Prefix, Local1, URI)
+	;   atom_concat(NS, Local1, URI)
+	).
+name_to_uri_nc(Name, Type, URI, Options) :-
+	restyle(Type, Name, Name1, Options),
+	(   option_prefix(Options, Prefix),
+	    Prefix \== (-)
+	->  atom_concat(Prefix, Name1, URI)
+	;   URI = Name1
+	).
+
+restyle(predicate, Name0, Name, Options) :-
+	option_predicate_style(Options, Style),
+	(   Style == keep
+	->  Name = Name0
+	;   restyle_identifier(Style, Name0, Name)
+	).
+restyle(class, Name0, Name, Options) :-
+	option_class_style(Options, Style),
+	(   Style == keep
+	->  Name = Name0
+	;   restyle_identifier(Style, Name0, Name)
+	).
+
+
+		 /*******************************
+		 *	    IDENTIFIERS		*
+		 *******************************/
+
+%%	restyle_identifier(+Style, +In, -Out) is det.
+%
+%	Restyle an identifier by extracting the alnum substrings and
+%	joining them together according to Style.
+%
+%	@param Style is described with join_name_parts/3.
+
+restyle_identifier(Style, In, Out) :-
+	name_parts(In, Parts),
+	join_name_parts(Style, Parts, Out).
+
+
+%%	name_parts(+Identifier, -Parts) is det.
+%
+%	Parts is a list of atoms  that   make  up  Identifier. The parts
+%	found are turned into lowercase, unless   all its characters are
+%	uppercase.  E.g.,
+%
+%	==
+%	?- name_parts('sourceCodeURI', X).
+%	X = [source, code, 'URI'].
+%	==
+
+name_parts(Name, Parts) :-
+	atom_codes(Name, Codes),
+	phrase(name_parts(Parts), Codes).
+
+name_parts([H|T]) -->
+	name_part(H), !,
+	name_parts(T).
+name_parts([]) --> [].
+
+name_part(H) -->
+	string(Codes, Tail),
+	sep(Tail), !,
+	{ Codes = [_|_],
+	  atom_codes(H0, Codes),
+	  (   maplist(is_upper, Codes)
+	  ->  H = H0
+	  ;   downcase_atom(H0, H)
+	  )
+	}.
+
+string(T,T) --> [].
+string([H|T], L) --> [H], string(T, L).
+
+sep([]) --> sep_char, !, sep_chars.
+sep([T]), [N] -->
+	[T,N],
+	{ code_type(T, lower),
+	  code_type(N, upper)
+	}.
+sep([],[],[]).
+
+sep_char -->
+	[H],
+	{ \+ code_type(H, alnum) }.
+
+sep_chars --> sep_char, !, sep_chars.
+sep_chars --> [].
+
+%%	join_name_parts(+Style, +Parts, -Identifier)
+%
+%	Join parts of an identifier according to Style.  Style is
+%	one of:
+%
+%	    * 'OneTwo'
+%	    * oneTwo
+%	    * one_two
+%	    * 'One_Two'
+
+join_name_parts(Style, [First|Parts], Identifier) :-
+	style(Style, CapFirst, CapRest, Sep),
+	capitalise(CapFirst, First, H),
+	maplist(capitalise(CapRest), Parts, T),
+	atomic_list_concat([H|T], Sep, Identifier).
+
+style('OneTwo',	 true,	true,  '').
+style(oneTwo,	 false,	true,  '').
+style(one_two,	 false,	false, '_').
+style('One_Two', true,	true,  '_').
+
+capitalise(false, X, X) :- !.
+capitalise(true, X, Y) :-
+	atom_codes(X, [H0|T]),
+	code_type(H0, to_lower(H)),
+	atom_codes(Y, [H|T]).
+
+
+		 /*******************************
+		 *	      MESSAGES		*
+		 *******************************/
+
+:- multifile
+	prolog:message//1.
+
+prolog:message(xmlrdf(loaded(From, Time, Count))) -->
+	[ 'Loaded ~D triples in ~3f seconds from ~p'-[Count, Time, From] ].
diff --git a/rdf/cpack/xmlrdf.ttl b/rdf/cpack/xmlrdf.ttl
index a640ba5..ffcbb44 100644
--- a/rdf/cpack/xmlrdf.ttl
+++ b/rdf/cpack/xmlrdf.ttl
@@ -11,7 +11,7 @@
 # this.  Otherwise you can specify the information inline as done below.
 # See http://xmlns.com/foaf/spec/ for defines fields.
 
-<> a cpack:Package ;
+<> a cpack:Library ;
 	cpack:packageName "xmlrdf" ;
 	dcterms:title "XML to RDF conversion" ;
 	cpack:author [ a foaf:Person ;
@@ -24,7 +24,17 @@
 	    ] ;
 	cpack:description
 
-"""The package description goes here.  You can use PlDoc Wiki markup.
+"""The XML to RDF convertor allows for importing XML as RDF using two steps:
+
+    1. A generic translation from XML to RDF.  This generally needs no
+       configuration but it allows for a few parameters:
+
+	- Define the primary record element
+	- Define the target namespace
+	- Define elements that must be kept as XMLLiteral
+
+    2. Preform a rewrite on the generated graph using a Prolog based
+       rewrite language.
 """ .