3.5 Extracting a DTD
Some documents have no DTD. One of the neat facilities of this
library is that it builds a DTD while parsing a document with an
implicit DTD. The resulting DTD contains all elements
encountered in the document. For each element the content model is a
disjunction of elements and possibly #PCDATA
that can be
repeated. Thus, if we found element y
and CDATA in element
x
, the model is:
<!ELEMENT x - - (y|#PCDATA)*>
Any encountered attribute is added to the attribute list with the
type
CDATA
and default #IMPLIED
.
The example below extracts the elements used in an unknown XML document.
elements_in_xml_document(File, Elements) :- load_structure(File, _, [ dialect(xml), dtd(DTD) ]), dtd_property(DTD, elements(Elements)), free_dtd(DTD).