SWI-Prolog -- Extracting a DTD

Documentation
- Reference manual
- Packages
  - SWI-Prolog SGML/XML parser
    - Predicate Reference

3.5 Extracting a DTD

Some documents have no DTD. One of the neat facilities of this library is that it builds a DTD while parsing a document with an implicit DTD. The resulting DTD contains all elements encountered in the document. For each element the content model is a disjunction of elements and possibly #PCDATA that can be repeated. Thus, if we found element y and CDATA in element x, the model is:

<!ELEMENT x - - (y|#PCDATA)*>

Any encountered attribute is added to the attribute list with the type CDATA and default #IMPLIED.

The example below extracts the elements used in an unknown XML document.

elements_in_xml_document(File, Elements) :-
        load_structure(File, _,
                       [ dialect(xml),
                         dtd(DTD)
                       ]),
        dtd_property(DTD, elements(Elements)),
        free_dtd(DTD).