The DTD (Document Type Definition) is a separate entity in sgml2pl, that can be created, freed, defined and inspected. Like the parser itself, it is filled by opening it as a Prolog output stream and sending data to it. This section summarises the predicates for handling the DTD.
- new_dtd(+DocType, -DTD)
- Creates an empty DTD for the named DocType. The returned DTD-reference is an opaque term that can be used in the other predicates of this package.
- Deallocate all resources associated to the DTD. Further use of DTD is invalid.
- load_dtd(+DTD, +File)
- Define the DTD by loading the SGML-DTD file File. Same as load_dtd/3 with empty option list.
- load_dtd(+DTD, +File, +Options)
- Define the DTD by loading File. Defined options are the
dialectoption from open_dtd/3 and the
encodingoption from open/4. Notably the
dialectoption must match the dialect used for subsequent parsing using this DTD.
- open_dtd(+DTD, +Options, -OutStream)
- Open a DTD as an output stream. See load_dtd/2
for an example. Defined options are:
- Define the DTD dialect. Default is
xmlnsprocesses the DTD case-sensitive.
- dtd(+DocType, -DTD)
- Find the DTD representing the indicated doctype. This predicate
uses a cache of DTD objects. If a doctype has no associated dtd, it
searches for a file using the file search path
dtdusing the call:
..., absolute_file_name(dtd(Type), [ extensions([dtd]), access(read) ], DtdFile), ...
Note that DTD objects may be modified while processing errornous documents. For example, loading an SGML document starting with
<?xml ...?>switches the DTD to XML mode and encountering unknown elements adds these elements to the DTD object. Re-using a DTD object to parse multiple documents should be restricted to situations where the documents processed are known to be error-free.
htmlis handled separately. The Prolog flag
html_dialectspecifies the default html dialect, which is either
html5(default).3Note that HTML5 has no DTD. The loaded DTD is an informal DTD that includes most of the HTML5 extensions (http://www.cs.tut.fi/~jkorpela/html5-dtd.html). In addition, the parser sets the
dialectflag of the DTD object. This is used by the parser to accept HTML extensions. Next, the corresponding DTD is loaded.
- dtd_property(+DTD, ?Property)
- This predicate is used to examine the content of a DTD. Property is one
- An atom representing the document-type defined by this DTD.
- A list of atoms representing the names of the elements in this DTD.
- element(Name, Omit, Content)
- The DTD contains an element with the given name. Omit is a
term of the format
omit(OmitOpen, OmitClose), where both arguments are booleans (
falserepresenting whether the open- or close-tag may be omitted. Content is the content-model of the element represented as a Prolog term. This term takes the following form:
- The element has no content.
- The element contains non-parsed character data. All data up to the matching end-tag is included in the data (declared content).
cdata, but entity-references are expanded.
- The element may contain any number of any element from the DTD in any order.
- The element contains parsed character data .
- n element with this name.
- 0 or more appearances.
- 0 or one appearance.
- 1 or more appearances.
- SubModel1 followed by SubModel2.
- &(SubModel1, SubModel2)
- SubModel1 and SubModel2 in any order.
- SubModel1 or SubModel2.
- attributes(Element, ListOfAttributes)
- ListOfAttributes is a list of atoms representing the attributes of the element Element.
- attribute(Element, Attribute, Type, Default)
- Query an element. Type is one of
nutoken. For DTD types that allow for a list, the notation
list(Type)is used. Finally, the DTD construct
(a|b|...)is mapped to the term
Default describes the sgml default. It is one
implied. If a real default is present, it is one of
- ListOfEntities is a list of atoms representing the names of the defined entities.
- entity(Name, Value)
- Name is the name of an entity with given value. Value is one
- If the value is atomic, it represents the literal value of the entity.
- Url is the URL of the system external entity.
- public(Id, Url)
- For external public entities, Id is the identifier. If an URL is provided this is returned in Url. Otherwise this argument is unbound.
- Returns a list holding the names of all
- notation(Name, Decl)
- Unify Decl with a list if
As this parser allows for processing partial documents and process the DTD separately, the DOCTYPE declaration plays a special role.
If a document has no DOCTYPE declaraction, the parser returns a list holding all elements and CDATA found. If the document has a DOCTYPE declaraction, the parser will open the element defined in the DOCTYPE as soon as the first real data is encountered.