2 Bluffer's Guide
This package allows you to parse SGML, XML and HTML data into a
Prolog data structure. The high-level interface defined in library(sgml)
provides access at the file-level, while the low-level interface defined
in the foreign module works with Prolog streams. Please use the source
of sgml.pl as a starting point for dealing with data from
other sources than files, such as SWI-Prolog resources, network-sockets,
character strings, etc. The first example below loads an HTML
file.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> <html> <head> <title>Demo</title> </head> <body> <h1 align=center>This is a demo</title> Paragraphs in HTML need not be closed. This is called `omitted-tag' handling. </body> </html>
?- load_html('test.html', Term, []),
print_term(Term, []).
[ element(html,
[],
[ element(head,
[],
[ element(title,
[],
[ 'Demo'
])
]),
element(body,
[],
[ '\n',
element(h1,
[ align = center
],
[ 'This is a demo'
]),
'\n\n',
element(p,
[],
[ 'Paragraphs in HTML need not be closed.\n'
]),
element(p,
[],
[ 'This is called `omitted-tag\' handling.'
])
])
])
].
The document is represented as a list, each element being an atom to
represent CDATA or a term element(Name, Attributes,
Content). Entities (e.g. <) are expanded and
included in the atom representing the element content or attribute
value.1Up to SWI-Prolog 5.4.x,
Prolog could not represent wide characters and entities that
did not fit in the Prolog characters set were emitted as a term number(+Code).
With the introduction of wide characters in the 5.5 branch this is no
longer needed.
2.1 ‘Goodies’Predicates
These predicates are for basic use of the library, converting entire and self-contained files in SGML, HTML, or XML into a structured term. They are based on load_structure/3.
- load_sgml(+Source, -ListOfContent, :Options)
- Calls load_structure/3
with the given Options, using the default option
dialect(sgml) - load_xml(+Source, -ListOfContent, :Options)
- Calls load_structure/3
with the given Options, using the default option
dialect(xml) - load_html(+Source, -ListOfContent, :Options)
- Calls load_structure/3
with the given Options, using the default options
dialect(HTMLDialect), where HTMLDialect ishtml4orhtml5(default), depending on the Prolog flaghtml_dialect. Both imply the optionshorttag(false). The optiondtd(DTD)is passed, where DTD is the HTML DTD as obtained usingdtd(html, DTD). See dtd/2.