5.2.1 Representing text: strings, atoms and code lists
With the introduction of strings as a Prolog data type, there are three main ways to represent text: using strings, using atoms and using lists of character codes. As a fourth way, one may also use lists of chars. This section explains what to choose for what purpose. Both strings and atoms are atomic objects: you can only look inside them using dedicated predicates, while lists of character codes or chars are compound data structures forming an extended structure that must follow a convention.
- Lists of character codes
- is what you need if you want to parse text using Prolog grammar
rules (DCGs, see phrase/3).
Most of the text reading predicates (e.g.,
read_line_to_codes/2)
return a list of character codes because most applications need to parse
these lines before the data can be processed. As said above, the back-quoted
text notation (
`hello`
) can be used to easily specify a list of character codes. The0'c
notation can be used to specify a single character code. - Atoms
- are identifiers. They are typically used in cases where
identity comparison is the main operation and that are typically not
composed nor taken apart. Examples are RDF resources (URIs that identify
something), system identifiers (e.g.,
'Boeing 747'
), but also individual words in a natural language processing system. They are also used where other languages would use enumerated types, such as the names of days in the week. Unlike enumerated types, Prolog atoms do not form a fixed set and the same atom can represent different things in different contexts. - Strings
- typically represents text that is processed as a unit most of the time, but which is not an identifier for something. Format specifications for format/3 is a good example. Another example is a descriptive text provided in an application. Strings may be composed and decomposed using e.g., string_concat/3 and sub_string/5 or converted for parsing using string_codes/2 or created from codes generated by a generative grammar rule, also using string_codes/2.