- Skip all characters that match PadChars
- Read up to a character that matches SepChars or end of file
- Discard trailing characters that match PadChars from the collected input
- Unify String with a string created from the input and Sep with the code of the separator character read. If input was terminated by the end of the input, Sep is unified with -1.
The predicate read_string/5 called repeatedly on an input until Sep is -1 (end of file) is equivalent to reading the entire file into a string and calling split_string/4, provided that SepChars and PadChars are not partially overlapping.156Behaviour that is fully compatible would require unlimited look-ahead. Below are some examples:
Read a line:
read_string(Input, "\n", "\r", Sep, String)
Read a line, stripping leading and trailing white space:
read_string(Input, "\n", "\r\t ", Sep, String)
Read up to‘,
’or‘)
’,
unifying Sep with 0',
i.e. Unicode 44, or 0')
,
i.e. Unicode 41:
read_string(Input, ",)", "\t ", Sep, String)
5.2.3 Why has the representation of double quoted text changed?
Prolog defines two forms of quoted text. Traditionally, single quoted text is mapped to atoms while double quoted text is mapped to a list of character codes (integers) or characters (atoms of length 1). Representing text using atoms is often considered inadequate for several reasons:
- It hides the conceptual difference between text and program symbols.
Where content of text often matters because it is used in I/O, program
symbols are merely identifiers that match with the same symbol
elsewhere. Program symbols can often be consistently replaced, for
example to obfuscate or compact a program.
- Atoms are globally unique identifiers. They are stored in a shared
table. Volatile strings represented as atoms come at a significant price
due to the required cooperation between threads for creating atoms.
Reclaiming temporary atoms using Atom garbage collection is a
costly process that requires significant synchronisation.
- Many Prolog systems (not SWI-Prolog) put severe restrictions on the length of atoms or the maximum number of atoms.
Representing text as lists, be it of character codes or characters, also comes at a price:
- It is not possible to distinguish (at runtime) a list of integers or
atoms from a string. Sometimes this information can be derived from
(implicit) typing. In other cases the list must be embedded in a
compound term to distinguish the two types. For example,
s("hello world")
could be used to indicate that we are dealing with a string.Lacking runtime information, debuggers and the toplevel can only use heuristics to decide whether to print a list of integers as such or as a string (see