- Reference manual
- SWI-Prolog extensions
- Reference manual
- Skip all characters that match PadChars
- Read up to a character that matches SepChars or end of file
- Discard trailing characters that match PadChars from the collected input
- Unify String with a string created from the input and Sep with the code of the separator character read. If input was terminated by the end of the input, Sep is unified with -1.
The predicate read_string/5 called repeatedly on an input until Sep is -1 (end of file) is equivalent to reading the entire file into a string and calling split_string/4, provided that SepChars and PadChars are not partially overlapping.156Behaviour that is fully compatible would require unlimited look-ahead. Below are some examples:
Read a line:
read_string(Input, "\n", "\r", Sep, String)
Read a line, stripping leading and trailing white space:
read_string(Input, "\n", "\r\t ", Sep, String)
Read up to‘
unifying Sep with
0', i.e. Unicode 44, or
i.e. Unicode 41:
read_string(Input, ",)", "\t ", Sep, String)
Prolog defines two forms of quoted text. Traditionally, single quoted text is mapped to atoms while double quoted text is mapped to a list of character codes (integers) or characters (atoms of length 1). Representing text using atoms is often considered inadequate for several reasons:
- It hides the conceptual difference between text and program symbols.
Where content of text often matters because it is used in I/O, program
symbols are merely identifiers that match with the same symbol
elsewhere. Program symbols can often be consistently replaced, for
example to obfuscate or compact a program.
- Atoms are globally unique identifiers. They are stored in a shared
table. Volatile strings represented as atoms come at a significant price
due to the required cooperation between threads for creating atoms.
Reclaiming temporary atoms using Atom garbage collection is a
costly process that requires significant synchronisation.
- Many Prolog systems (not SWI-Prolog) put severe restrictions on the length of atoms or the maximum number of atoms.
Representing text as lists, be it of character codes or characters, also comes at a price:
- It is not possible to distinguish (at runtime) a list of integers or
atoms from a string. Sometimes this information can be derived from
(implicit) typing. In other cases the list must be embedded in a
compound term to distinguish the two types. For example,
s("hello world")could be used to indicate that we are dealing with a string.
Lacking runtime information, debuggers and the toplevel can only use heuristics to decide whether to print a list of integers as such or as a string (see