- stable
- Unicode Versioning Stability has to be respected.
- compat
- Compatiblity decomposition (i.e. formatting information is lost)
- compose
- Return a result with composed characters.
- decompose
- Return a result with decomposed characters.
- ignore
- Strip "default ignorable characters"
- rejectna
- Return an error, if the input contains unassigned code points.
- nlf2ls
- Indicating that NLF-sequences (LF, CRLF, CR, NEL) are representing a line break, and should be converted to the unicode character for line separation (LS).
- nlf2ps
- Indicating that NLF-sequences are representing a paragraph break, and should be converted to the unicode character for paragraph separation (PS).
- nlf2lf
- Indicating that the meaning of NLF-sequences is unknown.
- stripcc
- Strips and/or convers control characters. NLF-sequences are transformed into space, except if one of the NLF2LS/PS/LF options is given. HorizontalTab (HT) and FormFeed (FF) are treated as a NLF-sequence in this case. All other control characters are simply removed.
- casefold
- Performs unicode case folding, to be able to do a case-insensitive string comparison.
- charbound
- Inserts 0xFF bytes at the beginning of each sequence which is representing a single grapheme cluster (see UAX#29).
- lump
- (e.g. HYPHEN U+2010 and MINUS U+2212 to ASCII "-"). (See module header for details.) If NLF2LF is set, this includes a transformation of paragraph and line separators to ASCII line-feed (LF).
- stripmark
- Strips all character markings (non-spacing, spacing and enclosing) (i.e.
accents) NOTE: this option works only with
compose
ordecompose
.