• Places
    • Home
    • Graphs
    • Prefixes
  • Admin
    • Users
    • Settings
    • Plugins
    • Statistics
  • CPACK
    • Home
    • List packs
    • Submit pack
  • Repository
    • Load local file
    • Load from HTTP
    • Load from library
    • Remove triples
    • Clear repository
  • Query
    • YASGUI SPARQL Editor
    • Simple Form
    • SWISH Prolog shell
  • Help
    • Documentation
    • Tutorial
    • Roadmap
    • HTTP Services
  • Login

2 Porter Stem -- Determine stem and related routines
All Application Manual Name SummaryHelp

  • Documentation
    • Reference manual
    • Packages
      • SWI-Prolog Natural Language Processing Primitives
        • Porter Stem -- Determine stem and related routines
          • porter_stem/2
          • unaccent_atom/2
          • tokenize_atom/2
          • atom_to_stem_list/2
          • Origin and Copyright
Availability::- use_module(library(porter_stem)).(can be autoloaded)
tokenize_atom(+In, -TokenList)
Break the text In into words, numbers and punctuation characters. Tokens are created to the following rules:

[-+][0-9]+(\.[0-9]+)?([eE][-+][0-9]+)? number
[:alpha:][:alnum:]+ word
[:space:]+ skipped
anything elsesingle-character

Character classification is based on the C-library iswalnum() etc. functions. Recognised numbers are passed to Prolog read/1, supporting unbounded integers.

It is likely that future versions of this library will provide tokenize_atom/3 with additional options to modify space handling as well as the definition of words.

ClioPatria (version V3.1.1-51-ga0b30a5)