• Places
    • Home
    • Graphs
    • Prefixes
  • Admin
    • Users
    • Settings
    • Plugins
    • Statistics
  • CPACK
    • Home
    • List packs
    • Submit pack
  • Repository
    • Load local file
    • Load from HTTP
    • Load from library
    • Remove triples
    • Clear repository
  • Query
    • YASGUI SPARQL Editor
    • Simple Form
    • SWISH Prolog shell
  • Help
    • Documentation
    • Tutorial
    • Roadmap
    • HTTP Services
  • Login

/home/swipl/lib/swipl/library/isub.pl
AllApplicationManualNameSummaryHelp

  • swipl
    • library
      • error.pl -- Error generating support
      • apply.pl -- Apply predicates on a list
      • lists.pl -- List Manipulation
      • debug.pl -- Print debug messages and test assertions
      • broadcast.pl -- Event service
      • socket.pl -- Network socket (TCP and UDP) library
      • predicate_options.pl -- Access and analyse predicate options
      • shlib.pl -- Utility library for loading foreign objects (DLLs, shared objects)
      • option.pl -- Option list processing
      • uid.pl -- User and group management on Unix systems
      • unix.pl -- Unix specific operations
      • syslog.pl -- Unix syslog interface
      • thread_pool.pl -- Resource bounded thread management
      • gensym.pl -- Generate unique symbols
      • settings.pl -- Setting management
      • arithmetic.pl -- Extensible arithmetic
      • main.pl -- Provide entry point for scripts
      • readutil.pl -- Read utilities
      • ssl.pl -- Secure Socket Layer (SSL) library
      • crypto.pl -- Cryptography and authentication library
      • filesex.pl -- Extended operations on files
      • doc_http.pl -- Documentation server
      • pldoc.pl -- Process source documentation
      • operators.pl -- Manage operators
      • pairs.pl -- Operations on key-value lists
      • prolog_source.pl -- Examine Prolog source-files
      • sgml.pl -- SGML, XML and HTML parser
      • quasi_quotations.pl -- Define Quasi Quotation syntax
      • uri.pl -- Process URIs
      • url.pl -- Analysing and constructing URL
      • www_browser.pl -- Open a URL in the users browser
      • prolog_colour.pl -- Prolog syntax colouring support.
      • record.pl -- Access compound arguments by name
      • prolog_xref.pl -- Prolog cross-referencer data collection
      • occurs.pl -- Finding and counting sub-terms
      • ordsets.pl -- Ordered set manipulation
      • assoc.pl -- Binary associations
      • ugraphs.pl -- Graph manipulation library
      • memfile.pl
      • xpath.pl -- Select nodes in an XML DOM
      • iostream.pl -- Utilities to deal with streams
      • atom.pl -- Operations on atoms
      • porter_stem.pl
      • solution_sequences.pl -- Modify solution sequences
      • prolog_pack.pl -- A package manager for Prolog
      • prolog_config.pl -- Provide configuration information
      • process.pl -- Create processes and redirect I/O
      • git.pl -- Run GIT commands
      • ctypes.pl -- Character code classification
      • time.pl -- Time and alarm library
      • utf8.pl -- UTF-8 encoding/decoding on lists of character codes.
      • base64.pl -- Base64 encoding and decoding
      • sha.pl -- SHA secure hashes
      • crypt.pl
      • persistency.pl -- Provide persistent dynamic predicates
      • pure_input.pl -- Pure Input from files and streams
      • nb_set.pl -- Non-backtrackable sets
      • xsdp_types.pl -- XML-Schema primitive types
      • uuid.pl -- Universally Unique Identifier (UUID) Library
      • pcre.pl -- Perl compatible regular expression matching for SWI-Prolog
      • aggregate.pl -- Aggregation operators on backtrackable predicates
      • rdf_write.pl -- Write RDF/XML from a list of triples
      • rdf.pl -- RDF/XML parser
      • sgml_write.pl -- XML/SGML writer module
      • archive.pl -- Access several archive formats
      • csv.pl -- Process CSV (Comma-Separated Values) data
      • dialect.pl -- Support multiple Prolog dialects
      • apply_macros.pl -- Goal expansion rules to avoid meta-calling
      • prolog_code.pl -- Utilities for reasoning about code
      • dif.pl -- The dif/2 constraint
      • thread.pl -- High level thread primitives
      • rdf_triple.pl -- Create triples from intermediate representation
      • rdf_parser.pl
      • rewrite_term.pl
      • rbtrees.pl -- Red black trees
      • nb_rbtrees.pl -- Non-backtrackable operations on red black trees
      • pengines.pl -- Pengines: Web Logic Programming Made Easy
      • yall.pl -- Lambda expressions
      • sandbox.pl -- Sandboxed Prolog code
      • prolog_format.pl -- Analyse format specifications
      • random.pl -- Random numbers
      • pengines_io.pl -- Provide Prolog I/O for HTML clients
      • zlib.pl -- Zlib wrapper for SWI-Prolog
      • bdb.pl -- Berkeley DB interface
      • hash_stream.pl -- Maintain a hash on a stream
      • md5.pl -- MD5 hashes
      • pprint.pl -- Pretty Print Prolog terms
      • modules.pl -- Module utility predicates
      • lazy_lists.pl -- Lazy list handling
      • edinburgh.pl -- Some traditional Edinburgh predicates
      • prolog_clause.pl -- Get detailed source-information about a clause
      • prolog_breakpoints.pl -- Manage Prolog break-points
      • dicts.pl -- Dict utilities
      • backcomp.pl -- Backward compatibility
      • date.pl -- Process dates and times
      • prolog_stream.pl -- A stream with Prolog callbacks
      • charsio.pl -- I/O on Lists of Character Codes
      • listing.pl -- List programs and pretty print clauses
      • quintus.pl -- Quintus compatibility
      • prolog_debug.pl -- User level debugging tools
      • system.pl -- System utilities
      • editline.pl -- BSD libedit based command line editing
      • prolog_codewalk.pl -- Prolog code walker
      • prolog_metainference.pl -- Infer meta-predicate properties
      • snowball.pl -- The Snowball multi-lingual stemmer library
      • unicode.pl -- Unicode string handling
      • statistics.pl -- Get information about resource usage
      • cgi.pl -- Read CGI parameters
      • check.pl -- Consistency checking
      • make.pl -- Reload modified source files
      • edit.pl -- Editor interface
      • help.pl -- Text based manual
      • wfs.pl -- Well Founded Semantics interface
      • pwp.pl -- Prolog Well-formed Pages
      • jpl.pl -- A Java interface for SWI Prolog 7.x
      • term_to_json.pl
      • mqi.pl
      • optparse.pl -- command line parsing
      • zip.pl -- Access resource ZIP archives
      • oset.pl -- Ordered set manipulation
      • prolog_stack.pl -- Examine the Prolog stack
      • yaml.pl -- Process YAML data
      • terms.pl -- Term manipulation
      • odbc.pl
      • sort.pl
      • double_metaphone.pl -- Phonetic string matching
      • when.pl -- Conditional coroutining
      • isub.pl -- isub: a string similarity measure
        • isub/4
      • files.pl
      • plunit.pl -- Unit Testing
      • base32.pl -- Base32 encoding and decoding
      • heaps.pl -- heaps/priority queues
      • prolog_wrap.pl -- Wrapping predicates
      • prolog_trace.pl -- Print access to predicates
      • intercept.pl -- Intercept and signal interface
      • paxos.pl -- A Replicated Data Store
      • portray_text.pl -- Portray text
      • coinduction.pl -- Co-Logic Programming
      • strings.pl -- String utilities
      • redis.pl -- Redis client
      • shell.pl -- Elementary shell commands
      • table.pl
      • prolog_autoload.pl -- Autoload all dependencies
      • qsave.pl -- Save current program as a state or executable
      • stomp.pl -- STOMP client.
      • doc_files.pl -- Create stand-alone documentation files
      • prolog_jiti.pl -- Just In Time Indexing (JITI) utilities
      • streampool.pl -- Input multiplexing
      • threadutil.pl -- Interactive thread utilities
      • ansi_term.pl -- Print decorated text to ANSI consoles
      • hashtable.pl -- Hash tables
      • tables.pl -- XSB interface to tables
      • varnumbers.pl -- Utilities for numbered terms
      • protobufs.pl -- Google's Protocol Buffers ("protobufs")
      • tty.pl -- Terminal operations
      • doc_latex.pl -- PlDoc LaTeX backend
      • c14n2.pl -- C14n2 canonical XML documents
      • rlimit.pl
      • xmlenc.pl -- XML encryption library
      • writef.pl -- Old-style formatted write
      • fastrw.pl -- Fast reading and writing of terms
      • codesio.pl -- I/O on Lists of Character Codes
      • xmldsig.pl -- XML Digital signature
      • explain.pl -- Describe Prolog Terms
      • increval.pl -- Incremental dynamic predicate modification
      • readline.pl -- GNU readline interface
      • win_menu.pl -- Console window menu
      • udp_broadcast.pl -- A UDP broadcast proxy
      • test_cover.pl -- Clause coverage analysis
      • redis_streams.pl -- Using Redis streams
      • prolog_history.pl -- Per-directory persistent commandline history
      • pdt_console.pl
      • tabling.pl
 isub(+Text1:text, +Text2:text, -Similarity:float, +Options:list) is det
Similarity is a measure of the similarity/dissimilarity between Text1 and Text2. E.g.
?- isub('E56.Language', 'languange', D, [normalize(true)]).
D = 0.4226950354609929.                       % [-1,1] range

?- isub('E56.Language', 'languange', D, [normalize(true),zero_to_one(true)]).
D = 0.7113475177304964.                       % [0,1] range

?- isub('E56.Language', 'languange', D, []).  % without normalization
D = 0.19047619047619047.                      % [-1,1] range

?- isub(aa, aa, D, []).  % does not work for short substrings
D = -0.8.

?- isub(aa, aa, D, [substring_threshold(0)]). % works with short substrings
D = 1.0.                                      % but may give unwanted values
                                              % between e.g. 'store' and 'spore'.

?- isub(joe, hoe, D, [substring_threshold(0)]).
D = 0.5315315315315314.

?- isub(joe, hoe, D, []).
D = -1.0.

This is a new version of isub/4 which replaces the old version while providing backwards compatibility. This new version allows several options to tweak the algorithm.

Arguments:
Text1- and Text2 are either an atom, string or a list of characters or character codes.
Similarity- is a float in the range [-1,1.0], where 1.0 means most similar. The range can be set to [0,1] with the zero_to_one option described below.
Options- is a list with elements described below. Please note that the options are processed at compile time using goal_expansion to provide much better speed. Supported options are:
normalize(+Boolean)
Applies string normalization as implemented by the original authors: Text1 and Text2 are mapped to lowercase and the characters "._ " are removed. Lowercase mapping is done with the C-library function towlower(). In general, the required normalization is domain dependent and is better left to the caller. See e.g., unaccent_atom/2. The default is to skip normalization (false).
zero_to_one(+Boolean)
The old isub implementation deviated from the original algorithm by returning a value in the [0,1] range. This new isub/4 implementation defaults to the original range of [-1,1], but this option can be set to true to set the output range to [0,1].
substring_threshold(+Nonneg)
The original algorithm was meant to compare terms in semantic web ontologies, and it had a hard coded parameter that only considered substring similarities greater than 2 characters. This caused the similarity between, for example 'aa' and 'aa' to return -0.8 which is not expected. This option allows the user to set any threshold, such as 0, so that the similatiry between short substrings can be properly recognized. The default value is 2 which is what the original algorithm used.
ClioPatria (version V3.1.1-42-gd6a756b-DIRTY)