Universal Levenshtein Automata for a Generalization of the Levenshtein Distance
Abstract
The need to efficiently find approximate matches for a given input string in a large background dictionary arises in many areas of computer science. In earlier work we introduced the concept of a universal Levenshtein automaton for a distance bound $n$. Given two arbitrary strings $v$ and $w$, we may use a sequence of bitstrings $\chi(v,w)$ obtained from $v$ and $w$ in a trivial way as input for the automaton. The automaton is deterministic. The sequence $\chi(v,w)$ is accepted iff the Levenshtein distance between $v$ and $w$ does not exceed $n$. We showed how universal Levenshtein automata can be used to efficiently select approximate matches in large dictionaries. In this paper we consider variants of the Levenshtein distance were substitutions may be blocked for specific symbol pairs. The concept of an universal Levenshtein automaton is extended to cope with this larger class of similarity measures.