StringAlgorithms

Documentation for StringAlgorithms.

StringAlgorithms.PalindromicAutomatonMethod

Create an empty palindromic automaton of key type T.

The Palindromic AutoMaton (PAM, also known as the palindrome tree) is a very useful data structure for solving palindrome-related problems. A detailed introduction to PAM can be found via:

To add new values into the automaton, simply use Base.push!.

Getters for the internal fields of the automaton are provided, including:

  • children(pam): Get the children dictionary of the PAM.
  • nodecount(pam): Get the number of nodes in the PAM.
  • lastnodeindex(pam): Get the index of the last node of the PAM.
  • len(pam, i): Get the palindrome length of the i-th node.
  • fail(pam, i): Get the fail pointer of the i-th node.
  • cnt(pam, i): Get the frequency count of the i-th node.

Sample usages

  • Finding the length of the longest palindromic substring
function longest_palindromic_substring(s)
    pam = PalindromicAutomaton{Char}()
    hi = 0
    for ch in s
        push!(pam, ch)
        hi = max(hi, len(pam, lastnodeindex(pam)))
    end
    return hi
end

longest_palindromic_substring("ddabababacc")

# output

7
source
StringAlgorithms.PrefixAutomatonMethod

Build an automaton based on the prefix function of a given vector and a fixed alphabet. This is useful when the vector to be searched within is very large.

source
Base.countFunction
count(pattern, sam)
count(pattern, sam, freq)

Count the number of occurrences of a pattern in the suffix automaton, including overlapping ones.

With a precalculated frequency array, this function can be extremely fast compared to Julia standar library's implementation.

julia> sam = SuffixAutomaton("abcdabdabcddabcabdab");

julia> freq = freqcount(sam);

julia> count("ab", sam, freq)
6

julia> count("da", sam, freq)
4
source
Base.findallFunction
findall(pattern, sam)
findall(pattern, sam, invlink; ascending)

Find all matches of a given pattern (including overlapping ones) in a suffix automaton, with the help of precalculated inverse links.

If the keyword argument ascending is set to true, the results will be sorted in place in the ascending order. Otherwise, the results will be unordered.

If you need to perform multiple searches on the same text, this function can be much faster than Julia standard library's implementation, especially when the pattern is long but does not occur in the string to be searched.

julia> sam = SuffixAutomaton("abcdabdabcddabcabdab");

julia> invlink = inverselink(sam);

julia> findall("ab", sam, invlink; ascending=true)
6-element Vector{UnitRange{Int64}}:
 1:2
 5:6
 8:9
 13:14
 16:17
 19:20

julia> findall("da", sam, invlink; ascending=true)
4-element Vector{UnitRange{Int64}}:
 4:5
 7:8
 12:13
 18:19
source
Base.findfirstMethod
findfirst(pattern, sam)

Find the first match of a given pattern in a suffix automaton.

If you need to perform multiple searches on the same text, this function can be much faster than Julia standard library's implementation, especially when the pattern is long but does not occur in the string to be searched.

julia> sam = SuffixAutomaton("abcdabdabcddabcabdab");

julia> findfirst("cda", sam)
3:5

julia> findfirst("cabd", sam)
15:18
source
StringAlgorithms.longestcommonsubstringMethod
longestcommonsubstring(args; lengthonly)

Find the longest common non-empty substring/sub-vector of multiple strings/vectors.

When there are more than one with the same length, all distinct answers will be returned in a nondeterministic order.

julia> longestcommonsubstring("ababab", "bababa", "abaabb")
1-element Vector{String}:
 "aba"

If keyword argument lengthonly is set to true, only the length of the longest common substring will be returned.

julia> longestcommonsubstring("ababab", "bababa", "abaabb"; lengthonly=true)
3
source
StringAlgorithms.manacherMethod
manacher(v)

The Manacher algorithm proposed by Glenn K. Manacher in 1975.

Given a vector of length n, so that there will be 2n+1 positions if the intervals between each element are included, returns a vector of length 2n+1 denoting the longest palindrome centered at each position.

Sample usages

  • Finding the length of the longest palindromic substring
function longest_palindromic_substring(s)
    return maximum(manacher(s))
end

longest_palindromic_substring("ddabababacc")

# output

7
source
StringAlgorithms.maximumrotationMethod
maximumrotation(s)

Find the maximum of all rotations of the given string/vector.

For example, string aab has three rotations:, aab, aba, baa, and baa is the maximum rotation.

julia> maximumrotation("aab")
"baa"
source
StringAlgorithms.minimumrotationMethod
minimumrotation(s)

Find the minimum of all rotations of the given string/vector.

For example, string aab has three rotations:, aab, aba, baa, and aab is the minimum rotation.

julia> minimumrotation("aab")
"aab"
source
StringAlgorithms.nodeorientedMethod

Transform the internal representation of the suffix automaton into a more user-friendly node-oriented representation, which might be useful when users want to access some internal information.

source
StringAlgorithms.prefix_functionMethod
prefix_function(v)

For each prefix of the given vector, calculate the maximum length such that its nontrivial prefix (not itself) equals to its suffix.

For example, for string ababc,

  • p[1] = 0
  • p[2] = 0
  • p[3] = 1, since the prefix aba has both prefix a and suffix a
  • p[4] = 2, since the prefix abab has both prefix ab and suffix ab
  • p[5] = 0

Sample usages

  • Finding all matches (KMP algorithm)
# The following function finds all occurrences of the pattern in the given vector, including overlapping ones.
function findall(pattern, s)
    n, m = length(s), length(pattern)
    tmp = vcat(collect(pattern), [nothing], collect(s))
    p = prefix_function(tmp)
    return [i-m+1:i for i in 1:n if p[i+m+1] == m]
end

findall("aba", "ddabababacc")

# output

3-element Vector{UnitRange{Int64}}:
 3:5
 5:7
 7:9
source
StringAlgorithms.toposortMethod
toposort(sam)

Find the topological order of all nodes in a suffix automaton. (Equivalent to sorting all nodes in the ascending order according to their length.)

source
StringAlgorithms.z_algorithmMethod
z_algorithm(v)

Z Algorithm calculates the longest common prefix of each suffix of the given vector and the whole vector itself. Specially, the Z value of the whole vector is forced to be 0.

Sample usages

Usages of Z Algorithm are almost the same as the prefix function.

  • Finding all matches
# The following function finds all occurrences of the pattern in the given vector, including overlapping ones.
function findall(pattern, s)
    n, m = length(s), length(pattern)
    tmp = vcat(collect(pattern), [nothing], collect(s))
    z = z_algorithm(tmp)
    return [i:i+m-1 for i in 1:n if z[i+m+1] == m]
end

findall("aba", "ddabababacc")

# output

3-element Vector{UnitRange{Int64}}:
 3:5
 5:7
 7:9
source