StringAlgorithms
Documentation for StringAlgorithms.
StringAlgorithms.PalindromicAutomaton
StringAlgorithms.PrefixAutomaton
StringAlgorithms.SuffixAutomaton
Base.count
Base.findall
Base.findfirst
StringAlgorithms.freqcount
StringAlgorithms.inverselink
StringAlgorithms.longestcommonsubstring
StringAlgorithms.manacher
StringAlgorithms.maximumrotation
StringAlgorithms.minimumrotation
StringAlgorithms.nodeoriented
StringAlgorithms.prefix_function
StringAlgorithms.toposort
StringAlgorithms.z_algorithm
StringAlgorithms.PalindromicAutomaton
— MethodCreate an empty palindromic automaton of key type T
.
The Palindromic AutoMaton (PAM, also known as the palindrome tree) is a very useful data structure for solving palindrome-related problems. A detailed introduction to PAM can be found via:
To add new values into the automaton, simply use Base.push!
.
Getters for the internal fields of the automaton are provided, including:
children(pam)
: Get the children dictionary of the PAM.nodecount(pam)
: Get the number of nodes in the PAM.lastnodeindex(pam)
: Get the index of the last node of the PAM.len(pam, i)
: Get the palindrome length of the i-th node.fail(pam, i)
: Get the fail pointer of the i-th node.cnt(pam, i)
: Get the frequency count of the i-th node.
Sample usages
- Finding the length of the longest palindromic substring
function longest_palindromic_substring(s)
pam = PalindromicAutomaton{Char}()
hi = 0
for ch in s
push!(pam, ch)
hi = max(hi, len(pam, lastnodeindex(pam)))
end
return hi
end
longest_palindromic_substring("ddabababacc")
# output
7
StringAlgorithms.PrefixAutomaton
— MethodBuild an automaton based on the prefix function of a given vector and a fixed alphabet. This is useful when the vector to be searched within is very large.
StringAlgorithms.SuffixAutomaton
— MethodCreate an empty suffix automaton of key type T
.
The Suffix AutoMaton (SAM) is a very useful data structure for solving string-related problems. A detailed introduction to SAM can be found via:
Base.count
— Functioncount(pattern, sam)
count(pattern, sam, freq)
Count the number of occurrences of a pattern in the suffix automaton, including overlapping ones.
With a precalculated frequency array, this function can be extremely fast compared to Julia standar library's implementation.
julia> sam = SuffixAutomaton("abcdabdabcddabcabdab");
julia> freq = freqcount(sam);
julia> count("ab", sam, freq)
6
julia> count("da", sam, freq)
4
Base.findall
— Functionfindall(pattern, sam)
findall(pattern, sam, invlink; ascending)
Find all matches of a given pattern (including overlapping ones) in a suffix automaton, with the help of precalculated inverse links.
If the keyword argument ascending
is set to true
, the results will be sorted in place in the ascending order. Otherwise, the results will be unordered.
If you need to perform multiple searches on the same text, this function can be much faster than Julia standard library's implementation, especially when the pattern is long but does not occur in the string to be searched.
julia> sam = SuffixAutomaton("abcdabdabcddabcabdab");
julia> invlink = inverselink(sam);
julia> findall("ab", sam, invlink; ascending=true)
6-element Vector{UnitRange{Int64}}:
1:2
5:6
8:9
13:14
16:17
19:20
julia> findall("da", sam, invlink; ascending=true)
4-element Vector{UnitRange{Int64}}:
4:5
7:8
12:13
18:19
Base.findfirst
— Methodfindfirst(pattern, sam)
Find the first match of a given pattern in a suffix automaton.
If you need to perform multiple searches on the same text, this function can be much faster than Julia standard library's implementation, especially when the pattern is long but does not occur in the string to be searched.
julia> sam = SuffixAutomaton("abcdabdabcddabcabdab");
julia> findfirst("cda", sam)
3:5
julia> findfirst("cabd", sam)
15:18
StringAlgorithms.freqcount
— Functionfreqcount(sam)
freqcount(sam, topo)
Calculate the size of the end-position set of each node.
StringAlgorithms.inverselink
— Methodinverselink(sam)
Build an inverse graph from the links of a suffix automaton.
StringAlgorithms.longestcommonsubstring
— Methodlongestcommonsubstring(args; lengthonly)
Find the longest common non-empty substring/sub-vector of multiple strings/vectors.
When there are more than one with the same length, all distinct answers will be returned in a nondeterministic order.
julia> longestcommonsubstring("ababab", "bababa", "abaabb")
1-element Vector{String}:
"aba"
If keyword argument lengthonly
is set to true
, only the length of the longest common substring will be returned.
julia> longestcommonsubstring("ababab", "bababa", "abaabb"; lengthonly=true)
3
StringAlgorithms.manacher
— Methodmanacher(v)
The Manacher algorithm proposed by Glenn K. Manacher in 1975.
Given a vector of length n
, so that there will be 2n+1
positions if the intervals between each element are included, returns a vector of length 2n+1
denoting the longest palindrome centered at each position.
Sample usages
- Finding the length of the longest palindromic substring
function longest_palindromic_substring(s)
return maximum(manacher(s))
end
longest_palindromic_substring("ddabababacc")
# output
7
StringAlgorithms.maximumrotation
— Methodmaximumrotation(s)
Find the maximum of all rotations of the given string/vector.
For example, string aab
has three rotations:, aab
, aba
, baa
, and baa
is the maximum rotation.
julia> maximumrotation("aab")
"baa"
StringAlgorithms.minimumrotation
— Methodminimumrotation(s)
Find the minimum of all rotations of the given string/vector.
For example, string aab
has three rotations:, aab
, aba
, baa
, and aab
is the minimum rotation.
julia> minimumrotation("aab")
"aab"
StringAlgorithms.nodeoriented
— MethodTransform the internal representation of the suffix automaton into a more user-friendly node-oriented representation, which might be useful when users want to access some internal information.
StringAlgorithms.prefix_function
— Methodprefix_function(v)
For each prefix of the given vector, calculate the maximum length such that its nontrivial prefix (not itself) equals to its suffix.
For example, for string ababc
,
p[1] = 0
p[2] = 0
p[3] = 1
, since the prefixaba
has both prefixa
and suffixa
p[4] = 2
, since the prefixabab
has both prefixab
and suffixab
p[5] = 0
Sample usages
- Finding all matches (KMP algorithm)
# The following function finds all occurrences of the pattern in the given vector, including overlapping ones.
function findall(pattern, s)
n, m = length(s), length(pattern)
tmp = vcat(collect(pattern), [nothing], collect(s))
p = prefix_function(tmp)
return [i-m+1:i for i in 1:n if p[i+m+1] == m]
end
findall("aba", "ddabababacc")
# output
3-element Vector{UnitRange{Int64}}:
3:5
5:7
7:9
StringAlgorithms.toposort
— Methodtoposort(sam)
Find the topological order of all nodes in a suffix automaton. (Equivalent to sorting all nodes in the ascending order according to their length.)
StringAlgorithms.z_algorithm
— Methodz_algorithm(v)
Z Algorithm calculates the longest common prefix of each suffix of the given vector and the whole vector itself. Specially, the Z value of the whole vector is forced to be 0.
Sample usages
Usages of Z Algorithm are almost the same as the prefix function.
- Finding all matches
# The following function finds all occurrences of the pattern in the given vector, including overlapping ones.
function findall(pattern, s)
n, m = length(s), length(pattern)
tmp = vcat(collect(pattern), [nothing], collect(s))
z = z_algorithm(tmp)
return [i:i+m-1 for i in 1:n if z[i+m+1] == m]
end
findall("aba", "ddabababacc")
# output
3-element Vector{UnitRange{Int64}}:
3:5
5:7
7:9