Rabin-Karp Algorithm
The Rabin-Karp algorithm is a method of finding all occurrences of a sub-string in a given string. The naive approach to this solution would be to use the Sliding-window technique. Rabin-Karp has a similar approach, but instead of comparing the window to the sub-string, it instead compares hash values using a rolling hash function.
Complexity
It has an average/best case time complexity of $O(n+m)$ and a worse case time complexity of $O(n * m)$. The worst case would occur if all the calculated hash values of the text were the same as the calculated hash of the pattern. It has auxiliary complexity of $O(1)$.
Hashing function
It is commonly recommended that the hashing value should be an integer. If a weak hashing function is used, this could potentially lead to the worst case scenario and cause multiple spurious hits.
A spurious hit occurs when the hash value of the pattern matches the hash value of the currently observed sub-string, but when compared they do not match one another.
The hashing function suggested by Rabin-Karp is \(p[0]*10^{m-1} + p[1]*10^{m-2} + ... p[m-1]*10^0\)
It should be acknowledged that for larger character sets and larger patterns, this hashing function is vulnerable to integer overflows. To account for this, the modulus of the hashing value can be taken using a prime number.
Examples
python
alphabetLength = 256
primeNumber = 16777619
def search(pattern, text):
patternLength = len(pattern)
textLength = len(text)
patternHash = 0
textHash = 0
fingerprint = 1
i = 0
j = 0
# setting the fingerprint
for i in range(patternLength-1):
fingerprint = (
fingerprint * alphabetLength
) % primeNumber
# pre-processing pattern and text hash values
for i in range(patternLength):
patternHash = (
alphabetLength * patternHash + ord(pattern[i])
) % primeNumber
textHash = (
alphabetLength * textHash + ord(text[i])
) % primeNumber
# search the text
for i in range(textLength - patternLength+1):
# check the characters manually when hit occurs
if patternHash == textHash:
for j in range(patternLength):
if text[i + j] != pattern[j]:
break
j += 1
if j == patternLength:
print("Pattern found at: " + str(i+1))
# roll the hash forward
if i < textLength-patternLength:
textHash = (
alphabetLength * (
textHash - ord(text[i]) * fingerprint
) + ord(text[i + patternLength])
) % primeNumber
if textHash < 0:
textHash = textHash + primeNumber
go
func search(pattern, text string) []int {
results := []int{}
patternLength := len(pattern)
textLength := len(text)
patternHash := 0
textHash := 0
fingerprint := 1
i := 0
j := 0
// setting the fingerprint
for i = 0; i < patternLength - 1; i++ {
fingerprint = (fingerprint * alphabetLength) % primeNumber
}
// pre-processing pattern and text hash values
for i = 0; i < patternLength; i++ {
patternHash = (alphabetLength*patternHash +
int(pattern[i])) % primeNumber
textHash = (alphabetLength*textHash +
int(text[i])) % primeNumber
}
// search the text
for i = 0; i <= textLength-patternLength; i++ {
// check the characters manually when hit occurs
if patternHash == textHash {
for j = 0; j < patternLength; j++ {
if text[i+j] != pattern[j] {
break
}
}
if j == patternLength {
results = append(results, i)
}
}
// roll the hash forward
if i < textLength-patternLength {
textHash = (alphabetLength * (textHash -
int(text[i])*fingerprint) +
int(text[i+patternLength])) % primeNumber
if textHash < 0 {
textHash = textHash + primeNumber
}
}
}
return results
}