我想实现拉宾,卡普查找的字符串;而我被困在滚动哈希(尝试使用公式维基百科提示)。
I am trying to implement the Rabin-Karp for finding the substring; and I got stuck at the rolling hash(trying to use the formula suggested in Wikipedia).
#define MOD 1000000007
unsigned long long rolling_hash(const char *str)
{
unsigned long long hash = 0;
size_t str_len = strlen(str);
for(int i = 0, k = str_len -1; i < str_len; i++, k--) {
hash = hash + str[i] * pow(257, k);
// hash = hash % MOD;
}
return hash;
}
int main(void)
{
printf("%llun", rolling_hash("TestString"));
printf("%llun", rolling_hash("estStringh"));
unsigned long long old = rolling_hash("TestString");
// Add a character to the end
// since the last char in old was multiplied by 1, now multiply it by
// the base and then add the _new_ character to the end
old = old * 257 + 'h';
//old = old % MOD;
// Remove a char from the start
// Simply, remove the hash value of the first character
old = old - 'T' * pow(257, 10);;
printf("n%llun", old);
return 0;
}
在code以上工作完全正常,只要我不介绍任何剩余的操作;一旦我取消我的%
操作,东西打破,答案我来自滚动哈希值的变化得到的将不等于说这正在被打印在第二打印。
The code above works perfectly fine as long as I do not introduce any remainder operations; once I uncomment my %
operations, things break down and the answer I get from the changes over the rolling hash won't equal that which's being printed by the second print.
janisz的回答:
改变散列发生器在janisz的答案的建议,得到的余数增加时,新的人物,但不删除旧的时候工作。
注意:我用我自己的 POW
函数工作无符号长长
janisz's answer:
The suggestion to change the hash generator as in janisz's answer got the remainder to work when adding new characters but NOT when removing the old ones.
Note: I am using my own pow
function to work with unsigned long long
推荐答案
哈希都能运行code是错误的。它应该是
Hash genrator code is wrong. It should be
hash = (hash*257 + str[i]) % MOD;
和unncoment old_hash = old_hash%MOD;
。也改变了方式,你生成previous新的哈希
and unncoment old_hash = old_hash % MOD;
. Also change the way you generate new hash from previous
(old_hash - to_delete_char * pow(257, str_len-1)) % MOD;
看一看你code。前两行是非常好的。什么发生在回路。 首先,你正在做尽可能多的乘法,如你可以。在我的方法我用计算哈希散列becouse的霍纳方案是一个多项式。
Take a look at your code. First 2 lines are perfectly good. What happen in the loop. First of all you are doing as much multiplies as you can. In my approach I use Horner scheme of computing hash becouse hash is a polynomial.
为什么它的工作原理没有弹性模量和不能当。我认为这是一个巧合becouse你溢出整数,8个字符(日志(2 ^ 64)/日志(257)= 8)。
Why it works when without modulus and with not. I think it's a coincidence becouse you overflow integer with 8 characters (log(2^64)/log(257) = 8).
现在有什么错删除字符。 to_delete_char * POW(257,str_len);
应 to_delete_char * POW(257,str_len-1);
指数应从0而不是1开始马赫您的发电机。
Now what's wrong with removing characters. to_delete_char * pow(257, str_len);
should be to_delete_char * pow(257, str_len-1);
index should start from 0 not 1 to mach your generator.
编辑: 我认为问题是战俘功能。正如我上面写的溢出了刚刚与8个字符。在您的例子中,你有10个,因此它不能正常工作。
I think problem was in pow function. As I wrote above it overflow just with 8 characters. In your example you have 10 so it can't work.
编辑:事实证明,添加和删除角色一定要做为一个操作。可能是由于当量,但我不知道。
It turns out that adding and removing character must be done as a one operation. Probably due to equivalents but I'm not sure.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#define MOD 787
unsigned long long pow(int x, int y)
{
unsigned long long ret = 1;
for (int i=0;i<y;i++)
ret = (ret*x)%MOD;
return ret;
}
unsigned long long rolling_hash(const char *str)
{
unsigned long long hash = 0;
size_t str_len = strlen(str);
for(int i = 0, k = str_len -1; i < str_len; i++, k--) {
hash = hash + (str[i] * pow(257, k))%MOD;
hash = hash % MOD;
}
return hash;
}
int main(void)
{
char input[] = "TestString";
printf("Input: %llun", rolling_hash(input));
printf("Expected: %llun", rolling_hash("estStringh"));
unsigned long long old = rolling_hash(input);
// Add a character to the end
// and Remove a char from the start
unsigned long long h = (input[0] * pow(257, strlen(input)))%MOD;
old = ((old * 257) + 'h' - h) % MOD;
printf("Actual: %llun", old);
return 0;
}
相关推荐
最新文章