
由网友(人潮拥挤怎么没挤死你呢)分享简介:我想实现拉宾,卡普查找的字符串;而我被困在滚动哈希(尝试使用公式维基百科提示)。I am trying to implement the Rabin-Karp for finding the substring; and I got stuck at the rolling hash(trying to use th...


I am trying to implement the Rabin-Karp for finding the substring; and I got stuck at the rolling hash(trying to use the formula suggested in Wikipedia).

#define MOD 1000000007
unsigned long long rolling_hash(const char *str)
        unsigned long long hash = 0;
        size_t str_len = strlen(str);
        for(int i = 0, k = str_len -1; i < str_len; i++, k--) {
                hash = hash + str[i] * pow(257, k);
        //      hash = hash % MOD;
        return hash;

int main(void)
        printf("%llun", rolling_hash("TestString"));
        printf("%llun", rolling_hash("estStringh"));
        unsigned long long old = rolling_hash("TestString");
        // Add a character to the end
        // since the last char in old was multiplied by 1, now multiply it by
        // the base and then add the _new_ character to the end
        old = old * 257 + 'h';
        //old = old % MOD;
        // Remove a char from the start
        // Simply, remove the hash value of the first character
        old = old - 'T' * pow(257, 10);;

        printf("n%llun", old);
        return 0;


The code above works perfectly fine as long as I do not introduce any remainder operations; once I uncomment my % operations, things break down and the answer I get from the changes over the rolling hash won't equal that which's being printed by the second print.

janisz的回答: 改变散列发生器在janisz的答案的建议,得到的余数增加时,新的人物,但不删除旧的时候工作。 注意:我用我自己的 POW 函数工作无符号长长

janisz's answer: The suggestion to change the hash generator as in janisz's answer got the remainder to work when adding new characters but NOT when removing the old ones. Note: I am using my own pow function to work with unsigned long long



Hash genrator code is wrong. It should be

hash = (hash*257 + str[i]) % MOD;

和unncoment old_hash = old_hash%MOD; 。也改变了方式,你生成previous新的哈希

and unncoment old_hash = old_hash % MOD;. Also change the way you generate new hash from previous

(old_hash - to_delete_char * pow(257, str_len-1)) % MOD;

看一看你code。前两行是非常好的。什么发生在回路。 首先,你正在做尽可能多的乘法,如你可以。在我的方法我用计算哈希散列becouse的霍纳方案是一个多项式。

Take a look at your code. First 2 lines are perfectly good. What happen in the loop. First of all you are doing as much multiplies as you can. In my approach I use Horner scheme of computing hash becouse hash is a polynomial.

为什么它的工作原理没有弹性模量和不能当。我认为这是一个巧合becouse你溢出整数,8个字符(日志(2 ^ 64)/日志(257)= 8)。

Why it works when without modulus and with not. I think it's a coincidence becouse you overflow integer with 8 characters (log(2^64)/log(257) = 8).

现在有什么错删除字符。 to_delete_char * POW(257,str_len); to_delete_char * POW(257,str_len-1); 指数应从0而不是1开始马赫您的发电机。

Now what's wrong with removing characters. to_delete_char * pow(257, str_len); should be to_delete_char * pow(257, str_len-1); index should start from 0 not 1 to mach your generator.

编辑: 我认为问题是战俘功能。正如我上面写的溢出了刚刚与8个字符。在您的例子中,你有10个,因此它不能正常工作。

I think problem was in pow function. As I wrote above it overflow just with 8 characters. In your example you have 10 so it can't work.


It turns out that adding and removing character must be done as a one operation. Probably due to equivalents but I'm not sure.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

#define MOD 787

unsigned long long pow(int x, int y)
    unsigned long long ret = 1;
    for (int i=0;i<y;i++)
        ret = (ret*x)%MOD;
    return ret;
unsigned long long rolling_hash(const char *str)
        unsigned long long hash = 0;
        size_t str_len = strlen(str);
        for(int i = 0, k = str_len -1; i < str_len; i++, k--) {
                hash = hash + (str[i] * pow(257, k))%MOD;
                hash = hash % MOD;
        return hash;

int main(void)
        char input[] = "TestString";
        printf("Input: %llun", rolling_hash(input));
        printf("Expected: %llun", rolling_hash("estStringh"));
        unsigned long long old = rolling_hash(input);
        // Add a character to the end
        // and Remove a char from the start

        unsigned long long  h = (input[0] * pow(257, strlen(input)))%MOD;
        old = ((old * 257) + 'h' - h) % MOD;

        printf("Actual: %llun", old);
        return 0;

