最小的循环移位算法解释算法、最小

由网友(乖娃回家吃饭)分享简介:我最近碰到了这个code缺乏任何评论。它发现字的最小循环移位(此code特别是返回其索引的字符串)和其所谓的杜瓦尔算法。只有信息我发现描述算法几句话,有清洁的code。我想AP preciate理解这个算法的帮助。我一直觉得文字的算法pretty的棘手,而很难理解。INT minLexCyc(为const char *...

我最近碰到了这个code缺乏任何评论。它发现字的最小循环移位(此code特别是返回其索引的字符串)和其所谓的杜瓦尔算法。只有信息我发现描述算法几句话,有清洁的code。我想AP preciate理解这个算法的帮助。我一直觉得文字的算法pretty的棘手,而很难理解。

  INT minLexCyc(为const char * X){
    INT I = 0,J = 1,K = 1,p为1,A,B,升= strlen的(x)的;
    而第(j + K&其中; =(1&其中;&小于1)){
        如果((A = X [第(i + k-1个)%升])>(二=×[(J + K-1)%升])){
            我= J ++;
            K = p值= 1;
        }否则,如果(A< B){
            J + = K;
            K = 1;
            P = J-I;
        }否则,如果(A == B和;&安培;!K = P){
            ķ++;
        } 其他 {
            J + = P;
            K = 1;
        }
    }
    返回我;
}
 

解决方案

首先,我相信你的code中有一个错误。最后一行应该是 返回磷; 。我beleve,我持有字典序最小循环移位的索引,并且p成立相匹配的最小偏移。我也认为你停止条件太弱,即你正在做过多的检查,你已经找到了比赛后,但我不知道到底应该是什么。

需要注意的是i和j只能提前,我总是小于学家我们正在寻找一个该字符串开始我相匹配的字符串,而我们正试图与以J开头的字符串相匹配。为此,我们通过比较每个串的第k个字符,同时增加K(只要它们匹配)。请注意,我们只能改变我,如果我们决定开始j中的字符串按字典顺序小于字符串少起始于J和然后我们设置i到j和复位k和P至初始值。

我没有时间进行了详细的分析,但它看起来像

I =的字典最小的循环移位的开始 J =我们对匹配的换档循环移位的起点开始我 K =字符的字符串i和j目前正在考虑(在位置1到k-1的字符串相匹配 P =所考虑的循环移位(我相信p表示preFIX)

修改进一步说

本节code

如果((α= X [第(i + k-1个)%升])>(二=×[(J + K-1)%升])){
        我= J ++;
        K = p值= 1;
 
详解个性化推荐五大最常用算法

将比较的开始到一个较早的字典序字符串时,我们发现一个,重新初始化一切。

本节

 }否则如果(A< B){
        J + = K;
        K = 1;
        P = J-I;
 

是棘手的部分。我们已经找到了一个匹配的是字典序晚于我们的基准线,所以我们跳过到目前为止所匹配的文本的结束,并从那里开始匹配。我们也增加了P(我们的步伐)。为什么我们可以跳过j和j + K之间的所有的起点?这是因为串开头i的字典序最小可见,并且如果当前Ĵ串的尾部是大于该字符串在i,那么该字符串在j应该大于字符串i在任何后缀。

最后

 }否则,如果(A == B和;&安培; k = P!){
        ķ++;
    } 其他 {
        J + = P;
        K = 1;
 

这仅仅检查长度为P的字符串开始我重复。

*的进一步修改的 为此,我们通过增加ķ直到 K == P ,检查字符串的第k个字符开始在i等于字符串的第k个字符开始的Ĵ 。一旦k达到p我们再次开始扫描在字符串的下一个应该发生。

即使进一步修改,以尝试回答杰思罗的问题。

第一:在 K =在 P 否则,如果(!A == B和;&功放; K = P)这里,我们有在字符串的第k和所有previous字符匹配开始在i和j是相等的。变量p重新presents,我们认为重复的字符串的长度。 !当 K = P ,实际上 K< P ,所以我们确保在字符串的P个字符开始的i是相同的字符串的P个字符开始的学家当 K == P (最后一个else),我们应该是在一个点那里的字符串开始 J +氏/ code>看起来一样开始于j中的字符串,所以我们j增加由p和设置K回1,并返回到比较两个串

二:是的,我相信你是正确的,它应该回到我。我被误解最小循环移位

的意义

I have recently came up against this code lacking any comment. It finds minimal cyclic shift of word (this code specifically returns its index in string) and its called Duval algorithm. Only info I found describes algorithm in few words and has cleaner code. I would appreciate any help in understanding this algorithm. I have always found text algorithms pretty tricky and rather hard to understand.

int minLexCyc(const char *x) {
    int i = 0, j = 1, k = 1, p = 1, a, b, l = strlen(x);
    while(j+k <= (l<<1)) {
        if ((a=x[(i+k-1)%l])>(b=x[(j+k-1)%l])) {
            i=j++;
            k=p=1;
        } else if (a<b) {
            j+=k; 
            k=1; 
            p=j-i;
        } else if (a==b && k!=p) {
            k++;
        } else {
            j+=p; 
            k=1;
        }
    }
    return i;
}

解决方案

First, I believe that your code has a bug in it. The last line should be return p;. I beleve that i holds the index of the lexicographically smallest cyclic shift, and p holds the smallest shift that matches. I also think that your stopping condition is too weak, i.e. you are doing too much checking after you have found a match, but I am not sure exactly what it should be.

Note that i and j only advance and that i is always less than j. We are looking for a string that matches the string starting at i, and we are trying to match it with a string that starts at j. We do this by comparing the k'th character of each string while increasing k (as long as they match). Note that we only change i if we determine that the string starting at j is lexicographically less than the string starting at j, and then we set i to j and reset k and p to their initial values.

I do not have time for a detailed analysis, but it looks like

i = the start of the lexicographic smallest cyclic shift j = the start of the cyclic shift we are matching against the shift starting at i k = the character in strings i and j currently under consideration (the strings match in positions 1 to k-1 p = the cyclic shift under consideration (i believe p stands for prefix)

Edit Going further

this section of code

    if ((a=x[(i+k-1)%l])>(b=x[(j+k-1)%l])) {
        i=j++;
        k=p=1;

Moves the start of the comparison to a lexicographically earlier string when we find one and reinitializes everything else.

this section

   } else if (a<b) {
        j+=k; 
        k=1; 
        p=j-i;

is the tricky part. We have found a mismatch that is lexicographically later than our reference string, so we skip to the end of the text matched so far, and start matching from there. We also increase p (our stride). Why can we skip over all the starting points between j and j + k? This is because the string starting with i is the lexicographically smallest seen, and if the tail of the current j string is greater then the string at i then any suffix of the string at j will be greater than the string at i.

Finally

    } else if (a==b && k!=p) {
        k++;
    } else {
        j+=p; 
        k=1;

this just checks that the string of length p starting at i repeats.

*further edit We do this by incrementing k until k == p, checking that the k'th character of the string starting at i equals the k'th character of the string starting at j. Once k reaches p we start scanning again at the next supposed occurrence of the string.

Even further edit to attempt to answer jethro's questions.

First: the k != p in else if (a==b && k!=p) Here we have a match in that the k'th and all previous characters in the strings starting at i and j are equal. The variable p represents the length that we think that the repeating string is. When k != p, actually k < p, so we are ensuring that the p characters at the string beginning at i are the same as the p characters of the string beginning at j. When k == p (the final else) we should be at a point where the string starting at j + k looks the same as the string starting at j, so we increase j by p and set k back to 1 and go back to comparing the two strings.

Second: Yes, I believe you are correct, it should return i. I was misunderstanding the meaning of "Minimum Cyclic Shift"