哈斯克尔线性时间的在线算法在线、线性、算法、斯克

由网友(一只哀伤的猫)分享简介:请原谅我,如果我滥用在标题中大的话;我不太熟知他们,但希望他们说明我的问题。我写了一个详细的方案,根据这些要求。对于长度为10 ^ 4和更高的弦,code我写的有点慢,我想知道 - 因为它处理了200块在同一时间(尽管只移动一个字符向前有时采取下一大块) ,可以将其进行修改,以输出结果更快或更线性的方式(例如,立即将结...

请原谅我,如果我滥用在标题中大的话;我不太熟知他们,但希望他们说明我的问题。我写了一个详细的方案,根据这些要求。对于长度为10 ^ 4和更高的弦,code我写的有点慢,我想知道 - 因为它处理了200块在同一时间(尽管只移动一个字符向前有时采取下一大块) ,可以将其进行修改,以输出结果更快或更线性的方式(例如,立即将结果输出为每个200个字符处理)。任何帮助,这或其他明显的优化将是AP preciated。

每电话的建议,我简化我的例子:

 连接code XS = EN code'XS []其中
  EN code'[]结果=结果
  EN code'(Z:ZS)结果
    |空测试= EN code'ZS(结果+ [Z])
    |否则= EN code(降numZsProcessed ZS)(导致++处理)
   其中test = ..some测试
         麦克罗公司一直供应=取200(Z:ZS)
         处理= ..对与麦克罗公司一直供应并发
         numZsProcessed = ..number Z的加工
 

解决方案

哈斯克尔和尾递归不相处以及其他功能性语言和尾递归。让我们做一些手工降低了一些非常简单的code,看看有什么用尾递归回事。这里有一个尾递归执行地图(1 +)

 去[] R = R
去(X:XS)R =去XS(R + [1 + X])
 
算法时间复杂度的理解

此外,我们必须牢记的定义(++)

  [] ++ YS =伊苏
(X:XS)++ YS = X(XS ++ YS)
 

现在,让我们减少去[1,2,3,4,5] [] 。请记住, [X,Y,Z] 的表示法 X:(Y:(Z:[])),或简称 X:Y:Z:[]

 去[1,2,3,4,5] []
去[2,3,4,5]([] ++ [2]) -  2这里实际上是在thunk 1 + 1,但
                            - 为紧凑我先前减少
去[3,4,5](([] ++ [2])+ [3​​])
去[4,5]((([] ++ [2])+ [3​​])+ [4])
去[5](((([] ++ [2])+ [3​​])+ [4])+ [5])
去[]((((([] ++ [2])+ [3​​])+ [4])+ [5])+ [6])
(((([] ++ [2])+ [3​​])+ [4])+ [5])+ [6]
((([2] + [3])+ [4])+ [5])+ [6]
(((2:([] ++ [3])+ [4])+ [5])+ [6]
((2:(([] ++ [3])+ [4])+ [5])+ [6]
(2:((([] ++ [3])+ [4])+ [5])+ [6]
2:(((([] ++ [3])+ [4])+ [5])+ [6]) - 第一观察到的输出
2:((([3] + [4])+ [5])+ [6])
2:((3:([] ++ [4])+ [5])+ [6])
2:(3:(([] ++ [4])+ [5])+ [6])
2:3:((([] ++ [4])+ [5])+ [6]) - 第二可观察到的输出
2:3:(([4] + [5])+ [6])
2:3:(4:([] ++ [5])+ [6])
2:3:4:(([] ++ [5])+ [6]) - 第三可观察到的输出
2:3:4:([5] + [6])
2:3:4:5:([] ++ [6]) - 第四观察到的输出
2:3:4:5:6:[]  - 最终输出
 

请参阅如何在输出中的每个项目需要从深层嵌套一系列的括号向外工作的方式?这将导致其采取二次时输入的大小让所有的输出。您还可以看到,前几个项目也慢慢产生了行为,它得到越来越快,你到达列表的末尾。这种减少解释说。

主要性能问题这里追加该新元素的列表,这需要一定的时间成比例的要附加到列表的大小的末端。更好的办法是在前面,这是一个缺点的坦时操作利弊。这将导致输出出来扭转,所以你需要扭转的结果。

 去[] R =后退R
去(X:XS)R =去XS((1 + X):R)

相反XS =转XS []  - 大致从报告prelude
转[] R = R
REV(X:XS)R =转XS(X,R)
 

和,让我们减少:

 去[1,2,3,4,5] []
去[2,3,4,5] [2]
去[3,4,5] [1,3,2]
去[4,5] [4,3,2]
去[5] [5,4,3,2]
去[] [6,5,4,3,2]
反向[6,5,4,3,2]
转[6,5,4,3,2] []
转速[5,4,3,2] [6]
转[4,3,2] [5,6]
转[3,2] [4,5,6]
转速[2] [3,4,5,6]
转[] [2,3,4,5,6]
[2,3,4,5,6]  - 首先,所有观察到的输出!
 

因此​​,这显然比第一个版本的工作更少。这是用在严格的语言,如Scheme和ML的风格,因为它具有良好的内存性能。然而,它有一些缺点:

所有的输入必须消耗任何输出,可以产生之前。事实上,整个计算执行之前,任何结果产生的。 因为这一点,它不会永远产生任何输出时给予无限列表。 它涉及逆转,这是需要一个额外的 O(N)时间,无关什么我们正在做的(什么倒车都与增加一到每个元素和preserving的顺序?)。

在一个慵懒的语言如Haskell中,我们可以做的更好。奇怪的是,和精美的,我们做的方法是更加天真地写它。

 去[] = []
去(X:XS)=(1 + X):去XS
 

和减少:

 去[1,2,3,4,5]
2:(去[2,3,4,5]) - 第一个可观察的输出
2:3:(去[3,4,5]) - 第二可观察到的输出
2:3:4:(去[4,5]) - 第三可观察到的输出
2:3:4:5:(去[6]) - 第四观察到的输出
2:3:4:5:6:(去[]) - 第五可观察到的输出
2:3:4:5:6:[]  - 最终输出
 

这需要更少的工作,并开始产生输出,即使在看名单的其余部分之前,因此它在流计算性能好,适用于无限大的投入。而实现大约是浅显的,你可能希望。

我希望这给你一些直觉如何尾递归工作在哈斯克尔。对于你的榜样,我建议去掉尾递归和一个天真的风格相似,以我方最后,使用直觉,我希望我的这篇文章认为,产生大改写输入地,一个preFIX尽快(请注意,返回 X:XS 立即产生 X ,即使有更多的工作要做,以计算 XS - 这是懒惰的(非)动作)

Please forgive me if I misused the big words in the title; I'm not too knowledgeable about them but hope they describe my problem. I wrote an elaborate scheme to try and encode strings according to these requirements. For strings of length 10^4 and higher, the code I wrote is quite slow, and I'm wondering - since it processes chunks of 200 at a time (albeit moving only one character forward sometimes to take the next chunk), could it be modified to output the result faster or in a more linear fashion (e.g., immediately output the result for each 200 characters processed). Any help with that or other noticeable optimizations would be appreciated.

Per tel's suggestion, I simplified my example:

encode xs = encode' xs [] where
  encode' []     result = result
  encode' (z:zs) result
    | null test = encode' zs (result ++ [z])
    | otherwise = encode' (drop numZsProcessed zs) (result ++ processed)
   where test = ..some test
         toProcess = take 200 (z:zs)
         processed = ..do something complicated with toProcess
         numZsProcessed = ..number of z's processed

解决方案

Haskell and tail recursion don't get along as well as other functional languages and tail recursion. Let's do some manual reduction on some very simple code to see what's going on with tail recursion. Here's a tail-recursive implementation of map (1+).

go [] r = r
go (x:xs) r = go xs (r ++ [1+x])

Also we must keep in mind the definition of (++):

[] ++ ys = ys
(x:xs) ++ ys = x : (xs ++ ys)

Now let's reduce go [1,2,3,4,5] []. Keep in mind that [x,y,z] is notation for x:(y:(z:[])), or for short x:y:z:[].

go [1,2,3,4,5] []
go [2,3,4,5] ([] ++ [2])   -- 2 here is actually the thunk 1+1, but
                           -- for compactness I am reducing earlier
go [3,4,5] (([] ++ [2]) ++ [3])
go [4,5] ((([] ++ [2]) ++ [3]) ++ [4])
go [5] (((([] ++ [2]) ++ [3]) ++ [4]) ++ [5])
go [] ((((([] ++ [2]) ++ [3]) ++ [4]) ++ [5]) ++ [6])
(((([] ++ [2]) ++ [3]) ++ [4]) ++ [5]) ++ [6]
((([2] ++ [3]) ++ [4]) ++ [5]) ++ [6]
(((2:([] ++ [3]) ++ [4]) ++ [5]) ++ [6]
((2:(([] ++ [3]) ++ [4]) ++ [5]) ++ [6]
(2:((([] ++ [3]) ++ [4]) ++ [5]) ++ [6]
2:(((([] ++ [3]) ++ [4]) ++ [5]) ++ [6])    -- first observable output
2:((([3] ++ [4]) ++ [5]) ++ [6])
2:((3:([] ++ [4]) ++ [5]) ++ [6])
2:(3:(([] ++ [4]) ++ [5]) ++ [6])
2:3:((([] ++ [4]) ++ [5]) ++ [6])           -- second observable output
2:3:(([4] ++ [5]) ++ [6])
2:3:(4:([] ++ [5]) ++ [6])
2:3:4:(([] ++ [5]) ++ [6])                  -- third observable output
2:3:4:([5] ++ [6])
2:3:4:5:([] ++ [6])                         -- fourth observable output
2:3:4:5:6:[]                                -- final output

See how each item in the output needs to work its way outward from a deeply nested series of parentheses? This causes it to take quadratic time in the size of the input to get all the output. You'll also see a behavior that the first few items are yielded slowly, and it gets faster and faster as you reach the end of the list. This reduction explains that.

The main performance problem here is appending the new element to the end of the list, which takes time proportional to the size of the list you are appending to. A better way is to cons on the front, which is a constant-time operation. This will cause the output to come out reversed, so you need to reverse the result.

go [] r = reverse r
go (x:xs) r = go xs ((1+x):r)

reverse xs = rev xs []      -- roughly from the report prelude
rev [] r = r
rev (x:xs) r = rev xs (x:r)

And, let's reduce:

go [1,2,3,4,5] []
go [2,3,4,5] [2]
go [3,4,5] [3,2]
go [4,5] [4,3,2]
go [5] [5,4,3,2]
go [] [6,5,4,3,2]
reverse [6,5,4,3,2]
rev [6,5,4,3,2] []
rev [5,4,3,2] [6]
rev [4,3,2] [5,6]
rev [3,2] [4,5,6]
rev [2] [3,4,5,6]
rev [] [2,3,4,5,6]
[2,3,4,5,6]          -- first and all observable output!

So this is clearly less work than the first version. This is the style is used in strict languages like Scheme and ML, because it has good memory performance. However, it has some disadvantages:

All the input must be consumed before any output can be produced. Indeed the entire computation is performed before anything results are produced. Because of this, it does not ever yield any output when given an infinite list. It involves reverse, which is takes an extra O(n) time and has nothing to do with what we are doing (what does reversing have to do with adding one to every element and preserving the order?).

In a lazy language like Haskell, we can do better. Strangely, and beautifully, the way we do is by writing it even more naively.

go [] = []
go (x:xs) = (1+x):go xs

and reduce:

go [1,2,3,4,5]
2:(go [2,3,4,5])     -- first observable output
2:3:(go [3,4,5])     -- second observable output
2:3:4:(go [4,5])     -- third observable output
2:3:4:5:(go [6])     -- fourth observable output
2:3:4:5:6:(go [])    -- fifth observable output
2:3:4:5:6:[]         -- final output

It takes even less work, and it starts yielding output before even looking at the remainder of the list, so it has good performance in a stream computation and works on infinite inputs. And the implementation is about as simple and obvious as you could hope for.

I hope this gives you some intuition for how tail recursion works in Haskell. For your example, I suggest removing the tail recursion and rewriting in a naive style similar to our final go, using the intuition I hope I suggested from this post to yield "as large a prefix of the input as possible, as soon as possible" (notice that returning x:xs immediately yields x, even if there is some more work to be done to compute xs -- that's laziness in (non-)action).

阅读全文

相关推荐

最新文章