如何更快的是Windows本地密码散列比.NET托管版本的本机实现?的是、更快、本机、密码

由网友(山秀溪清)分享简介:我提供哈希数据集,以指纹数据,并通过散列识别它 - 这是核心用例像SHA1和MD5哈希快I'm providing hashes for sets of data in order to fingerprint the data and identify it by hash - this is the core u...

我提供哈希数据集,以指纹数据,并通过散列识别它 - 这是核心用例像SHA1和MD5哈希快

I'm providing hashes for sets of data in order to fingerprint the data and identify it by hash - this is the core use case for fast hashes like SHA1 and MD5.

在.NET中,有一个选项去与其中一些哈希的本机或托管实现(沙变种,反正)。我在寻找一个MD5管理的实施,似乎没有成为一个在.Net框架,但想知道如果包装本土CSP是更快,无论如何,如果我只是用它的内容,不会有PERF使用它的问题。顶部的答案Why有没有在.NET框架没有管理MD5实现?表明,更快的性能可能是一个变异管理不存在的理由。

In .Net, there is an option to go with the native or managed implementations of some of these hashes (the SHA variants, anyway). I'm looking for an MD5 managed implementation, and there doesn't appear to be one in the .Net Framework, but wondered if the wrapped native CSP is faster anyway, and if I should just use it content that there will be no perf problems using it. The top answer to Why is there no managed MD5 implementation in the .NET framework? indicates that faster performance could be the reason that a managed variant doesn't exist.

这是不是真的,如果是这样,如何​​更快的是原生CSP?

Is this true, and if so, how much faster is the native CSP?

推荐答案

不幸的是,被包装的原生CSP的MD5 - MD5CryptoServiceProvider - 比纯粹的托管实现显著慢。这是一个顽固的观点认为,本土code是毫不含糊地比管理code快:在很多情况下则相反。这是这样的情况下,至少在头 - 头测量

Unfortunately, the wrapped native CSP for MD5 - MD5CryptoServiceProvider - is significantly slower than a pure managed implementation. It is an obstinate viewpoint that holds that native code is unequivocally faster than managed code: in many cases the opposite is true. This is such a case, at least in head-to-head measurements.

使用translated参考MD5实现由大卫安森,我构建了一个快速的性能测试(source)其目的是测量两个实现之间的任何大的差异的性能虽然对于小数据数组的差异是可以忽略的,正如所料,在大约16kB的本机实现开始显示潜在显著延迟 - 。毫秒的数量级上,这可能似乎并不像多,但它是幅度比纯托管实现较慢的订单这个差值被维持作为数据的大小被散列的增加,并在最大测试数据阵列 - 〜250MB - (CPU时间)的差为约8.5秒。考虑到这样的散列经常被用来指纹非常大的文件,这个额外的延迟将变得明显,甚至对从I / O经常更大的延迟。

Using the translated reference MD5 implementation by David Anson, I constructed a quick performance test (source) which aims to measure any large differences in performance between the two implementations. While for small data arrays the difference are negligible, as expected, at around 16kB the native implementation starts to show potentially significant delay - on the order of milliseconds. This might not seem like much, but it is orders of magnitude slower than the pure managed implementation. This difference is maintained as the size of the data being hashed increases, and at the largest tested data array - ~250MB - the difference in CPU time was about 8.5 seconds. Considering that a hash like this is often used to fingerprint very large files, this extra delay would become noticeable, even against the often much larger delays from I/O.

这不是昭然若揭,其中的延迟来源于,因为没有进行纯粹的原生测试(其中一个将免除一个CSP的包装和管理code消费),但考虑到的几乎相同的形状对数坐标图,它会出现在托管和本机实现具有相同的内在性能,但本土code性能转向性能下降可能是由于本机和托管$之间的互操作的成本C $Ç在运行时。该performance包裹本地电信运营商和纯管理的实现之间的差异也被复制和其他研究者记录的。

It's not abundantly clear where the delay comes from, since a pure native test was not performed (one which would dispense with the wrapping of a CSP and consumption in managed code), but given the nearly identical shape of the graphs on the log scale, it would appear that the managed and native implementations have the same intrinsic performance, but that the native code performance is "shifted" down in performance likely due to the cost of the interop between native and managed code at runtime. This performance difference between wrapped native CSPs and pure managed implementations has also been reproduced and documented by other investigators.

在除了回答如何更快的是本机实现在这种特殊情况下,我希望这方面的证据用以提示更多的思考和调查管理本机与问题出现的时候,打破了长期存在的有害反应,类似的问题,这天然code是总是快,因此,在某种程度上,效果更好。管理code显然是非常快的,即使是在大​​容量数据散列这种性能敏感的领域。

In addition to answering the question "how much faster is the native implementation" in this particular case, I hope this evidence serves to prompt more reflection and investigation when the question of native vs. managed arises, breaking the long-standing and pernicious reaction to similar questions that native code is always faster, and thus, somehow, better. Managed code is clearly very fast, even in this performance-sensitive domain of bulk data hashing.

阅读全文

相关推荐

最新文章