哪里是一个C#采集与其中的最大容量; T>定义的?是一个、最大容量、定义、GT

由网友(女人就该活的漂亮)分享简介:我试过了大量的元素添加到Collection,使用基本数据类型的五个属性每个简单的数据传输对象的元素,没有什么特别的。I tried to add a large number of elements to a Collection, the elements each simple data transfer ob...

我试过了大量的元素添加到Collection,使用基本数据类型的五个属性每个简单的数据传输对象的元素,没有什么特别的。

I tried to add a large number of elements to a Collection, the elements each simple data transfer objects with five properties of basic data types, nothing special.

当在一个循环中添加新条目我总是得到一个OutOfMemoryException。有趣的是尝试添加第八百三十八万八千六百○八元素时,我总是得到的异常(也就是8 * 1024 * 1024)。因此,我认为有在这样的集合允许容量(元素的数量)方面有内置的限制,但我找不到它的任何信息。

When adding new entries in a loop I always get an OutOfMemoryException. The interesting thing is that I always get the exception when trying to add the 8388608th element (which is 8*1024*1024). Therefore I assume that there is a built-in limit in terms of capacity (number of elements) allowed in such collections, but I couldn't find any information about it.

这是否限制确实存在吗?我会在哪里找到这个文件?

Does this limit indeed exist? Where would I find this documented?

推荐答案

这是一个OutOfMemoryException,所以它不是规模或发行集合在这里的能力:它是在你的应用程序的内存使用。诀窍在于,你不必占用内存的机器,甚至在过程中得到这个例外。

This is an OutOfMemoryException, so it's not the size or capacity of the collection at issue here: it is memory use in your application. The trick is that you don't have to use up the memory in your machine or even in your process to get this exception.

我认为正在发生的事情是,你填补了大对象堆。随着藏品成长,他们需要在后台添加存储空间来容纳新的项目。一旦新的存储分配和项目复制的,旧的存储被释放,并应符合垃圾收集。

What I think is happening is that you're filling up the large object heap. As collections grow they need to add storage in the background to accommodate the new items. Once the new storage is allocated and the items are copied in, the old storage is released and should be eligible for garbage collection.

的问题是,一旦你超过一定尺寸(以前是85000字节,但可能是现在不同了),垃圾收集器(GC)跟踪使用的东西你的记忆被称为大对象堆(LOH)。当GC释放从LOH(这恰好很少开始)内存,该内存将返回到操作系统,并可供其他进程,但是的的虚拟地址空间的从内存中还是会是在你自己的过程中使用。你必须在你的程序的地址表中一个伟大的大洞,因为这个洞是大对象堆它永远不会被压缩或回收。

The issue is that once you get beyond a certain size (used to be 85000 bytes, but might be different now), the Garbage Collector (GC) tracks your memory using something called the Large Object Heap (LOH). When the GC frees memory from the LOH (which happens only rarely to begin with), the memory will return to your operating system and be available for other processes, but the virtual address space from that memory will still be in use within your own process. You'll have a great gaping hole in your program's address table, and because this hole is on the Large Object Heap it will never be compacted or reclaimed.

你看到两个完全相同的动力这个异常的原因是,大多数.NET集合使用加倍算法添加存储到集合中。它会一直扔在那里,它需要再翻一番的地步,因为,直到这一点的内存已经分配。

The reason you see this exception on an exact power of two is that most .Net collections use a doubling algorithm for adding storage to the collection. It will always throw at the point where it needs to double again, because up until that point the RAM was already allocated.

有一个快速的解决方案,那么,是采取的大多数.NET集合一个很少使用的功能优势。如果你看一下构造函数重载,大多数集合类型将有一个允许您设置初始的施工过程中的能力。这种能力不是一个硬性限制—它只是一个起点—但在少数情况下,包括当你有收集,将变得非常大有益。您可以设置初始容量的东西猥亵...希望有些事情大到足以容纳你所有的物品,或者至少是只需要双重一次或两次。

A quick solution, then, is to take advantage of a little-used feature of most .Net Collections. If you look at the constructor overloads, most of the collection types will have one that allows you to set the capacity during initial construction. This capacity is not a hard limit — it's just a starting point — but it is useful in a few cases, including when you have collections that will grow very large. You can set the initial capacity to something obscene... hopefully something just large enough to hold all your items, or at least only need to "double" once or twice.

那种你可以看到这种效果通过在控制台应用程序运行以下code:

You can kind of see this effect by running the following code in a Console Application:

var x = new List<int>();
for (long y = 0; y < long.MaxValue; y++)
    x.Add(0);

在我的系统,抛出后134217728项目的内存溢出的异常。每INT 134217728 * 4个字节的RAM只(和完全一致)512MB。它不应该丢的是,因为这是在这个过程中任何实际尺寸的唯一的事情,但它无论如何,因为失去了旧版本的收集地址空间。

On my system, that throws an OutOfMemory exception after 134217728 items. 134217728 * 4 bytes per int is only (and exactly) 512MB of RAM. It shouldn't throw yet, because that's the only thing of any real size in the process, but it does anyway because of address space lost to old versions of the collection.

现在,让我们改变了code设置这样的能力:

Now let's change the code to set the capacity like this:

var x = new List<int>(134217728 * 2);
for (long y = 0; y < long.MaxValue; y++)
    x.Add(0);

现在我的系统使得它一路268435456项目(1GB内存),当它抛出,它不会因为它不能增加一倍1GB感谢其他RAM使用2GB的当前虚拟地址表的过程中吃了一部分限制(即:循环计数器,并从集合对象和过程本身的任何开销)。

Now my system makes it all the way to 268435456 items (1GB of RAM) when it throws, which it does because it can't double that 1GB thanks to other ram used by the process eating part of the 2GB virutal address table limit (ie: the loop counter and any overhead from the collection object and process itself).

我无法解释的是,它不会允许我使用3为乘数,尽管这将是唯一的(!)1.5GB。一个小实验用不同的乘数试图找出究竟有多大,我可以得到结果表明,数量不相符。有一次,我能得到2.6以上,但随后不得不后退到下2.4。一些新的探索,我想。

What I can't explain is that it does not allow me to use 3 as a multiplier, even though that would be only(!) 1.5GB. A little experiment using different multipliers trying to find out just how large I could get showed that the number is not consistent. At one point I was able to get above 2.6, but then had to back off to under 2.4. Something new to discover, I guess.

如果这个解决方案确实得到足够的空间,你,还有一个trick你可以用它来获得虚拟地址空间的3GB,或者你可以强迫你的应用程序编译为64,而不是86或值为anycpu。如果您使用的是基于2.0运行时(任何东西通过的.Net 3.5)版本的框架,你可以尝试升级到.NET 4.0或更高版本,这是据说这个好一点。这些失败,你将不得不看你如何处理你的数据可能包括保留在磁盘上,并且只有在持有一段时间的项目(高速缓存),内存中的单个项目或小样本的完全重新编写。我真的建议这个最后的选择,因为别的可能最终再次突破意外(如果你的数据集是这样的大,首先,它很可能成长为好)。

If this solution does get enough space for you, there is also a trick you can use to get 3GB of virtual address space, or you can force your app to compile for x64 rather than x86 or AnyCPU. If you're using a version of the framework based on the 2.0 runtime (anything up through .Net 3.5) you might try updating to .Net 4.0 or later, which is reportedly a little better about this. Failing those, you will have to look at a complete re-write of how you handle your data that likely involves keeping it on disk, and only holding a single item or small sample of the items (cache) in memory at a time. I really recommend this last option, because anything else is likely to eventually break again unexpectedly (and if you're dataset is this large to begin with, it's likely growing as well).

阅读全文

相关推荐

最新文章