
由网友(老酒挽旧友)分享简介:我需要做相当大的XML文件的一些处理(大在这里被潜在地向上一千兆字节)在C#包括执行一些复杂的XPath查询。我的问题是,标准的方式,我将通过对System.Xml库通常这样做喜欢整个文件加载到内存中做任何事情与它,这可能会导致内存问题与此大小的文件了。I need to do some processing on...


I need to do some processing on fairly large XML files ( large here being potentially upwards of a gigabyte ) in C# including performing some complex xpath queries. The problem I have is that the standard way I would normally do this through the System.XML libraries likes to load the whole file into memory before it does anything with it, which can cause memory problems with files of this size.

我并不需要被更新的文件,在一切只是读取它们和查询包含在其中的数据。一些XPath查询是相当棘手的,去跨越几个层次的亲子类型的关系 - 我不知道这是否会影响到使用流读取器,而不是将数据加载到内存中块的能力。

I don't need to be updating the files at all just reading them and querying the data contained in them. Some of the XPath queries are quite involved and go across several levels of parent-child type relationship - I'm not sure whether this will affect the ability to use a stream reader rather than loading the data into memory as a block.


One way I can see of making it work is to perform the simple analysis using a stream-based approach and perhaps wrapping the XPath statements into XSLT transformations that I could run across the files afterward, although it seems a little convoluted.


Alternately I know that there are some elements that the XPath queries will not run across, so I guess I could break the document up into a series of smaller fragments based on it's original tree structure, which could perhaps be small enough to process in memory without causing too much havoc.


I've tried to explain my objective here so if I'm barking up totally the wrong tree in terms of general approach I'm sure you folks can set me right...


XPathReader就是答案。它不是C#的运行时的一部分,但它是可从微软下载。下面是一个 MSDN文章。

XPathReader is the answer. It isn't part of the C# runtime, but it is available for download from Microsoft. Here is an MSDN article.


If you construct an XPathReader with an XmlTextReader you get the efficiency of a streaming read with the convenience of XPath expressions.


I haven't used it on gigabyte sized files, but I have used it on files that are tens of megabytes, which is usually enough to slow down DOM based solutions.


Quoting from the below: "The XPathReader provides the ability to perform XPath over XML documents in a streaming manner".



