如何最好地使用XPath在.NET中使用非常大的XML文件?非常大、文件、XPath、NET

由网友(老酒挽旧友)分享简介:我需要做相当大的XML文件的一些处理(大在这里被潜在地向上一千兆字节)在C#包括执行一些复杂的XPath查询。我的问题是,标准的方式,我将通过对System.Xml库通常这样做喜欢整个文件加载到内存中做任何事情与它,这可能会导致内存问题与此大小的文件了。I need to do some processing on...

我需要做相当大的XML文件的一些处理(大在这里被潜在地向上一千兆字节)在C#包括执行一些复杂的XPath查询。我的问题是,标准的方式,我将通过对System.Xml库通常这样做喜欢整个文件加载到内存中做任何事情与它,这可能会导致内存问题与此大小的文件了。

I need to do some processing on fairly large XML files ( large here being potentially upwards of a gigabyte ) in C# including performing some complex xpath queries. The problem I have is that the standard way I would normally do this through the System.XML libraries likes to load the whole file into memory before it does anything with it, which can cause memory problems with files of this size.

我并不需要被更新的文件,在一切只是读取它们和查询包含在其中的数据。一些XPath查询是相当棘手的,去跨越几个层次的亲子类型的关系 - 我不知道这是否会影响到使用流读取器,而不是将数据加载到内存中块的能力。

I don't need to be updating the files at all just reading them and querying the data contained in them. Some of the XPath queries are quite involved and go across several levels of parent-child type relationship - I'm not sure whether this will affect the ability to use a stream reader rather than loading the data into memory as a block.

利用这个机会,我可以看到的一种方法是使用一个基于流的方法,也许包裹XPath语句到,我可以在整个文件运行后XSLT转换进行简单的分析,尽管这似乎有点令人费解。

One way I can see of making it work is to perform the simple analysis using a stream-based approach and perhaps wrapping the XPath statements into XSLT transformations that I could run across the files afterward, although it seems a little convoluted.

另外,我知道,有一些元素的XPath查询不会遇到过,所以我想我可能会向上突破的文档转换为一系列基于它更小的片段的原树结构,这也许可以小到足以过程在内存中,而不会造成太大的破坏。

Alternately I know that there are some elements that the XPath queries will not run across, so I guess I could break the document up into a series of smaller fragments based on it's original tree structure, which could perhaps be small enough to process in memory without causing too much havoc.

我试着在这里解释一下我的目标,所以如果我吠叫起来完全错了的一般方法方面,我敢肯定,你的乡亲可以设置我的权利......

I've tried to explain my objective here so if I'm barking up totally the wrong tree in terms of general approach I'm sure you folks can set me right...

推荐答案

XPathReader就是答案。它不是C#的运行时的一部分,但它是可从微软下载。下面是一个 MSDN文章。

XPathReader is the answer. It isn't part of the C# runtime, but it is available for download from Microsoft. Here is an MSDN article.

如果您构建一个XPathReader有一个XmlTextReader你会得到一个流读取与XPath的前pressions方便的效率。

If you construct an XPathReader with an XmlTextReader you get the efficiency of a streaming read with the convenience of XPath expressions.

我没有用它的千兆字节大小的文件,但我已经用它的是几十兆的文件,这是通常足以减缓基于DOM的解决方案。

I haven't used it on gigabyte sized files, but I have used it on files that are tens of megabytes, which is usually enough to slow down DOM based solutions.

从下面的引述道:XPathReader提供以流方式执行的XPath在XML文档的能力。

Quoting from the below: "The XPathReader provides the ability to perform XPath over XML documents in a streaming manner".

从Microsoft下载

阅读全文

相关推荐

最新文章