如何从净许多HTML文件中读取的XPath值?文件、HTML、XPath

由网友(青春纯属扯淡)分享简介:我有一个文件夹中的大约5000 HTML文件。我需要通过这些循环,开放,抢说使用XPath,关闭,并存储在10个值(SQL Server)的数据库。I have about 5000 html files in a folder. I need to loop through them, open, grab say...

我有一个文件夹中的大约5000 HTML文件。我需要通过这些循环,开放,抢说使用XPath,关闭,并存储在10个值(SQL Server)的数据库。

I have about 5000 html files in a folder. I need to loop through them, open, grab say 10 values using xpath, close, and store in (SQL Server) DB.

什么是做阅读使用.NET中的XPath值的最简单的方法?

What is the easiest way to do read the xpath values using .Net?

的XPath应该是pretty的稳定。

The xpaths should be pretty stable.

请提供例如code读取一个值,说/ HTML /头/标题/文()

Please provide example code to read one value, say /html/head/title/text()

感谢

推荐答案

我想你应该考虑的 HTML敏捷性包的。它是一个HTML解析器,而不是XML解析器,以及用于此任务的更好。如果有任何不与XML被解析,解析器将抛出和异常一致。使用HTML解析器为您提供了更多的回旋余地与输入文件。

I think you should look into the HTML Agility Pack. It is an HTML parser rather than an XML parser, and is better for this task. If there is anything that doesn't agree with the XML being parsed then the parser will throw and exception. Using an HTML parser gives you a bit more leeway with the input files.

示例,演示如何做一些与所有HREF(链接)属性:

Example showing how to do something with all HREF (link) attributes:

 HtmlDocument doc = new HtmlDocument();
 doc.Load("file.htm");
 foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
 {
    HtmlAttribute att = link["href"];
    att.Value = FixLink(att);
 }

我不靠近一个编译器,但你想要的例子是这样的:

I'm not near a compiler but the example you want is something like:

string title = doc.DocumentNode.SelectSingleNode("//title").InnerText;
阅读全文

相关推荐

最新文章