如何转换HTML<表>到2D阵列阵列、HTML、LT、gt

由网友(我痴痴的盼望你还在乎吗)分享简介:可以说,我复制一个完整的HTML表格(当每 TR 和 D 有额外的属性)为一个字符串。我怎样才能把所有的内容(什么是标签之间),并创建一个组织像原来的表中的二维数组?Lets say I copy a complete HTML table (when each and every tr and td has e...

可以说,我复制一个完整的HTML表格(当每 TR D 有额外的属性) 为一个字符串。我怎样才能把所有的内容(什么是标签之间),并创建一个组织像原来的表中的二维数组?

Lets say I copy a complete HTML table (when each and every tr and td has extra attributes) into a String. How can I take all the contents (what is between the tags) and create an 2D array that is organized like the original table?

例如此表:

<table border="1">
    <tr align= "center">
        <td align="char">TD1</td>
        <td>td1</td>
        <td align="char">TD1</td>
        <td>td1</td>
    </tr>
    <tr>
        <td>TD2</td>
        <td>tD2</td>
        <td class="bold>Td2</td>
        <td>td2</td>
    </tr>
</table>

我想这个数组:

I want this array:

PS:我知道我可以使用正则表达式,但是这将是极其复杂的。我要像JSoup一个工具,可以做所有的自动工作,没有太多的code写

PS: I know I can use regex but it would be extremely complicated. I want a tool like JSoup that can do all the work automatically without much code writing

推荐答案

这是怎么回事,可以使用JSoup完成( srsly,唐'T使用正则表达式对HTML )。

This is how it could be done using JSoup (srsly, don't use regexp for HTML).

Document doc = Jsoup.parse(html);
Elements tables = doc.select("table");
for (Element table : tables) {
    Elements trs = table.select("tr");
    String[][] trtd = new String[trs.size()][];
    for (int i = 0; i < trs.size(); i++) {
        Elements tds = trs.get(i).select("td");
        trtd[i] = new String[tds.size()];
        for (int j = 0; j < tds.size(); j++) {
            trtd[i][j] = tds.get(j).text(); 
        }
    }
    // trtd now contains the desired array for this table
}

此外,属性值不正确关闭在你这里例如:

Also, the class attribute value is not closed properly here in your example:

<td class="bold>Td2</td>

应该是

<td class="bold">Td2</td>
阅读全文

相关推荐

最新文章