检测和解析转义字符＆QUOT; \＆QUOT;从JSON文件？字符、文件、QUOT、JSON

由网友(墨染倾城色)分享简介：我有数据的问题，是一个JSON文件。我使用下面的链接，从谷歌。I am having a problem with data that is a JSON file. I am using the following link, from google.http://www.google.com/finance/c...

我有数据的问题，是一个JSON文件。我使用下面的链接，从谷歌。

I am having a problem with data that is a JSON file. I am using the following link, from google.

http://www.google.com/finance/company_news?q=AAPL&output=json"

当我要分析的数据，并把它放在屏幕上出现我的问题。这些数据是不是被德codeD正确地从某些原因。

My problem occurs when i want to parse the data and putting it on screen. The data is not being decoded properly from some reason.

中的原始数据：

 1.) one which must have set many of the companyx26#39;s board on the edge of their
 2.) Making Less Money From Next x3cbx3e...x3c/bx3e

当我把数据我做到以下几点：

When i bring in the data i do the following:

DefaultHttpClient httpClient = new DefaultHttpClient();
HttpPost httpPost = new HttpPost(url);
HttpResponse httpResponse = httpClient.execute(httpPost);
HttpEntity httpEntity = httpResponse.getEntity();
is = httpEntity.getContent();        
BufferedReader reader = new BufferedReader(new InputStreamReader(
                is, "iso-8859-1"), 8); 
StringBuilder sb = new StringBuilder();
String line = null;
        while ((line = reader.readLine()) != null) {
            sb.append(line + "n");
}
is.close();
json = sb.toString();

输出I接收，使用org.json提取JSON文件中的数据，如下（注意没有反斜杠）：

The Output i receive, using org.json to extract the data from the json file, is the following(notice the lack of backslash):

1.)one which must have set many of the companyx26#39;s board on the edge of their
2.)Making Less Money From Next x3cbx3e...x3c/bx3e

我目前的方法，通过这种处理的第一个问题：

my current method for handling the first problem by this:

JSONRowData.setJTitle((Html.fromHtml((article.getString(TAG_TITLE).replaceAll("x26", "&")))).toString());

第二个脱离了我，但（没有双关语意）

the second one escapes me though(no pun intended)

我认为这并不工作正在后退用于转义字符的原因。香港专业教育学院尝试了许多不同的读取数据的方法，但Ive有没有运气。有没有一种办法可以导入数据来处理这个问题，而无需使用常规EX pressions？

I assume the reason that this doesn't work is being the backlash is used for escape characters. Ive tried many different methods of reading the data in but ive had no luck. Is there a way i can import the data to handle this problem without using regular expressions?

解决方案

我们的克星今天： X26 - ASCII码（16进制数）

Our nemesis today: "x26" -- ASCII (in Hexadecimal Notation)

读取原始数据转换成字符数组。来自Apache commons.io库是一个伟大的方式来做到这一点。一旦你这样做，请阅读字符数组在for循环中寻找，如果你有一个命中，然后寻找X在接下来的排列位置。如果你有一击，然后再采取下一步的两个字符的字符数组中开始。这两个人物是你的ASCII十六进制值。十六进制转换为十进制的形式再投小数为char。把这个角色，并追加到一个字符串生成器。

Read the Raw data into a Char Array. commons.io library from apache is a great way to do this. Once you do this, read the char array in a for loop looking for "", if you have a hit then look for "x" in the next array position. If you have a hit again then take the next two characters in the char array. These two characters are your ASCII hex values. Convert the hex into decimal form then cast the decimal to a char. Take this Character and append it to a string builder.

如果没有匹配（用），则追加的char一个字符串生成器。现在，我们可以调用的ToString（）方法，并把它变成一个字符串。

If there is no match(with "") then append the char to a string builder. We can now call the .toString() method and turn it into a string.

从那里，数据可以包含一些HTML残余（'和/或的在这种情况下）。这种使用Html.fromHtml（）照顾。

From there, the data may contain some HTML remnants(' and/or in this case). Using Html.fromHtml() Took care of this.

推荐答案

这里的问题是，谷歌 - 或者至少网址 - 是的提供的无效的JSON 1,2 。该JSON库，而不是拒绝无效的JSON顾左右而言他，是解析它在一个好了，让我们忽略这个废话，继续的方式。也就是说，它不是的渲染的这是错误的，它是的输入的这是不对的。

The problem here is that google -- or at least that url -- is supplying invalid JSON1,2. The JSON library, while not rejecting the invalid JSON outright, is parsing it in a "well, let's ignore this nonsense and continue" manner. That is, it's not the rendering that is wrong, it is the input which is wrong.

1 它的没有的允许 X 出现在一个字符串（除了如果本身转义）为（如果没有逃过）只能跟着一小套字符（不包括 X ）。转义字符codeS必须由 u1234做，而不是 X12 。

1It is not allowed for x to appear in a string (except if the is itself escaped) as (when not escaped) can only be followed by a small set of characters (which does not include x). Escapes for character codes must be done by u1234 and not x12.

唯一的修复我能想到的是真毛黑客：即读取原始文本，并转换 X12 到 u0012 。（实际上，这不是的是的坏黑客攻击，因为没有上下文相关的东西需要被考虑在内;！不过，应该的没有的要求可耻的是谷歌）

The only "fixes" I can think of are really gross hacks: i.e. read in raw text and convert x12 to u0012. (Actually, it's not that bad of a hack because no context-sensitive stuff needs to be taken into account; however, it should not be required! Shame on Google.)

2 提取的无效的JSON字符串：

2 Extracted invalid JSON string literal:

苹果公司（NASDAQ：AAPL）。股继续领涨大盘科技股中的顶级性能，今年的股票 X26＃39;的价格呈现以下关键事件周一开始没有大动作

"Apple Inc. (NASDAQ:AAPL) shares continued to lead large cap tech stocks in top performance this year. The stockx26#39;s price showed no major move following a key event started Monday."

（为了使这一有效，替换 X26 与 u0026 或＆放大器; ）

(To make this valid, replace x26 with u0026 or &.)

快乐编码 - 好运气：）

Happy coding and -- good luck :)

在Java的酮[未测试]的方法可能是使用常规的前pression（通过 String.replaceAll ）：

In Java one [untested] approach might be to use a regular expression (via String.replaceAll):

inputString.replaceAll("x(d{2})", "u00$1")

阅读全文

相关专题：字符；文件；QUOT ；json ；发布时间：2023-09-06 04:25:11