查找字符的长流的话。自动标记化的话、标记、字符

由网友(╰我有什么资格生气°)分享简介:你会如何找到正确的词语字符的长流?How would you find the correct words in a long stream of characters?输入:"The revised report onthesyntactictheoriesofsequentialcontrolandstate...

你会如何找到正确的词语字符的长流?

How would you find the correct words in a long stream of characters?

输入:

"The revised report onthesyntactictheoriesofsequentialcontrolandstate"

谷歌的输出:

Google's Output:

"The revised report on syntactic theories sequential controlandstate"

(这是足够接近考虑它们产生的输出的时间)

(which is close enough considering the time that they produced the output)

您如何看待谷歌不是吗? 你将如何提高精度?

How do you think Google does it? How would you increase the accuracy?

推荐答案

我会尝试递归算法是这样的:

I would try a recursive algorithm like this:

尝试在每个位置插入空格。如果左侧部分是一个字,然后复发在右边部分。 计数在所有的最终输出的有效字/总字的数目的数目。一个最好的比例是有可能的答案。

例如,给它thesentenceisgood将运行:

For example, giving it "thesentenceisgood" would run:

thesentenceisgood
the sentenceisgood
    sent enceisgood
         enceisgood: OUT1: the sent enceisgood, 2/3
    sentence isgood
             is good
                go od: OUT2: the sentence is go od, 4/5
             is good: OUT3: the sentence is good, 4/4
    sentenceisgood: OUT4: the sentenceisgood, 1/2
these ntenceisgood
      ntenceisgood: OUT5: these ntenceisgood, 1/2

所以,你会选择OUT3作为答案。

So you would pick OUT3 as the answer.

阅读全文

相关推荐

最新文章