算法在文本多字匹配多字、算法、文本

由网友(旺仔小裤头)分享简介:我有一个大的组字(约10000),而我需要找到如果有这些词出现在文本中的给定块。I have a large set of words (about 10,000) and I need to find if any of those words appear in a given block of text.有没...

我有一个大的组字(约10000),而我需要找到如果有这些词出现在文本中的给定块。

I have a large set of words (about 10,000) and I need to find if any of those words appear in a given block of text.

有没有更快的算法不是做一个简单的文本搜索每一个词语的文本块?

Is there a faster algorithm than doing a simple text search for each of the words in the block of text?

推荐答案

输入10000字到一个哈希表然后检查每一个词语的文本块,如果其散列有一个条目。

input the 10,000 words into a hashtable then check each of the words in the block of text if its hash has an entry.

更快,虽然我不知道,只是另一种方法(将取决于有多少话要搜索的)。

Faster though I don't know, just another method (would depend on how many words you are searching for).

简单的Perl examp:

simple perl examp:

my $word_block = "the guy went afk after being popped by a brownrabbit";
my %hash = ();
my @words = split /s/, $word_block;
while(<DATA>) { chomp; $hash{$_} = 1; }
foreach $word (@words)
{
    print "found word: $wordn" if exists $hash{$word};
}

__DATA__
afk
lol
brownrabbit
popped
garbage
trash
sitdown
阅读全文

相关推荐

最新文章