计数特定词的频率在文本文件文本文件、频率

由网友(╰掉毛の天使)分享简介:我有一个字符串变量保存的文本文件。该文本文件处理,使其只包含小写字和间隔。现在,说我有一个静态的词典,这是具体的话只是一个名单,我想算,从文本文件中,在字典中的每个单词的频率。例如:文本文件:爱我所爱VB开发的IMA虽然总的新手字典:爱情,开发,消防,石头我想看到的输出类似于下面的东西,列出两个字典中的单词和计数。如果...

我有一个字符串变量保存的文本文件。该文本文件处理,使其只包含小写字和间隔。现在,说我有一个静态的词典,这是具体的话只是一个名单,我想算,从文本文件中,在字典中的每个单词的频率。例如:

 文本文件:

爱我所爱VB开发的IMA虽然总的新手

字典:

爱情,开发,消防,石头
 

我想看到的输出类似于下面的东西,列出两个字典中的单词和计数。如果它使编码更简单,它也可以仅列出字典字中出现的文本。

  ===========

字数

爱,2

发展,1

火,0

石,0

============
 

使用正则表达式(例如, w +)我可以得到所有的词匹配,但我不知道怎么去说也都在字典中的数,所以我坚持。效率是至关重要的,因为这里的字典是相当大的(〜10万字)和文本文件不小任(〜200KB每一个)。

我AP preciate任何形式的帮助。

解决方案

  VAR字典=新字典<字符串,INT>();

的foreach(文件VAR字)
  如果(dict.ContainsKey(字))
    字典[文字] ++;
  其他
    字典[文字] = 1;
 
文本文件单词的检索与计数 实验准备

I have a text file stored as a string variable. The text file is processed so that it only contains lowercase words and spaces. Now, say I have a static dictionary, which is just a list of specific words, and I want to count, from within the text file, the frequency of each word in the dictionary. For example:

Text file:

i love love vb development although i m a total newbie

Dictionary:

love, development, fire, stone

The output I'd like to see is something like the following, listing both the dictionary word and its count. If it makes coding simpler, it can also only list the dictionary word that appeared in the text.

===========

WORD, COUNT

love, 2

development, 1

fire, 0

stone, 0

============

Using a regex (eg "w+") I can get all the word matches, but I have no clue how to get the counts that are also in the dictionary, so I'm stuck. Efficiency is crucial here since the dictionary is quite large (~100,000 words) and the text files are not small either (~200kb each).

I appreciate any kind help.

解决方案

var dict = new Dictionary<string, int>();

foreach (var word in file)
  if (dict.ContainsKey(word))
    dict[word]++;
  else
    dict[word] = 1;

阅读全文

相关推荐

最新文章