过滤项仅出现一次在一个非常大名单大名单

由网友(年少轻狂 你狂我更狂)分享简介:我有一个大名单(超过100万件),其中包含英文单词:I have a large list(over 1,000,000 items), which contains english words:tokens = ["today", "good", "computer", "people", "good", ......

我有一个大名单(超过100万件),其中包含英文单词:

I have a large list(over 1,000,000 items), which contains english words:

tokens = ["today", "good", "computer", "people", "good", ... ]

我想获得一切只发生一次,在列表中的项目

I'd like to get all the items that occurs only once in the list

现在,我使用:

tokens_once = set(word for word in set(tokens) if tokens.count(word) == 1)

但它真的很慢。我怎么能做出这样快?

but it's really slow. how could I make this faster?

推荐答案

您遍历列表,然后你再这样做的每个元素,这使得O(N²)。如果你使用计数器更换你的计数,您通过独特的列表迭代一次在列表中,然后再次元素,这使得它在最坏的情况下,O(2N),即O(N)。

You iterate over a list and then for each element you do it again, which makes it O(N²). If you replace your count by a Counter, you iterate once over the list and then once again over the list of unique elements, which makes it, in the worst case, O(2N), i.e. O(N).

from collections import Counter

tokens = ["today", "good", "computer", "people", "good"]
single_tokens = [k for k, v in Counter(tokens).iteritems() if v == 1 ]
# single_tokens == ['today', 'computer', 'people']
阅读全文

相关推荐

最新文章