有效的方法来计算数据集之间的相似性比例相似性、方法来、比例、有效

由网友(再见,我的少年)分享简介:我目前正在与用户对象 - 每一个有许多目标对象。该目标对象不特定的用户,即用户可以共享同一个目标。我试图以时尚的方式来计算两个用户...之间的相似性百分比(即,考虑到有多少目标,他们分享,以及有多少目标,他们不同意)有没有人有这种经验情况?我使用的Grails与MySQL,如果这是有帮助的。I am currentl...

我目前正在与用户对象 - 每一个有许多目标对象。该目标对象不特定的用户,即用户可以共享同一个目标。我试图以时尚的方式来计算两个用户...之间的相似性百分比(即,考虑到有多少目标,他们分享,以及有多少目标,他们不同意)有没有人有这种经验情况?我使用的Grails与MySQL,如果这是有帮助的。

I am currently working with User objects -- each of which have many Goal objects. The Goal objects are not User specific, that is, Users can share the same Goal. I am attempting to fashion a way to calculate a "similarity percentage" between two Users... (i.e., taking into account how many Goals they share as well as how many Goals they do not share) Does anyone have experience with this type of situation? I am using Grails with Mysql if that is helpful.

感谢

推荐答案

标准的方式做,这是杰卡德相似。如果A是一组第一用户的目标和B是集合的第二用户的目标,杰卡德相似度是:

The standard way to do this is the Jaccard similarity. If A is the set of goals of the first user and B is the set of goals of the second user, the Jaccard similarity is:

#(A intersect B)/#(A union B)

这是目标,他们通过投票总数两人有共同居住划分数(计算的目标,它们的份额只有一次)。因此,如果第一个用户都有进球A = {1,2,3},第二用户的目标B = {2,4}则是这样的:

This is the number of goals they share divided by the total number of votes the two have together (counting goals that they share only once). So if the first user has goals A={1,2,3} and the second user has goals B={2,4} then it is this:

A intersect B = {2}
A union B = {1,2,3,4}

#(A intersect B)/#(A union B) = 1/4

该杰卡德相似性总是在0(他们共享没有目标)和1(它们具有相同的目标),这样你就可以用它乘以100得到的百分比。

The Jaccard similarity is always between 0 (they share no goals) and 1 (they have the same goals), so you can get a percentage by multiplying it by 100.

http://en.wikipedia.org/wiki/Jaccard_index

阅读全文

相关推荐

最新文章