选择在K均值聚类数均值、聚类数

由网友(奋斗是因为年轻)分享简介:我要群集数据的大样本,并为其我使用k表示函数在MATLAB。的问题是,它返回与所有的数据排序簇予指定的数量的矩阵。 I want to cluster a large sample of data and for it I am using k means function in MATLAB. The proble...

我要群集数据的大样本,并为其我使用k表示函数在MATLAB。的问题是,它返回与所有的数据排序簇予指定的数量的矩阵。

I want to cluster a large sample of data and for it I am using k means function in MATLAB. The problem is that it returns a matrix with all the data sorted in the number of clusters I specify.

如何才能知道哪些群集的数量是最优的。

How can I know which number of clusters is optimal.

我想,如果我会得到相同数量的每个集群的元素,这将是最佳的,但是这不会发生。相反,它可以继续聚类数据的任何号码,我把。

I thought that if I would get the equal number of elements in each cluster that would be optimal but this never happens. Rather it can go on clustering the data for any number I put.

请帮忙...

推荐答案

我看了,我认为回答这个可能是: - 在k均值,我们正在努力根据手段对数据进行分区的数据均所以理论上我们最好数据集将是其中每个分区具有相等数目的数据。

I read and I think an answer to this could be :- In kmeans we are trying to partition the data according to the means as the data comes so theoretically our best dataset would be where each partition has equal number of data.

我使用k均值++这是一个比k均值更好的算法,因为它不初始化的随机值,然后遍历分区数直到分区的大小几乎相等。这是一个大约的数字是说3我得到2180,729,1219和4我越来越30,2422,1556,120所以我选择了3作为我最后的答案............

I used kmeans++ which was a better algorithm than kmeans because it does not initialise a random value and then iterated over the number of partitions till the sizes of partitions were almost equal. This was an approximate figure as say for 3 i got 2180,729,1219 and for 4 i was getting 30,2422, 1556,120 so I chose 3 as my final answer............

阅读全文

相关推荐

最新文章