详细信息

自适应特征权重的K-means聚类算法     被引量:10

K-means Clustering Algorithm Based on Adaptive Feature Weighted

文献类型:期刊文献

中文题名:自适应特征权重的K-means聚类算法

英文题名:K-means Clustering Algorithm Based on Adaptive Feature Weighted

作者:李四海[1];满自斌[2]

第一作者:李四海

机构:[1]甘肃中医学院,甘肃兰州730000;[2]兰州理工大学,甘肃兰州730050

第一机构:甘肃中医药大学

年份:2013

卷号:23

期号:6

起止页码:98

中文期刊名:计算机技术与发展

外文期刊名:Computer Technology and Development

收录:CSTPCD

基金:国家自然科学基金资助项目(51069004)

语种:中文

中文关键词:K—means;医学数据聚类;自适应特征权重;聚类评价;混淆矩阵

外文关键词:K-means; medical data clustering; AFW ;cluster evaluation; confusion matrix

摘要:为提高传统K-means聚类算法在医学数据聚类中的准确率和稳定性,提出了一种自适应特征权重的K-means聚类算法AFW-K-means。该算法首先通过计算属性的均方差选取初始聚类中心,然后根据当前的迭代结果,按照类内紧密、类间远离的原则调整属性在距离公式中的特征权重,以便更准确地反映数据点在欧氏空间中的真实距离,最后选取UCI上的BCW乳腺肿瘤等数据集对算法的有效性进行验证。结果表明:算法的准确率和稳定性均明显好于传统K-means算法。
In order to improve the accuracy and stability of traditional K-means algorithm on medical data clustering, proposed an adaptive feature weighted K-means clustering algorithm named AFW-K-means. Firstly, initial clustering center was chosen by calculating mean square deviation of feature attribute. Then,according to the results of each iteration,the feature attribute weight in distance formula is modified based on the principle of minimum-in-cluster-distance and maximum-between-cluster-distance, which can reflect the true distance among the data points in the Euclidean space. Finally, the validity of the proposed approach is demonstrated by the experiment of UCI data set such as Breast Cancer Wisconsin data set. The results showed that the algorithm has higher precision of prediction and better stability than traditional K-means algorithm.

参考文献:

正在载入数据...

版权所有©甘肃中医药大学 重庆维普资讯有限公司 渝B2-20050021-8 
渝公网安备 50019002500408号 违法和不良信息举报中心