Fuzzy C-Means in Content-Based Document Clustering for Grouping General Websites Based on Their Main Page Contents


Fuzzy C-Means, content-based document clustering, general websites


The research aimed to use Fuzzy C-Means clustering in content-based document clustering to classify general websites based on their content. The data used were a table ranking of the most visited websites for Indonesia, taken from https://dataforseo.com/top-1000-websites/ on September 24th, 2022. The research was conducted with two different cases using Fuzzy C-Means clustering, which had two different iteration parameter values, namely 100 and 200 in maximum iteration. The research results on Fuzzy C-Means clustering in content-based document clustering are based on the two cases. These different maximum iteration parameters result in a different amount of website name data in the cluster. They are formed in the first and second clusters only. However, in the other clusters, the numbers are all the same. The results of the cluster research are validated using the silhouette coefficient, with case no. 1 and no. 2 values being 0,977783879 and 0,977788457. The use of Fuzzy C-Means clustering in content-based document clustering has an excellent performance when this method is applied to group general websites based on their content. With that result, content-based clustering can be also applied in other cases. Hence, the results can be considered to be applied to other cases for content-based clustering in the future.


Author Biographies

Sri Probo Aditiyo, Brawijaya University

Department of Statistics, Mathematics and Natural Science Faculty

Eni Sumarminingsih, Brawijaya University

Department of Statistics, Mathematics and Natural Science Faculty

Rahma Fitriani, Brawijaya University

Department of Statistics, Mathematics and Natural Science Faculty


