Widaningrum, Ida, Mustikasari, Dyah, Arifin, Rizal, Tsaqila, Siti Lathifah and Fatmawati, Dwiyunia (2022) Algoritma Term Frequency – Inverse Document Frequency (TF-IDF) dan K-Means Clustering Untuk Menentukan Kategori Dokumen. In: Prosiding Seminar Nasional Sistem Informasi dan Teknologi (SISFOTEK) ke 6 Tahun 2022, Sabtu, 24 September 202, Malang (secara hybrid).
Text
4. cp_Algoritma Term Frequency – Inverse Document Frequency .pdf Download (1MB) |
Abstract
The development of technology is speedy; one of the results is developing documents in research articles. Searching for documents in a repository will take a long time if they are not stored grouped by document category. One way to define document categories is clustering. The usefulness of document clustering, to make it easier to find documents by certain categories. The clustering process uses the Term Frequency - Inverse Document Frequency (TF-IDF) algorithm and K-Means. TF-IDF is used to find document weights, while K-Means is for the clustering process. The test documents or dataset were grouped as many as 93 documents, with various themes and document contents. The K-Means cluster quality assessment process results using the Silhouette score; the optimal number of clusters is 4 clusters. This is obtained by looking at the fluctuation in cluster size and thickness of the silhouette plot.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Uncontrolled Keywords: | document clustering, characteristics or categories, python, term frequency-inverse document frequency (tf-idf). |
Subjects: | T Technology > T Technology (General) |
Divisions: | Faculty of Engineering |
Depositing User: | Library Umpo |
Date Deposited: | 20 Sep 2023 02:29 |
Last Modified: | 20 Sep 2023 02:29 |
URI: | http://eprints.umpo.ac.id/id/eprint/12869 |
Actions (login required)
View Item |