Multiple Classifier System for Handling Imbalanced and Overlapping Datasets on Multiclass Classification
DOI:
https://doi.org/10.21512/comtech.v15i1.11295Keywords:
Multiple Classifier System (MCS), imbalanced datasets, overlapping datasets, multiclass classificationAbstract
The performance of classification models suffer when the dataset contains imbalanced and overlapping data. These two conditions are already challenging separately and even more complex if they occur together. In the research, an ensemble method called a Multiple Classifier System was proposed to address these issues by combining K-Nearest Neighbour and Logistic Regression. The Synthetic Minority Oversampling Technique (SMOTE) method was also applied to balance the dataset. The One Versus One (OVO) decomposition technique helped the multiclass classification process. A simulation with 18 scenarios proves that the MCS-SMOTE model can handle these problems by providing good performance. The model’s performance is also tested using empirical data on Poverty in West Java in 2021. Empirical data also show that the proposed method performs well, with an accuracy rate of 80.09%, an F1 score of 0.782, and a G-Mean of 0.242. The areas with the highest poverty rates are Bogor, Bekasi City, Bandung City, Bekasi Regency, and Depok City, located near DKI Jakarta, the capital city. Based on existing predictor variables, poor households in West Java are more likely to occur when they do not have access to credit, the number of household members is more than three, multiple families live in one building, and the head of the household has not graduated from elementary school.
Plum Analytics
References
Aldania, A. N. A., Soleh, A. M., & Notodiputro, K. A. (2023). A comparative study of CatBoost and double random forest for multi-class classification. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), 7(1), 129–137. https://doi.org/10.29207/resti.v7i1.4766
Badan Pusat Statistik Kabupaten Pesisir Selatan. (2023). Garis kemiskinan (Rupiah), 2020-2022. https://pesselkab.bps.go.id/indicator/23/96/1/garis-kemiskinan.html
Brereton, R. G. (2021). Contingency tables, confusion matrices, classifiers and quality of prediction. Journal of Chemometrics, 35(11), 1–5. https://doi.org/10.1002/cem.3331
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953
De Diego, I. M., Redondo, A. R., Fernández, R. R., Navarro, J., & Moguerza, J. M. (2022). General performance score for classification problems. Applied Intelligence, 52, 12049–12063. https://doi.org/10.1007/s10489-021-03041-7
Djamaluddin, S. (2017). How to lower the poverty?: Population control and increase of asset ownership. Signifikan: Jurnal Ilmu Ekonomi, 6(2), 267–288. https://doi.org/10.15408/sjie.v6i2.5096
Esteves, V. M. S. (2020). Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. Universidade Do Porto.
Fernández, A., García, S., Galar, M., Prati, R. C., Krawczyk, B., & Herrera, F. (2018). Imbalanced classification with multiple classes. In Learning from imbalanced data sets (pp. 197–226). Springer International Publishing. https://doi.org/10.1007/978-3-319-98074-4_8
Galar, M., Fernández, A., Barrenechea, E., Bustince, H., & Herrera, F. (2011). An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes. Pattern Recognition, 44(8), 1761–1776. https://doi.org/10.1016/j.patcog.2011.01.017
Ishak, N. A., Ng, K. H., Tong, G. K., Kalid, S. N., & Khor, K. C. (2022). Mitigating unbalanced and overlapped classes in credit card fraud data with enhanced stacking classifiers system. F1000Research, 11, 11–71.
Kalid, S. N., Ng, K. H., Tong, G. K., & Khor, K. C. (2020). A multiple classifiers system for anomaly detection in credit card data with unbalanced and overlapped classes. IEEE Access, 8, 28210–28221. https://doi.org/10.1109/ACCESS.2020.2972009
Lango, M., & Stefanowski, J. (2022). What makes multi-class imbalanced problems difficult? An experimental study. Expert Systems with Applications, 199. https://doi.org/10.1016/j.eswa.2022.116962
Meidianingsih, Q., & Meganingtyas, D. E. W. (2022). Analisis perbandingan performa metode ensemble dalam menangani imbalanced multi-class classification. Jurnal Aplikasi Statistika dan Komputasi Statistik, 14(1), 13–21. https://doi.org/10.34123/jurnalasks.v14i2.335
Ongko, E., & Hartono. (2021). Hybrid approach redefinition-multi class with resampling and feature selection for multi-class imbalance with overlapping and noise. Bulletin of Electrical Engineering and Informatics, 10(3), 1718–1728. https://doi.org/10.11591/eei.v10i3.3057
Pensasaran Percepatan Penghapusan Kemiskinan Ekstrem. (2022). Tanya/jawab layanan data P3KE. Kementerian Koordinator Bidang Pembangunan Manusia dan Kebudayaan. https://p3ke.kemenkopmk.go.id/tanyajawab/
Pradana, Z. H., Nafi’ah, H., & Rochmanto, R. A. (2022). Chatbot-based information service using RASA open-source framework in Prambanan Temple tourism object. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), 6(4), 656–662. https://doi.org/10.29207/resti.v6i4.3913
Rosita, A. A., Kurnia, A., & Djuraidah, A. (2022). Evaluation of ensemble method for multiclass classification on unbalanced data. In AIP Conference Proceedings (Vol. 2662, No. 1). AIP Publishing. https://doi.org/10.1063/5.0108842
Sharma, N., Chakrabarti, A., Balas, V. E., & Martinovic, J. (Eds.). (2021). Data management, analytics and innovation: Proceedings of ICDMAI 2020 (Vol. 2). Springer. https://doi.org/10.1007/978-981-15-5619-7
Tanha, J., Abdi, Y., Samadi, N., Razzaghi, N., & Asadpour, M. (2020). Boosting methods for multi-class imbalanced data classification: An experimental review. Journal of Big Data, 7, 1–47. https://doi.org/10.1186/s40537-020-00349-y
Vuttipittayamongkol, P., Elyan, E., & Petrovski, A. (2021). On the class overlap problem in imbalanced data classification. Knowledge-Based Systems, 212. https://doi.org/10.1016/j.knosys.2020.106631
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Dessy Siahaan, Anwar Fitrianto, Khairil Anwar Notodiputro
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
a. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License - Share Alike that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
USER RIGHTS
All articles published Open Access will be immediately and permanently free for everyone to read and download. We are continuously working with our author communities to select the best choice of license options, currently being defined for this journal as follows: