Multiple Classifier System for Handling Imbalanced and Overlapping Datasets on Multiclass Classification

Authors

  • Dessy Siahaan IPB University
  • Anwar Fitrianto IPB University
  • Khairil Anwar Notodiputro IPB University

DOI:

https://doi.org/10.21512/comtech.v15i1.11295

Keywords:

Multiple Classifier System (MCS), imbalanced datasets, overlapping datasets, multiclass classification

Abstract

The performance of classification models suffer when the dataset contains imbalanced and overlapping data. These two conditions are already challenging separately and even more complex if they occur together. In the research, an ensemble method called a Multiple Classifier System was proposed to address these issues by combining K-Nearest Neighbour and Logistic Regression. The Synthetic Minority Oversampling Technique (SMOTE) method was also applied to balance the dataset. The One Versus One (OVO) decomposition technique helped the multiclass classification process. A simulation with 18 scenarios proves that the MCS-SMOTE model can handle these problems by providing good performance. The model’s performance is also tested using empirical data on Poverty in West Java in 2021. Empirical data also show that the proposed method performs well, with an accuracy rate of 80.09%, an F1 score of 0.782, and a G-Mean of 0.242. The areas with the highest poverty rates are Bogor, Bekasi City, Bandung City, Bekasi Regency, and Depok City, located near DKI Jakarta, the capital city. Based on existing predictor variables, poor households in West Java are more likely to occur when they do not have access to credit, the number of household members is more than three, multiple families live in one building, and the head of the household has not graduated from elementary school.

Dimensions

Plum Analytics

Author Biographies

Dessy Siahaan, IPB University

Statistics and Data Science, Department of Statistics, Faculty of Mathematics and Natural Sciences

Anwar Fitrianto, IPB University

Statistics and Data Science, Department of Statistics, Faculty of Mathematics and Natural Sciences

Khairil Anwar Notodiputro, IPB University

Statistics and Data Science, Department of Statistics, Faculty of Mathematics and Natural Sciences

References

Aldania, A. N. A., Soleh, A. M., & Notodiputro, K. A. (2023). A comparative study of CatBoost and double random forest for multi-class classification. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), 7(1), 129–137. https://doi.org/10.29207/resti.v7i1.4766

Badan Pusat Statistik Kabupaten Pesisir Selatan. (2023). Garis kemiskinan (Rupiah), 2020-2022. https://pesselkab.bps.go.id/indicator/23/96/1/garis-kemiskinan.html

Brereton, R. G. (2021). Contingency tables, confusion matrices, classifiers and quality of prediction. Journal of Chemometrics, 35(11), 1–5. https://doi.org/10.1002/cem.3331

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953

De Diego, I. M., Redondo, A. R., Fernández, R. R., Navarro, J., & Moguerza, J. M. (2022). General performance score for classification problems. Applied Intelligence, 52, 12049–12063. https://doi.org/10.1007/s10489-021-03041-7

Djamaluddin, S. (2017). How to lower the poverty?: Population control and increase of asset ownership. Signifikan: Jurnal Ilmu Ekonomi, 6(2), 267–288. https://doi.org/10.15408/sjie.v6i2.5096

Esteves, V. M. S. (2020). Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. Universidade Do Porto.

Fernández, A., García, S., Galar, M., Prati, R. C., Krawczyk, B., & Herrera, F. (2018). Imbalanced classification with multiple classes. In Learning from imbalanced data sets (pp. 197–226). Springer International Publishing. https://doi.org/10.1007/978-3-319-98074-4_8

Galar, M., Fernández, A., Barrenechea, E., Bustince, H., & Herrera, F. (2011). An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes. Pattern Recognition, 44(8), 1761–1776. https://doi.org/10.1016/j.patcog.2011.01.017

Ishak, N. A., Ng, K. H., Tong, G. K., Kalid, S. N., & Khor, K. C. (2022). Mitigating unbalanced and overlapped classes in credit card fraud data with enhanced stacking classifiers system. F1000Research, 11, 11–71.

Kalid, S. N., Ng, K. H., Tong, G. K., & Khor, K. C. (2020). A multiple classifiers system for anomaly detection in credit card data with unbalanced and overlapped classes. IEEE Access, 8, 28210–28221. https://doi.org/10.1109/ACCESS.2020.2972009

Lango, M., & Stefanowski, J. (2022). What makes multi-class imbalanced problems difficult? An experimental study. Expert Systems with Applications, 199. https://doi.org/10.1016/j.eswa.2022.116962

Meidianingsih, Q., & Meganingtyas, D. E. W. (2022). Analisis perbandingan performa metode ensemble dalam menangani imbalanced multi-class classification. Jurnal Aplikasi Statistika dan Komputasi Statistik, 14(1), 13–21. https://doi.org/10.34123/jurnalasks.v14i2.335

Ongko, E., & Hartono. (2021). Hybrid approach redefinition-multi class with resampling and feature selection for multi-class imbalance with overlapping and noise. Bulletin of Electrical Engineering and Informatics, 10(3), 1718–1728. https://doi.org/10.11591/eei.v10i3.3057

Pensasaran Percepatan Penghapusan Kemiskinan Ekstrem. (2022). Tanya/jawab layanan data P3KE. Kementerian Koordinator Bidang Pembangunan Manusia dan Kebudayaan. https://p3ke.kemenkopmk.go.id/tanyajawab/

Pradana, Z. H., Nafi’ah, H., & Rochmanto, R. A. (2022). Chatbot-based information service using RASA open-source framework in Prambanan Temple tourism object. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), 6(4), 656–662. https://doi.org/10.29207/resti.v6i4.3913

Rosita, A. A., Kurnia, A., & Djuraidah, A. (2022). Evaluation of ensemble method for multiclass classification on unbalanced data. In AIP Conference Proceedings (Vol. 2662, No. 1). AIP Publishing. https://doi.org/10.1063/5.0108842

Sharma, N., Chakrabarti, A., Balas, V. E., & Martinovic, J. (Eds.). (2021). Data management, analytics and innovation: Proceedings of ICDMAI 2020 (Vol. 2). Springer. https://doi.org/10.1007/978-981-15-5619-7

Tanha, J., Abdi, Y., Samadi, N., Razzaghi, N., & Asadpour, M. (2020). Boosting methods for multi-class imbalanced data classification: An experimental review. Journal of Big Data, 7, 1–47. https://doi.org/10.1186/s40537-020-00349-y

Vuttipittayamongkol, P., Elyan, E., & Petrovski, A. (2021). On the class overlap problem in imbalanced data classification. Knowledge-Based Systems, 212. https://doi.org/10.1016/j.knosys.2020.106631

Downloads

Published

2024-05-27

Issue

Section

Articles
Abstract 83  .
PDF downloaded 62  .