Association Analysis Using Apriori Algorithm of GANs-Expanded Student Performance Dataset

Rannie M. Sumacot

doi:10.21512/comtech.v15i2.11948

Authors

Rannie M. Sumacot Southern Leyte State University

DOI:

https://doi.org/10.21512/comtech.v15i2.11948

Keywords:

association analysis, Apriori algorithm, Generative Adversarial Networks (GANs), student performance dataset

Abstract

Traditional datasets are often limited, which can affect the accuracy of analyses. Additionally, the use of studentsâ€™ real data raises privacy concerns. Generative Adversarial Networks (GANs) offer a solution by generating synthetic data that closely mirrors real-world data without compromising sensitive information. The research explored the application of GANs to enhance student performance datasets by addressing challenges related to data scarcity and privacy in educational research. In the research, GANs were utilized to generate synthetic student performance data. The accuracy of the data was assessed using Mean Absolute Percentage Error (MAPE), with values ranging from 0.004% to 19.92% across various statistical measures and means. These results demonstrated the reliability of the synthetic data, making it suitable for further analysis. The synthetic datasets were then analyzed using the Apriori algorithm, a well-known method in data mining for discovering significant patterns and relationships. A lower bound minimum support of 0.1 (10%) and a minimum confidence threshold of 0.6 (60%) were applied, ensuring the identification of meaningful associations. The analysis reveals important patterns and relationships among student attributes and behaviors. The research highlights the potential of GANs to advance data-driven educational research. By generating high-quality synthetic data, GANs allow researchers to conduct comprehensive analyses while addressing privacy concerns. The research contributes to the methodological approach to data augmentation in education, offering new opportunities for ethical and robust research.

Dimensions

Plum Analytics

Author Biography

Rannie M. Sumacot, Southern Leyte State University

Department of Public Administration, Faculty of Governance and Development Studies

References

Arvidsson, J. (2023). Students performance. Kaggle. https://www.kaggle.com/datasets/joebeachcapital/students-performance/data

Choi, E., Biswal, S., Malin, B., Duke, J., Stewart, W. F., & Sun, J. (2017). Generating multi-label discrete patient records using generative adversarial networks. In Machine Learning for Healthcare Conference (pp. 286â€“305). PMLR. https://doi.org/10.48550/arXiv.1703.06490

Dino, L. (2022, May 1). Association mining â€” Support, association rules, and confidence. Medium. https://medium.com/@24littledino/association-mining-support-association-rules-and-confidence-60132a37e355

Figueira, A., & Vaz, B. (2022). Survey on synthetic data generation, evaluation methods and GANs. Mathematics, 10(15), 1â€“41. https://doi.org/10.3390/math10152733

Gan, D., Numtong, K., Li, H., & Jiang, S. (2024). Exploring the application of the Apriori algorithm in knowledge mining for linguistic data within Chinese studies. Eurasian Journal of Applied Linguistics, 10(1), 279â€“298.

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In Advances in Neural Information Processing Systems 27 (NIPS 2014).

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A. C., & Bengio, Y. (2020). Generative adversarial networks. Communications of the ACM, 63(11), 139â€“144. https://doi.org/10.1145/3422622

Google. (n.d.). Frequently asked questions. https://research.google.com/colaboratory/faq.html

Goyal, M., & Mahmoud, Q. H. (2024). A systematic review of synthetic data generation techniques using generative AI. Electronics, 13(17), 1â€“38. https://doi.org/10.3390/electronics13173509

Karras, T., Aittala, M., Hellsten, J., Laine, S., Lehtinen, J., & Aila, T. (2020). Training generative adversarial networks with limited data. Advances in Neural Information Processing Systems 33 (NeurIPS 2020) (pp. 12104â€“12114).

Li, Z., Li, X., Tang, R., & Zhang, L. (2021). Apriori algorithm for the data mining of global cyberspace security issues for human participatory based on association rules. Frontiers in Psychology, 11, 1â€“12. https://doi.org/10.3389/fpsyg.2020.582480

Liu, Z., Luo, P., Wang, X., & Tang, X. (2015). Deep learning face attributes in the wild. In Proceedings of the IEEE International Conference on Computer Vision (pp. 3730â€“3738).

Liu, R., Wei, J., Liu, F., Si, C., Zhang, Y., Rao, J., ... & Dai, A. M. (2024). Best practices and lessons learned on synthetic data. In First Conference on Language Modeling.

Ouassif, K., & Ziani, B. (2024). Predicting university major selection and academic performance through the combination of Apriori algorithm and deep neural network. Education and Information Technologies. https://doi.org/10.1007/s10639-024-13022-1

Pan, Z., Yu, W., Yi, X., Khan, A., Yuan, F., & Zheng, Y. (2019). Recent progress on Generative Adversarial Networks (GANs): A survey. IEEE Access, 7, 36322â€“36333. https://doi.org/10.1109/ACCESS.2019.2905015

Ramzan, F., Sartori, C., Consoli, S., & Reforgiato Recupero, D. (2024). Generative adversarial networks for synthetic data generation in finance: Evaluating statistical similarities and quality assessment. AI, 5(2), 667â€“685. https://doi.org/10.3390/ai5020035

Rather, I. H., Kumar, S. (2024). Generative adversarial network based synthetic data training model for lightweight convolutional neural networks. Multimedia Tools and Applications, 83, 6249â€“6271. https://doi.org/10.1007/s11042-023-15747-6

Sarker, I. H. (2021). Machine learning: Algorithms, real-world applications and research directions. SN Computer Science, 2, 1â€“21. https://doi.org/10.1007/s42979-021-00592-x

Shahul Hameed, M. A., Qureshi, A. M., & Kaushik, A. (2024). Bias mitigation via synthetic data generation: A review. Electronics, 13(19), 1â€“14. https://doi.org/10.3390/electronics13193909

Tan, H. M., Minh, L. G., Minh, T. C., Quyen, T. T. B., & Cao-Van, K. (2024). Comparing LSTM models for stock market prediction: A case study with Appleâ€™s historical prices. In Nature of Computation and Communication (ICTCC 2023). Springer. https://doi.org/10.1007/978-3-031-59462-5_12

Utkarsh. (2023, May 16). Weka in data mining. Scaler. https://www.scaler.com/topics/data-mining-tutorial/weka-tool-in-data-mining/

Wang, H., & Yeung, D. Y. (2016). Towards Bayesian deep learning: A framework and some existing methods. IEEE Transactions on Knowledge and Data Engineering, 28(12), 3395â€“3408.

Wu, W. T., Li, Y. J., Feng, A. Z., Li, L., Huang, T., Xu, A. D., & Lyu, J. (2021). Data mining in clinical big data: The frequently used databases, steps, and methodological models. Military Medical Research, 8, 1â€“12. https://doi.org/10.1186/s40779-021-00338-z

Ye, F. (2020). Research and application of improved Apriori algorithm based on hash technology. In 2020 Asia-Pacific Conference on Image Processing (IPEC) (pp. 64â€“67). IEEE. https://doi.org/10.1109/IPEC49694.2020.9115141