Association Analysis Using Apriori Algorithm of GANs-Expanded Student Performance Dataset
DOI:
https://doi.org/10.21512/comtech.v15i2.11948Keywords:
association analysis, Apriori algorithm, Generative Adversarial Networks (GANs), student performance datasetAbstract
Traditional datasets are often limited, which can affect the accuracy of analyses. Additionally, the use of students’ real data raises privacy concerns. Generative Adversarial Networks (GANs) offer a solution by generating synthetic data that closely mirrors real-world data without compromising sensitive information. The research explored the application of GANs to enhance student performance datasets by addressing challenges related to data scarcity and privacy in educational research. In the research, GANs were utilized to generate synthetic student performance data. The accuracy of the data was assessed using Mean Absolute Percentage Error (MAPE), with values ranging from 0.004% to 19.92% across various statistical measures and means. These results demonstrated the reliability of the synthetic data, making it suitable for further analysis. The synthetic datasets were then analyzed using the Apriori algorithm, a well-known method in data mining for discovering significant patterns and relationships. A lower bound minimum support of 0.1 (10%) and a minimum confidence threshold of 0.6 (60%) were applied, ensuring the identification of meaningful associations. The analysis reveals important patterns and relationships among student attributes and behaviors. The research highlights the potential of GANs to advance data-driven educational research. By generating high-quality synthetic data, GANs allow researchers to conduct comprehensive analyses while addressing privacy concerns. The research contributes to the methodological approach to data augmentation in education, offering new opportunities for ethical and robust research.
Plum Analytics
References
Arvidsson, J. (2023). Students performance. Kaggle. https://www.kaggle.com/datasets/joebeachcapital/students-performance/data
Choi, E., Biswal, S., Malin, B., Duke, J., Stewart, W. F., & Sun, J. (2017). Generating multi-label discrete patient records using generative adversarial networks. In Machine Learning for Healthcare Conference (pp. 286–305). PMLR. https://doi.org/10.48550/arXiv.1703.06490
Dino, L. (2022, May 1). Association mining — Support, association rules, and confidence. Medium. https://medium.com/@24littledino/association-mining-support-association-rules-and-confidence-60132a37e355
Figueira, A., & Vaz, B. (2022). Survey on synthetic data generation, evaluation methods and GANs. Mathematics, 10(15), 1–41. https://doi.org/10.3390/math10152733
Gan, D., Numtong, K., Li, H., & Jiang, S. (2024). Exploring the application of the Apriori algorithm in knowledge mining for linguistic data within Chinese studies. Eurasian Journal of Applied Linguistics, 10(1), 279–298.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In Advances in Neural Information Processing Systems 27 (NIPS 2014).
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A. C., & Bengio, Y. (2020). Generative adversarial networks. Communications of the ACM, 63(11), 139–144. https://doi.org/10.1145/3422622
Google. (n.d.). Frequently asked questions. https://research.google.com/colaboratory/faq.html
Goyal, M., & Mahmoud, Q. H. (2024). A systematic review of synthetic data generation techniques using generative AI. Electronics, 13(17), 1–38. https://doi.org/10.3390/electronics13173509
Karras, T., Aittala, M., Hellsten, J., Laine, S., Lehtinen, J., & Aila, T. (2020). Training generative adversarial networks with limited data. Advances in Neural Information Processing Systems 33 (NeurIPS 2020) (pp. 12104–12114).
Li, Z., Li, X., Tang, R., & Zhang, L. (2021). Apriori algorithm for the data mining of global cyberspace security issues for human participatory based on association rules. Frontiers in Psychology, 11, 1–12. https://doi.org/10.3389/fpsyg.2020.582480
Liu, Z., Luo, P., Wang, X., & Tang, X. (2015). Deep learning face attributes in the wild. In Proceedings of the IEEE International Conference on Computer Vision (pp. 3730–3738).
Liu, R., Wei, J., Liu, F., Si, C., Zhang, Y., Rao, J., ... & Dai, A. M. (2024). Best practices and lessons learned on synthetic data. In First Conference on Language Modeling.
Ouassif, K., & Ziani, B. (2024). Predicting university major selection and academic performance through the combination of Apriori algorithm and deep neural network. Education and Information Technologies. https://doi.org/10.1007/s10639-024-13022-1
Pan, Z., Yu, W., Yi, X., Khan, A., Yuan, F., & Zheng, Y. (2019). Recent progress on Generative Adversarial Networks (GANs): A survey. IEEE Access, 7, 36322–36333. https://doi.org/10.1109/ACCESS.2019.2905015
Ramzan, F., Sartori, C., Consoli, S., & Reforgiato Recupero, D. (2024). Generative adversarial networks for synthetic data generation in finance: Evaluating statistical similarities and quality assessment. AI, 5(2), 667–685. https://doi.org/10.3390/ai5020035
Rather, I. H., Kumar, S. (2024). Generative adversarial network based synthetic data training model for lightweight convolutional neural networks. Multimedia Tools and Applications, 83, 6249–6271. https://doi.org/10.1007/s11042-023-15747-6
Sarker, I. H. (2021). Machine learning: Algorithms, real-world applications and research directions. SN Computer Science, 2, 1–21. https://doi.org/10.1007/s42979-021-00592-x
Shahul Hameed, M. A., Qureshi, A. M., & Kaushik, A. (2024). Bias mitigation via synthetic data generation: A review. Electronics, 13(19), 1–14. https://doi.org/10.3390/electronics13193909
Tan, H. M., Minh, L. G., Minh, T. C., Quyen, T. T. B., & Cao-Van, K. (2024). Comparing LSTM models for stock market prediction: A case study with Apple’s historical prices. In Nature of Computation and Communication (ICTCC 2023). Springer. https://doi.org/10.1007/978-3-031-59462-5_12
Utkarsh. (2023, May 16). Weka in data mining. Scaler. https://www.scaler.com/topics/data-mining-tutorial/weka-tool-in-data-mining/
Wang, H., & Yeung, D. Y. (2016). Towards Bayesian deep learning: A framework and some existing methods. IEEE Transactions on Knowledge and Data Engineering, 28(12), 3395–3408.
Wu, W. T., Li, Y. J., Feng, A. Z., Li, L., Huang, T., Xu, A. D., & Lyu, J. (2021). Data mining in clinical big data: The frequently used databases, steps, and methodological models. Military Medical Research, 8, 1–12. https://doi.org/10.1186/s40779-021-00338-z
Ye, F. (2020). Research and application of improved Apriori algorithm based on hash technology. In 2020 Asia-Pacific Conference on Image Processing (IPEC) (pp. 64–67). IEEE. https://doi.org/10.1109/IPEC49694.2020.9115141
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Rannie M. Sumacot
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
a. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License - Share Alike that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
USER RIGHTS
All articles published Open Access will be immediately and permanently free for everyone to read and download. We are continuously working with our author communities to select the best choice of license options, currently being defined for this journal as follows: