Analyzing the Effects of Combining Gradient Conflict Mitigation Methods in Multi-Task Learning
DOI:
https://doi.org/10.21512/commit.v18i1.8905Keywords:
Gradient Conflict Mitigation Methods, Multi-Task Learning, Project Conflicting Gradients (PCGrad), Modulation Module, Language-Specific Subnetworks (LaSS)Abstract
Multi-task machine learning approaches involve training a single model on multiple tasks at once to increase performance and efficiency over multiple singletask models trained individually on each task. When such a multi-task model is trained to perform multiple unrelated tasks, performance can degrade significantly since unrelated tasks often have gradients that vary widely in direction. These conflicting gradients may destructively interfere with each other, causing weights learned during the training of some tasks to become unlearned during the training of others. The research selects three existing methods to mitigate this problem: Project Conflicting Gradients (PCGrad), Modulation Module, and Language-Specific Subnetworks (LaSS). It explores how the application of different combinations of these methods affects the performance of a convolutional neural network on a multi-task image classification problem. The image classification problem used as a benchmark utilizes a dataset of 4,503 leaf images to create two separate tasks: the classification of plants and the detection of disease from leaf images. Experiment results on this problem show performance benefits over singular mitigation methods, with a combination of PCGrad and LaSS obtaining a task-averaged F1 score of 0.84686. This combination outperforms individual mitigation approaches by 0.01870, 0.02682, and 0.02434 for PCGrad, Modulation Module, and LaSS, respectively in terms of F1 score.
Plum Analytics
References
Y. Zhang and Q. Yang, “A survey on multi-task learning,” IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 12, pp. 5586–5609, 2021.
R. Caruana, “Multitask learning,” Machine Learning, vol. 28, pp. 41–75, 1997.
D. S. Chauhan, S. R. Dhanush, A. Ekbal, and P. Bhattacharyya, “All-in-one: A deep attentive multi-task learning framework for humour, sarcasm, offensive, motivation, and sentiment on memes,” in Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, Suzhou, China, 2020, pp. 281–290.
V. Davoodnia and A. Etemad, “Identity and posture recognition in smart beds with deep multitask learning,” in 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC). Bari, Italy: IEEE, Oct. 6–9, 2019, pp. 3054–3059.
M. Pisov, G. Makarchuk, V. Kostjuchenko, A. Dalechina, A. Golanov, and M. Belyaev, “Brain tumor image retrieval via multitask learning,” 2018. [Online]. Available: https://arxiv.org/abs/1810.09369
A. Kendall, Y. Gal, and R. Cipolla, “Multi-task learning using uncertainty to weigh losses for scene geometry and semantics,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Utah, June 19–21, 2018, pp. 7482–7491.
X. Zhao, H. Li, X. Shen, X. Liang, and Y. Wu, “A modulation module for multi-task learning with applications in image retrieval,” in Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, Sept. 8–14, 2018, pp. 401–416.
S. Sridhar and S. Sanagavarapu, “Fake news detection and analysis using multitask learning with BiLSTM CapsNet model,” in 2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence). Noida, India: IEEE, Jan. 28–29, 2021, pp. 905–911.
Y. Liu, W. M. Sid-Lakhdar, O. Marques, X. Zhu, C. Meng, J. W. Demmel, and X. S. Li, “GPTune: Multitask learning for autotuning exascale applications,” in Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Virtual Event Republic of Korea, Feb. 2021, pp. 234–246.
K. Zhang, L. Wu, Z. Zhu, and J. Deng, “A multitask learning model for traffic flow and speed forecasting,” IEEE Access, vol. 8, pp. 80 707–80 715, 2020.
J. Li, X. Shao, and R. Sun, “A DBN-based deep neural network model with multitask learning for online air quality prediction,” Journal of Control Science and Engineering, vol. 2019, pp. 1–9, 2019.
Y. Li, J. Song, W. Lu, P. Monkam, and Y. Ao, “Multitask learning for super-resolution of seismic velocity model,” IEEE Transactions on Geoscience and Remote Sensing, vol. 59, no. 9, pp. 8022–8033, 2020.
J. Zhang, K. Yan, and Y. Mo, “Multi-task learning for sentiment analysis with hard-sharing and task recognition mechanisms,” Information, vol. 12, no. 5, pp. 1–13, 2021.
F. Tao and C. Busso, “End-to-end audiovisual speech recognition system with multitask learning,” IEEE Transactions on Multimedia, vol. 23, pp. 1–11, 2020.
K. H. Thung and C. Y. Wee, “A brief review on multi-task learning,” Multimedia Tools and Applications, vol. 77, no. 22, pp. 29 705–29 725, 2018.
T. Yu, S. Kumar, A. Gupta, S. Levine, K. Hausman, and C. Finn, “Gradient surgery for multitask learning,” Advances in Neural Information Processing Systems, vol. 33, pp. 5824–5836, 2020.
Z. Lin, L. Wu, M. Wang, and L. Li, “Learning language specific sub-network for multilingual machine translation,” 2021. [Online]. Available: https://arxiv.org/abs/2105.09259
S. S. Chouhan, U. P. Singh, A. Kaul, and S. Jain, “A data repository of leaf images: Practice towards plant conservation with plant pathology,” in 2019 4th International Conference on Information Systems and Computer Networks (ISCON). Mathura, India: IEEE, Nov. 21–22, 2019, pp. 700–707.
W. C. Tseng, “WeiChengTseng/Pytorch-PCGrad,” 2020. [Online]. Available: https://github.com/WeiChengTseng/Pytorch-PCGrad.git
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Richard Alison, Welly Jonathan, Derwin Suhartono
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
a. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License - Share Alike that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
USER RIGHTS
All articles published Open Access will be immediately and permanently free for everyone to read and download. We are continuously working with our author communities to select the best choice of license options, currently being defined for this journal as follows: Creative Commons Attribution-Share Alike (CC BY-SA)