End-to-End Steering Angle Prediction for Autonomous Car Using Vision Transformer


  • Ilvico Sonata Bina Nusantara University
  • Yaya Heryadi Bina Nusantara University
  • Antoni Wibowo Bina Nusantara University
  • Widodo Budiharto Bina Nusantara University




Steering Angle Prediction, Autonomous Car, Vision Transformer (ViT)


The development of autonomous cars is currently increasing along with the need for safe and comfortable autonomous cars. The development of autonomous cars cannot be separated from the use of deep learning to determine the steering angle of an autonomous car according to the road conditions it faces. In the research, a Vision Transformer (ViT) model is proposed to determine the steering angle based on images taken using a front-facing camera on an autonomous car. The dataset used to train ViT is a public dataset. The dataset is taken from streets around Rancho Palos Verdes and San Pedro, California. The number of images is 45,560, which are labeled with the steering angle value for each image. The proposed model can predict steering angle well. Then, the steering angle prediction results are compared using the same dataset with existing models. The experimental results show that the proposed model has better accuracy regarding the resulting MSE value of 2,991 compared to the CNN-based model of 5,358 and the CNN-LSTM combination model of 4,065. From the results of this experiment, the ViT model can replace the existing model, namely the CNN model and the combination model between CNN and LSTM, in predicting the steering angle of an autonomous car.


Plum Analytics

Author Biographies

Ilvico Sonata, Bina Nusantara University

Computer Science Department, BINUS Graduate Program - Doctor of Computer Science

Yaya Heryadi, Bina Nusantara University

Computer Science Department, BINUS Graduate Program - Doctor of Computer Science

Antoni Wibowo, Bina Nusantara University

Computer Science Department, BINUS Graduate Program - Doctor of Computer Science

Widodo Budiharto, Bina Nusantara University

Computer Science Department, School of Computer Science


P. Penmetsa, E. K. Adanu, D. Wood, T. Wang, and S. L. Jones, “Perceptions and expectations of autonomous vehicles–A snapshot of vulnerable road user opinion,” Technological Forecasting and Social Change, vol. 143, pp. 9–13, 2019.

T. Sawabe, M. Kanbara, and N. Hagita, “Comfort intelligence for autonomous vehicles,” in 2018 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct). Munich, Germany: IEEE, Oct. 16–20, 2018, pp. 350–353.

U. M. Gidado, H. Chiroma, N. Aljojo, S. Abubakar, S. I. Popoola, and M. A. Al-Garadi, “A survey on deep learning for steering angle prediction in autonomous vehicles,” IEEE Access, vol. 8, pp. 163 797–163 817, 2020.

S. Kuutti, R. Bowden, Y. Jin, P. Barber, and S. Fallah, “A survey of deep learning applications to autonomous vehicle control,” IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 2, pp. 712–733, 2020.

M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang, X. Zhang, J. Zhao, and K. Zieba, “End to end learning for self-driving cars,” 2016. [Online]. Available: https://arxiv.org/abs/1604.07316

H. Zhang, J. Bosch, and H. H. Olsson, “Endto-end federated learning for autonomous driving vehicles,” in 2021 International Joint Conference on Neural Networks (IJCNN). Shenzhen, China: IEEE, July 18–22, 2021, pp. 1–8.

S. Lade, P. Shrivastav, S. Waghmare, S. Hon, S. Waghmode, and S. Teli, “Simulation of self driving car using deep learning,” in 2021 International Conference on Emerging Smart Computing and Informatics (ESCI). Pune, India: IEEE, March 5–7, 2021, pp. 175–180.

Y. Zhao and Y. Chen, “End-to-end autonomous driving based on the convolution neural network model,” in 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). Lanzhou, China: IEEE, Nov. 18–21, 2019, pp. 419–423.

C. Yang, W. Jiang, and Z. Guo, “Time series data classification based on dual path CNN-RNN cascade network,” IEEE Access, vol. 7, pp. 155 304–155 312, 2019.

H. Zhang, H. Lu, and A. Nayak, “Periodic time series data analysis by deep learning methodology,” IEEE Access, vol. 8, pp. 223 078–223 088, 2020.

M.-j. Lee and Y.-g. Ha, “Autonomous driving control using end-to-end deep learning,” in 2020 IEEE International Conference on Big Data and Smart Computing (BigComp). Busan, Korea (South): IEEE, Feb. 19–22 2020, pp. 470–473.

H. Jiang, L. Chang, Q. Li, and D. Chen, “Deep transfer learning enable end-to-end steering angles prediction for self-driving car,” in 2020 IEEE Intelligent Vehicles Symposium (IV). Las Vegas, NV, USA: IEEE, Oct. 19–Nov. 13, 2020, pp. 405–412.

Z. Liu, K. Wang, J. Yu, and J. He, “End-to-end control of autonomous vehicles based on deep learning with visual attention,” in 2020 4th CAA International Conference on Vehicular Control and Intelligence (CVCI). Hangzhou, China: IEEE, Dec. 18–20, 2020, pp. 584–589.

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc., 2017, pp. 1–11.

T. J. Sefara, S. G. Zwane, N. Gama, H. Sibisi, P. N. Senoamadi, and V. Marivate, “Transformerbased machine translation for low-resourced languages embedded with language identification,” in 2021 Conference on Information Communications Technology and Society (ICTAS). Durban, South Africa: IEEE, March 10–11, 2021, pp. 127–132.

A. Tjandra, S. Sakti, and S. Nakamura, “Speechto-speech translation between untranscribed unknown languages,” in 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). Singapore: IEEE, Dec. 14–18, 2019, pp. 593–600.

C. Tho, Y. Heryadi, I. H. Kartowisastro, and W. Budiharto, “A comparison of lexicon-based and transformer-based sentiment analysis on code-mixed of low-resource languages,” in 2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI), vol. 1. Jakarta, Indonesia: IEEE, Oct. 28, 2021, pp. 81–85.

K. Pipalia, R. Bhadja, and M. Shukla, “Comparative analysis of different transformer based architectures used in sentiment analysis,” in 2020 9th International Conference System Modeling and Advancement in Research Trends (SMART). Moradabad, India: IEEE, Dec. 4–5, 2020, pp. 411–415.

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16×16 words: Transformers for image recognition at scale,” 2020. [Online]. Available: http://arxiv.org/abs/2010.11929

T. Panboonyuen, S. Thongbai, W. Wongweeranimit, P. Santitamnont, K. Suphan, and C. Charoenphon, “Object detection of road assets using transformer-based YOLOX with feature pyramid decoder on Thai highway panorama,” Information, vol. 13, no. 1, pp. 1–12, 2021.

Z. Zhao, X. Wu, and H. Liu, “Vision transformer for quality identification of sesame oil with stereoscopic fluorescence spectrum image,” LWT, vol. 158, pp. 1–9, 2022.

Y. Bazi, L. Bashmal, M. M. A. Rahhal, R. A. Dayil, and N. A. Ajlan, “Vision transformers for remote sensing image classification,” Remote Sensing, vol. 13, no. 3, pp. 1–19, 2021.

R. Atienza, “Vision transformer for fast and efficient scene text recognition,” in International Conference on Document Analysis and Recognition. Lausanne, Switzerland: Springer, Sept. 5–10, 2021, pp. 319–334.

M. Zeineldeen, A. Zeyer, R. Schl¨uter, and H. Ney, “Layer-normalized LSTM for hybrid-HMM and end-to-end ASR,” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Barcelona, Spain: IEEE, May 4–8, 2020, pp. 7679–7683.

A. Mikołajczyk and M. Grochowski, “Data augmentation for improving deep learning in image classification problem,” in 2018 International Interdisciplinary PhD Workshop (IIPhDW). Poland: IEEE, May 9–12, 2018, pp. 117–122.

S. Narayan and G. Tagliarini, “An analysis of underfitting in MLP networks,” in Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., vol. 2. Montreal, QC, Canada: IEEE, July 31–Aug. 4, 2005, pp. 984–988.

H. Li, J. Li, X. Guan, B. Liang, Y. Lai, and X. Luo, “Research on overfitting of deep learning,” in 2019 15th International Conference on Computational Intelligence and Security (CIS). Macao, China: IEEE, Dec. 13–16, 2019, pp. 78–81.

J. Kolluri, V. K. Kotte, M. S. B. Phridviraj, and S. Razia, “Reducing overfitting problem in machine learning using novel L1/4 regularization method,” in 2020 4th International Conference on Trends in Electronics and Informatics (ICOEI)(48184). Tirunelveli, India: IEEE, June 15–17, 2020, pp. 934–938.

J. Y. C. Chen and J. E. Thropp, “Review of low frame rate effects on human performance,” IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, vol. 37, no. 6, pp. 1063–1076, 2007.

K. Gauen, R. Dailey, J. Laiman, Y. Zi, N. Asokan, Y. H. Lu, G. K. Thiruvathukal, M. L. Shyu, and S. C. Chen, “Comparison of visual datasets for machine learning,” in 2017 IEEE International Conference on Information Reuse and Integration (IRI). San Diego, CA, USA: IEEE, Aug. 4–6, 2017, pp. 346–355.



Abstract 307  .
PDF downloaded 254  .