Convolutional Neural Network Using Kalman Filter for Human Detection and Tracking on RGB-D Video

Jovin Angelico, Ken Ratri Retno Wardani


The computer ability to detect human being by computer vision is still being improved both in accuracy or computation time. In low-lighting condition,
the detection accuracy is usually low. This research uses additional information, besides RGB channels, namely a depth map that shows objects’ distance relative to the camera. This research integrates Cascade Classifier to localize the potential object, the Convolutional Neural Network (CNN) technique to identify the human and nonhuman image, and the Kalman filter technique to track human movement. For training and testing purposes, there are two kinds of RGB-D datasets used with different points of view and lighting conditions. Both datasets have been selected to remove images which contain a lot of noises and occlusions so that during the training process it will be more directed. Using these integrated techniques, detection and tracking accuracy reach 77.7%. The impact of using Kalman filter increases computation efficiency by 41%.


Convolutional Neural Network, Human Detection, Tracking, RGB-D, Kalman filter

Full Text:



D. Tatarenkov and D. Podolsky, “The human detection in images using the depth map,” in Systems of Signal Synchronization, Generating and Processing in Telecommunications (SINKHROINFO). Kazan, Russia: IEEE, 2017,

pp. 1–4.

N. Sabri, Z. Ibrahim, M. M. Saad, N. N. A. Mangshor, and N. Jamil, “Human detection in video surveillance using texture features,” in 2016 6th IEEE International Conference on Control

System, Computing and Engineering (ICCSCE). IEEE, 2016, pp. 45–50.

Z.-J. Lin, W.-N. Chen, J. Zhang, and J.- J. Li, “Fast multiple human detection with neighborhood-based speciation differential evolution,” in Seventh International Conference on Information Science and Technology (ICIST). Da Nang, Vietnam: IEEE, 2017, pp. 200–207.

T. Jia, Z. Zhou, and H. Gao, “Depth measurement based on infrared coded structured light,” Journal of Sensors, vol. 2014, 2014.

L. Tian, M. Li, G. Zhang, J. Zhao, and Y. Q. Chen, “Robust human detection with super-pixel segmentation and random ferns classification using rgb-d camera,” in 2017 IEEE International Conference on Multimedia and Expo (ICME). Hong Kong, China: IEEE, 2017, pp. 1542–1547.

B. Choi, C. Meric¸li, J. Biswas, and M. Veloso, “Fast human detection for indoor mobile robots using depth images,” in 2013 IEEE International Conference on Robotics and Automation. Karlsruhe, Germany: IEEE, 2013, pp. 1108–1113.

Fudan University, “Clothing store RGBD dataset,” Online. [Online]. Available: upload/tpl/06/f4/


S. Singh and S. C. Gupta, “Human object detection by HoG, HoB, HoC and BO features,” in Fourth International Conference on Parallel, Distributed and Grid Computing (PDGC). Waknaghat,

India: IEEE, 2016, pp. 742–746.

H. K. Ragb and V. K. Asari, “Multi-feature fusion and PCA based approach for efficient human detection,” in IEEE Applied Imagery Pattern Recognition Workshop (AIPR). Washington, DC,

USA: IEEE, 2016, pp. 1–6.

J. Zhao, G. Zhang, L. Tian, and Y. Q. Chen, “Real-time human detection with depth camera via a physical radius-depth detector and a CNN descriptor,” in IEEE International Conference on Multimedia and Expo (ICME). Hong Kong, China: IEEE, 2017, pp. 1536–1541.

V. Sriram, K and H. Havaldar, R, “Human detection and tracking in video surveillance system,” in IEEE International Conference on Computational Intelligence and Computing Research (ICCIC). Chennai, India: IEEE, 2016, pp. 1–3.

L. Yusnita, N. Hadisukmana, R. B. Wahyu, R. Roestam, Y. Wahyu et al., “Implementation of real-time static hand gesture recognition using artificial neural network,” in 2017 4th International Conference on Computer Applications and Information Processing Technology (CAIPT). Kuta

Bali, Indonesia: IEEE, 2017, pp. 1–6.

J. Cristanto and K. R. R. Wardani, “Penerapan metode single-layer feed-forward neural network menggunakan kernal gabor untuk pengenalan ekspresi wajah,” Jurnal Telematika, vol. 12, no. 1, 2017.

S. B. Driss, M. Soua, R. Kachouri, and M. Akil, “A comparison study between MPL and Convolutional Neural Network models for character recognition,” in SPIE Conference on Real- Time Image and Video Processing, Anaheim, CA, United States, 2017., “Cs231n: Convolutional neural networks for visual recognition,” Online, syllabus. [Online]. Available: http://cs231n.

Z. Ren, S. Yang, F. Zou, F. Yang, C. Luan,

and K. Li, “A face tracking framework based on convolutional neural networks and kalman filter,” in 2017 8th IEEE International Conference on Software Engineering and Service Science (ICSESS). Beijing, China: IEEE, 2017, pp. 410–413.

J. Jordan, “Setting the learning rate of your neural network,” Online, 2018. [Online]. Available:



  • There are currently no refbacks.

Visitor Statistic: web

Public View: click here!


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.