Deep Learning for Crowd Counting: A Survey

Authors

  • Tjeng Wawan Cenggoro Bina Nusantara University

DOI:

https://doi.org/10.21512/emacsjournal.v1i1.5794

Keywords:

Deep learning, computer vision, crowd counting

Abstract

The growth of deep learning for crowd counting is immense in the recent years. This results in numerous deep learning model developed with huge multifariousness. This paper aims to capture a big picture of existing deep learning models for crowd counting. Hence, the development of novel models for future works can be accelerated.

Dimensions

Plum Analytics

References

Amirgholipour, S., He, X., Jia, W., Wang, D., & Zeibots, M. (2018). A-CCNN: Adaptive CCNN for Density Estimation and Crowd Counting. In 2018 25th IEEE International Conference on Image Processing (ICIP) (pp. 948–952). IEEE. https://doi.org/10.1109/ICIP.2018.8451399

Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., & Sivic, J. (2018). NetVLAD: CNN Architecture for Weakly Supervised Place Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(6), 1437–1451. https://doi.org/10.1109/TPAMI.2017.2711011

Boominathan, L., Kruthiventi, S. S. S., & Babu, R. V. (2016). CrowdNet: A Deep Convolutional Network for Dense Crowd Counting. In Proceedings of the 2016 ACM on Multimedia Conference - MM ’16 (pp. 640–644). New York, New York, USA: ACM Press. https://doi.org/10.1145/2964284.2967300

Cao, X., Wang, Z., Zhao, Y., & Su, F. (2018). Scale Aggregation Network for Accurate and Efficient Crowd Counting. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11209 LNCS, 757–773. https://doi.org/10.1007/978-3-030-01228-1_45

Cenggoro, T. W., Aslamiah, A. H., & Yunanto, A. (2019). Feature Pyramid Networks for Crowd Counting. In To appear: 2019 International Conference on Computer Science and Computational Intelligence. Yogyakarta: Elsevier.

Chan, A. B., Zhang-Sheng John Liang, & Vasconcelos, N. (2008). Privacy preserving crowd monitoring: Counting people without people models or tracking. In 2008 IEEE Conference on Computer Vision and Pattern Recognition (pp. 1–7). IEEE. https://doi.org/10.1109/CVPR.2008.4587569

Chen, K., Loy, C. C., Gong, S., & Xiang, T. (2012). Feature Mining for Localised Crowd Counting. In Procedings of the British Machine Vision Conference 2012 (Vol. 47, pp. 21.1-21.11). British Machine Vision Association. https://doi.org/10.5244/C.26.21

Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16 (Vol. 19, pp. 785–794). New York, New York, USA: ACM Press. https://doi.org/10.1145/2939672.2939785

Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., & Wei, Y. (2017). Deformable Convolutional Networks. In 2017 IEEE International Conference on Computer Vision (ICCV) (Vol. 2017-Octob, pp. 764–773). IEEE. https://doi.org/10.1109/ICCV.2017.89

Deb, D., & Ventura, J. (2018). An aggregated multicolumn dilated convolution network for perspective-free counting. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2018-June, 308–317. https://doi.org/10.1109/CVPRW.2018.00057

Ding, X., Lin, Z., He, F., Wang, Y., & Huang, Y. (2018). A Deeply-Recursive Convolutional Network For Crowd Counting. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (Vol. 2018-April, pp. 1942–1946). IEEE. https://doi.org/10.1109/ICASSP.2018.8461772

Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5), 1189–1232. https://doi.org/10.1214/aos/1013203451

Friedman, J. H. (2002). Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4), 367–378. https://doi.org/10.1016/S0167-9473(01)00065-2

Gao, J., Wang, Q., & Li, X. (2019). PCC Net: Perspective Crowd Counting via Spatial Convolutional Network, 1–13. Retrieved from http://arxiv.org/abs/1905.10085

Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., … Bengio, Y. (2014). Generative Adversarial Networks. Advances in Neural Information Processing Systems 27, 2672–2680. Retrieved from http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf

He, K., Zhang, X., Ren, S., & Sun, J. (2014). Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition, 1–14. https://doi.org/10.1007/978-3-319-10578-9_23

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Vol. 7, pp. 770–778). IEEE. https://doi.org/10.1109/CVPR.2016.90

Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

Hossain, M., Hosseinzadeh, M., Chanda, O., & Wang, Y. (2019). Crowd Counting Using Scale-Aware Attention Networks. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV) (pp. 1280–1288). IEEE. https://doi.org/10.1109/WACV.2019.00141

Hu, J., Shen, L., Albanie, S., Sun, G., & Wu, E. (2019). Squeeze-and-Excitation Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–1. https://doi.org/10.1109/TPAMI.2019.2913372

Huang, S., Li, X., Zhang, Z., Wu, F., Gao, S., Ji, R., & Han, J. (2018). Body Structure Aware Deep Crowd Counting. IEEE Transactions on Image Processing, 27(3), 1049–1059. https://doi.org/10.1109/TIP.2017.2740160

Idrees, H., Saleemi, I., Seibert, C., & Shah, M. (2013). Multi-source Multi-scale Counting in Extremely Dense Crowd Images. In 2013 IEEE Conference on Computer Vision and Pattern Recognition (pp. 2547–2554). IEEE. https://doi.org/10.1109/CVPR.2013.329

Idrees, H., Tayyab, M., Athrey, K., Zhang, D., Al-Maadeed, S., Rajpoot, N., & Shah, M. (2018). Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11206 LNCS, 544–559. https://doi.org/10.1007/978-3-030-01216-8_33

Jaderberg, M., Simonyan, K., Zisserman, A., & Kavukcuoglu, K. (2015). Spatial Transformer Networks. Nips, 1–14. https://doi.org/10.1038/nbt.3343

Jegou, H., Douze, M., Schmid, C., & Perez, P. (2010). Aggregating local descriptors into a compact image representation. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 3304–3311). IEEE. https://doi.org/10.1109/CVPR.2010.5540039

Jiang, X., Xiao, Z., Zhang, B., Zhen, X., Cao, X., Doermann, D., & Shao, L. (2019). Crowd Counting and Density Estimation by Trellis Encoder-Decoder Network. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Retrieved from http://arxiv.org/abs/1903.00853

LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jackel, L. D. (1989). Backpropagation Applied to Handwritten Zip Code Recognition. Neural Computation, 1(4), 541–551. https://doi.org/10.1162/neco.1989.1.4.541

Lempitsky, V., & Zisserman, A. (2010). Learning To Count Objects in Images. Advances in Neural Information Processing Systems, 1324–1332. https://doi.org/10.1111/1467-9280.03439

Li, Y., Zhang, X., & Chen, D. (2018). CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 1091–1100). IEEE. https://doi.org/10.1109/CVPR.2018.00120

Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017-Janua, 936–944. https://doi.org/10.1109/CVPR.2017.106

Liu, C., Weng, X., & Mu, Y. (2019). Recurrent Attentive Zooming for Joint Crowd Counting and Precise Localization. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Liu, J., Gao, C., Meng, D., & Hauptmann, A. G. (2018). DecideNet: Counting Varying Density Crowds Through Attention Guided Detection and Density Estimation. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 5197–5206). IEEE. https://doi.org/10.1109/CVPR.2018.00545

Liu, L., Wang, H., Li, G., Ouyang, W., & Lin, L. (2018). Crowd counting using deep recurrent spatial-aware network. IJCAI International Joint Conference on Artificial Intelligence, 2018-July, 849–855. https://doi.org/arXiv:1807.00601v1

Liu, W., Salzmann, M., & Fua, P. (2019). Context-Aware Crowd Counting. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Retrieved from http://arxiv.org/abs/1811.10452

Liu, X., van de Weijer, J., & Bagdanov, A. D. (2018). Leveraging Unlabeled Data for Crowd Counting by Learning to Rank. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 7661–7669). IEEE. https://doi.org/10.1109/CVPR.2018.00799

Liu, Y., & Yao, X. (1999). Ensemble learning via negative correlation. Neural Networks, 12(10), 1399–1404. https://doi.org/10.1016/S0893-6080(99)00073-8

Long, J., Shelhamer, E., & Darrell, T. (2014). Fully Convolutional Networks for Semantic Segmentation. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3431–3440. https://doi.org/10.1109/CVPR.2015.7298965

Makhzani, A., & Frey, B. (2015). Winner-take-all Autoencoders. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2 (pp. 2791–2799). Cambridge, MA, USA: MIT Press. Retrieved from http://dl.acm.org/citation.cfm?id=2969442.2969552

Mamdani, E. H. (1977). Application of fuzzy logic to approximate reasoning using linguistic. Ieee_J_C, C–26(12), 1182--1191. Retrieved from http://www.scopus.com/inward/record.url?eid=2-s2.0-0017219295&partnerID=tZOtx3y1

Marsden, M., McGuinness, K., Little, S., & O’Connor, N. E. (2017). ResnetCrowd: A residual deep learning architecture for crowd counting, violent behaviour detection and crowd density level classification. In 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) (pp. 1–7). IEEE. https://doi.org/10.1109/AVSS.2017.8078482

Ming Liu, Jue Jiang, Zhenqei Guo, Zenan Wang, & Yang Liu. (2018). Crowd Counting with Fully Convolutional Neural Network. In 2018 25th IEEE International Conference on Image Processing (ICIP) (pp. 953–957). IEEE. https://doi.org/10.1109/ICIP.2018.8451787

Olmschenk, G., Tang, H., & Zhu, Z. (2018). Crowd Counting with Minimal Data Using Generative Adversarial Networks for Multiple Target Regression. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) (Vol. 2018-Janua, pp. 1151–1159). IEEE. https://doi.org/10.1109/WACV.2018.00131

Oñoro-Rubio, D., & López-Sastre, R. J. (2016). Towards Perspective-Free Object Counting with Deep Learning. In B. Leibe, J. Matas, N. Sebe, & M. Welling (Eds.), ECCV 2016: 14th European Conference, Amsterdam, The Netherlands (Vol. 9911, pp. 615–629). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-46478-7_38

Sam, D. B., & Babu, R. V. (2018). Top-Down Feedback for Crowd Counting Convolutional Neural Network. Retrieved from http://arxiv.org/abs/1807.08881

Sam, D. B., Sajjan, N. N., Babu, R. V., & Srinivasan, M. (2018). Divide and Grow: Capturing Huge Diversity in Crowd Images with Incrementally Growing CNN. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 3618–3626). IEEE. https://doi.org/10.1109/CVPR.2018.00381

Sam, D. B., Sajjan, N. N., Maurya, H., & Babu, R. V. (2019). Almost Unsupervised Learning for Dense Crowd Counting. Aaai. Retrieved from val.serc.iisc.ernet.in/valweb/papers/AAAI_2019_WTACNN.pdf%0A

Sam, D. B., Surya, S., & Babu, R. V. (2017). Switching Convolutional Neural Network for Crowd Counting. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 4031–4039). IEEE. https://doi.org/10.1109/CVPR.2017.429

Sang, J., Wu, W., Luo, H., Xiang, H., Zhang, Q., Hu, H., & Xia, X. (2019). Improved Crowd Counting Method Based on Scale-Adaptive Convolutional Neural Network. IEEE Access, 7, 24411–24419. https://doi.org/10.1109/ACCESS.2019.2899939

Shen, Z., Xu, Y., Ni, B., Wang, M., Hu, J., & Yang, X. (2018). Crowd Counting via Adversarial Cross-Scale Consistency Pursuit. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 5245–5254). IEEE. https://doi.org/10.1109/CVPR.2018.00550

Shi, M., Yang, Z., Xu, C., & Chen, Q. (2019). Revisiting Perspective Information for Efficient Crowd Counting. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Retrieved from http://arxiv.org/abs/1807.01989

Shi, Z., Zhang, L., Liu, Y., Cao, X., Ye, Y., Cheng, M.-M., & Zheng, G. (2018). Crowd Counting with Deep Negative Correlation Learning. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 5382–5390). IEEE. https://doi.org/10.1109/CVPR.2018.00564

Shi, Z., Zhang, L., Sun, Y., & Ye, Y. (2018). Multiscale multitask deep NetVLAD for crowd counting. IEEE Transactions on Industrial Informatics, 14(11), 4953–4962. https://doi.org/10.1109/TII.2018.2852481

Simonyan, K., & Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition. The International Conference on Learning Representations 2015, 1–14. Retrieved from http://arxiv.org/abs/1409.1556

Sindagi, V. A., & Patel, V. M. (2017a). CNN-Based cascaded multi-task learning of high-level prior and density estimation for crowd counting. 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance, AVSS 2017. https://doi.org/10.1109/AVSS.2017.8078491

Sindagi, V. A., & Patel, V. M. (2017b). Generating High-Quality Crowd Density Maps Using Contextual Pyramid CNNs. Proceedings of the IEEE International Conference on Computer Vision, 2017-Octob, 1879–1888. https://doi.org/10.1109/ICCV.2017.206

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., … Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Vol. 07-12-June, pp. 1–9). https://doi.org/10.1109/CVPR.2015.7298594

Walach, E., & Wolf, L. (2016). Learning to count with CNN boosting. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9906 LNCS, 660–676. https://doi.org/10.1007/978-3-319-46475-6_41

Wang, Q., Gao, J., Lin, W., & Yuan, Y. (2019). Learning from Synthetic Data for Crowd Counting in the Wild. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Retrieved from http://arxiv.org/abs/1903.03303

Wang, Z., Xiao, Z., Xie, K., Qiu, Q., Zhen, X., & Cao, X. (2018). In Defense of Single-column Networks for Crowd Counting. Retrieved from http://arxiv.org/abs/1808.06133

Wu, X., Zheng, Y., Ye, H., Hu, W., Yang, J., & He, L. (2018). Adaptive Scenario Discovery for Crowd Counting. In 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 2382–2386). Retrieved from http://arxiv.org/abs/1812.02393

Xiong, F., Shi, X., & Yeung, D.-Y. (2017). Spatiotemporal Modeling for Crowd Counting in Videos. In 2017 IEEE International Conference on Computer Vision (ICCV) (pp. 5161–5169). IEEE. https://doi.org/10.1109/ICCV.2017.551

Yang, J., Zhou, Y., & Kung, S.-Y. (2018). Multi-scale Generative Adversarial Networks for Crowd Counting. In 2018 24th International Conference on Pattern Recognition (ICPR) (Vol. 2018-Augus, pp. 3244–3249). IEEE. https://doi.org/10.1109/ICPR.2018.8545683

Yu, F., & Koltun, V. (2015). Multi-Scale Context Aggregation by Dilated Convolutions. https://doi.org/10.16373/j.cnki.ahr.150049

Zeng, L., Xu, X., Cai, B., Qiu, S., & Zhang, T. (2017). Multi-scale convolutional neural networks for crowd counting. In 2017 IEEE International Conference on Image Processing (ICIP) (Vol. 2017-Septe, pp. 465–469). IEEE. https://doi.org/10.1109/ICIP.2017.8296324

Zhang, C., Li, H., Wang, X., & Yang, X. (2015). Cross-scene crowd counting via deep convolutional neural networks. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Vol. 07-12-June, pp. 833–841). IEEE. https://doi.org/10.1109/CVPR.2015.7298684

Zhang, L., Shi, M., & Chen, Q. (2018). Crowd Counting via Scale-Adaptive Convolutional Neural Network. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) (Vol. 2018-Janua, pp. 1113–1121). IEEE. https://doi.org/10.1109/WACV.2018.00127

Zhang, Y., Zhou, D., Chen, S., Gao, S., & Ma, Y. (2016). Single-Image Crowd Counting via Multi-Column Convolutional Neural Network. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 589–597. https://doi.org/10.1109/CVPR.2016.70

Zou, Z., Su, X., Qu, X., & Zhou, P. (2018). DA-Net: Learning the fine-grained density distribution with deformation aggregation network. IEEE Access, 6, 60745–60756. https://doi.org/10.1109/ACCESS.2018.2875495

Downloads

Published

2019-09-04

Issue

Section

Articles
Abstract 908  .
PDF downloaded 449  .