Deep Learning for Crowd Counting: A Survey
DOI:
https://doi.org/10.21512/emacsjournal.v1i1.5794Keywords:
Deep learning, computer vision, crowd countingAbstract
The growth of deep learning for crowd counting is immense in the recent years. This results in numerous deep learning model developed with huge multifariousness. This paper aims to capture a big picture of existing deep learning models for crowd counting. Hence, the development of novel models for future works can be accelerated.Plum Analytics
References
Amirgholipour, S., He, X., Jia, W., Wang, D., & Zeibots, M. (2018). A-CCNN: Adaptive CCNN for Density Estimation and Crowd Counting. In 2018 25th IEEE International Conference on Image Processing (ICIP) (pp. 948–952). IEEE. https://doi.org/10.1109/ICIP.2018.8451399
Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., & Sivic, J. (2018). NetVLAD: CNN Architecture for Weakly Supervised Place Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(6), 1437–1451. https://doi.org/10.1109/TPAMI.2017.2711011
Boominathan, L., Kruthiventi, S. S. S., & Babu, R. V. (2016). CrowdNet: A Deep Convolutional Network for Dense Crowd Counting. In Proceedings of the 2016 ACM on Multimedia Conference - MM ’16 (pp. 640–644). New York, New York, USA: ACM Press. https://doi.org/10.1145/2964284.2967300
Cao, X., Wang, Z., Zhao, Y., & Su, F. (2018). Scale Aggregation Network for Accurate and Efficient Crowd Counting. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11209 LNCS, 757–773. https://doi.org/10.1007/978-3-030-01228-1_45
Cenggoro, T. W., Aslamiah, A. H., & Yunanto, A. (2019). Feature Pyramid Networks for Crowd Counting. In To appear: 2019 International Conference on Computer Science and Computational Intelligence. Yogyakarta: Elsevier.
Chan, A. B., Zhang-Sheng John Liang, & Vasconcelos, N. (2008). Privacy preserving crowd monitoring: Counting people without people models or tracking. In 2008 IEEE Conference on Computer Vision and Pattern Recognition (pp. 1–7). IEEE. https://doi.org/10.1109/CVPR.2008.4587569
Chen, K., Loy, C. C., Gong, S., & Xiang, T. (2012). Feature Mining for Localised Crowd Counting. In Procedings of the British Machine Vision Conference 2012 (Vol. 47, pp. 21.1-21.11). British Machine Vision Association. https://doi.org/10.5244/C.26.21
Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16 (Vol. 19, pp. 785–794). New York, New York, USA: ACM Press. https://doi.org/10.1145/2939672.2939785
Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., & Wei, Y. (2017). Deformable Convolutional Networks. In 2017 IEEE International Conference on Computer Vision (ICCV) (Vol. 2017-Octob, pp. 764–773). IEEE. https://doi.org/10.1109/ICCV.2017.89
Deb, D., & Ventura, J. (2018). An aggregated multicolumn dilated convolution network for perspective-free counting. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2018-June, 308–317. https://doi.org/10.1109/CVPRW.2018.00057
Ding, X., Lin, Z., He, F., Wang, Y., & Huang, Y. (2018). A Deeply-Recursive Convolutional Network For Crowd Counting. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (Vol. 2018-April, pp. 1942–1946). IEEE. https://doi.org/10.1109/ICASSP.2018.8461772
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5), 1189–1232. https://doi.org/10.1214/aos/1013203451
Friedman, J. H. (2002). Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4), 367–378. https://doi.org/10.1016/S0167-9473(01)00065-2
Gao, J., Wang, Q., & Li, X. (2019). PCC Net: Perspective Crowd Counting via Spatial Convolutional Network, 1–13. Retrieved from http://arxiv.org/abs/1905.10085
Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., … Bengio, Y. (2014). Generative Adversarial Networks. Advances in Neural Information Processing Systems 27, 2672–2680. Retrieved from http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf
He, K., Zhang, X., Ren, S., & Sun, J. (2014). Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition, 1–14. https://doi.org/10.1007/978-3-319-10578-9_23
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Vol. 7, pp. 770–778). IEEE. https://doi.org/10.1109/CVPR.2016.90
Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Hossain, M., Hosseinzadeh, M., Chanda, O., & Wang, Y. (2019). Crowd Counting Using Scale-Aware Attention Networks. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV) (pp. 1280–1288). IEEE. https://doi.org/10.1109/WACV.2019.00141
Hu, J., Shen, L., Albanie, S., Sun, G., & Wu, E. (2019). Squeeze-and-Excitation Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–1. https://doi.org/10.1109/TPAMI.2019.2913372
Huang, S., Li, X., Zhang, Z., Wu, F., Gao, S., Ji, R., & Han, J. (2018). Body Structure Aware Deep Crowd Counting. IEEE Transactions on Image Processing, 27(3), 1049–1059. https://doi.org/10.1109/TIP.2017.2740160
Idrees, H., Saleemi, I., Seibert, C., & Shah, M. (2013). Multi-source Multi-scale Counting in Extremely Dense Crowd Images. In 2013 IEEE Conference on Computer Vision and Pattern Recognition (pp. 2547–2554). IEEE. https://doi.org/10.1109/CVPR.2013.329
Idrees, H., Tayyab, M., Athrey, K., Zhang, D., Al-Maadeed, S., Rajpoot, N., & Shah, M. (2018). Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11206 LNCS, 544–559. https://doi.org/10.1007/978-3-030-01216-8_33
Jaderberg, M., Simonyan, K., Zisserman, A., & Kavukcuoglu, K. (2015). Spatial Transformer Networks. Nips, 1–14. https://doi.org/10.1038/nbt.3343
Jegou, H., Douze, M., Schmid, C., & Perez, P. (2010). Aggregating local descriptors into a compact image representation. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 3304–3311). IEEE. https://doi.org/10.1109/CVPR.2010.5540039
Jiang, X., Xiao, Z., Zhang, B., Zhen, X., Cao, X., Doermann, D., & Shao, L. (2019). Crowd Counting and Density Estimation by Trellis Encoder-Decoder Network. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Retrieved from http://arxiv.org/abs/1903.00853
LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jackel, L. D. (1989). Backpropagation Applied to Handwritten Zip Code Recognition. Neural Computation, 1(4), 541–551. https://doi.org/10.1162/neco.1989.1.4.541
Lempitsky, V., & Zisserman, A. (2010). Learning To Count Objects in Images. Advances in Neural Information Processing Systems, 1324–1332. https://doi.org/10.1111/1467-9280.03439
Li, Y., Zhang, X., & Chen, D. (2018). CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 1091–1100). IEEE. https://doi.org/10.1109/CVPR.2018.00120
Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017-Janua, 936–944. https://doi.org/10.1109/CVPR.2017.106
Liu, C., Weng, X., & Mu, Y. (2019). Recurrent Attentive Zooming for Joint Crowd Counting and Precise Localization. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Liu, J., Gao, C., Meng, D., & Hauptmann, A. G. (2018). DecideNet: Counting Varying Density Crowds Through Attention Guided Detection and Density Estimation. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 5197–5206). IEEE. https://doi.org/10.1109/CVPR.2018.00545
Liu, L., Wang, H., Li, G., Ouyang, W., & Lin, L. (2018). Crowd counting using deep recurrent spatial-aware network. IJCAI International Joint Conference on Artificial Intelligence, 2018-July, 849–855. https://doi.org/arXiv:1807.00601v1
Liu, W., Salzmann, M., & Fua, P. (2019). Context-Aware Crowd Counting. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Retrieved from http://arxiv.org/abs/1811.10452
Liu, X., van de Weijer, J., & Bagdanov, A. D. (2018). Leveraging Unlabeled Data for Crowd Counting by Learning to Rank. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 7661–7669). IEEE. https://doi.org/10.1109/CVPR.2018.00799
Liu, Y., & Yao, X. (1999). Ensemble learning via negative correlation. Neural Networks, 12(10), 1399–1404. https://doi.org/10.1016/S0893-6080(99)00073-8
Long, J., Shelhamer, E., & Darrell, T. (2014). Fully Convolutional Networks for Semantic Segmentation. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3431–3440. https://doi.org/10.1109/CVPR.2015.7298965
Makhzani, A., & Frey, B. (2015). Winner-take-all Autoencoders. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2 (pp. 2791–2799). Cambridge, MA, USA: MIT Press. Retrieved from http://dl.acm.org/citation.cfm?id=2969442.2969552
Mamdani, E. H. (1977). Application of fuzzy logic to approximate reasoning using linguistic. Ieee_J_C, C–26(12), 1182--1191. Retrieved from http://www.scopus.com/inward/record.url?eid=2-s2.0-0017219295&partnerID=tZOtx3y1
Marsden, M., McGuinness, K., Little, S., & O’Connor, N. E. (2017). ResnetCrowd: A residual deep learning architecture for crowd counting, violent behaviour detection and crowd density level classification. In 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) (pp. 1–7). IEEE. https://doi.org/10.1109/AVSS.2017.8078482
Ming Liu, Jue Jiang, Zhenqei Guo, Zenan Wang, & Yang Liu. (2018). Crowd Counting with Fully Convolutional Neural Network. In 2018 25th IEEE International Conference on Image Processing (ICIP) (pp. 953–957). IEEE. https://doi.org/10.1109/ICIP.2018.8451787
Olmschenk, G., Tang, H., & Zhu, Z. (2018). Crowd Counting with Minimal Data Using Generative Adversarial Networks for Multiple Target Regression. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) (Vol. 2018-Janua, pp. 1151–1159). IEEE. https://doi.org/10.1109/WACV.2018.00131
Oñoro-Rubio, D., & López-Sastre, R. J. (2016). Towards Perspective-Free Object Counting with Deep Learning. In B. Leibe, J. Matas, N. Sebe, & M. Welling (Eds.), ECCV 2016: 14th European Conference, Amsterdam, The Netherlands (Vol. 9911, pp. 615–629). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-46478-7_38
Sam, D. B., & Babu, R. V. (2018). Top-Down Feedback for Crowd Counting Convolutional Neural Network. Retrieved from http://arxiv.org/abs/1807.08881
Sam, D. B., Sajjan, N. N., Babu, R. V., & Srinivasan, M. (2018). Divide and Grow: Capturing Huge Diversity in Crowd Images with Incrementally Growing CNN. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 3618–3626). IEEE. https://doi.org/10.1109/CVPR.2018.00381
Sam, D. B., Sajjan, N. N., Maurya, H., & Babu, R. V. (2019). Almost Unsupervised Learning for Dense Crowd Counting. Aaai. Retrieved from val.serc.iisc.ernet.in/valweb/papers/AAAI_2019_WTACNN.pdf%0A
Sam, D. B., Surya, S., & Babu, R. V. (2017). Switching Convolutional Neural Network for Crowd Counting. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 4031–4039). IEEE. https://doi.org/10.1109/CVPR.2017.429
Sang, J., Wu, W., Luo, H., Xiang, H., Zhang, Q., Hu, H., & Xia, X. (2019). Improved Crowd Counting Method Based on Scale-Adaptive Convolutional Neural Network. IEEE Access, 7, 24411–24419. https://doi.org/10.1109/ACCESS.2019.2899939
Shen, Z., Xu, Y., Ni, B., Wang, M., Hu, J., & Yang, X. (2018). Crowd Counting via Adversarial Cross-Scale Consistency Pursuit. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 5245–5254). IEEE. https://doi.org/10.1109/CVPR.2018.00550
Shi, M., Yang, Z., Xu, C., & Chen, Q. (2019). Revisiting Perspective Information for Efficient Crowd Counting. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Retrieved from http://arxiv.org/abs/1807.01989
Shi, Z., Zhang, L., Liu, Y., Cao, X., Ye, Y., Cheng, M.-M., & Zheng, G. (2018). Crowd Counting with Deep Negative Correlation Learning. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 5382–5390). IEEE. https://doi.org/10.1109/CVPR.2018.00564
Shi, Z., Zhang, L., Sun, Y., & Ye, Y. (2018). Multiscale multitask deep NetVLAD for crowd counting. IEEE Transactions on Industrial Informatics, 14(11), 4953–4962. https://doi.org/10.1109/TII.2018.2852481
Simonyan, K., & Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition. The International Conference on Learning Representations 2015, 1–14. Retrieved from http://arxiv.org/abs/1409.1556
Sindagi, V. A., & Patel, V. M. (2017a). CNN-Based cascaded multi-task learning of high-level prior and density estimation for crowd counting. 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance, AVSS 2017. https://doi.org/10.1109/AVSS.2017.8078491
Sindagi, V. A., & Patel, V. M. (2017b). Generating High-Quality Crowd Density Maps Using Contextual Pyramid CNNs. Proceedings of the IEEE International Conference on Computer Vision, 2017-Octob, 1879–1888. https://doi.org/10.1109/ICCV.2017.206
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., … Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Vol. 07-12-June, pp. 1–9). https://doi.org/10.1109/CVPR.2015.7298594
Walach, E., & Wolf, L. (2016). Learning to count with CNN boosting. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9906 LNCS, 660–676. https://doi.org/10.1007/978-3-319-46475-6_41
Wang, Q., Gao, J., Lin, W., & Yuan, Y. (2019). Learning from Synthetic Data for Crowd Counting in the Wild. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Retrieved from http://arxiv.org/abs/1903.03303
Wang, Z., Xiao, Z., Xie, K., Qiu, Q., Zhen, X., & Cao, X. (2018). In Defense of Single-column Networks for Crowd Counting. Retrieved from http://arxiv.org/abs/1808.06133
Wu, X., Zheng, Y., Ye, H., Hu, W., Yang, J., & He, L. (2018). Adaptive Scenario Discovery for Crowd Counting. In 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 2382–2386). Retrieved from http://arxiv.org/abs/1812.02393
Xiong, F., Shi, X., & Yeung, D.-Y. (2017). Spatiotemporal Modeling for Crowd Counting in Videos. In 2017 IEEE International Conference on Computer Vision (ICCV) (pp. 5161–5169). IEEE. https://doi.org/10.1109/ICCV.2017.551
Yang, J., Zhou, Y., & Kung, S.-Y. (2018). Multi-scale Generative Adversarial Networks for Crowd Counting. In 2018 24th International Conference on Pattern Recognition (ICPR) (Vol. 2018-Augus, pp. 3244–3249). IEEE. https://doi.org/10.1109/ICPR.2018.8545683
Yu, F., & Koltun, V. (2015). Multi-Scale Context Aggregation by Dilated Convolutions. https://doi.org/10.16373/j.cnki.ahr.150049
Zeng, L., Xu, X., Cai, B., Qiu, S., & Zhang, T. (2017). Multi-scale convolutional neural networks for crowd counting. In 2017 IEEE International Conference on Image Processing (ICIP) (Vol. 2017-Septe, pp. 465–469). IEEE. https://doi.org/10.1109/ICIP.2017.8296324
Zhang, C., Li, H., Wang, X., & Yang, X. (2015). Cross-scene crowd counting via deep convolutional neural networks. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Vol. 07-12-June, pp. 833–841). IEEE. https://doi.org/10.1109/CVPR.2015.7298684
Zhang, L., Shi, M., & Chen, Q. (2018). Crowd Counting via Scale-Adaptive Convolutional Neural Network. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) (Vol. 2018-Janua, pp. 1113–1121). IEEE. https://doi.org/10.1109/WACV.2018.00127
Zhang, Y., Zhou, D., Chen, S., Gao, S., & Ma, Y. (2016). Single-Image Crowd Counting via Multi-Column Convolutional Neural Network. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 589–597. https://doi.org/10.1109/CVPR.2016.70
Zou, Z., Su, X., Qu, X., & Zhou, P. (2018). DA-Net: Learning the fine-grained density distribution with deformation aggregation network. IEEE Access, 6, 60745–60756. https://doi.org/10.1109/ACCESS.2018.2875495
Downloads
Published
Issue
Section
License
Authors who publish with this journal agree to the following terms:
a. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License - Share Alike that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
USER RIGHTS
All articles published Open Access will be immediately and permanently free for everyone to read and download. We are continuously working with our author communities to select the best choice of license options, currently being defined for this journal as follows: Creative Commons Attribution-Share Alike (CC BY-SA)