Adversarial learning for multiscale crowd counting under complex scenes

Yuan Zhou, Jianxing Yang, Hongru Li, Tao Cao, Sun Yuan Kung

Research output: Contribution to journalArticlepeer-review

25 Scopus citations


In this article, a multiscale generative adversarial network (MS-GAN) is proposed for generating high-quality crowd density maps of arbitrary crowd density scenes. The task of crowd counting has many challenges, such as severe occlusions in extremely dense crowd scenes, perspective distortion, and high visual similarity between the pedestrians and background elements. To address these problems, the proposed MS-GAN combines a multiscale convolutional neural network (generator) and an adversarial network (discriminator) to generate a high-quality density map and accurately estimate the crowd count in complex crowd scenes. The multiscale generator utilizes the fusion features from multiple hierarchical layers to detect people with large-scale variation. The resulting density map produced by the multiscale generator is processed by a discriminator network trained to solve a binary classification task between a poor quality density map and real ground-truth ones. The additional adversarial loss can improve the quality of the density map, which is critical to accurately estimate the crowd counts. The experiments were conducted on multiple datasets with different crowd scenes and densities. The results showed that the proposed method provided better performance compared to current state-of-the-art methods.

Original languageEnglish (US)
Pages (from-to)5423-5432
Number of pages10
JournalIEEE Transactions on Cybernetics
Issue number11
StatePublished - Nov 1 2021

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems
  • Human-Computer Interaction
  • Electrical and Electronic Engineering
  • Control and Systems Engineering
  • Computer Science Applications


  • Adversarial learning
  • crowd counting
  • density map
  • multiscale generator


Dive into the research topics of 'Adversarial learning for multiscale crowd counting under complex scenes'. Together they form a unique fingerprint.

Cite this