Generating realistic human images has been of great value
in recent times due to their varied application in Robotics, Computer
Graphics, Movie Making, and Games. Advancements in Artificial Intel-
ligence (AI) and Machine learning (ML) lead to the rapid growth of
integrating every aspect into AI and ML. There are many deep learning
models like Variational Auto Encoders (VAE), Stacked Hourglass net-
works and Generative Adversarial Networks (GANs) for solving human-
image generation. However, it is still difficult for these models to gener-
alize well to the real-world person-image generation task qualitatively. In
this paper, we develop a multi-stage model based on Conditional GANs,
which could synthesize the new image of the target person given the
image of the person and the target (novel) pose desired in real-world
scenarios. The model uses 2D keypoints to represent human poses. We
propose a Multi-stage Person Generation (MPG) model, in which we
have modified the Generator architecture of Pose Guided Person Image
Generation (P G2 ) resulting in two approaches. The first three-stage per-
son generation approach has an additional generator integrated to base
architecture and has trained the model end-to-end. The second two-stage
person generation approach has a novel texture feature block in stage-II
and has been trained incrementally to improve the generation of human-
images qualitatively. The proposed two-stage MPG approach has gener-
ated promising results on Market-1501 Dataset. The claims are supported
by benchmarking the proposed models with recent state-of-the-art mod-
els. We also show how the multi-stage conditional GAN architectures
influence the process of human-image synthesis.