One of the main challenges of human-image generation is generating a person along with pose and clothing
details. However, it is still a difficult task due to challenging backgrounds and appearance variance. Re-
cently, various deep learning models like Stacked Hourglass networks, Variational Auto Encoders (VAE), and
Generative Adversarial Networks (GANs) have been used to solve this problem. However, still, they do not
generalize well to the real-world human-image generation task qualitatively. The main goal is to use the Spec-
tral Normalization (SN) technique for training GAN to synthesize the human-image along with the perfect
pose and appearance details of the person. In this paper, we have investigated how Conditional GANs, along
with Spectral Normalization (SN), could synthesize the new image of the target person given the image of the
person and the target (novel) pose desired. The model uses 2D keypoints to represent human poses. We also
use adversarial hinge loss and present an ablation study. The proposed model variants have generated promis-
ing results on both the Market-1501 and DeepFashion Datasets. We supported our claims by benchmarking
the proposed model with recent state-of-the-art models. Finally, we show how the Spectral Normalization
(SN) technique influences the process of human-image synthesis.