Text to Image Synthesis

Authors

  • Chaitanya Ghadling  Student, Department Computer Engineering, Dr. D. Y. Patil School of Engineering, Lohegaon, Pune, Maharashtra, India
  • Firosh Vasudevan  Student, Department Computer Engineering, Dr. D. Y. Patil School of Engineering, Lohegaon, Pune, Maharashtra, India
  • Ruchin Dhama  Student, Department Computer Engineering, Dr. D. Y. Patil School of Engineering, Lohegaon, Pune, Maharashtra, India
  • Shreya Lad  Student, Department Computer Engineering, Dr. D. Y. Patil School of Engineering, Lohegaon, Pune, Maharashtra, India
  • Sunil Rathod  Professor, Department Computer Engineering, Dr. D. Y. Patil School of Engineering, Lohegaon, Pune, Maharashtra, India

Keywords:

GAN, AI, ML, Deep Learning, AttnGAN

Abstract

One of the most difficult things for current Artificial Intelligence and Machine Learning systems to replicate is human creativity and imagination. Humans have the ability to create mental images of objects by just visualizing and having a general look at the description of that particular object. In recent years with the evolution of GANs (Generative Adversarial Network) and its gaining popularity for being able to somewhat replicate human creativity and imagination, research on generating high quality images from text description is boosted tremendously. Through this research paper, we are trying to explore a newly developed GAN architecture known as Attentional Generative Adversarial Network (AttnGAN) that generates plausible images of birds from detailed text descriptions with visual realism and semantic accuracy.

References

  1. AttnGAN: Fine - grained Text to Image Generation with Attentional Generative Adversarial Networks.
  2. StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks.
  3. Generative Adversarial Text to Image Synthesis.
  4. MirrorGAN: Learning Text to Image Generation by Redescription.
  5. Learn, Imagine, and Create: Text to Image Generation from Prior Knowledge.
  6. H. Zhang, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang, and D. Metaxas. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In ICCV, 2017. 1, 2, 3, 5, 7
  7. A. Agrawal, J. Lu, S. Antol, M. Mitchell, C. L. Zitnick, D. Parikh, and D. Batra. VQA: visual question answering. IJCV, 123(1):4–31, 2017.
  8. D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. arXiv:1409.0473, 2014.
  9. E. L. Denton, S. Chintala, A. Szlam, and R. Fergus. Deep generative image models using a laplacian pyramid of adversarial networks. In NIPS, 2015.
  10. T. Salimans, I. J. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen. Improved techniques for training gans. In NIPS, 2016. 2, 5.

Downloads

Published

2021-06-30

Issue

Section

Research Articles

How to Cite

[1]
Chaitanya Ghadling, Firosh Vasudevan, Ruchin Dhama, Shreya Lad, Sunil Rathod, " Text to Image Synthesis" International Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 8, Issue 3, pp.307-313, May-June-2021.