Text to Image Synthesis

Chaitanya Ghadling; Firosh Vasudevan; Ruchin Dhama; Shreya Lad; Sunil Rathod

doi:10.32628/CSEIT218348

Authors

Chaitanya Ghadling Student, Department Computer Engineering, Dr. D. Y. Patil School of Engineering, Lohegaon, Pune, Maharashtra, India
Firosh Vasudevan Student, Department Computer Engineering, Dr. D. Y. Patil School of Engineering, Lohegaon, Pune, Maharashtra, India
Ruchin Dhama Student, Department Computer Engineering, Dr. D. Y. Patil School of Engineering, Lohegaon, Pune, Maharashtra, India
Shreya Lad Student, Department Computer Engineering, Dr. D. Y. Patil School of Engineering, Lohegaon, Pune, Maharashtra, India
Sunil Rathod Professor, Department Computer Engineering, Dr. D. Y. Patil School of Engineering, Lohegaon, Pune, Maharashtra, India

Keywords:

GAN, AI, ML, Deep Learning, AttnGAN

Abstract

One of the most difficult things for current Artificial Intelligence and Machine Learning systems to replicate is human creativity and imagination. Humans have the ability to create mental images of objects by just visualizing and having a general look at the description of that particular object. In recent years with the evolution of GANs (Generative Adversarial Network) and its gaining popularity for being able to somewhat replicate human creativity and imagination, research on generating high quality images from text description is boosted tremendously. Through this research paper, we are trying to explore a newly developed GAN architecture known as Attentional Generative Adversarial Network (AttnGAN) that generates plausible images of birds from detailed text descriptions with visual realism and semantic accuracy.

References

AttnGAN: Fine - grained Text to Image Generation with Attentional Generative Adversarial Networks.
StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks.
Generative Adversarial Text to Image Synthesis.
MirrorGAN: Learning Text to Image Generation by Redescription.
Learn, Imagine, and Create: Text to Image Generation from Prior Knowledge.
H. Zhang, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang, and D. Metaxas. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In ICCV, 2017. 1, 2, 3, 5, 7
A. Agrawal, J. Lu, S. Antol, M. Mitchell, C. L. Zitnick, D. Parikh, and D. Batra. VQA: visual question answering. IJCV, 123(1):4–31, 2017.
D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. arXiv:1409.0473, 2014.
E. L. Denton, S. Chintala, A. Szlam, and R. Fergus. Deep generative image models using a laplacian pyramid of adversarial networks. In NIPS, 2015.
T. Salimans, I. J. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen. Improved techniques for training gans. In NIPS, 2016. 2, 5.

Text to Image Synthesis

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite