A Survey Report On Text to Image Generator Using Stable Diffusion
Keywords:
Text-to-Image Generation, Stable Diffusion, CLIP ViT-L/14, Iterative Refinement, Photorealistic Images, Image Synthesis, Textual Conditioning, Diverse Dataset, Convergence, Creative Expression, Visual Realism.Abstract
In recent years, the advancement of artifical intelligence has led to remarkable progress in generating realistic images from textual descriptions. This project introduces “Stable Diffusion”, an innovative text-to-image synthesis model that achieves photorealistic image generation through a unique iterative refinement process. Trained on a diverse dataset of images, the model employs a fixed CLIP ViT-L/14 text encoder to condition image synthesis on textual cues. Stable diffusion employs a stepwise approach, gradually enhancing a random noise image while aligning it with the given text prompt. This iterative process continues until convergence, yielding high-quality images that faithfully represent the text description. The model demonstrates its capabilities aceoss a spectrum of a humans, animals, landscapes, and abstract art. The potency of stable diffusion materializes across diverse domains. From evocative portraits of people and enchanting depictions of animals to sprawling landscapes and abstract artistic expressions, the model encapsulates the intricate essence of textual descriptions, yielding images that extend beyond mere representation.
References
- Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugun Logeswaran, Bernt Schiele, Honglaklee (2016). Generative Adversarial Text-to-Image Synthesis.
- Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas (2017). StackGAN: Text to Photp-realistic Image Synthesis with Stacked Generative Adversarial Networks.
- Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever (2021). DALL-E: Creating Images from Text.
- Tao Xu, Pengchuan Zhang, Qiuyuan Huang Han Zhang,Zhe Gan, Xiaolei Huang, Xiaodong He (2018). AttnGAN: Finegrained text to Image Generation with Attention Generative Adversarial Networks.
- Tom B.Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwa(2020). Plug and Play Language Models: A Simple Approach to Controlled Text Generation.
- Patrick Esser, Robin R.Selvaraju, Marc’Areli (2021). Image Generation from text with Transformers.
- Mohammad Khedekar, Min Hwan Oh, Teng-Yok Lee, Philip Yu (2021). TediGAN: Text-Guided Diverse Image generation and manipulation.
- Taesung Park, Ming-Yu Liu, Ting-Chun Wang, Jun-Yan Zhu (2019). Semantic Image Synthesis with spatially Adaptive Normalizatin.
- Samuel R.Bowman, Luke Vilnis, Oriol Vinyals. (2018). Adversal Generation of Natural Language.
- Jaeyoon Yoo, Jangho Kim, Hyunwo Kim, Sungwoong Kim (2016).In that a InceptioV3-based Conditional GAN for Semantic Image Synthesis.
- Parlewar, P. ., Jagtap, V. ., Pujeri, U. ., Kulkarni, M. M. S. ., Shirkande, S. T. ., & Tripathi, A. . (2023). An Efficient Low-Loss Data Transmission Model for Noisy Networks. International Journal of Intelligent Systems and Applications in Engineering, 11(9s), 267–276
Downloads
Published
Issue
Section
License
Copyright (c) IJSRCSEIT

This work is licensed under a Creative Commons Attribution 4.0 International License.