Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

Chitwan Saharia; William Chan; Saurabh Saxena; Lala Li; Jay Whang; Emily L. Denton; Seyed Kamyar Seyed Ghasemipour; Burcu Karagol Ayan; S. S. Mahdavi; Raphael Gontijo Lopes; Tim Salimans; Jonathan Ho; David J. Fleet; Mohammad Norouzi

DOI:10.48550/arXiv.2205.11487
Corpus ID: 248986576

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

@article{Saharia2022PhotorealisticTD,
  title={Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding},
  author={Chitwan Saharia and William Chan and Saurabh Saxena and Lala Li and Jay Whang and Emily L. Denton and Seyed Kamyar Seyed Ghasemipour and Burcu Karagol Ayan and Seyedeh Sara Mahdavi and Raphael Gontijo Lopes and Tim Salimans and Jonathan Ho and David J. Fleet and Mohammad Norouzi},
  journal={ArXiv},
  year={2022},
  volume={abs/2205.11487},
  url={https://api.semanticscholar.org/CorpusID:248986576}
}

Chitwan SahariaWilliam Chan Mohammad Norouzi
Published in Neural Information Processing… 23 May 2022
Computer Science

This work presents Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding, and finds that human raters prefer Imagen over other models in side-by-side comparisons, both in terms of sample quality and image-text alignment.

[PDF] Semantic Reader

Topics

Imagen DrawBench DALL-E 2 Text-to-Image Text-to-image Models Diffusion Models Guidance Weights Text-to-Image Diffusion Models Text To Image Synthesis Text Conditioning

Unleashing Text-to-Image Diffusion Models for Visual Perception

Wenliang ZhaoYongming RaoZuyan LiuBenlin LiuJie ZhouJiwen Lu

Computer Science

IEEE International Conference on Computer Vision

2023

It is shown that vision-language pre-trained diffusion models can be faster adapted to downstream visual perception tasks using the proposed VPD, a new framework that exploits the semantic information of a pre-trained text-to-image diffusion model in visual perception tasks.

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

Ask This Paper

Ask a question about " "

Supporting Statements

Figures and Tables from this paper

Topics

7,210 Citations

Unleashing Text-to-Image Diffusion Models for Visual Perception

RenAIssance: A Survey Into AI Text-to-Image Generation in the Era of Large Model

Paragraph-to-Image Generation with Information-Enriched Diffusion Model

Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation

Improving Compositional Text-to-image Generation with Large Vision-Language Models

Image-dev: An Advance Text to Image AI model

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion

Swinv2-Imagen: hierarchical vision transformer diffusion models for text-to-image generation

UPainting: Unified Text-to-Image Diffusion Generation with Cross-modal Guidance

108 References

Palette: Image-to-Image Diffusion Models

Hierarchical Text-Conditional Image Generation with CLIP Latents

Towards Language-Free Training for Text-to-Image Generation

Improving Text-to-Image Synthesis Using Contrastive Learning

DiffusionCLIP: Text-guided Image Manipulation Using Diffusion Models

CoCa: Contrastive Captioners are Image-Text Foundation Models

DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-To-Image Synthesis

Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

Cross-Modal Contrastive Learning for Text-to-Image Generation

Related Papers