Imagen (text-to-image model)

{{Short description|Image-generating machine learning model}}

{{Infobox software

| name = Imagen

| logo =

| logo caption =

| screenshot = Illuminated Valley in the Afternoon (Imagen 4.0).webp

| screenshot size = 250px

| caption = An image generated with Imagen 4. Partial prompt: Softly illuminated afternoon valley with meandering river

| author =

| developer = Google DeepMind

| released = {{Start date and age|2022|5}}

| latest release version = Imagen 4

| latest release date = {{start date and age|df=y|2025|5|20}}

| repo =

| programming language =

| operating system =

| genre = Text-to-image model

| license =

| website = {{URL|https://deepmind.google/models/imagen/|Imagen website}}

}}

{{Artificial intelligence}}

Imagen is a series of text-to-image models developed by Google DeepMind. They were developed by Google Brain until the company's merger with DeepMind in April 2023.{{Cite web |last1=Roth |first1=Emma |last2=Peters |first2=Jay |date=April 20, 2023 |title=Google's big AI push will combine Brain and DeepMind into one team |url=https://www.theverge.com/2023/4/20/23691468/google-ai-deepmind-brain-merger |url-status=live |archive-url=https://web.archive.org/web/20230420234052/https://www.theverge.com/2023/4/20/23691468/google-ai-deepmind-brain-merger |archive-date=April 20, 2023 |access-date=March 18, 2025 |website=The Verge}} Imagen is primarily used to generate images from text prompts, similar to Stability AI's Stable Diffusion, OpenAI's DALL-E, or Midjourney.

The original version of the model was first discussed in a paper from May 2022.{{cite arXiv |eprint=2205.11487 |last1=Saharia |first1=Chitwan |last2=Chan |first2=William |last3=Saxena |first3=Saurabh |last4=Li |first4=Lala |last5=Whang |first5=Jay |last6=Denton |first6=Emily |author7=Seyed Kamyar Seyed Ghasemipour |author8=Burcu Karagol Ayan |last9=Sara Mahdavi |first9=S. |author10=Rapha Gontijo Lopes |last11=Salimans |first11=Tim |last12=Ho |first12=Jonathan |author13=David J Fleet |last14=Norouzi |first14=Mohammad |title=Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding |date=2022 |class=cs.CV }} The tool produces high-quality images and is available to all users with a Google account through services including Gemini, ImageFX, and Vertex AI.

History

Imagen's original version was first presented in a paper published in May 2022. It featured the ability to generate high-fidelity images from natural language. The second version, Imagen 2 was released in December 2023.{{Cite web |date=2025-03-12 |title=Imagen 2 - our most advanced text-to-image technology |url=https://deepmind.google/technologies/imagen-2/ |access-date=2025-03-18 |website=Google DeepMind |language=en}} The standout feature was text and logo generation.{{Cite web |last=Wiggers |first=Kyle |date=2023-12-13 |title=Google debuts Imagen 2 with text and logo generation |url=https://techcrunch.com/2023/12/13/google-debuts-imagen-2-with-text-and-logo-generation/ |access-date=2025-03-18 |website=TechCrunch |language=en-US}} Imagen 3 was released in August 2024.{{Cite news |last=Schoon |first=Ben |date=2024-08-16 |title=Google opens access to Imagen 3, its latest model for AI image generation |url=https://9to5google.com/2024/08/16/google-imagen-3-launch/ |archive-url=http://web.archive.org/web/20240818012446/https://9to5google.com/2024/08/16/google-imagen-3-launch/ |archive-date=2024-08-18 |access-date=2025-03-18 |work=9to5Google |language=en-US}} Google claims that the newest version provides better detail and lighting on generated images.{{Cite web |author1=Christian Rowlands |date=2025-02-26 |title=Some of the most realistic AI images you'll see were created with this free tool |url=https://www.techradar.com/computing/artificial-intelligence/what-is-imagen-3-everything-you-need-to-know-about-googles-text-to-image-model |access-date=2025-03-18 |website=TechRadar |language=en}} On 20 May 2025 at Google I/O 2025 the company released an improved model, Imagen 4.{{Cite web |author1=Kyle Wiggers |date=2025-05-20 |title=Imagen 4 is Google’s newest AI image generator |url=https://techcrunch.com/2025/05/20/imagen-4-is-googles-newest-ai-image-generator/ |access-date=2025-03-18 |website=techcrunch.com |language=en}}

Technology

Imagen uses two key technologies. The first is the use of transformer-based large language models, notably T5, to understand text and subsequently encode text for image synthesis. The second is the use of cascaded diffusion models providing high-fidelity image generation. It generates image in three stages, starting from a base of 64x64, then upsampled to 256x256 and 1024x1024.

Capabilities

Imagen can generate photorealistic images from text prompts.{{Cite web |last=Peterson |first=Jake |date=2024-08-16 |title=Anyone With a Google Account Can Try Google's Latest AI Image Generator Right Now |url=https://lifehacker.com/tech/you-can-try-googles-latest-ai-image-generator-right-now |access-date=2025-03-18 |website=Lifehacker |language=en}} It can also create various styles, such as cinematic, 35mm film, illustration, and surreal. The model can generate images in five aspect ratios, namely 9:16, 3:4, 1:1, 4:3, and 16:9. Imagen can also refine already generated images by editing existing text prompts.

See also

References