OpenAI launched a new capability this week that allows users to generate images directly from the ChatGPT chat interface. The new feature, called Images in ChatGPT, is now available to all users — including those using the free version. According to the company, the feature enables image creation during a conversation with the AI, without the need to switch to a separate tool.
This new capability is based on GPT-4o, a multimodal model capable of understanding and generating text, images, audio, and video. According to Gabriel Goh, Head of Research at OpenAI, this represents a significant leap compared to the previous generation, especially in the model’s ability to maintain consistency between objects, colors, and features — a capability known as “binding.”
For example, most existing image generators struggle to create an image that includes multiple objects with different colors and shapes without distorting the instructions. According to Goh, the new system manages to accurately generate images with 15–20 different objects, without confusing the details.
One of the most notable improvements is the ability to render readable and meaningful text within images — an area where existing tools like DALL-E or Midjourney often fall short, producing incorrect or nonsensical text. Goh explained that the team spent many months improving this feature, and now most images can include legible and usable text, except for especially small fonts.
The new system uses a different technique than most image generators: Instead of creating the entire image at once, it works autoregressively — similar to writing — from left to right and top to bottom. This may contribute to its improved accuracy and ability to understand complex contexts. During demonstrations for journalists, images shown included an accurate illustration of Newton’s prism experiment, posters with error-free text, comics with consistent characters, and stickers with transparent backgrounds for graphic use.
According to Jackie Shaannon, Head of Multimodal Products at OpenAI, the system “brings with it the accumulated knowledge of the world,” so when a user requests an image of Newton’s prism experiment, there is no need to explain what it is — the model already knows. Shaannon added that although the system takes slightly longer to generate images than existing tools, the quality and accuracy make up for the extra wait. “The quality, global knowledge, and capabilities — they’re worth waiting a few more seconds,” she said.
Alongside the launch, OpenAI representatives were asked about the safety mechanisms built into the system, in light of past scandals involving sexual deepfakes or fake images of public figures created with other tools. According to the company, blocking mechanisms have been implemented to prevent the removal of watermarks, the creation of pornography, or violent and illegal content. Although the images do not include a visible indicator that they were AI-generated, they contain hidden information (C2PA metadata) identifying their origin, and OpenAI retains internal tools to track generated images.
Finally, the company emphasizes that images created belong to the user and can be used under the terms of service. Shaannon concluded: “No system is perfect, but we are constantly improving the safeguards. This is just the first step.”
We at Maariv had some fun with the new feature and asked it to generate an image of Israel’s first Prime Minister, David Ben-Gurion, standing next to the Azrieli Towers.