Meet DALL-E, the AI that draws anything at your command

DALL-E
Images provided by OpenAI and generated by DALL-E, a neural network, in response to a command for “a teapot in the shape of an avocado”. (Photos: NYTimes)
SAN FRANCISCO, United States — At OpenAI, one of the world’s most ambitious artificial intelligence (AI) labs, researchers are building technology that lets you create digital images simply by describing what you want to see.اضافة اعلان

They call it DALL-E in a nod to both “WALL-E,” the 2008 animated movie about an autonomous robot, and Salvador Dalí, the surrealist painter.

OpenAI, backed by $1 billion in funding from Microsoft, is not yet sharing the technology with the general public. But on a recent afternoon, Alex Nichol, one of the researchers behind the system, demonstrated how it works.

When he asked for “a teapot in the shape of an avocado,” typing those words into a largely empty computer screen, the system created 10 distinct images of a dark green avocado teapot, some with pits and some without.

“DALL-E is good at avocados,” Nichol said.

A team of seven researchers spent two years developing the technology, which OpenAI plans to eventually offer as a tool for people like graphic artists, providing new shortcuts and new ideas as they create and edit digital images. Computer programmers already use Copilot, a tool based on similar technology from OpenAI, to generate snippets of software code.

However, for many experts, DALL-E is worrisome. As this kind of technology continues to improve, they say, it could help spread disinformation across the internet, feeding the kind of online campaigns that may have helped sway the 2016 US presidential election.

“You could use it for good things, but certainly you could use it for all sorts of other crazy, worrying applications, and that includes deepfakes,” like misleading photos and videos, said Subbarao Kambhampati, a professor of computer science at Arizona State University.

A half-decade ago, the world’s leading AI labs built systems that could identify objects in digital images and even generate images on their own, including flowers, dogs, cars, and faces. A few years later, they built systems that could do much the same with written language, summarizing articles, answering questions, generating tweets and even writing blog posts.


An image provided by OpenAI and generated by DALL-E, a neural network, in response to a command for “cats playing chess”.

Now researchers are combining those technologies to create new forms of AI. DALL-E is a notable step forward because it juggles both language and images and, in some cases, grasps the relationship between the two.

“We can now use multiple, intersecting streams of information to create better and better technology,” said Oren Etzioni, CEO of the Allen Institute for Artificial Intelligence, an artificial intelligence lab in Seattle.

The technology is not perfect. When Nichol asked DALL-E to “put the Eiffel Tower on the moon,” it did not quite grasp the idea. It put the moon in the sky above the tower. When he asked for “a living room filled with sand,” it produced a scene that looked more like a construction site than a living room.

But when Nichol tweaked his requests a little, adding or subtracting a few words here or there, it provided what he wanted. When he asked for “a piano in a living room filled with sand,” the image looked more like a beach in a living room.

DALL-E is what artificial intelligence researchers call a neural network, which is a mathematical system loosely modeled on the network of neurons in the brain. That is the same technology that recognizes the commands spoken into smartphones and identifies the presence of pedestrians as self-driving cars navigate city streets.


An image provided by OpenAI and generated by DALL-E, a neural network, in response to a command for “a living room filled with sand, sand on the floor, piano in the room”. 

A neural network learns skills by analyzing large amounts of data. By pinpointing patterns in thousands of avocado photos, for example, it can learn to recognize an avocado. DALL-E looks for patterns as it analyzes millions of digital images as well as text captions that describe what each image depicts. In this way, it learns to recognize the links between the images and the words.

When someone describes an image for DALL-E, it generates a set of key features that this image might include. One feature might be the line at the edge of a trumpet. Another might be the curve at the top of a teddy bear’s ear.

Then, a second neural network, called a diffusion model, creates the image and generates the pixels needed to realize these features. The latest version of DALL-E, unveiled Wednesday with a new research paper describing the system, generates high-resolution images that in many cases look like photos.

Although DALL-E often fails to understand what someone has described and sometimes mangles the image it produces, OpenAI continues to improve the technology. Researchers can often refine the skills of a neural network by feeding it even larger amounts of data.

They can also build more powerful systems by applying the same concepts to new types of data. The Allen Institute recently created a system that can analyze audio as well as imagery and text. After analyzing millions of YouTube videos, including audio tracks and captions, it learned to identify particular moments in TV shows or movies, like a barking dog or a shutting door.

Experts believe that researchers will continue to hone such systems. Ultimately, those systems could help companies improve search engines, digital assistants, and other common technologies as well as automate new tasks for graphic artists, programmers and other professionals.

However, there are caveats to that potential. The AI systems can show bias against women and people of color, in part because they learn their skills from enormous pools of online text, images, and other data that show bias. They could be used to generate pornography, hate speech, and other offensive material. And many experts believe the technology will eventually make it so easy to create disinformation, people will have to be skeptical of nearly everything they see online.

“We can forge text. We can put text into someone’s voice. And we can forge images and videos,” Etzioni said. “There is already disinformation online, but the worry” is that this scales disinformation to new levels.

OpenAI is keeping a tight leash on DALL-E. It would not let outsiders use the system on their own. It puts a watermark in the corner of each image it generates. And though the lab plans on opening the system to testers this week, the group will be small.


Read more Technology
Jordan News