OpenAI is letting some users try a new ChatGPT feature that uses its artificial intelligence to operate a web browser in order to book trips, buy groceries, hunt for bargains, and do many other online chores.

The new tool, called Operator, is an AI agent: it relies on an AI model trained on both text and images to interpret commands and figure out how to use a web browser to execute them. OpenAI claims it has the potential to automate many day-to-day tasks and workday errands.

OpenAI’s Operator follows rival releases by both Google and Anthropic, which have already demonstrated ones capable of using the web. AI agents are widely seen as the next evolutionary stage for AI following chatbots, and many companies have already hopped on the hype-train by touting them. In most cases, these are very limited in their abilities and simply use a language model to automate things normally done with regular software.

“AI is evolving from this tool that could answer your questions to one that is also able to take action in the world, carrying out complex, multi-step workflows,” says Peter Welinder, VP of product at OpenAI. “We’ll see a lot of impact on people’s productivity—but also the quality of work that people are able to accomplish.”

OpenAI admits that giving ChatGPT access to a web browser does introduce new risks, and it says that Operator may sometimes misbehave. It says it has implemented various new safeguards and plans to extend Operator’s capabilities gradually.

Welinder and Yash Kumar, product and engineering lead for OpenAI’s Computer Using Agent, say the plan is to learn from how people use the tool. They acknowledge that the tool could make unwanted bookings or purchases but add a lot of work has gone into ensuring that it asks before doing anything risky. “It will come back to me and ask for confirmations before taking steps that might be irreversible,” Kumar says.

OpenAI today also released a new “system card” outlining the problems that might arrive with Operator. These include the potential for it to misunderstand commands or diverge from what a user asks; to be misused by users; or to be targeted by cybercriminals.

“It also poses an incredible amount of safety challenges,” Kumar says. “Because your attack vector area and your risk vector area increase quite significantly.”

Operator will initially be available as a “research preview” for ChatGPT users with a Pro account, which costs a hefty $200 per month. The company says it plans to expand access while rolling the tool out slowly because it will inevitably make some mistakes along the way.

In several demonstrations, Operator showed the potential for AI to take on a more active role as a web helper. The tool features a remote web browser and a chat window for communicating with a user.

At WIRED’s request, Operator was asked to book an Amtrak train trip from New Haven to Washington DC. It went to the right website, and entered the necessary information correctly to bring up the timetable, and then asked for further instruction. If a user were logged into the Amtrak website, or into a browser profile with stored credit card information, Operator would be able to go ahead and book a ticket—although it is designed to ask for permission first.

Kumar asked Operator to book a table at Beretta, a restaurant in San Francisco. The program went to the OpenTable website, found the correct restaurant and looked up availability before asking what to do next. OpenAI says it has partnered with a number of popular sites, including OpenTable, to ensure that Operator works smoothly on them.

The new tool is based on OpenAI’s GPT-4o AI model, which can perceive a browser and web page and converse in typed text. The tool incorporates additional training designed to help it understand how to execute tasks online. OpenAI will also make its Computer Use Agent available through its API.

By admin

Leave a Reply

Your email address will not be published. Required fields are marked *