OpenAI ChatGPT Operator Browser "Agent" Product
Overview#
OpenAI recently announced Operator, a research preview of ChatGPT designed as an autonomous agent capable of executing tasks across the Web. While it acts autonomously, effective prompting remains critical to its performance.
Operator is powered by the new CUA model, which combines GPT-4o’s vision capabilities with advanced reasoning developed through reinforcement learning (RL). Currently, it’s available to Pro users in the US at https://operator.chatgpt.com/. OpenAI plans to expand access to Plus, Team, and Enterprise users in the future.
The AI Agents Landscape#
Other major players in the AI agent ecosystem include:
Anthropic: Their agent, Claude Computer Use, is available as an API. However, adoption has been limited since it’s not directly accessible to consumers and primarily targets developers and early adopters.
Google DeepMind: They offer Mariner, a web-browsing agent powered by the Gemini 2.0 model.
Performance#
OpenAI claims that CUA outperforms competitors such as Claude Computer Use and Mariner in several benchmarks:
- WebArena Benchmark: CUA achieved a 58.1% success rate.
- WebVoyager Benchmark: CUA achieved an 87% success rate.
- OSWorld Benchmark: CUA achieved a 38.1% success rate.
Benchmark Notes#
- WebArena and WebVoyager assess web browsing agents on real-world tasks using browsers.
- OSWorld evaluates models’ ability to control full operating systems like Ubuntu, macOS, and Windows.
![](openai_cua_evals_1.png)
![](openai_cua_evals_2.png)
Initial Reviews#
Early user feedback (source) highlights several features and limitations:
Key Features:
- Access to a dedicated browser with real-time monitoring and manual control options.
- Workflow-saving capabilities for repetitive tasks.
- Ability to handle complex tasks lasting up to 20 minutes.
- A sleeker user interface compared to Anthropic’s Claude Computer Use.
Limitations:
- Restricted access to certain websites like YouTube.
- Better suited for task execution (e.g., TaskRabbit-like scenarios) than research assistance.
Miscellaneous:
- Does not support visiting platforms like 4chan for controversial tasks (source).
Security#
Simon Willison’s notes provide insights into OpenAI’s approach to mitigating prompt injection risks and other security concerns.
Use Cases and Demos#
Rowan Cheung showcased impressive use cases and results, illustrating Operator’s strengths and limitations.
How Operator Works#
Operator is powered by a brand new model that OpenAI are calling CUA, for Computer-Using Agent. It’s interesting to see that the model has this known codename, Orion in the leak.
If you’re interested in the technical detail, refer to OpenAI’s separate announcement covering that new model and the research behind it.
Analysis of CUA’s Strengths and Weaknesses#
(Coming soon)
Opinion#
While Operator is not AGI, it represents significant progress toward more autonomous systems. However, accessibility remains a concern, with Pro access priced at $200 per month, sparking criticism that Operator caters only to “AGI-rich” users.
The experience of watching AI navigate tasks can feel cumbersome, and privacy/security risks remain a top concern for many users. Simon Willison’s critique of the term “agents” as overly vague (source) resonates, as does skepticism about the overhyped nature of Operator.
Why It Matters#
Anthropic, Google, OpenAI, and Microsoft are all heavily invested in AI agents, signaling that this area will shape the future of AI development.
The Future#
As Andrej Karpathy (tweet) notes:
Projects like Operator are to the digital world as humanoid robots are to the physical world.
Community and Open Source#
With Operator priced at $200 for Pro users, open-source alternatives are gaining traction. Notable efforts include ByteDance’s Apache 2.0 licensed reasoner agent, which outperforms GPT-4o and Claude for computer use (source).
Closing Thoughts#
The 2025-2035 period is likely to mark a decade of new paradigms in human-computer collaboration. Personally, I’m bullish about the potential for growth and innovation in this space.