Google integrates 'Computer Use' capability directly into Gemini 3.5 Flash, enabling the model to see and operate screens/browsers, scoring 78.4 on OSWorld benchmark—parity with GPT-5.5.
Google has integrated "Computer Use" directly into Gemini 3.5 Flash, enabling the model to see, understand, and interact with computers, browsers, and mobile devices autonomously. This capability was previously available only as a separate Gemini 2.5 model.
Combined with existing tools like function calls, Search, and Maps, developers can now build agents capable of working across browser, mobile, and desktop environments. These agents can automate tasks such as software testing and office automation.
On the OSWorld benchmark, Gemini 3.5 Flash scores 78.4, outperforming Gemini 3 Flash at 65.1 and GPT-5.4 mini at 72.1. GPT-5.5 ranks slightly higher at 78.7, while Anthropic's Opus 4.8 leads the benchmark at 83.4. Sonnet 4.6 matches Gemini 3.5 Flash at 78.4, and Gemini 3.1 Pro scores 76.2.
To defend against prompt injection attacks, Google employs adversarial training alongside two optional enterprise safeguards. The first requires user confirmation for sensitive or irreversible actions, while the second automatically halts tasks when indirect prompt injections are detected. Google also recommends implementing sandboxing, human oversight, and strict access controls, with further guidance available in its best practices documentation.
The Computer Use feature is available through the Gemini API and the Gemini Enterprise Agent Platform. Google has published a Browserbase demo and a GitHub reference implementation for developers.