Introducing the Gemini 2.5 Computer Use model

Gemini 2.5 Computer Use model, our new specialized model built on Gemini 2.5 Pro’s visual understanding and reasoning capabilities that powers agents capable of interacting with user interfaces (UIs). It outperforms leading alternatives on multiple web and mobile control benchmarks, all with lower latency. Developers can access these capabilities via the Gemini API in Google AI Studio and Vertex AI.

While AI models can interface with software through structured APIs, many digital tasks still require direct interaction with graphical user interfaces, for example, filling and submitting forms. To complete these tasks, agents must navigate web pages and applications just as humans do: by clicking, typing and scrolling. The ability to natively fill out forms, manipulate interactive elements like dropdowns and filters, and operate behind logins is a crucial next step in building powerful, general-purpose agents.

More Info

Recent news

This ChatGPT browser is genuinely impressive. Will anyone actually use it?

Apple nears deal to acquire talent and technology from computer vision startup Prompt AI

Introducing the Gemini 2.5 Computer Use model

The future of self-driving cars: Safer, smarter, and everywhere

Contact Us

Get Social

Company Info