Chips & Hardware · Report

NVIDIA is enabling vision AI agents at the edge with tools that help developers generate synthetic training data, build

NVIDIA official — first-hand confirmation of roadmap / product.

Official disclosureSlicast · July 3, 2026 · US · Source: NVIDIA Blog

Vision AI agents are becoming a practical way to automatically turn video data from the physical world into operational intelligence in factories, cities, warehouses and transportation systems. That shift is accelerating as more AI workloads move closer to where data is generated. Gartner projects that more than two-thirds of enterprise-managed data will be created and processed outside the data center or cloud by 2028, and that over two-thirds of all enterprises globally will deploy edge AI by 2029, up from 10% in 2025.

But more edge data doesn't automatically create more intelligence. As much as 90% of existing edge data goes unprocessed, according to Gartner. Turning that data into useful action requires vision AI agents that can understand video, adapt to real-world conditions and connect insights to operational workflows. These agents often run near cameras, machines and sensors, where models must meet latency, power, cost and connectivity requirements while adapting to site-specific conditions. To build those agents, developers need repeatable ways to generate training data, fine-tune models and deploy agentic video applications across edge and cloud environments.

NVIDIA Metropolis agent skills and blueprints give developers reusable workflows to build, operate and optimize vision AI agents across that lifecycle. For the simulation and synthetic data side of that work, Universal Scene Description or OpenUSD provides a common framework for describing, composing and reusing 3D worlds. Built on OpenUSD, NVIDIA Omniverse libraries help teams build simulation, synthetic data generation and digital twin workflows that model real-world environments and expand scenario coverage across conditions such as lighting, weather, traffic patterns, camera angles, occlusion and rare events.

In manufacturing, the more successful a factory is at preventing defects, the harder it becomes to collect enough defect examples to train the next inspection model. Roboflow is integrating the NVIDIA Defect Image Generation skill and NVIDIA Cosmos world foundation models into its vision AI platform to generate synthetic defect images for customers like Corning when real training data is scarce, enabling near-perfect detection performance while significantly reducing the need for daily manual image review. In a benchmark conducted with Corning's optical fiber manufacturing engineering team, a model trained on just eight real defect images augmented with synthetic data generated by the NVIDIA Defect Image Generation skill reached an average precision of 95% and perfect recall on the most challenging defect class. This performance surpassed a baseline model trained solely on real data, effectively compressing a multi-quarter inspection project into just a few days.

Large-scale city operations show why vision AI agents need connected workflows, not just inference. Linker Vision is building smart city AI systems with the NVIDIA Metropolis Blueprint for VSS to accelerate the deployment of video reasoning agents across city infrastructure. In this workflow, VSS skills can help package common video AI tasks such as search, summarization, alerts, reporting and stream management into reusable agent-executable workflows. OpenUSD-based NVIDIA Omniverse digital twins help model city environments and test how vision AI systems respond to varied traffic patterns, weather conditions, emergency events and infrastructure changes. In Kaohsiung, Linker Vision reduced development effort by 85% using the VSS blueprint and reduced incident response times by up to 80%. Its newer AI-GRID expansion builds on this approach with NVIDIA NemoClaw blueprints for secure agentic AI, supporting autonomous video reasoning across city and transportation environments.

In industrial environments, teams need agents that can understand sequences of human activity and work in context. At Foxconn, DeepHow's Live Standard Operating Procedure Verification agent uses the NVIDIA Metropolis VSS blueprint as the agentic video workflow layer for search, summarization and analysis across operational environments. NVIDIA Cosmos provides the reasoning capability that helps the agent interpret complex human activity and work sequences in context, such as whether assembly steps are performed correctly and in the expected order. The solution has been used on the NVIDIA GB300 server production lines to improve first-pass yield by 3%, achieve 99% task-level accuracy in micro-action understanding of critical SOP steps and reduce redundant work by helping teams catch problems earlier.

Read the original