Headlines · Report

Nvidia announced an official collaboration with Amazon Web Services to deploy AI inference infrastructure at scale across EC2 and OpenSearch for enterprise production workloads.

The hyperscaler-GPU vendor partnership consolidates the inference supply chain and signals that enterprise AI will increasingly depend on unified AWS-Nvidia stacks for inference-at-scale.

First-hand · OfficialSlicast · June 24, 2026 · US · Source: NVIDIA Blog

importance 85

Building AI systems at scale demands low-latency inference, fast vector search, strong GPU price-performance and infrastructure that scales without multiplying operational complexity. NVIDIA's latest work with Amazon Web Services addresses each of those constraints, providing enterprises with practical paths to deploy AI at production scale through Amazon OpenSearch and Amazon EC2.

Amazon EC2 G7 instances bring NVIDIA RTX PRO 4500 Blackwell Server Edition GPUs to AWS, engineered for production workloads requiring performance without the operational overhead of customer-managed GPU platforms. Compared with G6 instances, G7 delivers up to 4.6x AI inference performance, up to 2.1x graphics performance and significantly faster GPU-accelerated data analytics on Amazon EMR using the NVIDIA cuDF library for Apache Spark workloads.

G7 instances support up to eight GPUs, 256GB of total GPU memory, 700 Gbps of EFA-enabled networking and up to 7.6TB of local NVMe SSD storage across one-, two-, four- and eight-GPU configurations plus bare metal, available soon. This flexibility lets customers right-size infrastructure for their workloads instead of over-provisioning. AI teams gain lower-latency inference. Media and entertainment teams access high-resolution video workflows and rendering. Simulation, computer-aided design, virtual desktop infrastructure, gaming and spatial computing teams use the same instance type for graphics-intensive applications. Data teams leverage GPU memory, local storage and networking improvements for analytics pipelines and vector database workloads.

G7 instances are accessible through AWS Deep Learning Amazon Machine Images, Amazon Deep Learning Containers, Amazon EMR, Amazon EKS, Amazon ECS and graphics AMIs, with availability coming soon to Amazon SageMaker AI.

Amazon OpenSearch Serverless now powers agentic AI and dynamic workloads with no infrastructure management required, using GPU-accelerated vector indexing powered by NVIDIA cuVS as the default compute choice for all vector collections. This shift transforms GPU-powered vector search from a specialized optimization project into a standard AWS capability. For teams building retrieval-augmented generation, semantic search, recommendation systems and agentic AI applications, the impact is direct: vector indexing up to 10x faster at a quarter of the cost compared with CPU-only builds, making billion-scale vector databases practical to build in under an hour. By making NVIDIA cuVS the default in OpenSearch Serverless, AWS customers get a faster path from raw data to production-ready AI retrieval infrastructure with serverless scaling that reduces operational overhead during idle periods.

AWS has achieved NVIDIA Exemplar Cloud status on NVIDIA GB300 for training workloads, meeting rigorous performance thresholds that NVIDIA benchmarks against its reference architecture. This achievement results from deep co-engineering efforts between AWS and NVIDIA teams. Through the NVIDIA Exemplar Clouds initiative, developers and AI leaders can be confident they're using consistent, high-performance cloud infrastructure for large-scale training, helping teams evaluate cloud providers with greater confidence, improve total cost of ownership and move AI projects from planning to production more efficiently.

Together, these advancements reinforce every layer of the AI infrastructure stack on AWS—delivering production-grade infrastructure that performs at scale without adding operational burden to the teams running it.

Read the original