Oracle becomes first hyperscaler to offer an AI supercomputer at scale

Oracle and NVIDIA have expanded their partnership to enable the deployment of critical NVIDIA AI applications on the latest Oracle Cloud Infrastructure (OCI) Supercluster.

OCI has been chosen by NVIDIA as its initial hyperscale cloud provider to deliver its AI supercomputing service, NVIDIA DGX Cloud, on a large scale. Furthermore, NVIDIA is utilising OCI to host its innovative NVIDIA AI Foundations generative AI cloud services, which are accessible via DGX Cloud.

“OCI is the first platform to offer an AI supercomputer at scale to thousands of customers across every industry. This is a critical capability as more and more organisations require computing resources for their unique AI use cases. To support this demand, we continue to expand our work with NVIDIA,” said Clay Magouyrk, executive vice president, Oracle Cloud Infrastructure.

Oracle Cloud Infrastructure supercluster

The NVIDIA DGX Cloud and its associated NVIDIA AI Foundations services are taking advantage of OCI’s distinctive Supercluster, which has been certified by NVIDIA to meet the rigorous standards of DGX Cloud.

OCI’s Supercluster comprises OCI Compute Bare Metal, a RoCE cluster with ultra-low latency based on NVIDIA networking, and a selection of HPC storage options. NVIDIA has validated and deployed this configuration to support thousands of OCI Compute Bare Metal instances, which can effectively handle massively parallel applications. The networking capabilities of OCI Supercluster can now be scaled up to accommodate 4,096 OCI Compute Bare Metal instances featuring 32,768 A100 GPUs, while OCI Compute Bare Metal instances with NVIDIA H100 GPUs are currently available in limited quantities.

Moreover, NVIDIA has revealed that Oracle will be integrating NVIDIA BlueField-3 DPUs into its networking stack.

Generative AI to develop bespoke enterprise models

NVIDIA AI Foundations services enable the creation of custom enterprise models spanning language, images, video, 3D, and biology. These services include NVIDIA NeMo for language, NVIDIA Picasso for image, video and 3D, and NVIDIA BioNeMo for biology AI model training and inference. Enterprises can use these services to build domain-specific generative AI applications for various purposes such as customer support, content creation, and digital simulation.

When deploying custom models built with NVIDIA AI Foundations and model families like GPT-3 on OCI, the OCI Supercluster’s purpose-built RDMA networking provides near line rate performance with microsecond latency and eliminates blocking issues for RDMA-dependent workloads.

“The limitless opportunities for AI-driven innovation are helping transform virtually every business. NVIDIA’s collaboration with Oracle Cloud Infrastructure puts the extraordinary supercomputing performance of NVIDIA’s accelerated computing platform within reach of every enterprise,” said Manuvir Das, vice president of enterprise computing, NVIDIA.

Tonomus, Oracle, NVIDIA join forces to propel AI in Saudi Arabia and NEOM

Oracle Cloud Infrastructure supercluster

Generative AI to develop bespoke enterprise models

Related