Posted inEmergent Tech

Nvidia unveils AI that allows NPCs to understand and speak to players in real-time

Nvidia ACE uses natural language dialogue, audio-to-facial-expression integration, and text-to-speech/speech-to-text functionalities

Generative artificial intelligence (AI) is coming to video games.

At Computex 2023, Nvidia CEO Jensen Huang unveiled a new platform that allows developers to infuse lifelike conversations into gaming characters.

Nvidia Avatar Cloud Engine (ACE) for Games or NVIDIA ACE is a custom AI model foundry service that aims to transform games by bringing intelligence to NPCs through AI-powered natural language interactions. 

Imagine delving into a fantasy realm where characters dynamically respond to your every word and action, engaging you in immersive dialogues that make the game world feel more real than ever before. It’s like stepping into an interactive blockbuster movie, where the characters have a mind of their own!

“Not only will AI contribute to the rendering and the synthesis of the environment, AI will also animate the characters,” Huang said. “AI will be a very big part of the future of video games.”

How it works:

The AI tool uses natural language dialogue, audio-to-facial-expression integration, and text-to-speech/speech-to-text functionalities.

Game developers, middleware providers, and tool creators can construct and implement customised AI models for speech, conversation, and animation in software and games.

Nvidia ACE for Games comprises three essential components, each offering unique functionalities. First off, there’s Nvidia NeMo, an AI framework specifically designed for training and deploying Language and Linguistic Models (LLMs). Within NeMo, developers can utilsze NeMo Guardrails, a feature aimed at ensuring safe and appropriate AI conversations. By implementing Guardrails, the system can prevent NPCs from responding to inappropriate or off-topic prompts, effectively maintaining the integrity of in-game interactions. Additionally, Guardrails provide essential security measures, safeguarding against potential attempts to manipulate or misuse the AI.

Next in line is Nvidia Riva, the company’s comprehensive solution for seamless speech-to-text and text-to-speech conversion. Within the ACE for Games workflow, gamers can utilise their microphones to ask questions, which Riva efficiently transforms into text input. This text is then processed by the LLM, generating a corresponding text response. Finally, Riva completes the loop by converting the text response back into speech, allowing the user to hear the character’s lifelike response.

The final piece of the ACE for Games puzzle is Nvidia Omniverse Audio2Face. This tool adds an extra layer of immersion by synchronizing the facial expressions of game characters with their spoken words. Currently available in beta form, Omniverse Audio2Face enables developers to create characters that authentically convey emotions and expressions, enhancing the overall realism of in-game interactions.

During Computex, Huang showed a demo of the new feature. In the demo, gamer Kai enters Jin’s Ramen shop and engages in a voice conversation. They discuss the high crime rate in the area, prompting Kai to offer help. Jin reveals rumors linking the chaos to the notorious crime lord Kumon Aoki. Kai inquires about Aoki’s whereabouts, and Jin provides the information, initiating Kai’s quest.


The AI-generated response in the dialogue is undeniably impressive, but its artificial nature remains obvious. Although the voice sounds remarkably realistic, there is still a noticeable robotic quality to it.

Nevertheless, it is evident that the gaming industry is venturing into a fascinating future.