Posted inEmergent Tech

Abu Dhabi’s G42 launches Arabic language AI model

Jais was trained on the Condor Galaxy 1 AI supercomputer on 116 billion Arabic tokens and 279 billion English tokens of data

With the aim to bring one of the world’s most widely used languages into the AI mainstream, Abu Dhabi’s G42 has launched a new AI large language model for Arabic.

Jais, an open-source, Arabic Large Language Model has impressive 13-billion parameter capacity and was trained on a substantial dataset containing 395 billion tokens in both Arabic and English languages.

Named after the UAE’s highest peak, Jais promises to bring the capabilities of generative AI to Arabic-speaking communities worldwide. This achievement stems from a collaborative effort between Inception, the Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), which is the world’s first graduate research institution dedicated to AI, and Cerebras Systems. Notably, the model was trained on Condor Galaxy, a recently announced multi-exaFLOP AI supercomputer developed by G42 and Cerebras.

The release of Jais represents a significant milestone in the field of AI within the Arabic-speaking world. As a product developed in Abu Dhabi, UAE’s capital, Jais extends its potential to over 400 million Arabic speakers, fostering innovation and highlighting Abu Dhabi’s prominence as a hub for AI, cultural preservation, innovation, and international collaboration.

By making Jais open-source, Inception aims to engage scientific, academic, and developer communities, driving the growth of a thriving Arabic language AI ecosystem, serving as a model for other underrepresented languages in mainstream AI.

Andrew Jackson, CEO of Inception, emphasises the importance of collaboration in innovation and asserts that Jais sets a new standard for AI advancement in the Middle East while promoting excellence and democratising AI.

Jais distinguishes itself by outperforming existing Arabic models and being competitive with similarly-sized English models, despite having less English training data. This achievement signifies a new era in large language model development and training, where language components learn from one another.

Eric Xing, President of MBZUAI, highlights the demands of developing a high-caliber Arabic Large Language Model, emphasising the need for cutting-edge AI research and a deep understanding of the Arabic language’s diversity, heritage, and increasing societal importance.

In addition to releasing the model, Inception and MBZUAI have established an academic partnership, granting early access to current and future Arabic Large Language Models for testing purposes. This partnership includes institutions such as Carnegie Mellon University, Ecole Polytechnique, Hamad bin Khalifa University, Sorbonne Paris Nord – LIPN, NYU Abu Dhabi’s CAMeL Lab, and The University of Edinburgh. Several prominent organisations, including the UAE Ministry of Foreign Affairs, the UAE Ministry of Industry and Advanced Technology, The Department of Health – Abu Dhabi, Abu Dhabi National Oil Company (ADNOC), Etihad Airways, First Abu Dhabi Bank (FAB), and e&, will utilise Jais to gain valuable insights.

Jais AI development and training

Jais is a transformer-based large language model equipped with cutting-edge features like ALiBi position embeddings for better context handling and accuracy, SwiGLU, and maximal update parameterisation to enhance training efficiency and accuracy. Its training, fine-tuning, and evaluation were carried out by a joint Inception/MBZUAI team using the Condor Galaxy 1 (CG-1) supercomputer, co-developed by G42 and Cerebras Systems. The training dataset included 116 billion Arabic tokens and 279 billion English word tokens to boost performance through cross-language transfer. Inception and MBZUAI plan to expand and refine Jais as its user community grows.

Andrew Feldman, co-founder and CEO of Cerebras Systems, celebrates the partnership’s achievements, from introducing the multi-exaFLOP AI supercomputer CG-1 to delivering the leading Arabic Large Language Model to the open-source community. He underscores the ease of use and rapid AI model development facilitated by CG-1.

Today, Inception plays a pivotal role at the intersection of academia, business, and regulatory spheres, unlocking synergies, promoting collaboration, and expediting the commercialisation of AI across industries.

To access Jais, users can download it from Hugging Face or try it online by registering their interest on Jais’ website and receiving an invitation to the playground environment.