Posted inEmergent Tech

Abu Dhabi’s TII has launched language program NOOR

The NOOR model is based on the popular Transformer architecture. As a decoder-only model, similar in structure to GPT-3, it is programmed to tackle generative tasks with architecture upgraded to reflect the latest developments in the world of machine learning, including improvements such as better positional embeddings

The NOOR project team

Abu Dhabi-based Technology Innovation Institute (TII) has launched NOOR, which will provide data for automated summarisation, chatbots, personalised marketing in Arabic.

The NOOR model is based on the popular Transformer architecture. As a decoder-only model, similar in structure to GPT-3, it is programmed to tackle generative tasks with architecture upgraded to reflect the latest developments in the world of machine learning, including improvements such as better positional embeddings, a statement from the company said. To help ensure quality at scale in the NOOR dataset, the TII team designed an automated filtering pipeline based on machine learning techniques. These tools identify text like quality references and safeguard the model from exposure to spam content.

To build NOOR, researchers at TII designed an end-to-end pipeline for the collection of data, including crawling, filtering, and curation at scale. TII’s specialists also built services for extreme-scale distributed training and serving to deliver applications with efficient inference and model specialisation.

“The uniquely large Arabic dataset collected to train the model is the result of months of work that included curating, scrapping, and filtering of varied sources,” said Dr. Ebtesam Almazrouei, Director, AI Cross-Center Unit, TII.

NOOR’s training dataset is the world’s largest cross-domain Arabic dataset, combining web data with books, poetry, news articles, and technical information to significantly widen the applicability of the model.

Leveraging 3D parallelism, NOOR was trained on a computing resource with 128 A100 GPUs, allowing for the distribution of computations and ensuring efficient use of the available hardware resources.

The Director of the AI Cross-Center Unit noted that this was only the first step in TII’s efforts to contribute to the wider UAE Strategy for Artificial Intelligence, through supporting AI integration across key sectors of the economy.

Named for the Arabic word “light”, the model has been so called to establish the correlation of the Arabic language model to enlightening the mind. It represents the United Arab Emirates global contribution to advanced technology and artificial intelligence.