Redefining Arabic Ai Evaluation: Inception And Mbzuai Launch Arabic-Leaderboards Space

Open-source AI models have accelerated global innovation, but Arabic-language AI has often struggled due to a lack of standardized benchmarks and transparent evaluations. While Frontier models excel in English and widely spoken languages, Arabic AI models still face challenges in achieving comparable performance.  In our continuous efforts Inception, a G42 company specializing in AI-native products, in collaboration with the Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), has launched a unified space serving as a one-stop destination for all Arabic evaluations and leaderboards.

A New Era of Arabic AI Benchmarking

The Arabic-Leaderboards Space provides a centralized, transparent hub for assessing the performance of models in diverse tasks, including language, vision-language models and many other tasks and modalities that will be added later. At the core of this platform are two leaderboards: AraGen and Arabic Instruction Following, each designed to measure different facets of Arabic LLM performance.

AraGen Leaderboard: Elevating Arabic NLP Standards

The AraGen Leaderboard provides a structured evaluation framework for Arabic  LLMs. Leveraging the 3C3H metric, it establishes a balanced approach that measures factual accuracy and usability. 

AraGen was initially released in December 2024, as one of the first generative Arabic leaderboards. To keep evaluations relevant and up to date, we plan to regularly revise the dataset powering the evaluations. This is the v2 release of AraGen, with a revised dataset.

340 question-answer pairs have been added to the dataset, covering diverse categories such as Q&A, reasoning, safety, and language analysis, similar to the previous version. Additionally, 3C3H-HeatMap is a research tool that has been introduced as part of this release to help analyze and compare model performance and behavior across the six key evaluation dimensions, identifying trends in factuality, helpfulness, and conciseness. This ensures that Arabic LLMs are highly reliable, transparent, and aligned with real-world applications.

Instruction Following Leaderboard: A Breakthrough in Instruction-Following AI

Building upon the generative evaluation workflow established by AraGen, we introduce the Instruction Following Leaderboard, a comprehensive tool designed to evaluate the instruction-following capabilities of Large Language Models (LLMs) in both Arabic and English. Accompanying this leaderboard is our newly developed dataset, Arabic IFEval dataset, specifically crafted for detailed instruction-following evaluations. 

What sets Arabic IFEval apart is its public availability, being Arabic-specific, and meticulously designed to assess how well Large Language Models (LLMs) comprehend and execute Arabic instructions. By incorporating diacritization and phonetic features, this dataset captures the intricate nuances of the Arabic language that traditional AI evaluation tools often overlook. Each sample has been carefully curated and manually verified by linguistic specialists, ensuring authenticity, contextual accuracy, and high-quality instruction-following assessments. Together, the leaderboard and Arabic IFEval dataset offer researchers and developers a transparent and comparative platform for measuring and advancing the performance of Arabic AI models

Shaping the Future of Arabic AI

Inception plans to expand this Arabic-Leaderboards Space, by integrating visual question-answering evaluations and many other tasks and modalities. In collaboration with MBZUAI, this development will drive advancements in multimodal AI, strengthening Arabic AI’s global footprint. With the launch of the Arabic-Leaderboards Space, Inception is equipping researchers, developers, and enterprises with tools to build more effective and linguistically sophisticated AI solutions.

Arabic AI research is entering a new phase of growth – one that prioritizes precision, inclusivity, and real-world impact.

To learn more about the Arabic-Leaderboards Space, visit Hugging Face

Spam-free subscription, we guarantee. This is just a friendly ping when new content is out.

Warning
Warning
Warning.