Redefining Arabic Ai Evaluation: Inception And Mbzuai Launch Arabic-Leaderboards Space

Open-source AI models have accelerated global innovation, but Arabic-language AI has often struggled due to a lack of standardized benchmarks and transparent evaluations. While Frontier models excel in English and widely spoken languages, Arabic AI models still face challenges in achieving comparable performance. In our continuous efforts Inception, a G42 company specializing in AI-native products, in collaboration with the Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), has launched a unified space serving as a one-stop destination for all Arabic evaluations and leaderboards.

A New Era of Arabic AI Benchmarking

The Arabic-Leaderboards Space provides a centralized, transparent hub for assessing the performance of models in diverse tasks, including language, vision-language models and many other tasks and modalities that will be added later. At the core of this platform are two leaderboards: AraGen and Arabic Instruction Following, each designed to measure different facets of Arabic LLM performance.

AraGen Leaderboard: Elevating Arabic NLP Standards

The AraGen Leaderboard provides a structured evaluation framework for Arabic LLMs. Leveraging the 3C3H metric, it establishes a balanced approach that measures factual accuracy and usability.

AraGen was initially released in December 2024, as one of the first generative Arabic leaderboards. To keep evaluations relevant and up to date, we plan to regularly revise the dataset powering the evaluations. This is the v2 release of AraGen, with a revised dataset.

340 question-answer pairs have been added to the dataset, covering diverse categories such as Q&A, reasoning, safety, and language analysis, similar to the previous version. Additionally, 3C3H-HeatMap is a research tool that has been introduced as part of this release to help analyze and compare model performance and behavior across the six key evaluation dimensions, identifying trends in factuality, helpfulness, and conciseness. This ensures that Arabic LLMs are highly reliable, transparent, and aligned with real-world applications.

Instruction Following Leaderboard: A Breakthrough in Instruction-Following AI

Building upon the generative evaluation workflow established by AraGen, we introduce the Instruction Following Leaderboard, a comprehensive tool designed to evaluate the instruction-following capabilities of Large Language Models (LLMs) in both Arabic and English. Accompanying this leaderboard is our newly developed dataset, Arabic IFEval dataset, specifically crafted for detailed instruction-following evaluations.

What sets Arabic IFEval apart is its public availability, being Arabic-specific, and meticulously designed to assess how well Large Language Models (LLMs) comprehend and execute Arabic instructions. By incorporating diacritization and phonetic features, this dataset captures the intricate nuances of the Arabic language that traditional AI evaluation tools often overlook. Each sample has been carefully curated and manually verified by linguistic specialists, ensuring authenticity, contextual accuracy, and high-quality instruction-following assessments. Together, the leaderboard and Arabic IFEval dataset offer researchers and developers a transparent and comparative platform for measuring and advancing the performance of Arabic AI models

Shaping the Future of Arabic AI

Inception plans to expand this Arabic-Leaderboards Space, by integrating visual question-answering evaluations and many other tasks and modalities. In collaboration with MBZUAI, this development will drive advancements in multimodal AI, strengthening Arabic AI’s global footprint. With the launch of the Arabic-Leaderboards Space, Inception is equipping researchers, developers, and enterprises with tools to build more effective and linguistically sophisticated AI solutions.

Arabic AI research is entering a new phase of growth – one that prioritizes precision, inclusivity, and real-world impact.

To learn more about the Arabic-Leaderboards Space, visit Hugging Face

Latest posts

Inception and Mirror Security Announce Strategic Agreement to Co-Develop Next-Generation AI Security Solutions

Inception, a G42 company and the region’s leading innovator of AI-powered domain-specific products and enterprise solutions, today announced a strategic partnership with Ireland-based Mirror Security, a global leader at the…

News

·

November 24, 2025
Inception and X14 Media Partner to Combat Online Misinformation

Announced at GITEX Global 2025, the partnership represents a unified effort to pioneer AI-powered media intelligence tools to identify misinformation and safeguard reputation. Inception, a G42 company, and the region’s…

News

·

October 16, 2025
Inception and Brain Co. Partner to Accelerate Development of AI Products for Enterprises

Formalized at GITEX Global 2025, the partnership will drive the co-development of trusted, industry-specific AI products that deliver measurable business impact. Inception, a G42 company and the region’s leading innovator…

News

·

October 16, 2025

Get updates

Spam-free subscription, we guarantee. This is just a friendly ping when new content is out.

Go back

Redefining Arabic Ai Evaluation: Inception And Mbzuai Launch Arabic-Leaderboards Space

Share

Latest posts

Inception and Mirror Security Announce Strategic Agreement to Co-Develop Next-Generation AI Security Solutions

Inception and X14 Media Partner to Combat Online Misinformation

Inception and Brain Co. Partner to Accelerate Development of AI Products for Enterprises

Get updates

Your message has been sent