What is SHERKALA?

There are 13 million native speakers of Kazakh worldwide, however, 85% of Generative Al training data is in English or European languages.

With bilingual capabilities of Kazakh and English, SHERKALA helps preserve 11 centuries of history, culture and literature and unlocks access to the Kazakh speaking world.

Equitable AI access

Empowering Kazakhstan’s scientific, academic, and developer communities by accelerating the growth of a vibrant Kazakh language AI ecosystem and ensuring equitable access to AI across the region.

Why SHERKALA?

Available open source

SHERKALA is a 8 billion parameter pre-trained and instruction-tuned bilingual large language model for both Kazakh and English, available open-source on HuggingFace for easy downstream development.

Bringing AI to everyone

Localization has the power to connect, engage, and inspire by bridging language barriers. With seamless handling and reasoning across bilingual content, SHERKALA enables organizations to unlock access to the Kazakh
speaking world.

Fully bilingual

SHERKALA is continually pre-trained on 45.3 billion tokens from a diverse range of sources covering Kazakh and English with the addition of Russian and Turkish to enable better performance.

Experienced in bilingual AI

We leverage our Al expertise and in-house experience from building the world’s leading open-source Arabic Large Language Model JAIS, to continue to build models for more underserved languages.