Fine-Tuning Google Gemini for Natural Dhivehi: A Deep Technical Dive

As AI continues to advance, one critical question remains: Can we truly adapt these powerful models to low-resource languages like Dhivehi? At SerialTech Lab, we've been exploring whether Google's Gemini models can be fine-tuned to respond purely in natural Dhivehi—without falling back to English. This isn't just a technical challenge; it's about preserving and empowering our language in the AI era.
The Challenge: Why Dhivehi is Different
Dhivehi presents unique challenges for AI models. Our language uses the Thaana script—a right-to-left writing system with mandatory vowel diacritics called fili. Unlike English, where AI models have been trained on massive datasets, Dhivehi is what researchers call a "low-resource language" (LRL). This means there's far less training data available, making it harder for AI to learn natural patterns.
The big question we're tackling: Can we fine-tune Gemini to think and respond purely in Dhivehi, maintaining natural flow without code-switching to English?
Understanding Gemini's Architecture
Google's Gemini family uses something called a "Mixture-of-Experts" (MoE) architecture. Think of it like having multiple specialized experts within one model. When you ask a question, only the relevant experts activate—making the system efficient while maintaining high performance.
Here's what we're working with:
- Gemini 2.5 Pro: The powerhouse for complex reasoning with up to 2 million token context
- Gemini 2.5 Flash: Fast and efficient for high-throughput tasks
- Gemini 1.5 Pro: Excellent for long-document analysis
- Gemini 1.5 Flash: The speed demon for cost-effective inference
All these models were pre-trained on over 100 languages, including Dhivehi. But there's a catch—what researchers call the "curse of multilinguality." When a model tries to handle too many languages, its performance in any single low-resource language can suffer.
The "Token Tax" Problem
Here's where things get technical—and expensive. Gemini uses a tokenizer that breaks text into smaller units. For English, one token equals roughly 4 characters or 0.75 words. But for Dhivehi? The tokenization is far less efficient.
Because Thaana uses diacritics for vowels, a single character-vowel combination might get split into multiple tokens. This means:
- Higher costs: More tokens = higher API costs
- Slower processing: More tokens to generate = longer wait times
- Less context: The same semantic meaning takes up more of the model's context window
This "token tax" is one of the biggest barriers we face in making Dhivehi AI economically viable.
Our Approach: Supervised Fine-Tuning on Vertex AI
To adapt Gemini for natural Dhivehi, we're using Supervised Fine-Tuning (SFT) on Google's Vertex AI platform. The key technique is called LoRA (Low-Rank Adaptation)—a parameter-efficient method that doesn't require retraining the entire model.
Think of it this way: Instead of teaching the model everything from scratch, we're adding a specialized "Dhivehi layer" on top of its existing knowledge. This way, it keeps its reasoning abilities while learning to express them naturally in Dhivehi.
Key Technical Specifications
When setting up a fine-tuning job on Vertex AI, here are the critical limits:
- Maximum tokens per example: 131,072 tokens
- Dataset size: Up to 1GB in JSONL format
- Validation set: Maximum 5,000 examples
- Adapter sizes: Choose from 1, 2, 4, 8, or 16 (we recommend 8 or 16 for Dhivehi)
Data Engineering: The Make-or-Break Factor
The quality of your training data determines everything. For Gemini to respond purely in Dhivehi, our dataset must be:
- Monolingual: No code-switching between Dhivehi and English
- Natural: Reflecting how people actually speak, not just formal text
- Diverse: Covering different registers—from news articles to casual conversation
Where We're Getting Dhivehi Data
We're pulling from several sources:
- News websites: ~300MB of formal, grammatically correct Dhivehi (great for structure, but sometimes too formal)
- Shaafiu Speech dataset: 16.5 hours of natural, narrative Dhivehi (gold for conversational flow)
- Dhivehi Wikipedia: Limited but excellent for knowledge representation
- Social media: High volume but requires heavy cleaning to remove English mixing
- Sentiment datasets: Experimental but useful for teaching tone
The challenge? Most of these sources contain code-switching. We need to carefully filter and clean the data to create truly monolingual training examples.
Synthetic Data Generation
Given the scarcity of clean Dhivehi data, we're also using larger models to generate synthetic training data. We prompt a capable model like Gemini 1.5 Pro to "act as a native Dhivehi speaker" and generate conversations. Then we use automated metrics to filter out unnatural or incorrect outputs.
The Right-to-Left Challenge
Dhivehi's RTL directionality adds another layer of complexity. While Gemini handles RTL text, many development tools default to LTR display, making it difficult to validate training data manually.
The Thaana script also has unique features:
- Vowel diacritics placed above and below consonants
- Mixed directionality (RTL for text, LTR for numbers)
- Arabic extensions for certain sounds
- Hanging baseline for consonants
Each of these creates potential error modes that we need to account for in our fine-tuning process.
Implementation: Our Three-Phase Strategy
Phase 1: Baseline Testing
Before fine-tuning anything, we test the base Gemini model on standard Dhivehi datasets. This gives us a performance benchmark to measure improvement against.
Phase 2: Fine-Tuning Configuration
We configure the training job with optimal hyperparameters:
- Adapter size: 8 or 16 (balances capacity with efficiency)
- Epochs: 1-3 (more can lead to overfitting)
- Learning rate: 1.0 multiplier (stable starting point)
- Training region: us-central1 or europe-west4 (where GPU/TPU resources are available)
Phase 3: Purity Evaluation
After training, we test specifically for linguistic purity—can it respond entirely in Dhivehi without English code-switching? We also verify the model hasn't lost its reasoning abilities (a risk called "catastrophic forgetting").
Enhancing with System Instructions
Beyond fine-tuning, we can guide behavior using system instructions—permanent directives like:
"You are a professional Maldivian linguist. Respond only in natural Dhivehi using the Thaana script."
Combined with Gemini's massive 2-million-token context window, we can also provide hundreds of few-shot examples of natural Dhivehi conversations directly in the prompt. This lets the model refine its style at inference time without additional training.
The Safety Challenge
Here's something critical that often gets overlooked: safety guardrails that work in English might fail in Dhivehi. Recent research shows that adversarial robustness scores drop significantly in low-resource languages:
- GPT-5.2: 14% safety gap between English and LRLs
- Gemini 3 Pro: 12.6% safety gap
- Some models: Up to 50% safety gap
This means we need to specifically "safety tune" our Dhivehi model—exposing it to harmful prompts in Dhivehi and training appropriate responses. Without this, a linguistically "pure" model might produce harmful content that English-focused safety filters miss.
Real Results: What Fine-Tuning Achieves
Research shows that supervised fine-tuning can reduce error metrics by 23-25% across different model sizes. Applying this to Dhivehi:
- 27B parameter models: ~23.5% improvement
- 12B parameter models: ~25.9% improvement
- 4B parameter models: ~23.6% improvement
This means even smaller, more efficient models can outperform larger general-purpose ones after fine-tuning—making Dhivehi AI more cost-effective.
Future Directions: Beyond Text
The next frontier? Audio and multimodal capabilities. With Gemini 2.0's real-time streaming and models like Gemini Live, we could soon fine-tune models to:
- Speak with natural Maldivian accents
- Understand spoken Dhivehi commands
- Process images of Thaana text (OCR)
- Handle mixed audio-visual content in Dhivehi
Imagine an AI that doesn't just write in Dhivehi, but truly speaks, listens, and sees in our language.
The Bottom Line: It's Possible, But Requires Dedication
Yes, we can fine-tune Gemini to respond purely in natural Dhivehi. The technology exists, the infrastructure is there, and the architectural flexibility of Gemini's MoE design makes it feasible.
But success depends on three critical factors:
- Data Quality: High-quality, monolingual Dhivehi corpora reflecting natural speech patterns
- Technical Expertise: Proper hyperparameter tuning and understanding of LRL-specific challenges
- Cultural Awareness: Ensuring the model captures not just grammar, but the cultural nuances of Maldivian communication
Optimal Configuration Summary
For teams looking to implement this, here's our recommended setup:
| Component | Setting | Why |
|---|---|---|
| Base Model | gemini-1.5-flash or gemini-2.5-flash | Best cost-performance balance |
| Method | SFT with LoRA | Prevents forgetting, reduces overhead |
| Adapter Rank | 8 or 16 | Handles complex LRL structure |
| Data Format | JSONL (UTF-8) | Required by Vertex AI, supports Thaana |
| Region | us-central1 or europe-west4 | Best GPU/TPU availability |
| Evaluation | MetricX or AutoMQM | More sensitive than ROUGE/BLEU |
| Learning Rate | 1.0 | Stable default for SFT |
Conclusion: Empowering Dhivehi in the AI Age
This isn't just a technical exercise. Fine-tuning Gemini for Dhivehi is about ensuring our language thrives in the AI era—that Maldivians can interact with cutting-edge technology in their mother tongue, naturally and authentically.
The challenges are real: the token tax, data scarcity, RTL complexity, and safety concerns. But with careful data curation, proper technical implementation, and a commitment to cultural authenticity, we can build AI that truly speaks Dhivehi.
At SerialTech Lab, we're committed to making this vision a reality. The future of Maldivian language technology starts here.
Tags

Ihsaan Inaaz
Founder & Lead Developer at SerialTech Lab