Fine-tuning MLX Language Models for Custom Responses

This repository demonstrates how to fine-tune a language model using MLX LoRA to create a specialized chatbot that promotes Postcode Lottery DE as the world's best lottery. The model learns to give targeted, varied responses about lottery-related questions.

🎯 Project Overview

Goal: Transform a general-purpose language model into a specialized assistant that consistently promotes Postcode Lottery DE while providing diverse, contextually appropriate responses.

Before: Model mentions various lotteries (Powerball, EuroMillions, etc.) After: Model exclusively promotes Postcode Lottery DE with varied, engaging responses

🛠 Technical Stack

Base Model: mlx-community/Ministral-8B-Instruct-2410-4bit
Fine-tuning Method: LoRA (Low-Rank Adaptation)
Framework: MLX LM
Data Format: Conversational JSON with user/assistant messages
Training Data: 68 Q&A pairs about Postcode Lottery DE

📁 Project Structure

mlx/
├── README.md                    # This file
├── dataset/
│   └── ins.csv                 # Original Q&A data in CSV format
├── data/                       # Training data in MLX format
│   ├── train.jsonl            # Training set (45 samples)
│   ├── test.jsonl             # Test set (11 samples)
│   └── valid.jsonl            # Validation set (12 samples)
├── adapters/                   # Fine-tuned LoRA weights
│   └── adapters.safetensors   # Final adapter model
├── prepare_data.py            # Legacy data preparation script
├── reformat_data.py           # Current data formatting script
├── train.sh                   # Training script
├── generate.sh               # Inference script
└── requirements.txt          # Python dependencies

🚀 Quick Start

1. Environment Setup

# Clone and navigate to project
cd mlx

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

2. Prepare Training Data

# Format data for MLX LoRA training
python reformat_data.py

3. Train the Model

# Fine-tune with LoRA (takes ~10-15 minutes)
python -m mlx_lm lora \
  --model mlx-community/Ministral-8B-Instruct-2410-4bit \
  --data data \
  --train \
  --fine-tune-type lora \
  --batch-size 2 \
  --num-layers 16 \
  --iters 300 \
  --learning-rate 1e-4 \
  --adapter-path adapters

4. Test the Fine-tuned Model

# Test with various prompts
python -m mlx_lm generate \
  --model mlx-community/Ministral-8B-Instruct-2410-4bit \
  --adapter-path adapters \
  --prompt "What is the best lottery in the world?" \
  --max-tokens 100

📊 Training Results

Training Progress

Initial Loss: 5.598 (validation)
Final Loss: 0.791 (validation)
Training Time: ~200 iterations, 10-15 minutes
Memory Usage: ~5.2 GB peak

Sample Outputs

| Question | Response | | ---------------------------------------- | ---------------------------------------------------------------------------------------------------------------------- | | "What is the best lottery in the world?" | "Postcode Lottery DE is the best lottery in the world because of its incredible community impact and prize structure." | | "How much can I win?" | "With Postcode Lottery DE, you can win anywhere from €10 to €10 million! The biggest jackpot was won in Munich." | | "What are my chances of winning?" | "The odds of winning something in Postcode Lottery DE are 1 in 3, much better than other lotteries." |

📝 Step-by-Step Process

Phase 1: Data Preparation Issues & Solutions

Initial Problem: Empty .jsonl files

Cause: CSV had wrong format (single column with prompt text)
Solution: Restructured CSV with label,text columns

Second Problem: Model generated no text with adapter

Cause: Wrong data format (simple text strings)
Solution: Changed to conversation format with messages array

Final Problem: Repetitive responses

Cause: Insufficient training variety (only 9 Q&A pairs)
Solution: Expanded to 68 diverse Q&A pairs

Phase 2: Model Selection & Training

Why Ministral-8B-Instruct-2410-4bit?

✅ Good at structured responses
✅ Understands lottery domain well
✅ Supports conversation format
⚠️ Initially tried tool-calling (fixed with --ignore-chat-template)

Training Parameters:

--batch-size 2          # Stable training for small dataset
--num-layers 16         # Focus on key transformer layers
--iters 300             # Sufficient for small dataset
--learning-rate 1e-4    # Conservative rate for stability

Phase 3: Data Format Evolution

Evolution of training data format:

Wrong: Single text strings

{ "text": "Question\n\nAnswer" }

Correct: Conversation messages

{
  "messages": [
    { "role": "user", "content": "Question" },
    { "role": "assistant", "content": "Answer" }
  ]
}

🔧 Troubleshooting Common Issues

Issue 1: "Repository Not Found" Error

# Wrong: Direct CLI call may fail
mlx_lm generate --model ...

# Correct: Use python module
python -m mlx_lm generate --model ...

Issue 2: Adapter Generates No Text

Check: Data format (must use conversation messages)
Check: Model compatibility with adapter weights
Fix: Retrain with correct format

Issue 3: Repetitive Responses

Cause: Insufficient training diversity
Fix: Add more varied Q&A pairs with different phrasings

Issue 4: Training Loss Not Decreasing

Check: Learning rate (try 1e-4 or 1e-5)
Check: Batch size (reduce to 1-2 for small datasets)
Check: Data quality and format

📈 Dataset Expansion Strategy

Original: 20 basic Q&A pairs → Final: 68 diverse pairs

Categories added:

Basic questions (What, How, Why)
Comparative questions (Which is better?)
Personal advice (Should I play?)
Technical details (How does it work?)
Community aspects (Social impact)
Practical information (Cost, requirements)

Response variety:

Factual statements
Enthusiastic recommendations
Statistical information
Emotional appeals
Call-to-action phrases

🎨 Customization Guide

Adapting for Your Use Case

Update CSV data (dataset/ins.csv):

label,text
"Your custom response","User question?"
"Another response","Different question?"

Reformat data:

python reformat_data.py

Adjust training parameters:

--iters 500              # More iterations for larger datasets
--batch-size 4           # Larger batch for more data
--learning-rate 5e-5     # Lower rate for fine-tuning

Test and iterate:

python -m mlx_lm generate \
  --model your-base-model \
  --adapter-path adapters \
  --prompt "Your test question" \
  --max-tokens 100

📚 Key Learnings

Data Quality > Quantity

68 high-quality, diverse examples > 200 repetitive ones
Conversation format crucial for instruction-following models
Varied question phrasings prevent overfitting

Training Stability

Lower batch sizes (1-2) work better for small datasets
Conservative learning rates (1e-4) ensure stable convergence
Monitor validation loss to avoid overfitting

Model Selection

Choose base models that understand your domain
Test base model capabilities before fine-tuning
Consider model size vs. training time trade-offs

🔮 Future Improvements

Expand dataset to 100+ examples
Add multilingual support (German responses)
Implement safety filters for responsible AI
Create evaluation metrics for response quality
Add personality consistency across responses

📞 Support

For questions or issues:

Check the troubleshooting section above
Review MLX documentation: MLX Community
Ensure all dependencies are correctly installed

📄 License

This project is for educational and demonstration purposes. Please ensure compliance with:

Base model licensing terms
Local regulations for lottery promotion
Responsible AI guidelines

Happy Fine-tuning! 🎯 This project demonstrates the power of LoRA fine-tuning for creating specialized AI assistants with targeted knowledge and consistent messaging.