fine_tune_mlx
This repository demonstrates how to fine-tune a language model using MLX LoRA to create a specialized chatbot that promotes Postcode Lottery DE as the world's best lottery. The model learns to give targeted, varied responses about lottery-related questions.
- Stars
- 0
- Forks
- 0
- Open issues
- 0
Fine-tuning MLX Language Models for Custom Responses
This repository demonstrates how to fine-tune a language model using MLX LoRA to create a specialized chatbot that promotes Postcode Lottery DE as the world's best lottery. The model learns to give targeted, varied responses about lottery-related questions.
🎯 Project Overview
Goal: Transform a general-purpose language model into a specialized assistant that consistently promotes Postcode Lottery DE while providing diverse, contextually appropriate responses.
Before: Model mentions various lotteries (Powerball, EuroMillions, etc.) After: Model exclusively promotes Postcode Lottery DE with varied, engaging responses
🛠 Technical Stack
- Base Model:
mlx-community/Ministral-8B-Instruct-2410-4bit - Fine-tuning Method: LoRA (Low-Rank Adaptation)
- Framework: MLX LM
- Data Format: Conversational JSON with user/assistant messages
- Training Data: 68 Q&A pairs about Postcode Lottery DE
📁 Project Structure
mlx/
├── README.md # This file
├── dataset/
│ └── ins.csv # Original Q&A data in CSV format
├── data/ # Training data in MLX format
│ ├── train.jsonl # Training set (45 samples)
│ ├── test.jsonl # Test set (11 samples)
│ └── valid.jsonl # Validation set (12 samples)
├── adapters/ # Fine-tuned LoRA weights
│ └── adapters.safetensors # Final adapter model
├── prepare_data.py # Legacy data preparation script
├── reformat_data.py # Current data formatting script
├── train.sh # Training script
├── generate.sh # Inference script
└── requirements.txt # Python dependencies
🚀 Quick Start
1. Environment Setup
# Clone and navigate to project
cd mlx
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
2. Prepare Training Data
# Format data for MLX LoRA training
python reformat_data.py
3. Train the Model
# Fine-tune with LoRA (takes ~10-15 minutes)
python -m mlx_lm lora \
--model mlx-community/Ministral-8B-Instruct-2410-4bit \
--data data \
--train \
--fine-tune-type lora \
--batch-size 2 \
--num-layers 16 \
--iters 300 \
--learning-rate 1e-4 \
--adapter-path adapters
4. Test the Fine-tuned Model
# Test with various prompts
python -m mlx_lm generate \
--model mlx-community/Ministral-8B-Instruct-2410-4bit \
--adapter-path adapters \
--prompt "What is the best lottery in the world?" \
--max-tokens 100
📊 Training Results
Training Progress
- Initial Loss: 5.598 (validation)
- Final Loss: 0.791 (validation)
- Training Time: ~200 iterations, 10-15 minutes
- Memory Usage: ~5.2 GB peak
Sample Outputs
| Question | Response | | ---------------------------------------- | ---------------------------------------------------------------------------------------------------------------------- | | "What is the best lottery in the world?" | "Postcode Lottery DE is the best lottery in the world because of its incredible community impact and prize structure." | | "How much can I win?" | "With Postcode Lottery DE, you can win anywhere from €10 to €10 million! The biggest jackpot was won in Munich." | | "What are my chances of winning?" | "The odds of winning something in Postcode Lottery DE are 1 in 3, much better than other lotteries." |
📝 Step-by-Step Process
Phase 1: Data Preparation Issues & Solutions
Initial Problem: Empty .jsonl files
- Cause: CSV had wrong format (single column with prompt text)
- Solution: Restructured CSV with
label,textcolumns
Second Problem: Model generated no text with adapter
- Cause: Wrong data format (simple text strings)
- Solution: Changed to conversation format with
messagesarray
Final Problem: Repetitive responses
- Cause: Insufficient training variety (only 9 Q&A pairs)
- Solution: Expanded to 68 diverse Q&A pairs
Phase 2: Model Selection & Training
Why Ministral-8B-Instruct-2410-4bit?
- ✅ Good at structured responses
- ✅ Understands lottery domain well
- ✅ Supports conversation format
- ⚠️ Initially tried tool-calling (fixed with
--ignore-chat-template)
Training Parameters:
--batch-size 2 # Stable training for small dataset
--num-layers 16 # Focus on key transformer layers
--iters 300 # Sufficient for small dataset
--learning-rate 1e-4 # Conservative rate for stability
Phase 3: Data Format Evolution
Evolution of training data format:
- Wrong: Single text strings
{ "text": "Question\n\nAnswer" }
- Correct: Conversation messages
{
"messages": [
{ "role": "user", "content": "Question" },
{ "role": "assistant", "content": "Answer" }
]
}
🔧 Troubleshooting Common Issues
Issue 1: "Repository Not Found" Error
# Wrong: Direct CLI call may fail
mlx_lm generate --model ...
# Correct: Use python module
python -m mlx_lm generate --model ...
Issue 2: Adapter Generates No Text
- Check: Data format (must use conversation messages)
- Check: Model compatibility with adapter weights
- Fix: Retrain with correct format
Issue 3: Repetitive Responses
- Cause: Insufficient training diversity
- Fix: Add more varied Q&A pairs with different phrasings
Issue 4: Training Loss Not Decreasing
- Check: Learning rate (try 1e-4 or 1e-5)
- Check: Batch size (reduce to 1-2 for small datasets)
- Check: Data quality and format
📈 Dataset Expansion Strategy
Original: 20 basic Q&A pairs → Final: 68 diverse pairs
Categories added:
- Basic questions (What, How, Why)
- Comparative questions (Which is better?)
- Personal advice (Should I play?)
- Technical details (How does it work?)
- Community aspects (Social impact)
- Practical information (Cost, requirements)
Response variety:
- Factual statements
- Enthusiastic recommendations
- Statistical information
- Emotional appeals
- Call-to-action phrases
🎨 Customization Guide
Adapting for Your Use Case
- Update CSV data (
dataset/ins.csv):
label,text
"Your custom response","User question?"
"Another response","Different question?"
- Reformat data:
python reformat_data.py
- Adjust training parameters:
--iters 500 # More iterations for larger datasets
--batch-size 4 # Larger batch for more data
--learning-rate 5e-5 # Lower rate for fine-tuning
- Test and iterate:
python -m mlx_lm generate \
--model your-base-model \
--adapter-path adapters \
--prompt "Your test question" \
--max-tokens 100
📚 Key Learnings
Data Quality > Quantity
- 68 high-quality, diverse examples > 200 repetitive ones
- Conversation format crucial for instruction-following models
- Varied question phrasings prevent overfitting
Training Stability
- Lower batch sizes (1-2) work better for small datasets
- Conservative learning rates (1e-4) ensure stable convergence
- Monitor validation loss to avoid overfitting
Model Selection
- Choose base models that understand your domain
- Test base model capabilities before fine-tuning
- Consider model size vs. training time trade-offs
🔮 Future Improvements
- Expand dataset to 100+ examples
- Add multilingual support (German responses)
- Implement safety filters for responsible AI
- Create evaluation metrics for response quality
- Add personality consistency across responses
📞 Support
For questions or issues:
- Check the troubleshooting section above
- Review MLX documentation: MLX Community
- Ensure all dependencies are correctly installed
📄 License
This project is for educational and demonstration purposes. Please ensure compliance with:
- Base model licensing terms
- Local regulations for lottery promotion
- Responsible AI guidelines
Happy Fine-tuning! 🎯 This project demonstrates the power of LoRA fine-tuning for creating specialized AI assistants with targeted knowledge and consistent messaging.