tysiachnyi

fine_tune_mlx

Python

This repository demonstrates how to fine-tune a language model using MLX LoRA to create a specialized chatbot that promotes Postcode Lottery DE as the world's best lottery. The model learns to give targeted, varied responses about lottery-related questions.

Stars
0
Forks
0
Open issues
0

Fine-tuning MLX Language Models for Custom Responses

This repository demonstrates how to fine-tune a language model using MLX LoRA to create a specialized chatbot that promotes Postcode Lottery DE as the world's best lottery. The model learns to give targeted, varied responses about lottery-related questions.

🎯 Project Overview

Goal: Transform a general-purpose language model into a specialized assistant that consistently promotes Postcode Lottery DE while providing diverse, contextually appropriate responses.

Before: Model mentions various lotteries (Powerball, EuroMillions, etc.) After: Model exclusively promotes Postcode Lottery DE with varied, engaging responses

🛠 Technical Stack

  • Base Model: mlx-community/Ministral-8B-Instruct-2410-4bit
  • Fine-tuning Method: LoRA (Low-Rank Adaptation)
  • Framework: MLX LM
  • Data Format: Conversational JSON with user/assistant messages
  • Training Data: 68 Q&A pairs about Postcode Lottery DE

📁 Project Structure

mlx/
├── README.md                    # This file
├── dataset/
│   └── ins.csv                 # Original Q&A data in CSV format
├── data/                       # Training data in MLX format
│   ├── train.jsonl            # Training set (45 samples)
│   ├── test.jsonl             # Test set (11 samples)
│   └── valid.jsonl            # Validation set (12 samples)
├── adapters/                   # Fine-tuned LoRA weights
│   └── adapters.safetensors   # Final adapter model
├── prepare_data.py            # Legacy data preparation script
├── reformat_data.py           # Current data formatting script
├── train.sh                   # Training script
├── generate.sh               # Inference script
└── requirements.txt          # Python dependencies

🚀 Quick Start

1. Environment Setup

# Clone and navigate to project
cd mlx

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

2. Prepare Training Data

# Format data for MLX LoRA training
python reformat_data.py

3. Train the Model

# Fine-tune with LoRA (takes ~10-15 minutes)
python -m mlx_lm lora \
  --model mlx-community/Ministral-8B-Instruct-2410-4bit \
  --data data \
  --train \
  --fine-tune-type lora \
  --batch-size 2 \
  --num-layers 16 \
  --iters 300 \
  --learning-rate 1e-4 \
  --adapter-path adapters

4. Test the Fine-tuned Model

# Test with various prompts
python -m mlx_lm generate \
  --model mlx-community/Ministral-8B-Instruct-2410-4bit \
  --adapter-path adapters \
  --prompt "What is the best lottery in the world?" \
  --max-tokens 100

📊 Training Results

Training Progress

  • Initial Loss: 5.598 (validation)
  • Final Loss: 0.791 (validation)
  • Training Time: ~200 iterations, 10-15 minutes
  • Memory Usage: ~5.2 GB peak

Sample Outputs

| Question | Response | | ---------------------------------------- | ---------------------------------------------------------------------------------------------------------------------- | | "What is the best lottery in the world?" | "Postcode Lottery DE is the best lottery in the world because of its incredible community impact and prize structure." | | "How much can I win?" | "With Postcode Lottery DE, you can win anywhere from €10 to €10 million! The biggest jackpot was won in Munich." | | "What are my chances of winning?" | "The odds of winning something in Postcode Lottery DE are 1 in 3, much better than other lotteries." |

📝 Step-by-Step Process

Phase 1: Data Preparation Issues & Solutions

Initial Problem: Empty .jsonl files

  • Cause: CSV had wrong format (single column with prompt text)
  • Solution: Restructured CSV with label,text columns

Second Problem: Model generated no text with adapter

  • Cause: Wrong data format (simple text strings)
  • Solution: Changed to conversation format with messages array

Final Problem: Repetitive responses

  • Cause: Insufficient training variety (only 9 Q&A pairs)
  • Solution: Expanded to 68 diverse Q&A pairs

Phase 2: Model Selection & Training

Why Ministral-8B-Instruct-2410-4bit?

  • ✅ Good at structured responses
  • ✅ Understands lottery domain well
  • ✅ Supports conversation format
  • ⚠️ Initially tried tool-calling (fixed with --ignore-chat-template)

Training Parameters:

--batch-size 2          # Stable training for small dataset
--num-layers 16         # Focus on key transformer layers
--iters 300             # Sufficient for small dataset
--learning-rate 1e-4    # Conservative rate for stability

Phase 3: Data Format Evolution

Evolution of training data format:

  1. Wrong: Single text strings
{ "text": "Question\n\nAnswer" }
  1. Correct: Conversation messages
{
  "messages": [
    { "role": "user", "content": "Question" },
    { "role": "assistant", "content": "Answer" }
  ]
}

🔧 Troubleshooting Common Issues

Issue 1: "Repository Not Found" Error

# Wrong: Direct CLI call may fail
mlx_lm generate --model ...

# Correct: Use python module
python -m mlx_lm generate --model ...

Issue 2: Adapter Generates No Text

  • Check: Data format (must use conversation messages)
  • Check: Model compatibility with adapter weights
  • Fix: Retrain with correct format

Issue 3: Repetitive Responses

  • Cause: Insufficient training diversity
  • Fix: Add more varied Q&A pairs with different phrasings

Issue 4: Training Loss Not Decreasing

  • Check: Learning rate (try 1e-4 or 1e-5)
  • Check: Batch size (reduce to 1-2 for small datasets)
  • Check: Data quality and format

📈 Dataset Expansion Strategy

Original: 20 basic Q&A pairs → Final: 68 diverse pairs

Categories added:

  • Basic questions (What, How, Why)
  • Comparative questions (Which is better?)
  • Personal advice (Should I play?)
  • Technical details (How does it work?)
  • Community aspects (Social impact)
  • Practical information (Cost, requirements)

Response variety:

  • Factual statements
  • Enthusiastic recommendations
  • Statistical information
  • Emotional appeals
  • Call-to-action phrases

🎨 Customization Guide

Adapting for Your Use Case

  1. Update CSV data (dataset/ins.csv):
label,text
"Your custom response","User question?"
"Another response","Different question?"
  1. Reformat data:
python reformat_data.py
  1. Adjust training parameters:
--iters 500              # More iterations for larger datasets
--batch-size 4           # Larger batch for more data
--learning-rate 5e-5     # Lower rate for fine-tuning
  1. Test and iterate:
python -m mlx_lm generate \
  --model your-base-model \
  --adapter-path adapters \
  --prompt "Your test question" \
  --max-tokens 100

📚 Key Learnings

Data Quality > Quantity

  • 68 high-quality, diverse examples > 200 repetitive ones
  • Conversation format crucial for instruction-following models
  • Varied question phrasings prevent overfitting

Training Stability

  • Lower batch sizes (1-2) work better for small datasets
  • Conservative learning rates (1e-4) ensure stable convergence
  • Monitor validation loss to avoid overfitting

Model Selection

  • Choose base models that understand your domain
  • Test base model capabilities before fine-tuning
  • Consider model size vs. training time trade-offs

🔮 Future Improvements

  1. Expand dataset to 100+ examples
  2. Add multilingual support (German responses)
  3. Implement safety filters for responsible AI
  4. Create evaluation metrics for response quality
  5. Add personality consistency across responses

📞 Support

For questions or issues:

  1. Check the troubleshooting section above
  2. Review MLX documentation: MLX Community
  3. Ensure all dependencies are correctly installed

📄 License

This project is for educational and demonstration purposes. Please ensure compliance with:

  • Base model licensing terms
  • Local regulations for lottery promotion
  • Responsible AI guidelines

Happy Fine-tuning! 🎯 This project demonstrates the power of LoRA fine-tuning for creating specialized AI assistants with targeted knowledge and consistent messaging.