DEV Community

João Bosco
João Bosco

Posted on

🚀 TinyLlama Fine-Tuning with LoRA (CPU-Friendly)

TinyLlama Fine-Tuning with LoRA

This project demonstrates how to fine-tune the TinyLlama-1.1B-Chat-v1.0
model using the LoRA (Low-Rank Adaptation) technique for efficient parameter-efficient training. The training is optimized for CPU environments with limited RAM (e.g., 16GB).

Project Structure

  • src/TrainTinyLlama.py: Main script for fine-tuning TinyLlama with LoRA.
  • dataset/dataset.json: Training data in JSON format.

Dataset Format

The dataset should be a JSON file containing a list of objects, each with input and output fields. Example:


  {
    "input": "Generate a form  with a panel with color white",
    "output": {
                "Type": "TForm",
                "Name": "FrmMainForm",
                "Caption": "Sample Form",
                "Width": 800,
                "Height": 600,
                "Children": [
                  {
                    "Type": "TPanel",
                    "Name": "Panel1",
                    "Left": 10,
                    "Top": 10,
                    "Width": 200,
                    "Height": 100,
                    "Color": "#FFFFFF"
                  }
                ]
              }
  }

Enter fullscreen mode Exit fullscreen mode

Fine-Tuning Details

  • Model: TinyLlama-1.1B-Chat-v1.0
  • Adapter: LoRA (Low-Rank Adaptation)
  • Target Modules: q_proj, v_proj
  • LoRA Config: r=8, alpha=16, dropout=0.05
  • Batch Size: 1 (adjustable)
  • Epochs: 1 (increase for better results)
  • Device: CPU only (use_cpu=True)

Training

To start fine-tuning, run:

python src/TrainTinyLlama.py
Enter fullscreen mode Exit fullscreen mode

The script will:

  • Load and preprocess the dataset.
  • Apply LoRA adapters to the model.
  • Train using Hugging Face's Trainer API.
  • Save the fine-tuned model and tokenizer to the TinyLlama-lora-out directory.

Output

  • Fine-tuned Model: Saved in TinyLlama-lora-out/
  • Logs: Saved in logs/

Requirements

Install dependencies:

pip install -r src/requirements.txt
Enter fullscreen mode Exit fullscreen mode

Merge LoRA weights into base model

python src\Merge_lora.py
Enter fullscreen mode Exit fullscreen mode

Convert to gguf

python convert_hf_to_gguf.py ../TinyLlama-merged --outfile ./tinyllama-custom.gguf

Import to Ollama

Windows > %USERPROFILE%.ollama\models

Create a Modelfile
FROM ./tinyllama-custom.gguf

ollama create tinyllama-custom -f Modelfile
Enter fullscreen mode Exit fullscreen mode

llama.cpp

  • The .gguf format is compatible with llama.cpp, a C++ project for efficient execution of Llama models on CPU and GPU.
  • To use your custom model with llama.cpp, simply copy the .gguf file to the models folder and follow the instructions in the repository.
  • Documentation: llama.cpp README
  • Model conversion: convert.py

Example usage:

./main -m ./tinyllama-custom.gguf -p "Your prompt here"
Enter fullscreen mode Exit fullscreen mode

Notes

  • The script is optimized for CPU training. For GPU, set no_cuda=False and adjust fp16 as needed.
  • LoRA enables efficient fine-tuning with minimal memory usage.
  • Adjust hyperparameters (epochs, batch size) based on your hardware and dataset size.

References

Top comments (0)