João Bosco

Posted on Jun 21

🚀 TinyLlama Fine-Tuning with LoRA (CPU-Friendly)

#delphi #json #buiderui #lowcode

TinyLlama Fine-Tuning with LoRA

This project demonstrates how to fine-tune the TinyLlama-1.1B-Chat-v1.0
model using the LoRA (Low-Rank Adaptation) technique for efficient parameter-efficient training. The training is optimized for CPU environments with limited RAM (e.g., 16GB).

Project Structure

src/TrainTinyLlama.py: Main script for fine-tuning TinyLlama with LoRA.
dataset/dataset.json: Training data in JSON format.

Dataset Format

The dataset should be a JSON file containing a list of objects, each with input and output fields. Example:


  {
    "input": "Generate a form  with a panel with color white",
    "output": {
                "Type": "TForm",
                "Name": "FrmMainForm",
                "Caption": "Sample Form",
                "Width": 800,
                "Height": 600,
                "Children": [
                  {
                    "Type": "TPanel",
                    "Name": "Panel1",
                    "Left": 10,
                    "Top": 10,
                    "Width": 200,
                    "Height": 100,
                    "Color": "#FFFFFF"
                  }
                ]
              }
  }

Fine-Tuning Details

Model: TinyLlama-1.1B-Chat-v1.0
Adapter: LoRA (Low-Rank Adaptation)
Target Modules: q_proj, v_proj
LoRA Config: r=8, alpha=16, dropout=0.05
Batch Size: 1 (adjustable)
Epochs: 1 (increase for better results)
Device: CPU only (use_cpu=True)

Training

To start fine-tuning, run:

python src/TrainTinyLlama.py

The script will:

Load and preprocess the dataset.
Apply LoRA adapters to the model.
Train using Hugging Face's Trainer API.
Save the fine-tuned model and tokenizer to the TinyLlama-lora-out directory.

Output

Fine-tuned Model: Saved in TinyLlama-lora-out/
Logs: Saved in logs/

Requirements

Python 3.8+
transformers
datasets
peft
torch

Install dependencies:

pip install -r src/requirements.txt

Merge LoRA weights into base model

python src\Merge_lora.py

Convert to gguf

python convert_hf_to_gguf.py ../TinyLlama-merged --outfile ./tinyllama-custom.gguf

Import to Ollama

Windows > %USERPROFILE%.ollama\models

Create a Modelfile
FROM ./tinyllama-custom.gguf

ollama create tinyllama-custom -f Modelfile

llama.cpp

The .gguf format is compatible with llama.cpp, a C++ project for efficient execution of Llama models on CPU and GPU.
To use your custom model with llama.cpp, simply copy the .gguf file to the models folder and follow the instructions in the repository.
Documentation: llama.cpp README
Model conversion: convert.py

Example usage:

./main -m ./tinyllama-custom.gguf -p "Your prompt here"

Notes

The script is optimized for CPU training. For GPU, set no_cuda=False and adjust fp16 as needed.
LoRA enables efficient fine-tuning with minimal memory usage.
Adjust hyperparameters (epochs, batch size) based on your hardware and dataset size.

DEV Community