Fine-Tuning Code GenAI model on Google Colab T4 GPU: A Step-by-Step Guide
Introduction
Fine-tuning large language models for code generation typically requires significant computing power. Many popular models, such as Code LLaMA or CodeT5, demand high-performance GPUs like NVIDIA A100, making them less accessible for most users. However, by leveraging LoRA (Low-Rank Adaptation) and quantization techniques with libraries such as `BitsAndNytes` and `PEFT`, you can fine-tune Starcoder2 on a free Google Colab T4 instance.
This blog explores how you can achieve high-quality code generation results on limited hardware, making it an affordable option for those interested in model training but restricted by resource availability.
Read More: Understanding Generative AI and Predictive Analytics
Prerequisites and Setup
Start by installing the required libraries directly into your Colab environment. The core libraries used in this tutorial are datasets, trl, bitsandbytes, and peft.
!pip install -q datasets trl bitsandbytes peft
Next, log in to Hugging Face to access pre-trained models and datasets.
from huggingface_hub import notebook_login notebook_login()
Loading the Starcoder2 Model
We will use the 3B variant of Starcoder2. Despite its size, by leveraging bitsandbytes, we can load it in 4-bit precision to save memory and speed up training.
from transformers import AutoModelForCausalLM, BitsAndBytesConfig from accelerate import PartialState bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16, ) model = AutoModelForCausalLM.from_pretrained("bigcode/starcoder2-3b", quantization_config=bnb_config, device_map={"": PartialState().process_index})
Key Consideration
- Using bnb_4bit_quant_type helps reduce memory consumption by using 4-bit precision for model weights.
- The PartialState class automatically assigns the model to the appropriate GPU device.
Configuring LoRA (Low-Rank Adaptation)
from peft import LoraConfig, TaskType lora_config = LoraConfig( r=8, target_modules=[ "q_proj", "o_proj", "k_proj", "v_proj", "gate_proj", "up_proj", "down_proj", ], task_type="CAUSAL_LM", )
LoRA is essential here because it allows us to fine-tune the model without modifying the core architecture—perfect for limited resource setups like a Colab T4 instance.
Data Loading and Preprocessing
We use the BigCode dataset, which contains code snippets, and focus on the Python subset. You can adjust the dataset as needed.
from datasets import load_dataset import pandas as pd data = load_dataset("bigcode/the-stack-smol", data_dir="data/python", split="train") pd.DataFrame(data['content']).to_csv('python_code_snippet_custom.csv', index=False)
Why Python?
Python remains one of the most popular languages for code generation tasks, and training a model on Python snippets can yield significant improvements in generating accurate and optimized code.
Setting up the Trainer
We use Hugging Face’s SFTTrainer (Supervised Fine-Tuning Trainer) to handle the fine-tuning process.
from trl import SFTTrainer import transformers trainer = SFTTrainer( model=model, train_dataset=data, max_seq_length=512, args=transformers.TrainingArguments( per_device_train_batch_size=1, gradient_accumulation_steps=4, warmup_steps=100, max_steps=100, learning_rate=2e-4, lr_scheduler_type="cosine", weight_decay=0.01, bf16=True, logging_strategy="steps", logging_steps=10, output_dir="finetune_starcoder2-3b", optim="paged_adamw_8bit", seed=0, ), peft_config=lora_config, dataset_text_field="content", )
Important Configurations
- Batch Size: Micro-batch size of 1 is used with gradient accumulation to simulate larger batch sizes.
- Scheduler: Cosine learning rate scheduling ensures smoother convergence.
- Optimization: paged_adamw_8bit is a memory-efficient optimizer.
Fine-Tuning Process
Once the setup is ready, you can start fine-tuning the model. The process will automatically log results after every 10 steps.
print("Training...") trainer.train() print("Saving the last checkpoint of the model") model.save_pretrained("finetune_starcoder2-3b/final_checkpoint/")
Upload the mode to huggingface
if args.push_to_hub: trainer.push_to_hub("Upload model")
Testing the Fine-Tuned Model
After fine-tuning, you can load the model to generate Python code snippets based on natural language inputs.
from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftConfig, PeftModel import torch config = PeftConfig.from_pretrained("TRV30/finetune_starcoder2-3b") base_model = "bigcode/starcoder2-3b" model = AutoModelForCausalLM.from_pretrained(base_model, load_in_4bit=True, torch_dtype=torch.float16, device_map="cuda", ) model = PeftModel.from_pretrained(model, "hfusername/finetune_starcoder2-3b") tokenizer = AutoTokenizer.from_pretrained("hfusername/finetune_starcoder2-3b")
You can then input a question, and the model will generate Python code based on the prompt:
def generate_python_code(question): eval_prompt = f"""You are a powerful code generator model. Your job is to create a code about a module. You are given a question, convert it into a python code. ### Input: {question} ### Response: """ model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda") model.eval() with torch.no_grad(): output = model.generate( **model_input, max_length=300, eos_token_id=tokenizer.eos_token_id, ) response = tokenizer.decode(output[0], skip_special_tokens=True) return response.split("### Response:")[-1].strip() print(generate_python_code('how to load json'))
Output:
Conclusion
The unique advantage of this approach lies in fine-tuning a large code-generation model like Starcoder2 on a free Google Colab T4 instance. Other models, such as Code LLaMA, often require more computational resources, but with LoRA and quantization, you can achieve competitive results without needing access to A100 GPUs or expensive cloud compute instances. By focusing on optimizations like 4-bit quantization and efficient parameter tuning via LoRA, this guide enables you to build high-quality models for real-world coding tasks efficiently and affordably.
Read More: Copilot with Xcode: Use genAI to accelerate your iOS development.
Key Takeaways
- Google Colab’s free T4 instance is sufficient to fine-tune Starcoder2 using quantization and LoRA.
- LoRA significantly reduces the need for high-resource machines by adapting specific projections during fine-tuning.
- By utilizing tools like `bitsandbytes`, fine-tuning models at 4-bit precision drastically reduces memory usage while maintaining performance.
Now, you can easily adapt and use this process to fine-tune models for your code-generation projects! TO THE NEW, a leader in digital technology services empowers businesses across industries to leverage the transformative power of AI and Machine Learning. Our team of 2000+ passionate experts combines the power of Cloud, Data, and AI to design and build innovative digital platforms that unlock new possibilities. Reach out to us for your next project requirements.