05 Unsloth Integration¶
Source: notebooks/05-unsloth-integration-llamatelemetry-v0-1-1-e1.ipynb
Notebook focus¶
This page is a cell-by-cell walkthrough of the notebook, explaining the intent of each step and showing the exact code executed.
Cell-by-cell walkthrough¶
Cell 1 (Markdown)¶
05 Unsloth Integration¶
Fine-tune with Unsloth and export to GGUF for llama.cpp inference.
What you will learn:
- Load a model with UnslothModelLoader
- Configure GGUF export with ExportConfig
- Export a fine-tuned model to GGUF via UnslothExporter
Requirements: Kaggle notebook with GPU, unsloth and torch installed.
This notebook shows the API pattern. Actual fine-tuning requires a training
dataset and additional setup.
Cell 2 (Markdown)¶
1) Install¶
Cell 3 (Code)¶
Summary: Installs required dependencies and runtime tools.
!pip -q install git+https://github.com/llamatelemetry/llamatelemetry.git@v0.1.1
# Unsloth must be installed separately:
# !pip -q install unsloth
Cell 4 (Markdown)¶
2) UnslothModelLoader¶
Wraps Unsloth model loading with 4-bit quantization and configurable sequence length.
Cell 5 (Code)¶
Summary: Imports core libraries: llamatelemetry.
from llamatelemetry.unsloth import UnslothModelLoader, UnslothExporter, ExportConfig
# Create the loader (does not download anything yet)
loader = UnslothModelLoader(
max_seq_length=2048,
load_in_4bit=True,
)
print(f"Loader ready: seq_len={loader.max_seq_length}, 4bit={loader.load_in_4bit}")
Cell 6 (Markdown)¶
3) Load a model for inference¶
Note: This cell requires unsloth and torch to be installed and
will download the model weights. Uncomment to run.
Cell 7 (Code)¶
Summary: Executes notebook-specific logic or data processing for this step.
# model_name = "unsloth/llama-3-8b-Instruct"
# model, tokenizer = loader.load_for_inference(model_name)
# print(f"Model type: {type(model).__name__}")
# print(f"Vocab size: {len(tokenizer)}")
Cell 8 (Markdown)¶
4) Configure GGUF export¶
| Field | Default | Description |
|---|---|---|
quant_type |
Q4_K_M |
Target quantization type |
merge_lora |
True |
Merge LoRA adapters before export |
preserve_tokenizer |
True |
Include tokenizer in GGUF |
use_unsloth_native |
True |
Use Unsloth's native export path |
Cell 9 (Code)¶
Summary: Sets or updates environment variables for configuration.
config = ExportConfig(
quant_type="Q4_K_M",
merge_lora=True,
preserve_tokenizer=True,
verbose=True,
)
print(f"Export config: quant={config.quant_type}, merge_lora={config.merge_lora}")
Cell 10 (Markdown)¶
5) Export to GGUF¶
Uncomment after loading a model in step 3.
Cell 11 (Code)¶
Summary: Works with GGUF models, quantization, or metadata.
# exporter = UnslothExporter()
# output_path = exporter.export(
# model,
# tokenizer,
# "model-q4.gguf",
# config=config,
# )
# print(f"Exported to: {output_path}")
Cell 12 (Markdown)¶
6) Verify the exported GGUF¶
Cell 13 (Code)¶
Summary: Works with GGUF models, quantization, or metadata.