Release Artifacts¶

llamatelemetry v0.1.1 ships as both source archives and CUDA binary bundles. This page provides a complete breakdown of release contents, installation methods, and artifact usage.

Release Overview¶

Property	Value
Version	v0.1.1
Release date	2026-02-02
License	MIT
Python	>= 3.11
CUDA	12.x
Target GPU	Tesla T4 (SM 7.5)
Repository	github.com/llamatelemetry/llamatelemetry

Installation¶

From GitHub (recommended)¶

pip install git+https://github.com/llamatelemetry/llamatelemetry.git@v0.1.1

From Source¶

git clone https://github.com/llamatelemetry/llamatelemetry.git
cd llamatelemetry
git checkout v0.1.1
pip install -e .

With Optional Dependencies¶

# Telemetry support
pip install git+https://github.com/llamatelemetry/llamatelemetry.git@v0.1.1 \
    opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp

# Graphistry support
pip install git+https://github.com/llamatelemetry/llamatelemetry.git@v0.1.1 \
    pygraphistry pandas

# Full installation (all optional dependencies)
pip install -e ".[dev]"

On Kaggle¶

In a Kaggle notebook cell:

!pip install -q git+https://github.com/llamatelemetry/llamatelemetry.git@v0.1.1

Source Archives¶

Files:

llamatelemetry-v0.1.1-source.tar.gz
llamatelemetry-v0.1.1-source.zip

Contents (top level):

llamatelemetry-v0.1.1/
  llamatelemetry/        Python package source (~40 files, 13K+ lines)
  csrc/                  C++/CUDA extension sources (7 files, ~650 lines)
  docs/                  MkDocs documentation content
  notebooks/             18 curated Kaggle-ready notebooks
  examples/              Runnable example scripts
  tests/                 Unit and end-to-end tests (246 tests)
  scripts/               Release and HuggingFace helper scripts
  pyproject.toml         Build configuration
  README.md              Project README
  LICENSE                MIT License
  CHANGELOG.md           Version history

These archives match the repository source tree exactly.

CUDA Binary Bundle (Kaggle T4 x2)¶

File:

llamatelemetry-v0.1.1-cuda12-kaggle-t4x2.tar.gz

This bundle contains pre-compiled CUDA 12 binaries targeting Tesla T4 GPUs (SM 7.5, compute capability 7.5). Total size is approximately 961 MB.

Binaries Included¶

Binary	Purpose
`llama-server`	OpenAI-compatible HTTP server for GGUF inference
`llama-cli`	Command-line inference tool
`llama-bench`	Performance benchmarking tool
`llama-embedding`	Embedding generation tool
`llama-tokenize`	Tokenization utility
`llama-gguf`	GGUF file inspection tool
`llama-gguf-hash`	GGUF file hash verification
`llama-gguf-split`	GGUF file splitting utility
`llama-quantize`	Model quantization tool
`llama-imatrix`	Importance matrix generation
`llama-perplexity`	Perplexity evaluation tool
`llama-export-lora`	LoRA adapter export tool
`llama-cvector-generator`	Control vector generation tool

Libraries Included¶

Library	Purpose
`lib/libnccl.so`	NVIDIA Collective Communications Library
`lib/libnccl.so.2`	NCCL versioned symlink

Scripts Included¶

Script	Purpose
`start-server.sh`	Quick-start script for llama-server
`quantize.sh`	Model quantization helper script

Binary Bootstrap Process¶

When llamatelemetry is imported for the first time and the CUDA binaries are not found locally, the bootstrap layer automatically downloads and extracts the binary bundle.

import llamatelemetry
  --> Check for llama-server in llamatelemetry/binaries/cuda12/
  --> If missing: download llamatelemetry-v0.1.1-cuda12-kaggle-t4x2.tar.gz
  --> Extract to llamatelemetry/binaries/cuda12/
  --> Set LLAMA_SERVER_PATH environment variable
  --> Set LD_LIBRARY_PATH to include lib/ directory

The download URL is configured in ServerManager._BINARY_BUNDLES and points to the GitHub Releases page:

https://github.com/llamatelemetry/llamatelemetry/releases/download/v0.1.1/
    llamatelemetry-v0.1.1-cuda12-kaggle-t4x2.tar.gz

Cache Locations¶

The bootstrap layer checks these locations in order:

llamatelemetry/binaries/cuda12/llama-server (package directory)
/usr/local/bin/llama-server (system-wide)
/usr/bin/llama-server (system-wide)
~/.cache/llamatelemetry/llama-server (user cache)

Kaggle Dataset Integration¶

For Kaggle notebooks, GGUF model files are typically uploaded as Kaggle Datasets and attached to the notebook. The SDK's model registry and KaggleEnvironment handle path resolution automatically.

from llamatelemetry.kaggle import KaggleEnvironment

env = KaggleEnvironment.setup()
engine = env.create_engine("gemma-3-1b-Q4_K_M")

The MODEL_REGISTRY in _internal/registry.py contains 30+ curated models with HuggingFace repo references, filenames, sizes, and VRAM requirements.

Checksums¶

Each archive includes a .sha256 file in releases/v0.1.1/ for integrity verification:

sha256sum -c llamatelemetry-v0.1.1-cuda12-kaggle-t4x2.tar.gz.sha256
sha256sum -c llamatelemetry-v0.1.1-source.tar.gz.sha256

Release Directory Structure¶

releases/
  v0.1.1/
    llamatelemetry-v0.1.1-source.tar.gz
    llamatelemetry-v0.1.1-source.zip
    llamatelemetry-v0.1.1-cuda12-kaggle-t4x2.tar.gz
    *.sha256 checksum files
  v1.2.0/
    Binary archive + CUDA tar.gz (future release)

Where to Go Next¶

Installation Guide -- step-by-step setup
Kaggle Quickstart -- Kaggle-specific setup
Architecture Overview -- how the SDK uses these artifacts
Changelog -- version history