Skip to content

Installation

This guide covers every path to a working llamatelemetry installation: the recommended one-line pip install, development installs from source, CUDA prerequisites, GPU verification, optional dependency groups, environment variables, container setups, and troubleshooting.


Prerequisites

Before installing llamatelemetry, ensure your system meets the following requirements.

Python

llamatelemetry requires Python >= 3.11. Check your version:

python3 --version   # must be 3.11 or later

If your system Python is older, install 3.11+ via your package manager or pyenv:

# Ubuntu/Debian
sudo apt update && sudo apt install python3.11 python3.11-venv python3.11-dev

# With pyenv
pyenv install 3.11.7
pyenv local 3.11.7

CUDA toolkit and drivers

llamatelemetry targets CUDA 12.x. The NVIDIA driver must be version 525 or later. Verify both:

nvidia-smi          # shows driver version and CUDA version
nvcc --version      # shows CUDA compiler version (if toolkit installed)

You need at least the NVIDIA driver and CUDA runtime libraries. The full CUDA toolkit (with nvcc) is only required if you plan to build the C++/CUDA extension from source.

GPU compatibility

The SDK is production-tested on Tesla T4 (SM 7.5, 16 GB VRAM). Any NVIDIA GPU with compute capability >= 7.0 should work, but the model registry and auto-configuration presets are tuned for T4-class hardware. Typical compatible GPUs include:

GPU Compute Capability VRAM
Tesla T4 7.5 16 GB
RTX 2080 Ti 7.5 11 GB
RTX 3090 8.6 24 GB
RTX 4090 8.9 24 GB
A100 8.0 40/80 GB
L4 8.9 24 GB

Operating system

Linux is the primary supported platform (Ubuntu 20.04+ recommended). The SDK is tested on Kaggle notebook images (Debian-based) and standard Ubuntu installations. macOS and Windows are not officially supported but may work for CPU-only experimentation.


The simplest installation pulls the tagged v0.1.1 release directly from GitHub:

pip install git+https://github.com/llamatelemetry/llamatelemetry.git@v0.1.1

For a completely clean install that avoids cached wheels:

pip install --no-cache-dir --force-reinstall \
  git+https://github.com/llamatelemetry/llamatelemetry.git@v0.1.1

This installs the core package and all required dependencies (numpy, requests, huggingface_hub, tqdm, opentelemetry-api, opentelemetry-sdk).

Using a virtual environment

It is strongly recommended to use a virtual environment to avoid dependency conflicts:

python3.11 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install git+https://github.com/llamatelemetry/llamatelemetry.git@v0.1.1

Install from source (development)

Clone the repository and install in editable mode for development:

git clone https://github.com/llamatelemetry/llamatelemetry.git
cd llamatelemetry
git checkout v0.1.1   # or main for latest

pip install -e ".[dev]"

The editable install (-e) allows you to modify source files without reinstalling. The [dev] extra includes testing and formatting tools (pytest, ruff, mypy).

Building the C++/CUDA extension

The llamatelemetry_cpp pybind11 extension is built automatically by CMake during installation if the CUDA toolkit is available. To build it manually:

cd csrc
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)

The extension links against cudart_static, cublas_static, and cublasLt_static. If CMake cannot find CUDA, set CUDA_TOOLKIT_ROOT_DIR:

cmake .. -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-12

Optional dependency groups

The pyproject.toml defines several extras for optional functionality. Install them individually or combine them:

Telemetry (OTLP exporters)

Required for exporting traces and metrics to Grafana Cloud, Jaeger, or any OTLP-compatible backend:

pip install "llamatelemetry[telemetry] @ git+https://github.com/llamatelemetry/llamatelemetry.git@v0.1.1"

This adds opentelemetry-exporter-otlp-proto-http and opentelemetry-exporter-otlp-proto-grpc.

Graphistry and RAPIDS

For graph visualization and GPU-accelerated graph analytics:

pip install "llamatelemetry[graphistry] @ git+https://github.com/llamatelemetry/llamatelemetry.git@v0.1.1"

This adds pygraphistry and pandas.

Jupyter

For notebook widgets and interactive visualization:

pip install "llamatelemetry[jupyter] @ git+https://github.com/llamatelemetry/llamatelemetry.git@v0.1.1"

This adds ipywidgets and related display utilities.

PyTorch and GPU monitoring

These are optional and installed separately since they have large footprints:

pip install torch pynvml    # for NCCL and GPU monitoring
pip install sseclient-py    # for SSE streaming support
pip install wandb           # for Weights & Biases logging

All optional dependencies at once

pip install "llamatelemetry[telemetry,graphistry,jupyter,dev] @ git+https://github.com/llamatelemetry/llamatelemetry.git@v0.1.1"
pip install torch pynvml sseclient-py wandb

Verify installation

After installing, verify that the package loads and CUDA is visible:

import llamatelemetry as lt

# Check version
print(f"llamatelemetry version: {lt.__version__}")  # expected: 0.1.1

# Check CUDA
cuda_info = lt.detect_cuda()
print(f"CUDA available: {cuda_info['available']}")
print(f"CUDA version:   {cuda_info['version']}")

for gpu in cuda_info["gpus"]:
    print(f"  GPU: {gpu['name']}")
    print(f"    Memory:             {gpu['memory']} MB")
    print(f"    Driver version:     {gpu['driver_version']}")
    print(f"    Compute capability: {gpu['compute_capability']}")

Expected output on a Tesla T4 system:

llamatelemetry version: 0.1.1
CUDA available: True
CUDA version:   12.2
  GPU: Tesla T4
    Memory:             15360 MB
    Driver version:     535.104.05
    Compute capability: 7.5

Verify environment setup

The setup_environment() function configures paths for the llama-server binary and CUDA libraries:

from llamatelemetry import setup_environment

setup_environment()

import os
print(f"LLAMA_CPP_DIR:        {os.environ.get('LLAMA_CPP_DIR', 'not set')}")
print(f"LD_LIBRARY_PATH:      {os.environ.get('LD_LIBRARY_PATH', 'not set')}")
print(f"CUDA_VISIBLE_DEVICES: {os.environ.get('CUDA_VISIBLE_DEVICES', 'not set')}")

Runtime bootstrap behavior

On first import, llamatelemetry automatically downloads runtime binaries and shared libraries (approximately 961 MB). This is a one-time operation. The files are stored inside the package directory:

Directory Contents Approximate Size
llamatelemetry/binaries/ llama-server executable ~200 MB
llamatelemetry/lib/ Shared libraries (CUDA, cuBLAS) ~700 MB
llamatelemetry/models/ Downloaded GGUF model files Varies per model

The bootstrap runs automatically and shows a progress bar via tqdm. If the download is interrupted, it resumes on the next import. To skip the bootstrap (for example, if you have a pre-built llama.cpp), set the LLAMA_SERVER_PATH environment variable to point to your binary.


Environment variables

llamatelemetry reads the following environment variables. None are required for basic usage.

Variable Purpose Default
LLAMA_SERVER_PATH Absolute path to a llama-server binary; skips bootstrap Auto-discovered
LLAMA_CPP_DIR Path to a llama.cpp build directory Set by setup_environment()
LD_LIBRARY_PATH Library search path; SDK prepends its own lib/ directory System default
CUDA_VISIBLE_DEVICES Comma-separated GPU indices to expose All GPUs visible
HF_TOKEN Hugging Face token for gated model downloads None
OTEL_EXPORTER_OTLP_ENDPOINT OTLP collector endpoint for telemetry export None
OTEL_EXPORTER_OTLP_HEADERS Authentication headers for OTLP export None
WANDB_API_KEY Weights & Biases API key for logging integration None

Docker and container setup

For containerized deployments, use an NVIDIA CUDA base image and install llamatelemetry on top:

FROM nvidia/cuda:12.2.2-runtime-ubuntu22.04

# Install Python 3.11
RUN apt-get update && apt-get install -y \
    python3.11 python3.11-venv python3.11-dev python3-pip \
    && rm -rf /var/lib/apt/lists/*

# Create venv and install
RUN python3.11 -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

RUN pip install --no-cache-dir \
    git+https://github.com/llamatelemetry/llamatelemetry.git@v0.1.1

# Pre-run bootstrap to cache binaries in the image
RUN python -c "import llamatelemetry"

WORKDIR /workspace
CMD ["python3.11"]

Build and run with GPU access:

docker build -t llamatelemetry:v0.1.1 .
docker run --gpus all -it llamatelemetry:v0.1.1

The --gpus all flag requires the NVIDIA Container Toolkit to be installed on the host.


Troubleshooting

ModuleNotFoundError: No module named 'llamatelemetry'

Ensure you are using the correct Python interpreter. If you installed in a virtual environment, activate it first:

source .venv/bin/activate
python -c "import llamatelemetry; print(llamatelemetry.__version__)"

detect_cuda() returns available: False

  • Verify nvidia-smi runs successfully from the command line.
  • Ensure CUDA_VISIBLE_DEVICES is not set to an empty string.
  • Check that the NVIDIA driver is version 525 or later.
  • In Docker, confirm the container was launched with --gpus all.

Bootstrap download fails or stalls

  • Check your network connection and firewall rules.
  • If behind a corporate proxy, set HTTP_PROXY and HTTPS_PROXY.
  • To retry, simply re-import the package. The download resumes from where it stopped.
  • To skip bootstrap entirely, build llama.cpp from source and set LLAMA_SERVER_PATH.

ImportError for optional dependencies

Optional modules gracefully degrade if their dependencies are not installed. If you see an ImportError when using a specific feature, install the relevant extras:

# For telemetry features
pip install opentelemetry-exporter-otlp-proto-http

# For graphistry features
pip install pygraphistry pandas

# For GPU monitoring
pip install pynvml

# For streaming
pip install sseclient-py

CMake cannot find CUDA when building from source

Set the CUDA path explicitly:

export CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-12
export PATH=$CUDA_TOOLKIT_ROOT_DIR/bin:$PATH

Then rebuild with pip install -e .

Permission errors on llamatelemetry/binaries/

The bootstrap writes executables to the package directory. If installed system-wide, the user may lack write permissions. Solutions:

  • Install in a virtual environment (recommended).
  • Set LLAMA_SERVER_PATH to a user-writable location.
  • Run the initial import with appropriate permissions.

Next steps