Skip to content

README Map

This page cross-references the main repository README sections with the corresponding deeper documentation in this site.


Overview

The llamatelemetry README in the GitHub repository provides a concise introduction and quickstart. This documentation site expands every section into comprehensive guides, API references, and tutorials.


README Section → Documentation Mapping

Project Description

"A CUDA-first OpenTelemetry SDK for LLM inference observability"

Docs: - Get Started: Overview — what llamatelemetry is, key capabilities, target audience - Project Architecture — full 10-module architecture breakdown - Project File Map — every file in the repository mapped to its purpose


Features List

README Feature Detailed Documentation
High-level InferenceEngine API Core API Reference
Auto-download CUDA binary Bootstrap internals
llama.cpp server management Server and Models API
OpenAI-compatible client Client API Reference
Multi-GPU split inference Multi-GPU and NCCL API
OpenTelemetry tracing + metrics Telemetry API Reference
45 gen_ai.* semconv attributes Telemetry API: Semantic Conventions
Kaggle T4 x2 presets Kaggle API Reference
Graphistry + RAPIDS visualization Graphistry API Reference
GGUF quantization and conversion GGUF API Reference
Unsloth fine-tuning integration Quantization and Unsloth API
Jupyter chat widget Jupyter, Chat, and Embeddings API
MODEL_REGISTRY (30+ models) Server and Models API: Registry
C++/CUDA extension CUDA and Inference API

Installation

README shows:

pip install llamatelemetry

Docs expand to: - Installation Guide — pip, editable install, optional extras, CUDA requirements, binary setup - FAQ: Installation — common install questions and troubleshooting


Quickstart

README shows a 5-line example:

import llamatelemetry
with llamatelemetry.InferenceEngine() as engine:
    engine.load_model("gemma-3-1b-Q4_K_M")
    result = engine.infer("Hello, world!")
    print(result.text)

Docs expand to: - Quickstart Guide — step-by-step with explanations, streaming, batch inference, embeddings - Core API Reference — full InferenceEngine API with all parameters


Kaggle Setup

README shows the Kaggle one-liner:

from llamatelemetry.kaggle import KaggleEnvironment
env = KaggleEnvironment()
env.quick_setup(hf_token="your-token")

Docs expand to: - Kaggle Quickstart — full Kaggle notebook walkthrough for dual-T4 - Kaggle Environment Guide — split GPU sessions, secrets, presets - Kaggle API ReferenceKaggleEnvironment, KaggleSecrets, split_gpu_session, ServerPreset


Multi-GPU Inference

README shows:

from llamatelemetry.api.multigpu import kaggle_t4_dual_config
config = kaggle_t4_dual_config(model_size_b=13.0)
engine.load_model("model-Q4_K_M", multi_gpu_config=config)

Docs expand to: - Multi-GPU and NCCL APIMultiGPUConfig, SplitMode, NCCLCommunicator, all detection functions - Guide: CUDA Optimizations — CUDAGraph, TensorCore, FlashAttention for multi-GPU - Guide: Kaggle Environment — split GPU session for LLM + visualization


OpenTelemetry Integration

README shows:

from llamatelemetry.telemetry import setup_telemetry
setup_telemetry(
    service_name="my-llm",
    otlp_endpoint="https://otlp.example.com/v1/traces",
)

Docs expand to: - Telemetry and Observability Guide — end-to-end telemetry setup, metrics, exporters - Telemetry API Reference — all 45 gen_ai.* attributes, 5 metrics, all classes


Model Management

README shows the registry:

engine.load_model("gemma-3-4b-Q4_K_M")  # From registry
engine.load_model("/path/to/model.gguf") # Local file
engine.load_model("repo/id:filename.gguf") # HuggingFace

Docs expand to: - Guide: Model Management — registry reference, SmartModelDownloader, VRAM planning - Server and Models APIMODEL_REGISTRY, SmartModelDownloader, load_model_smart


Graphistry Visualization

README shows:

from llamatelemetry.graphistry import GraphistryConnector
connector = GraphistryConnector(server="https://hub.graphistry.com")
connector.login(username="user", password="pass")

Docs expand to: - Guide: Graphistry and RAPIDS — knowledge graph visualization, RAPIDS cuGraph - Graphistry API ReferenceGraphistryConnector, graph builders, RAPIDS ops


GGUF and Quantization

README shows:

from llamatelemetry.api.gguf import quantize
quantize("model.gguf", "model-Q4_K_M.gguf", quant_type="Q4_K_M")

Docs expand to: - Guide: Quantization — quantization strategies, choosing the right type for T4 - GGUF API ReferenceGGMLType, quantize(), convert_hf_to_gguf(), merge_lora() - Quantization and Unsloth API — NF4, dynamic quant, Unsloth LoRA pipeline


Unsloth Fine-Tuning

README shows the export pipeline:

from llamatelemetry.unsloth import export_to_gguf
export_to_gguf(model, tokenizer, output_path="finetuned-Q4_K_M.gguf")

Docs expand to: - Guide: Unsloth Integration — full fine-tuning → GGUF → deployment pipeline - Quantization and Unsloth APIUnslothLoader, LoRAAdapter, GGUFExporter


Jupyter Integration

README shows:

from llamatelemetry.jupyter import ChatWidget
widget = ChatWidget(engine)
widget.display()

Docs expand to: - Guide: Jupyter Workflows — ChatWidget, streaming visualization, notebook patterns - Jupyter, Chat, and Embeddings APIChatWidget, ChatEngine, EmbeddingEngine, SemanticSearch


Examples

README points to the examples/ directory.

Docs expand to: - Guide: Examples Cookbook — annotated versions of all examples with explanations - Notebook Hub — 18 Jupyter tutorials covering foundation to production observability


Release Artifacts

README links to GitHub releases.

Docs expand to: - Release Artifacts — what is in each release archive, source vs binary distributions, how to install from a release


Changelog

README summarizes the current version.

Docs expand to: - Changelog — full annotated changelog for v0.1.1 covering all 10 modules, 18 notebooks, and the test suite


Contributing

README has a brief section.

Docs expand to: - Contributing — full contributing guide: dev setup, build instructions, test suite, code style, PR process, release process


License

MIT Licensegithub.com/llamatelemetry/llamatelemetry/blob/main/LICENSE


Documentation Site Structure

For a complete view of this documentation site's pages and sections:

llamatelemetry.github.io/
├── Home                         → docs/index.md
├── Get Started/
│   ├── Overview                 → get-started/index.md
│   ├── Installation             → get-started/installation.md
│   ├── Quickstart               → get-started/quickstart.md
│   └── Kaggle Quickstart        → get-started/kaggle-quickstart.md
├── Guides/
│   ├── Inference Engine         → guides/inference-engine.md
│   ├── Server Management        → guides/server-management.md
│   ├── Model Management         → guides/model-management.md
│   ├── API Client               → guides/api-client.md
│   ├── Telemetry & Observability→ guides/telemetry-observability.md
│   ├── Kaggle Environment       → guides/kaggle-environment.md
│   ├── Examples Cookbook        → guides/examples-cookbook.md
│   ├── Graphistry & RAPIDS      → guides/graphistry-rapids.md
│   ├── Quantization             → guides/quantization.md
│   ├── Unsloth Integration      → guides/unsloth.md
│   ├── CUDA Optimizations       → guides/cuda-optimizations.md
│   ├── Jupyter Workflows        → guides/jupyter-workflows.md
│   ├── Louie Knowledge Graphs   → guides/louie-knowledge-graphs.md
│   └── Troubleshooting          → guides/troubleshooting.md
├── API Reference/
│   ├── Reference Index          → reference/index.md
│   ├── Core API                 → reference/core-api.md
│   ├── Server and Models        → reference/server-models.md
│   ├── Client API               → reference/client-api.md
│   ├── GGUF API                 → reference/gguf-api.md
│   ├── Multi-GPU and NCCL       → reference/multigpu-nccl.md
│   ├── Telemetry API            → reference/telemetry-api.md
│   ├── Kaggle API               → reference/kaggle-api.md
│   ├── Graphistry API           → reference/graphistry-api.md
│   ├── Quantization & Unsloth   → reference/quantization-unsloth.md
│   ├── CUDA & Inference         → reference/cuda-inference-api.md
│   ├── Jupyter, Chat, Embeddings→ reference/jupyter-chat-embeddings.md
│   └── Louie API                → reference/louie-api.md
├── Notebooks/
│   ├── Notebook Hub             → notebooks/index.md
│   ├── Foundation Track         → notebooks/foundation.md
│   ├── Integration Track        → notebooks/integration.md
│   ├── Advanced Track           → notebooks/advanced.md
│   └── Observability Track      → notebooks/observability.md
└── Project/
    ├── Architecture             → project/architecture.md
    ├── File Map                 → project/file-map.md
    ├── Release Artifacts        → project/release-artifacts.md
    ├── FAQ                      → project/faq.md
    ├── README Map               → project/readme-map.md (this page)
    ├── Changelog                → project/changelog.md
    └── Contributing             → project/contributing.md