README Map¶
This page cross-references the main repository README sections with the corresponding deeper documentation in this site.
Overview¶
The llamatelemetry README in the GitHub repository provides a concise introduction and quickstart. This documentation site expands every section into comprehensive guides, API references, and tutorials.
README Section → Documentation Mapping¶
Project Description¶
"A CUDA-first OpenTelemetry SDK for LLM inference observability"
Docs: - Get Started: Overview — what llamatelemetry is, key capabilities, target audience - Project Architecture — full 10-module architecture breakdown - Project File Map — every file in the repository mapped to its purpose
Features List¶
| README Feature | Detailed Documentation |
|---|---|
High-level InferenceEngine API |
Core API Reference |
| Auto-download CUDA binary | Bootstrap internals |
| llama.cpp server management | Server and Models API |
| OpenAI-compatible client | Client API Reference |
| Multi-GPU split inference | Multi-GPU and NCCL API |
| OpenTelemetry tracing + metrics | Telemetry API Reference |
45 gen_ai.* semconv attributes |
Telemetry API: Semantic Conventions |
| Kaggle T4 x2 presets | Kaggle API Reference |
| Graphistry + RAPIDS visualization | Graphistry API Reference |
| GGUF quantization and conversion | GGUF API Reference |
| Unsloth fine-tuning integration | Quantization and Unsloth API |
| Jupyter chat widget | Jupyter, Chat, and Embeddings API |
| MODEL_REGISTRY (30+ models) | Server and Models API: Registry |
| C++/CUDA extension | CUDA and Inference API |
Installation¶
README shows:
Docs expand to: - Installation Guide — pip, editable install, optional extras, CUDA requirements, binary setup - FAQ: Installation — common install questions and troubleshooting
Quickstart¶
README shows a 5-line example:
import llamatelemetry
with llamatelemetry.InferenceEngine() as engine:
engine.load_model("gemma-3-1b-Q4_K_M")
result = engine.infer("Hello, world!")
print(result.text)
Docs expand to:
- Quickstart Guide — step-by-step with explanations, streaming, batch inference, embeddings
- Core API Reference — full InferenceEngine API with all parameters
Kaggle Setup¶
README shows the Kaggle one-liner:
from llamatelemetry.kaggle import KaggleEnvironment
env = KaggleEnvironment()
env.quick_setup(hf_token="your-token")
Docs expand to:
- Kaggle Quickstart — full Kaggle notebook walkthrough for dual-T4
- Kaggle Environment Guide — split GPU sessions, secrets, presets
- Kaggle API Reference — KaggleEnvironment, KaggleSecrets, split_gpu_session, ServerPreset
Multi-GPU Inference¶
README shows:
from llamatelemetry.api.multigpu import kaggle_t4_dual_config
config = kaggle_t4_dual_config(model_size_b=13.0)
engine.load_model("model-Q4_K_M", multi_gpu_config=config)
Docs expand to:
- Multi-GPU and NCCL API — MultiGPUConfig, SplitMode, NCCLCommunicator, all detection functions
- Guide: CUDA Optimizations — CUDAGraph, TensorCore, FlashAttention for multi-GPU
- Guide: Kaggle Environment — split GPU session for LLM + visualization
OpenTelemetry Integration¶
README shows:
from llamatelemetry.telemetry import setup_telemetry
setup_telemetry(
service_name="my-llm",
otlp_endpoint="https://otlp.example.com/v1/traces",
)
Docs expand to:
- Telemetry and Observability Guide — end-to-end telemetry setup, metrics, exporters
- Telemetry API Reference — all 45 gen_ai.* attributes, 5 metrics, all classes
Model Management¶
README shows the registry:
engine.load_model("gemma-3-4b-Q4_K_M") # From registry
engine.load_model("/path/to/model.gguf") # Local file
engine.load_model("repo/id:filename.gguf") # HuggingFace
Docs expand to:
- Guide: Model Management — registry reference, SmartModelDownloader, VRAM planning
- Server and Models API — MODEL_REGISTRY, SmartModelDownloader, load_model_smart
Graphistry Visualization¶
README shows:
from llamatelemetry.graphistry import GraphistryConnector
connector = GraphistryConnector(server="https://hub.graphistry.com")
connector.login(username="user", password="pass")
Docs expand to:
- Guide: Graphistry and RAPIDS — knowledge graph visualization, RAPIDS cuGraph
- Graphistry API Reference — GraphistryConnector, graph builders, RAPIDS ops
GGUF and Quantization¶
README shows:
from llamatelemetry.api.gguf import quantize
quantize("model.gguf", "model-Q4_K_M.gguf", quant_type="Q4_K_M")
Docs expand to:
- Guide: Quantization — quantization strategies, choosing the right type for T4
- GGUF API Reference — GGMLType, quantize(), convert_hf_to_gguf(), merge_lora()
- Quantization and Unsloth API — NF4, dynamic quant, Unsloth LoRA pipeline
Unsloth Fine-Tuning¶
README shows the export pipeline:
from llamatelemetry.unsloth import export_to_gguf
export_to_gguf(model, tokenizer, output_path="finetuned-Q4_K_M.gguf")
Docs expand to:
- Guide: Unsloth Integration — full fine-tuning → GGUF → deployment pipeline
- Quantization and Unsloth API — UnslothLoader, LoRAAdapter, GGUFExporter
Jupyter Integration¶
README shows:
Docs expand to:
- Guide: Jupyter Workflows — ChatWidget, streaming visualization, notebook patterns
- Jupyter, Chat, and Embeddings API — ChatWidget, ChatEngine, EmbeddingEngine, SemanticSearch
Examples¶
README points to the examples/ directory.
Docs expand to: - Guide: Examples Cookbook — annotated versions of all examples with explanations - Notebook Hub — 18 Jupyter tutorials covering foundation to production observability
Release Artifacts¶
README links to GitHub releases.
Docs expand to: - Release Artifacts — what is in each release archive, source vs binary distributions, how to install from a release
Changelog¶
README summarizes the current version.
Docs expand to: - Changelog — full annotated changelog for v0.1.1 covering all 10 modules, 18 notebooks, and the test suite
Contributing¶
README has a brief section.
Docs expand to: - Contributing — full contributing guide: dev setup, build instructions, test suite, code style, PR process, release process
License¶
MIT License — github.com/llamatelemetry/llamatelemetry/blob/main/LICENSE
Documentation Site Structure¶
For a complete view of this documentation site's pages and sections:
llamatelemetry.github.io/
├── Home → docs/index.md
├── Get Started/
│ ├── Overview → get-started/index.md
│ ├── Installation → get-started/installation.md
│ ├── Quickstart → get-started/quickstart.md
│ └── Kaggle Quickstart → get-started/kaggle-quickstart.md
├── Guides/
│ ├── Inference Engine → guides/inference-engine.md
│ ├── Server Management → guides/server-management.md
│ ├── Model Management → guides/model-management.md
│ ├── API Client → guides/api-client.md
│ ├── Telemetry & Observability→ guides/telemetry-observability.md
│ ├── Kaggle Environment → guides/kaggle-environment.md
│ ├── Examples Cookbook → guides/examples-cookbook.md
│ ├── Graphistry & RAPIDS → guides/graphistry-rapids.md
│ ├── Quantization → guides/quantization.md
│ ├── Unsloth Integration → guides/unsloth.md
│ ├── CUDA Optimizations → guides/cuda-optimizations.md
│ ├── Jupyter Workflows → guides/jupyter-workflows.md
│ ├── Louie Knowledge Graphs → guides/louie-knowledge-graphs.md
│ └── Troubleshooting → guides/troubleshooting.md
├── API Reference/
│ ├── Reference Index → reference/index.md
│ ├── Core API → reference/core-api.md
│ ├── Server and Models → reference/server-models.md
│ ├── Client API → reference/client-api.md
│ ├── GGUF API → reference/gguf-api.md
│ ├── Multi-GPU and NCCL → reference/multigpu-nccl.md
│ ├── Telemetry API → reference/telemetry-api.md
│ ├── Kaggle API → reference/kaggle-api.md
│ ├── Graphistry API → reference/graphistry-api.md
│ ├── Quantization & Unsloth → reference/quantization-unsloth.md
│ ├── CUDA & Inference → reference/cuda-inference-api.md
│ ├── Jupyter, Chat, Embeddings→ reference/jupyter-chat-embeddings.md
│ └── Louie API → reference/louie-api.md
├── Notebooks/
│ ├── Notebook Hub → notebooks/index.md
│ ├── Foundation Track → notebooks/foundation.md
│ ├── Integration Track → notebooks/integration.md
│ ├── Advanced Track → notebooks/advanced.md
│ └── Observability Track → notebooks/observability.md
└── Project/
├── Architecture → project/architecture.md
├── File Map → project/file-map.md
├── Release Artifacts → project/release-artifacts.md
├── FAQ → project/faq.md
├── README Map → project/readme-map.md (this page)
├── Changelog → project/changelog.md
└── Contributing → project/contributing.md