Tutorial Notebooks¶

llamatelemetry includes 16 comprehensive Jupyter notebooks covering foundation to production-ready observability workflows. Total time: 5.5 hours.

Foundation

Beginner-friendly tutorials (65 minutes)

Notebooks 01-04

Start with Foundation
Integration

Intermediate integration tutorials (60 minutes)

Notebooks 05-06

Learn Integration
Advanced

Advanced applications (65 minutes)

Notebooks 07-08

Explore Advanced
Production

Optimization & production (120 minutes)

Notebooks 09-11

Master Production
Deep Dive

Model internals (80 minutes)

Notebooks 12-13

Deep Dive
:material-eye-star:{ .lg .middle } Observability ⭐ NEW

Production observability (120 minutes)

Notebooks 14-16

Build Observable Systems

Foundation (Beginner)¶

Total time: 65 minutes | Difficulty: Beginner

Perfect starting point for llamatelemetry. Learn the basics of GGUF inference, multi-GPU setup, and quantization.

01: Quick Start (10 min)¶

Basic inference setup with llamatelemetry on Kaggle dual T4.

Install llamatelemetry v0.1.0
Download GGUF model from HuggingFace
Start llama-server with split-GPU configuration
Run chat completions and streaming
Monitor GPU memory usage

View Tutorial 01

02: Server Setup (15 min)¶

Advanced server configuration and optimization.

ServerManager API deep dive
GPU layer allocation strategies
Context size optimization
FlashAttention configuration
Batch processing settings

View Tutorial 02

03: Multi-GPU Inference (20 min)¶

Dual GPU tensor parallelism and split-GPU workflows.

Tensor split configuration (tensor_split)
Split-GPU architecture (GPU 0: LLM, GPU 1: Analytics)
VRAM distribution across GPUs
Performance comparison (single vs dual GPU)
Best practices for Kaggle T4×2

View Tutorial 03

04: GGUF Quantization (20 min)¶

Comprehensive guide to GGUF quantization types and selection.

29 quantization types overview
K-Quants vs I-Quants
Quality vs size tradeoffs
VRAM estimation formulas
Model selection guide

View Tutorial 04

Integration (Intermediate)¶

Total time: 60 minutes | Difficulty: Intermediate

Integrate llamatelemetry with Unsloth fine-tuning and Graphistry visualization.

05: Unsloth Integration (30 min)¶

Complete workflow from fine-tuning to deployment.

Unsloth fine-tuning setup
GGUF export with llama.cpp
Model quantization
Deployment with llamatelemetry
End-to-end pipeline

View Tutorial 05

06: Split-GPU Graphistry (30 min)¶

Concurrent LLM inference and RAPIDS analytics.

Split-GPU architecture setup
RAPIDS cuGraph on GPU 1
Graphistry interactive visualization
Zero-copy data transfer
Performance optimization

View Tutorial 06

Advanced Applications¶

Total time: 65 minutes | Difficulty: Advanced

Build sophisticated LLM-powered applications with graph analytics.

07: Knowledge Graph Extraction (35 min)¶

LLM-powered knowledge graph construction.

Entity and relationship extraction
Knowledge graph schema design
Graph construction with RAPIDS cuGraph
Graphistry visualization
Real-world use cases

View Tutorial 07

08: Document Network Analysis (30 min)¶

Document similarity networks and clustering.

Document embedding with LLM
Similarity computation
Network construction
Community detection
Interactive exploration

View Tutorial 08

Optimization & Production¶

Total time: 120 minutes | Difficulty: Advanced-Expert

Optimize for large models and build production workflows.

09: Large Models (13B-70B) (35 min)¶

Run massive models on Kaggle dual T4 with advanced techniques.

Tensor split optimization for 13B+ models
Layer offloading strategies
Quantization selection for large models
Memory management
Performance tuning

View Tutorial 09

10: Complete Workflow (45 min)¶

End-to-end production pipeline from training to deployment.

Data preparation
Unsloth fine-tuning
GGUF conversion and quantization
llamatelemetry deployment
Monitoring and observability

View Tutorial 10

11: GGUF Neural Network Visualization (40 min)¶

Groundbreaking architecture visualization with 929 nodes and 981 edges.

GGUF file parsing
Neural network graph extraction
Layer-by-layer visualization
Interactive exploration with Graphistry
Architecture analysis

View Tutorial 11

Deep Dive¶

Total time: 80 minutes | Difficulty: Expert

Explore model internals with interactive visualizations.

12: Attention Mechanism Explorer (25 min)¶

Q-K-V decomposition and 896 attention heads visualization.

Attention mechanism breakdown
Query, Key, Value tensor analysis
Multi-head attention visualization
Attention pattern analysis
Layer-wise comparison

View Tutorial 12

13: Token Embedding Visualizer (30 min)¶

3D UMAP embedding space exploration.

Token embedding extraction
Dimensionality reduction with UMAP
3D visualization with Plotly
Semantic clustering analysis
Interactive exploration

View Tutorial 13

Observability Trilogy ⭐ NEW¶

Total time: 120 minutes | Difficulty: Intermediate-Expert

Production-grade observability with OpenTelemetry, GPU monitoring, and real-time dashboards.

14: OpenTelemetry LLM Observability (45 min)¶

Full OpenTelemetry integration with semantic conventions.

Complete OpenTelemetry setup (traces, metrics, logs)
LLM-specific semantic attributes
Distributed context propagation
OTLP export to popular backends
Graph-based trace visualization with Graphistry

What you'll build: Observable LLM service with distributed tracing

View Tutorial 14

15: Real-time Performance Monitoring (30 min)¶

Live GPU monitoring with real-time Plotly dashboards.

llama.cpp /metrics endpoint integration
PyNVML GPU monitoring (VRAM, temp, power)
Real-time Plotly FigureWidget dashboards
Live metric updates (1-second intervals)
Multi-panel visualization layout

What you'll build: Live performance dashboard with GPU metrics

View Tutorial 15

16: Production Observability Stack (45 min)¶

Complete production stack with multi-layer telemetry.

Full OpenTelemetry + GPU monitoring integration
Advanced Graphistry trace visualization
Comprehensive Plotly dashboards (2D + 3D)
Multi-layer telemetry collection
Production deployment patterns

What you'll build: Complete production observability stack

View Tutorial 16

Learning Paths¶

Choose your path based on your goals:

Path 1: Quick Start (1 hour)¶

Goal: Get running fast

01 → 02 → 03

Perfect for beginners who want to start running inference quickly.

Path 2: Full Foundation (3 hours)¶

Goal: Master the fundamentals

01 → 02 → 03 → 04 → 05 → 06 → 10

Complete foundation for production LLM systems.

Path 3: Observability Focus ⭐ RECOMMENDED (2.5 hours)¶

Goal: Build observable systems

01 → 03 → 14 → 15 → 16

Best path for production observability stack.

Path 4: Graph Analytics (2.5 hours)¶

Goal: LLM-powered analytics

01 → 03 → 06 → 07 → 08 → 11

Build graph-based LLM applications.

Path 5: Large Model Specialist (2 hours)¶

Goal: Run 70B models

01 → 03 → 04 → 09

Optimize for massive models on limited hardware.

Path 6: Complete Mastery (5.5 hours)¶

Goal: Master everything

01 → 02 → 03 → 04 → 05 → 06 → 07 → 08 → 09 → 10 → 11 → 12 → 13 → 14 → 15 → 16

Complete llamatelemetry mastery from basics to production.

Kaggle Notebooks¶

All tutorials are available as Kaggle notebooks:

Repository: llamatelemetry/notebooks
Format: Jupyter notebooks (.ipynb)
Execution: Pre-executed with outputs
Requirements: Kaggle T4 x2, Internet enabled

Running on Kaggle¶

Go to Kaggle
Create new notebook or upload tutorial
Set Accelerator to GPU T4 x2
Enable Internet
Run cells sequentially

Prerequisites¶

Knowledge¶

Python: Intermediate level
CUDA/GPU: Basic understanding
LLMs: Familiarity with language models
OpenTelemetry: None required (taught in tutorials)

Hardware¶

Recommended: Kaggle dual Tesla T4 (30GB VRAM total)
Minimum: Single Tesla T4 (15GB VRAM)

Software¶

Python 3.11+
CUDA 12.x (pre-installed on Kaggle)
Internet connection

Getting Help¶

Documentation: Full documentation
GitHub Issues: Report problems
Discussions: Ask questions
Troubleshooting: Common issues

Ready to learn? Start with Tutorial 01: Quick Start or jump to the Observability Trilogy.