Graphistry and RAPIDS¶
llamatelemetry integrates with Graphistry for interactive graph visualization and RAPIDS cuGraph for GPU-accelerated graph analytics. This module enables visual exploration of inference traces, knowledge graphs, document similarity networks, and embedding spaces.
Overview¶
The Graphistry and RAPIDS integration provides:
- GraphistrySession -- manages authentication and connection to Graphistry Hub
- GraphistryBuilders -- pre-built graph constructors for common patterns
- Trace-to-Graph -- converts OpenTelemetry traces into graph structures
- GraphistryViz -- renders interactive graph visualizations
- RAPIDSBackend -- GPU-accelerated graph algorithms (PageRank, Louvain, UMAP)
- SplitGPUManager -- coordinates GPU allocation between LLM and graph workloads
- GraphWorkload -- tracks and manages graph computation tasks
GraphistrySession¶
Authentication¶
from llamatelemetry.graphistry.connector import GraphistrySession
# From explicit credentials
session = GraphistrySession(
server="https://hub.graphistry.com",
username="your-username",
password="your-password",
)
session.login()
From Kaggle Secrets¶
session = GraphistrySession.from_kaggle_secrets()
# Reads GRAPHISTRY_USERNAME, GRAPHISTRY_PASSWORD, GRAPHISTRY_SERVER
# from Kaggle notebook secrets
Session Methods¶
| Method | Description |
|---|---|
login() |
Authenticate with Graphistry server |
is_authenticated() |
Check if session is active |
register(edges_df, nodes_df) |
Register a graph for visualization |
plot() |
Render the graph in the notebook |
GraphistryBuilders¶
Pre-built graph constructors for common LLM observability patterns:
Knowledge Graph¶
Visualize entities and relationships extracted from text:
from llamatelemetry.graphistry.builders import GraphistryBuilders
# Assuming you have a KnowledgeGraph from LouieClient
from llamatelemetry.louie.knowledge import KnowledgeExtractor
extractor = KnowledgeExtractor()
kg = extractor.extract("NVIDIA created CUDA for parallel computing on GPUs.")
# Build Graphistry-compatible graph
nodes_df, edges_df = GraphistryBuilders.knowledge_graph(kg)
# Visualize
session = GraphistrySession.from_kaggle_secrets()
session.login()
g = session.register(edges_df, nodes_df)
g.plot()
Document Similarity¶
Build a similarity graph from document embeddings:
documents = [
"Flash attention reduces memory usage.",
"CUDA enables parallel GPU computing.",
"GGUF stores quantized model weights.",
"Attention is a key transformer component.",
]
embeddings = [...] # List of embedding vectors
nodes_df, edges_df = GraphistryBuilders.document_similarity(
documents=documents,
embeddings=embeddings,
threshold=0.7, # Minimum similarity to create an edge
)
Embedding KNN Graph¶
Build a K-nearest-neighbors graph from embedding vectors:
labels = ["doc_1", "doc_2", "doc_3", "doc_4"]
embeddings = [...] # numpy arrays
nodes_df, edges_df = GraphistryBuilders.embedding_knn(
labels=labels,
embeddings=embeddings,
k=3, # Number of nearest neighbors
)
Attention Graph¶
Visualize attention patterns from transformer layers:
# attention_matrix is a numpy array of shape (n_tokens, n_tokens)
tokens = ["[CLS]", "What", "is", "CUDA", "?", "[SEP]"]
attention_matrix = [...]
nodes_df, edges_df = GraphistryBuilders.attention_graph(
tokens=tokens,
attention_weights=attention_matrix,
threshold=0.1, # Minimum attention weight to show
)
Trace-to-Graph Pipeline¶
Convert OpenTelemetry traces into graph structures for analysis:
Step 1: Collect Trace Records¶
from llamatelemetry.graphistry.viz import traces_to_records
# Get trace records from the telemetry exporter
records = traces_to_records(span_data)
# Returns: list of dicts with trace_id, span_id, parent_id, name, duration, attributes
Step 2: Convert to DataFrame¶
from llamatelemetry.graphistry.viz import records_to_dataframe
df = records_to_dataframe(records)
print(df.columns)
# ['trace_id', 'span_id', 'parent_span_id', 'name', 'duration_ms',
# 'start_time', 'end_time', 'gen_ai.request.model', ...]
Step 3: Build Graph Structure¶
from llamatelemetry.graphistry.viz import build_graph_nodes_edges
nodes_df, edges_df = build_graph_nodes_edges(df)
# nodes: span_id, name, duration_ms, attributes
# edges: source (parent_span_id), target (span_id), relationship
Step 4: Visualize¶
session = GraphistrySession.from_kaggle_secrets()
session.login()
g = session.register(edges_df, nodes_df)
g.plot()
Latency Time Series¶
Build a time-series visualization of inference latencies:
from llamatelemetry.graphistry.viz import build_latency_time_series
ts_df = build_latency_time_series(df)
# DataFrame with timestamp, latency_ms, tokens_per_sec, model columns
# Suitable for plotting with matplotlib or Graphistry
RAPIDSBackend¶
GPU-accelerated graph algorithms via RAPIDS cuGraph:
PageRank¶
Find the most important nodes in the graph:
pagerank_scores = backend.pagerank(edges_df, source="src", target="dst")
print(pagerank_scores.head())
# Returns DataFrame with node_id and pagerank columns
Louvain Community Detection¶
Discover communities in the graph:
communities = backend.louvain(edges_df, source="src", target="dst")
print(communities.head())
# Returns DataFrame with node_id and community columns
# Inspect community sizes
community_sizes = communities["community"].value_counts()
print(f"Found {len(community_sizes)} communities")
Betweenness Centrality¶
Identify bridge nodes that connect different parts of the graph:
centrality = backend.betweenness_centrality(
edges_df, source="src", target="dst"
)
top_bridges = centrality.nlargest(10, "centrality")
print(top_bridges)
UMAP Dimensionality Reduction¶
Project high-dimensional embeddings to 2D for visualization:
import numpy as np
embeddings = np.random.randn(100, 768) # 100 documents, 768 dims
coords_2d = backend.umap(
embeddings,
n_neighbors=15,
min_dist=0.1,
n_components=2,
)
# Returns numpy array of shape (100, 2)
SplitGPUManager¶
Manage GPU allocation between LLM inference and graph analytics:
from llamatelemetry.kaggle.gpu_context import split_gpu_session
# GPU 0 for LLM inference, GPU 1 for RAPIDS/Graphistry
with split_gpu_session(llm_gpu=0, graph_gpu=1):
# LLM operations use GPU 0
engine.load_model("gemma-3-1b-Q4_K_M", auto_start=True)
result = engine.infer("What is CUDA?", max_tokens=128)
# RAPIDS operations automatically use GPU 1
backend = RAPIDSBackend()
scores = backend.pagerank(edges_df)
GraphWorkload¶
Track graph computation workloads:
from llamatelemetry.graphistry.workload import GraphWorkload
workload = GraphWorkload(name="trace-analysis")
workload.add_task("pagerank", edges_df)
workload.add_task("louvain", edges_df)
results = workload.execute(backend)
print(f"Tasks completed: {workload.completed_count}")
print(f"Total time: {workload.total_duration_ms:.1f} ms")
Complete Example¶
import llamatelemetry as lt
from llamatelemetry.graphistry.connector import GraphistrySession
from llamatelemetry.graphistry.builders import GraphistryBuilders
from llamatelemetry.graphistry.rapids import RAPIDSBackend
from llamatelemetry.louie.knowledge import KnowledgeExtractor
from llamatelemetry.kaggle.gpu_context import split_gpu_session
# Split GPUs
with split_gpu_session(llm_gpu=0, graph_gpu=1):
# 1. Run inference to generate text
with lt.InferenceEngine() as engine:
engine.load_model("gemma-3-1b-Q4_K_M", auto_start=True)
result = engine.infer(
"Describe the relationship between CUDA, GPUs, and deep learning.",
max_tokens=256,
)
# 2. Extract knowledge graph from generated text
extractor = KnowledgeExtractor()
kg = extractor.extract(result.text)
# 3. Build Graphistry visualization
nodes_df, edges_df = GraphistryBuilders.knowledge_graph(kg)
# 4. Run graph analytics
backend = RAPIDSBackend()
pagerank = backend.pagerank(edges_df)
communities = backend.louvain(edges_df)
print(f"Entities: {len(kg.entities)}")
print(f"Relationships: {len(kg.relationships)}")
print(f"Communities found: {communities['community'].nunique()}")
# 5. Visualize in Graphistry
session = GraphistrySession.from_kaggle_secrets()
session.login()
g = session.register(edges_df, nodes_df)
g.plot()
Dependencies¶
| Package | Required For | Install |
|---|---|---|
pygraphistry |
Visualization | pip install pygraphistry |
pandas |
DataFrames | pip install pandas |
cudf |
GPU DataFrames | RAPIDS install |
cugraph |
Graph algorithms | RAPIDS install |
cuml |
UMAP | RAPIDS install |
RAPIDS Availability
RAPIDS (cudf, cugraph, cuml) requires NVIDIA GPUs and specific CUDA versions. On Kaggle T4 instances, RAPIDS is pre-installed. For local development, see the RAPIDS installation guide.
Best Practices¶
- Split GPUs on Kaggle -- dedicate one GPU to inference and another to RAPIDS.
- Use thresholds in similarity graphs to avoid excessive edges.
- Start with PageRank for initial graph exploration before running more expensive algorithms.
- Cache embeddings -- recompute only when the underlying data changes.
- Use Graphistry Hub for sharing interactive visualizations with collaborators.
Related¶
- Louie Knowledge Graphs -- knowledge extraction
- Telemetry and Observability -- trace data source
- Kaggle Environment -- GPU splitting
- Graphistry API Reference