08 Document Network Analysis¶
Source: notebooks/08-document-network-analysis-graphistry-e1.ipynb
Notebook focus¶
This page is a cell-by-cell walkthrough of the notebook, explaining the intent of each step and showing the exact code executed.
Cell-by-cell walkthrough¶
Cell 1 (Markdown)¶
08 Document Network Analysis¶
Build document similarity networks with GraphistryBuilders.document_similarity().
What you will learn: - Model documents as nodes with metadata - Define pairwise similarity scores as edges - Generate DataFrames for graph visualization
Requirements: llamatelemetry installed. No GPU required.
Cell 2 (Markdown)¶
1) Install¶
Cell 3 (Code)¶
Summary: Installs required dependencies and runtime tools.
Cell 4 (Markdown)¶
2) Define documents and similarity scores¶
Cell 5 (Code)¶
Summary: Imports core libraries: llamatelemetry. Sets up Graphistry for graph visualization or analytics.
from llamatelemetry.graphistry import GraphistryBuilders
documents = [
{"id": "doc1", "title": "CUDA Programming Guide"},
{"id": "doc2", "title": "llama.cpp Server Architecture"},
{"id": "doc3", "title": "OpenTelemetry Metrics Specification"},
{"id": "doc4", "title": "NCCL Collective Operations"},
{"id": "doc5", "title": "GPU Memory Management"},
]
similarities = [
{"source": "doc1", "target": "doc2", "score": 0.72},
{"source": "doc1", "target": "doc4", "score": 0.85},
{"source": "doc1", "target": "doc5", "score": 0.91},
{"source": "doc2", "target": "doc3", "score": 0.68},
{"source": "doc4", "target": "doc5", "score": 0.77},
]
Cell 6 (Markdown)¶
3) Build the similarity graph¶
Cell 7 (Code)¶
Summary: Executes notebook-specific logic or data processing for this step.
nodes_df, edges_df = GraphistryBuilders.document_similarity(documents, similarities)
print("=== Document Nodes ===")
print(nodes_df.to_string(index=False))
print(f"\n=== Similarity Edges ({len(edges_df)}) ===")
print(edges_df.to_string(index=False))
Cell 8 (Markdown)¶
4) Filter by threshold¶
Cell 9 (Code)¶
Summary: Executes notebook-specific logic or data processing for this step.
threshold = 0.75
strong_edges = edges_df[edges_df["score"] >= threshold]
print(f"Edges with score >= {threshold}:")
print(strong_edges.to_string(index=False))
Cell 10 (Markdown)¶
5) Visualize (optional)¶
Cell 11 (Code)¶
Summary: Sets up Graphistry for graph visualization or analytics.