Troubleshooting¶
Common issues and solutions for llamatelemetry v0.1.0.
Installation Issues¶
Binary Download Fails¶
Symptom: Error during import llamatelemetry or binary download timeout
Solutions:
-
Clear cache and retry:
-
Check internet connection:
-
Manual download from HuggingFace Hub
CUDA Not Available¶
Symptom: check_cuda_available() returns False
Solutions:
-
Verify CUDA installation:
-
Check Kaggle accelerator settings:
- Settings → Accelerator → GPU T4 x2
-
Restart notebook session
-
Verify GPU detection:
Version Conflicts¶
Symptom: ImportError or version mismatch warnings
Solution: Force reinstall with no cache:
!pip install -q --no-cache-dir --force-reinstall \
git+https://github.com/llamatelemetry/llamatelemetry.git@v0.1.0
Server Issues¶
Server Won't Start¶
Symptom: start_server() fails or hangs
Solutions:
-
Check if port 8080 is in use:
-
Use different port:
-
Check model file exists:
Out of Memory (OOM)¶
Symptom: CUDA out of memory error
Solutions:
-
Use smaller model:
-
Reduce context size:
-
Check VRAM usage:
Slow Inference¶
Symptom: Low tokens/sec throughput
Solutions:
-
Enable FlashAttention:
-
Use GPU 0 only:
-
Verify GPU utilization:
OpenTelemetry Issues¶
No Spans Captured¶
Symptom: get_finished_spans() returns empty list
Solutions:
-
Attach exporter BEFORE making requests:
-
Use SimpleSpanProcessor for testing:
-
Check spans were actually created:
OTLP Export Fails with 404¶
Symptom: OTLP exporter returns 404 error
Solution: Use explicit endpoint paths:
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
exporter = OTLPSpanExporter(
endpoint="http://localhost:4317/v1/traces", # Explicit /v1/traces
insecure=True
)
Metrics Not Recorded¶
Symptom: GPU metrics not appearing
Solutions:
-
Verify PyNVML is working:
-
Check meter is properly configured:
Graphistry Issues¶
Connection Fails¶
Symptom: Graphistry plot fails to render
Solutions:
-
Verify registration:
-
Check internet connection:
-
Use GPU 1 explicitly:
No Edges Error¶
Symptom: "DataFrame has no edges" error
Solution: Verify spans have parent-child relationships:
spans = memory_exporter.get_finished_spans()
edges = [(s.parent.span_id, s.context.span_id) for s in spans if s.parent]
print(f"Found {len(edges)} edges")
if len(edges) == 0:
print("No parent-child relationships found!")
Plot Doesn't Render¶
Symptom: Graphistry returns URL but doesn't display
Solutions:
-
Open URL manually:
-
Check Kaggle allows iframes:
- Graphistry plots may not render in Kaggle notebooks
- Copy URL and open in new tab
Model Issues¶
Model Download Fails¶
Symptom: HuggingFace download timeout or error
Solutions:
-
Check internet connection:
-
Use explicit token:
-
Try different model repo:
Wrong Quantization¶
Symptom: Model quality is poor
Solution: Check quantization level:
- Q2/Q3: Very low quality, not recommended
- Q4: Good balance (recommended)
- Q5/Q6: Better quality, slower
- Q8: Best quality, much slower
Use Q4_K_M for best balance:
Performance Issues¶
Low Tokens/sec¶
Symptom: Inference is slower than expected
Solutions:
-
Enable FlashAttention:
-
Use dedicated GPU (tensor_split):
-
Reduce batch size for streaming:
-
Check GPU utilization:
High Memory Usage¶
Symptom: Running out of VRAM
Solutions:
-
Reduce context size:
-
Use more aggressive quantization:
-
Offload some layers to CPU:
Client Issues¶
Connection Refused¶
Symptom: Client can't connect to server
Solutions:
-
Verify server is running:
-
Check correct port:
-
Wait for server to be ready:
Timeout Errors¶
Symptom: Request timeout
Solutions:
-
Increase timeout:
-
Reduce max_tokens:
Getting Help¶
If you're still stuck:
- Check logs:
- Server logs in notebook output
-
Python tracebacks
-
Gather information:
-
Search issues:
-
Ask for help:
- GitHub Discussions
-
Include: error message, code snippet, system info
-
Report bug:
- New Issue
- Use bug report template