User Tools

Site Tools


wiki:ai:dgx-spark-monitoring

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
wiki:ai:dgx-spark-monitoring [2026/04/17 11:28] swilsonwiki:ai:dgx-spark-monitoring [2026/04/17 11:36] (current) – [Step 1: SSH into the DGX Spark] swilson
Line 12: Line 12:
 =====Step 1: SSH into the DGX Spark===== =====Step 1: SSH into the DGX Spark=====
  
-From your Mac terminal, SSH into the Spark: +From your Local terminal, SSH into the Spark:
   ssh YOUR_USERNAME@YOUR_SPARK_IP   ssh YOUR_USERNAME@YOUR_SPARK_IP
  
Line 39: Line 38:
  
 Verify it works by launching the interactive TUI: Verify it works by launching the interactive TUI:
- 
   ./nv-monitor   ./nv-monitor
  
Line 55: Line 53:
  
 Start nv-monitor in headless mode with a Bearer token: Start nv-monitor in headless mode with a Bearer token:
- 
   cd ~/nv-monitor   cd ~/nv-monitor
   ./nv-monitor -n -p 9101 -t YOUR_SECRET_TOKEN &   ./nv-monitor -n -p 9101 -t YOUR_SECRET_TOKEN &
Line 72: Line 69:
  
 Verify it is working: Verify it is working:
- 
   curl -s -H "Authorization: Bearer YOUR_SECRET_TOKEN" localhost:9101/metrics | head -10   curl -s -H "Authorization: Bearer YOUR_SECRET_TOKEN" localhost:9101/metrics | head -10
  
Line 95: Line 91:
   global:   global:
     scrape_interval: 5s     scrape_interval: 5s
- 
   scrape_configs:   scrape_configs:
     - job_name: 'nv-monitor'     - job_name: 'nv-monitor'
Line 127: Line 122:
  
 Connect both containers to a shared Docker network so Grafana can reach Prometheus by name: Connect both containers to a shared Docker network so Grafana can reach Prometheus by name:
- 
   docker network create monitoring   docker network create monitoring
   docker network connect monitoring prometheus   docker network connect monitoring prometheus
Line 133: Line 127:
  
 Verify both are healthy: Verify both are healthy:
- 
   docker ps   docker ps
   curl -s localhost:9090/-/healthy   curl -s localhost:9090/-/healthy
Line 148: Line 141:
  
 **Note:** The DGX Spark does not have UFW installed. Use iptables directly: **Note:** The DGX Spark does not have UFW installed. Use iptables directly:
- 
   sudo iptables -I INPUT -s 172.17.0.0/16 -p tcp --dport 9101 -j ACCEPT   sudo iptables -I INPUT -s 172.17.0.0/16 -p tcp --dport 9101 -j ACCEPT
  
Line 162: Line 154:
  
 On your **Mac**, open a **new local terminal** (not an SSH session to the Spark — the prompt must show your Mac hostname): On your **Mac**, open a **new local terminal** (not an SSH session to the Spark — the prompt must show your Mac hostname):
- 
   ssh -L 9090:localhost:9090 -L 3000:localhost:3000 YOUR_USERNAME@YOUR_SPARK_IP   ssh -L 9090:localhost:9090 -L 3000:localhost:3000 YOUR_USERNAME@YOUR_SPARK_IP
  
Line 298: Line 289:
  
 **Fix 1** — Use the correct target IP in ''prometheus.yml''. The target must be the Docker bridge gateway, not localhost: **Fix 1** — Use the correct target IP in ''prometheus.yml''. The target must be the Docker bridge gateway, not localhost:
- 
   targets: ['172.17.0.1:9101']   targets: ['172.17.0.1:9101']
  
Line 306: Line 296:
  
 **Fix 2** — Allow Docker bridge through the firewall: **Fix 2** — Allow Docker bridge through the firewall:
- 
   sudo iptables -I INPUT -s 172.17.0.0/16 -p tcp --dport 9101 -j ACCEPT   sudo iptables -I INPUT -s 172.17.0.0/16 -p tcp --dport 9101 -j ACCEPT
  
Line 317: Line 306:
 ====Grafana cannot connect to Prometheus — "lookup prometheus: no such host"==== ====Grafana cannot connect to Prometheus — "lookup prometheus: no such host"====
 The containers are not on the same Docker network. Run: The containers are not on the same Docker network. Run:
- 
   docker network create monitoring   docker network create monitoring
   docker network connect monitoring prometheus   docker network connect monitoring prometheus
wiki/ai/dgx-spark-monitoring.1776425292.txt.gz · Last modified: by swilson