This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| wiki:ai:dgx-spark-monitoring [2026/04/17 11:26] – swilson | wiki:ai:dgx-spark-monitoring [2026/04/17 11:36] (current) – [Step 1: SSH into the DGX Spark] swilson | ||
|---|---|---|---|
| Line 12: | Line 12: | ||
| =====Step 1: SSH into the DGX Spark===== | =====Step 1: SSH into the DGX Spark===== | ||
| - | From your Mac terminal, SSH into the Spark: | + | From your Local terminal, SSH into the Spark: |
| - | + | ssh YOUR_USERNAME@YOUR_SPARK_IP | |
| - | ssh < | + | |
| All steps below are run on the Spark unless noted otherwise. | All steps below are run on the Spark unless noted otherwise. | ||
| Line 39: | Line 38: | ||
| Verify it works by launching the interactive TUI: | Verify it works by launching the interactive TUI: | ||
| - | |||
| ./ | ./ | ||
| Line 55: | Line 53: | ||
| Start nv-monitor in headless mode with a Bearer token: | Start nv-monitor in headless mode with a Bearer token: | ||
| - | |||
| cd ~/ | cd ~/ | ||
| - | ./ | + | ./ |
| - | Replace '' | + | Replace '' |
| ====Flags explained==== | ====Flags explained==== | ||
| * **-n:** headless mode — no TUI, runs silently in the background | * **-n:** headless mode — no TUI, runs silently in the background | ||
| * **-p 9101:** expose Prometheus metrics endpoint on port 9101 | * **-p 9101:** expose Prometheus metrics endpoint on port 9101 | ||
| - | * **-t < | + | * **-t YOUR_SECRET_TOKEN:** require this Bearer token on every HTTP request |
| * **&:** run in background so the terminal stays free | * **&:** run in background so the terminal stays free | ||
| Line 72: | Line 69: | ||
| Verify it is working: | Verify it is working: | ||
| - | + | | |
| - | | + | |
| You should see output starting with ''# | You should see output starting with ''# | ||
| Line 95: | Line 91: | ||
| global: | global: | ||
| scrape_interval: | scrape_interval: | ||
| - | |||
| scrape_configs: | scrape_configs: | ||
| - job_name: ' | - job_name: ' | ||
| authorization: | authorization: | ||
| - | credentials: | + | credentials: |
| static_configs: | static_configs: | ||
| - targets: [' | - targets: [' | ||
| EOF | EOF | ||
| - | Replace '' | + | Replace '' |
| ====Why 172.17.0.1 and not localhost? | ====Why 172.17.0.1 and not localhost? | ||
| Line 127: | Line 122: | ||
| Connect both containers to a shared Docker network so Grafana can reach Prometheus by name: | Connect both containers to a shared Docker network so Grafana can reach Prometheus by name: | ||
| - | |||
| docker network create monitoring | docker network create monitoring | ||
| docker network connect monitoring prometheus | docker network connect monitoring prometheus | ||
| Line 133: | Line 127: | ||
| Verify both are healthy: | Verify both are healthy: | ||
| - | |||
| docker ps | docker ps | ||
| curl -s localhost: | curl -s localhost: | ||
| Line 148: | Line 141: | ||
| **Note:** The DGX Spark does not have UFW installed. Use iptables directly: | **Note:** The DGX Spark does not have UFW installed. Use iptables directly: | ||
| - | |||
| sudo iptables -I INPUT -s 172.17.0.0/ | sudo iptables -I INPUT -s 172.17.0.0/ | ||
| Line 162: | Line 154: | ||
| On your **Mac**, open a **new local terminal** (not an SSH session to the Spark — the prompt must show your Mac hostname): | On your **Mac**, open a **new local terminal** (not an SSH session to the Spark — the prompt must show your Mac hostname): | ||
| - | + | | |
| - | | + | |
| Keep this terminal open. Then open in your Mac browser: | Keep this terminal open. Then open in your Mac browser: | ||
| Line 265: | Line 256: | ||
| cd ~/ | cd ~/ | ||
| - | ./ | + | ./ |
| docker start prometheus grafana | docker start prometheus grafana | ||
| **On your Mac (new local terminal): | **On your Mac (new local terminal): | ||
| - | ssh -L 9090: | + | ssh -L 9090: |
| Then open '' | Then open '' | ||
| Line 298: | Line 289: | ||
| **Fix 1** — Use the correct target IP in '' | **Fix 1** — Use the correct target IP in '' | ||
| - | |||
| targets: [' | targets: [' | ||
| Line 306: | Line 296: | ||
| **Fix 2** — Allow Docker bridge through the firewall: | **Fix 2** — Allow Docker bridge through the firewall: | ||
| - | |||
| sudo iptables -I INPUT -s 172.17.0.0/ | sudo iptables -I INPUT -s 172.17.0.0/ | ||
| Line 317: | Line 306: | ||
| ====Grafana cannot connect to Prometheus — " | ====Grafana cannot connect to Prometheus — " | ||
| The containers are not on the same Docker network. Run: | The containers are not on the same Docker network. Run: | ||
| - | |||
| docker network create monitoring | docker network create monitoring | ||
| docker network connect monitoring prometheus | docker network connect monitoring prometheus | ||