This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| wiki:ai:dgx-spark-monitoring [2026/04/24 15:52] – [Step 5: Create the Prometheus Configuration] swilson | wiki:ai:dgx-spark-monitoring [2026/04/24 16:24] (current) – [Step 11: Build the Dashboard] swilson | ||
|---|---|---|---|
| Line 159: | Line 159: | ||
| -p 3000:3000 \ | -p 3000:3000 \ | ||
| grafana/ | grafana/ | ||
| + | | ||
| + | | ||
| Connect both containers to a shared Docker network so Grafana can reach Prometheus by name: | Connect both containers to a shared Docker network so Grafana can reach Prometheus by name: | ||
| Line 173: | Line 175: | ||
| * '' | * '' | ||
| * '' | * '' | ||
| + | |||
| + | | ||
| + | |||
| \\ \\ | \\ \\ | ||
| Line 181: | Line 186: | ||
| **Note:** The DGX Spark does not have UFW installed. Use iptables directly: | **Note:** The DGX Spark does not have UFW installed. Use iptables directly: | ||
| sudo iptables -I INPUT -s 172.17.0.0/ | sudo iptables -I INPUT -s 172.17.0.0/ | ||
| + | | ||
| + | | ||
| This is the critical rule that allows Prometheus (running in Docker) to scrape nv-monitor (running on the host). | This is the critical rule that allows Prometheus (running in Docker) to scrape nv-monitor (running on the host). | ||
| Line 217: | Line 224: | ||
| On your **Mac**, open a **new local terminal** (not an SSH session to the Spark — the prompt must show your Mac hostname): | On your **Mac**, open a **new local terminal** (not an SSH session to the Spark — the prompt must show your Mac hostname): | ||
| ssh -L 9090: | ssh -L 9090: | ||
| + | | ||
| + | {{: | ||
| Keep this terminal open. Then open in your Mac browser: | Keep this terminal open. Then open in your Mac browser: | ||
| Line 238: | Line 247: | ||
| * State: **UP** (green) | * State: **UP** (green) | ||
| * Scrape duration: under 10ms (typically ~2ms) | * Scrape duration: under 10ms (typically ~2ms) | ||
| + | |||
| + | {{: | ||
| If the state shows DOWN, see the Troubleshooting section. | If the state shows DOWN, see the Troubleshooting section. | ||
| Line 249: | Line 260: | ||
| * Set a new password when prompted | * Set a new password when prompted | ||
| + | {{: | ||
| ====Add Prometheus as a data source==== | ====Add Prometheus as a data source==== | ||
| - Click **Connections** in the left sidebar | - Click **Connections** in the left sidebar | ||
| Line 257: | Line 269: | ||
| - Click **Save & test** | - Click **Save & test** | ||
| - You should see: **Successfully queried the Prometheus API** | - You should see: **Successfully queried the Prometheus API** | ||
| + | |||
| + | {{: | ||
| + | {{: | ||
| + | {{: | ||
| + | {{: | ||
| + | |||
| + | |||
| + | |||
| ====Why '' | ====Why '' | ||
| Line 270: | Line 290: | ||
| - Add each panel below one at a time | - Add each panel below one at a time | ||
| - For each panel: select the metric in the Builder tab, set the title in the right panel options, confirm the visualization type, then click **Back to dashboard** | - For each panel: select the metric in the Builder tab, set the title in the right panel options, confirm the visualization type, then click **Back to dashboard** | ||
| + | |||
| + | {{: | ||
| + | {{: | ||
| + | {{: | ||
| + | |||
| + | |||
| + | |||
| ====Dashboard panels==== | ====Dashboard panels==== | ||
| Line 296: | Line 323: | ||
| cd ~/ | cd ~/ | ||
| ./demo-load --gpu | ./demo-load --gpu | ||
| + | | ||
| + | {{: | ||
| Expected output: | Expected output: | ||
| Line 393: | Line 422: | ||
| \\ \\ | \\ \\ | ||
| [[wiki: | [[wiki: | ||
| - | |||