User Tools

Site Tools


wiki:ai:dgx-spark-monitoring

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
wiki:ai:dgx-spark-monitoring [2026/04/24 15:54] – [Step 6: Start Prometheus and Grafana in Docker] swilsonwiki:ai:dgx-spark-monitoring [2026/04/24 16:24] (current) – [Step 11: Build the Dashboard] swilson
Line 186: Line 186:
 **Note:** The DGX Spark does not have UFW installed. Use iptables directly: **Note:** The DGX Spark does not have UFW installed. Use iptables directly:
   sudo iptables -I INPUT -s 172.17.0.0/16 -p tcp --dport 9101 -j ACCEPT   sudo iptables -I INPUT -s 172.17.0.0/16 -p tcp --dport 9101 -j ACCEPT
 +  
 + {{:wiki:ai:screenshot_2026-04-17_at_3.42.51 pm.png|}}
  
 This is the critical rule that allows Prometheus (running in Docker) to scrape nv-monitor (running on the host). This is the critical rule that allows Prometheus (running in Docker) to scrape nv-monitor (running on the host).
Line 222: Line 224:
 On your **Mac**, open a **new local terminal** (not an SSH session to the Spark — the prompt must show your Mac hostname): On your **Mac**, open a **new local terminal** (not an SSH session to the Spark — the prompt must show your Mac hostname):
   ssh -L 9090:localhost:9090 -L 3000:localhost:3000 YOUR_USERNAME@YOUR_SPARK_IP   ssh -L 9090:localhost:9090 -L 3000:localhost:3000 YOUR_USERNAME@YOUR_SPARK_IP
 +  
 +{{:wiki:ai:screenshot_2026-04-17_at_3.43.41 pm.png|}}
  
 Keep this terminal open. Then open in your Mac browser: Keep this terminal open. Then open in your Mac browser:
Line 243: Line 247:
   * State: **UP** (green)   * State: **UP** (green)
   * Scrape duration: under 10ms (typically ~2ms)   * Scrape duration: under 10ms (typically ~2ms)
 +
 +{{:wiki:ai:screenshot_2026-04-17_at_3.49.16 pm.png|}}
  
 If the state shows DOWN, see the Troubleshooting section. If the state shows DOWN, see the Troubleshooting section.
Line 254: Line 260:
   * Set a new password when prompted   * Set a new password when prompted
  
 +{{:wiki:ai:screenshot_2026-04-17_at_3.54.06 pm.png|}}
 ====Add Prometheus as a data source==== ====Add Prometheus as a data source====
   - Click **Connections** in the left sidebar   - Click **Connections** in the left sidebar
Line 262: Line 269:
   - Click **Save & test**   - Click **Save & test**
   - You should see: **Successfully queried the Prometheus API**   - You should see: **Successfully queried the Prometheus API**
 +
 +{{:wiki:ai:screenshot_2026-04-17_at_3.57.53 pm.png|}}
 +{{:wiki:ai:screenshot_2026-04-17_at_3.59.01 pm.png|}}
 +{{:wiki:ai:Screenshot 2026-04-17 at 4.01.33 pm.png|}}
 +{{:wiki:ai:Screenshot 2026-04-17 at 4.02.47 PM.png|}}
 +
 +
 +
  
 ====Why ''http://prometheus:9090'' works==== ====Why ''http://prometheus:9090'' works====
Line 275: Line 290:
   - Add each panel below one at a time   - Add each panel below one at a time
   - For each panel: select the metric in the Builder tab, set the title in the right panel options, confirm the visualization type, then click **Back to dashboard**   - For each panel: select the metric in the Builder tab, set the title in the right panel options, confirm the visualization type, then click **Back to dashboard**
 +
 +{{:wiki:ai:screenshot_2026-04-17_at_4.03.34 pm.png|}}
 +{{:wiki:ai:Screenshot 2026-04-17 at 4.03.56 PM.png|}}
 +{{:wiki:ai:Screenshot 2026-04-17 at 4.11.42 PM.png|}}
 +
 +
 +
  
 ====Dashboard panels==== ====Dashboard panels====
Line 301: Line 323:
   cd ~/nv-monitor   cd ~/nv-monitor
   ./demo-load --gpu   ./demo-load --gpu
 +  
 +{{:wiki:ai:screenshot_2026-04-17_at_4.24.57 pm.png|}}
  
 Expected output: Expected output:
Line 398: Line 422:
 \\ \\ \\ \\
 [[wiki:ai:home-page|AI Home]] [[wiki:ai:home-page|AI Home]]
- 
wiki/ai/dgx-spark-monitoring.1777046046.txt.gz · Last modified: by swilson