Back to NVIDIA page

Back to BCM page

Topic: Bringing Up the Cluster (Post-Installation Tasks)

What This Unit Covers: The complete post-installation workflow required to move a newly installed NVIDIA Base Command Manager cluster from a basic head-node installation into a usable, provisioned cluster. What must be done on the head node before compute nodes can be safely powered on and provisioned. Licensing, software updates, software image preparation, category design, node assignment, identification, and provisioning flow. Common operational mistakes that slow down bring-up, especially around licensing, image changes, and node identification.

1. What “Bringing Up the Cluster” Means

High-level bring-up workflow

2. Step 1: Log Into the Head Node

3. Step 2: Install the Cluster License

Default / initial license behavior

How license installation works

What happens after license installation

How to verify the license

4. Step 3: Update the Head Node OS

5. Step 4: Update the Software Image

Important note about image changes

NVIDIA also notes that when kernel modules or similar low-level components are changed in an image, BCM regenerates the initial ramdisk. That regeneration can take some time, and it is part of why administrators should treat image modification as a controlled operation rather than a casual edit

6. Step 5: Clone the Software Image

7. Step 6: Clone the Node Category

8. Step 7: Assign the Software Image to the Category

9. Step 8: Assign Nodes to the Category

10. Step 9: Power On, Identify, and Provision Nodes

Node identification is critical

Provisioning flow

Advanced practical note

Key Takeaways

[1]: https://docs.nvidia.com/dgx-basepod/deployment-guides/dgx-basepod-a100/latest/deployment-configure.html “Cluster Configuration — NVIDIA DGX BasePOD: Deployment Guide Featuring NVIDIA DGX A100 Systems”

[2]: https://docs.nvidia.com/dgx-basepod/deployment-guide-dgx-basepod/latest/cluster-bringup.html “Cluster Bring Up — NVIDIA DGX BasePOD: Deployment Guide Featuring NVIDIA DGX H200/H100 Systems”

[3]: https://docs.nvidia.com/dgx-basepod/deployment-guide-basepod-rhel/latest/head-node.html “Head Node Configuration — NVIDIA DGX BasePOD on RHEL: Deployment Guide Featuring NVIDIA DGX A100”

Back to NVIDIA page

Back to BCM page