| |

VCF / NSX-T Edge Node Resize

This blog post will cover the process to scale up your NSX-T edge nodes when used in a VCF environment.

You may find that your existing edge nodes are reaching their throughput maximums, or you may need to scale up to enable more CPU heavy features such as L7 Load Balancing, or Tanzu support.
Edge Sizing Guide

It’s not supported to simply add CPU/Memory, you must redeploy the edge nodes with the new size. Luckily there is a tool provided to achieve this.
Edge Node Resizing Tool

In this example, I accidentally deployed my edge nodes as “LARGE” size. You’ll probably want this in Production environments, but since we’re low on available resources I’m going to resize this to “SMALL”.

vcf@sddc-manager [ ~/resizer ]$ ./resize.sh --edge-cluster EC-01 --user administrator@vsphere.local --password VMware123! --form-factor SMALL
VCF Edge node resizer tool, version 0.7
Logging to /home/vcf/resizer/resizer/edge_node_resizer_2023-06-30T01:15:43.log
Resizing Edge nodes in Edge cluster EC-01 to form factor SMALL

The Edge Node Resizer tool takes Edge nodes offline one at a time in order
to recreate them with a specified form factor. This means that

* Each Edge node's Tier-0 interfaces temporarily go offline during resizing

* Tier-1 router services relocate from one Edge node to another

This may lead to temporary network traffic interruptions during the resize.
The full resize operation may take as long as it did to originally create
and (if requested) expand the Edge cluster.

Do you wish to proceed (y/n)? y
Run confirmation accepted by user.
Getting credentials from SDDC Manager..
Getting Bearer token
count of WLDs supported by our NSX-T cluster: 1
workload_name = mgmt-domain
Credential retrieval completed.
Connection established to vCenter at 10.0.0.12
edge cluster EC-01: 5d169fc3-474e-42a6-b8e1-75fd265c73d3
Refreshing NSX view of Edge node id fa7f3afe-b3b9-4b8e-83f3-91ce36608322
Found vSphere rules for VM edge1-mgmt:
   VCF-edge_EC-01_antiAffinity_b38f42b9beb851202facc2bcc7cd6d7e
Edge node VM edge1-mgmt is in 0 VM groups for cluster mgmt-cluster-01
Refreshing NSX view of Edge node id 64749ef7-d464-48b6-9fdd-f3e8efffc9bf
Found vSphere rules for VM edge2-mgmt:
   VCF-edge_EC-01_antiAffinity_b38f42b9beb851202facc2bcc7cd6d7e
Edge node VM edge2-mgmt is in 0 VM groups for cluster mgmt-cluster-01
For Edge cluster EC-01,
   Edge node edge1-mgmt (10.0.0.23) has form factor LARGE
   Edge node edge2-mgmt (10.0.0.24) has form factor LARGE
Loading Edge cluster config info from /home/vcf/.vcf-edge-redeploy/EC-01.json
Marking Edge cluster EC-01 cache with operation-in-progress = True
Loading Edge cluster config info from /home/vcf/.vcf-edge-redeploy/EC-01.json
Check that 2 x SMALL Edge node VMs fit in cluster mgmt-cluster-01's resource pool EC-01
Resource pool has 0 CPU and 0 RAM.
After resize, pool's Edge nodes need 4000 CPU and 8192 RAM
Resizing resource pool EC-01
posting to url: https://10.0.0.20/api/v1/transport-nodes/fa7f3afe-b3b9-4b8e-83f3-91ce36608322?action=redeploy
resp.status_code = 200
EN VM moid: start=vm-37, cur=vm-37
tnState: ndsState=NODE_READY, outerState=in_progress
EN VM moid: start=vm-37, cur=None
tnState: ndsState=VM_DEPLOYMENT_RESTARTED, outerState=pending
EN VM moid: start=vm-37, cur=None
tnState: ndsState=REGISTRATION_PENDING, outerState=pending
EN VM moid: start=vm-37, cur=None
tnState: ndsState=NODE_NOT_READY, outerState=pending
tnState: ndsState=NODE_READY, outerState=failed
EN VM moid: start=vm-37, cur=vm-6081
tnState: ndsState=NODE_READY, outerState=failed
EN VM moid: start=vm-37, cur=vm-6081
tnState: ndsState=NODE_READY, outerState=in_progress
EN VM moid: start=vm-37, cur=vm-6081
………

Redeployment successful for Edge node edge1-mgmt
Waited 1254 seconds, or 21 minutes, for redeploy of Edge node edge1-mgmt
AA rule VCF-edge_EC-01_antiAffinity_b38f42b9beb851202facc2bcc7cd6d7e still exists, updating it..
Re-added edge1-mgmt to AA rule VCF-edge_EC-01_antiAffinity_b38f42b9beb851202facc2bcc7cd6d7e
Updating known_hosts entry for edge1-mgmt.vcf.sddc.lab (fa7f3afe-b3b9-4b8e-83f3-91ce36608322)
Freshen VCF known_hosts key for edge1-mgmt.vcf.sddc.lab
Temporarily enabling ssh to edge1-mgmt.vcf.sddc.lab
* posting to url https://10.0.0.20/api/v1/transport-nodes/fa7f3afe-b3b9-4b8e-83f3-91ce36608322/node/services/ssh
Get https://10.0.0.20/api/v1/transport-nodes/fa7f3afe-b3b9-4b8e-83f3-91ce36608322/node/services/ssh/status
ssh runtime_state: running
Freshen known_hosts key for edge1-mgmt.vcf.sddc.lab
# edge1-mgmt.vcf.sddc.lab:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
Current ssh key obtained for edge1-mgmt.vcf.sddc.lab
dropping old edge1-mgmt.vcf.sddc.lab, key type ssh-rsa
Ran post, result = {}
Re-disabling ssh to edge1-mgmt.vcf.sddc.lab
Traceback (most recent call last):
  File "./resize.py", line 2233, in <module>
    redeployer.process()
  File "./resize.py", line 1955, in process
    self.resize_edge_nodes()
  File "./resize.py", line 533, in resize_edge_nodes
    self._do_requested_resize()
  File "./resize.py", line 558, in _do_requested_resize
    if not self._resize_edge_node(enInfo, doRollback=False):
  File "./resize.py", line 647, in _resize_edge_node
    self._update_edge_node_host_ssh_key(enInfo)
  File "./resize.py", line 831, in _update_edge_node_host_ssh_key
    dryrun=self.isDryRun())
  File "/home/vcf/resizer/vcf_utils/edge_node_vcf_known_hosts_util.py", line 198, in freshenEdgeNodeInVcfKnownHosts
    absUrl = self._setTnSshState(edgeNodeNsxId, False)
  File "/home/vcf/resizer/vcf_utils/edge_node_vcf_known_hosts_util.py", line 108, in _setTnSshState
    state = self._getTnSshStatus(edgeNodeNsxId)
  File "/home/vcf/resizer/vcf_utils/edge_node_vcf_known_hosts_util.py", line 81, in _getTnSshStatus
    resp.raise_for_status()
  File "/usr/lib/python3.7/site-packages/requests/models.py", line 941, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error:  for url: https://10.0.0.20/api/v1/transport-nodes/fa7f3afe-b3b9-4b8e-83f3-91ce36608322/node/services/ssh/status
vcf@sddc-manager [ ~/resizer ]$

Now this replaced the first node successfully, but then failed. It’s likely just a timeout because my lab is a little slow. Let’s try again.

vcf@sddc-manager [ ~/resizer ]$ ./resize.sh --edge-cluster EC-01 --user administrator@vsphere.local --password VMware123! --form-factor SMALL
VCF Edge node resizer tool, version 0.7
Logging to /home/vcf/resizer/resizer/edge_node_resizer_2023-06-30T01:40:41.log
Resizing Edge nodes in Edge cluster EC-01 to form factor SMALL

The Edge Node Resizer tool takes Edge nodes offline one at a time in order
to recreate them with a specified form factor. This means that

* Each Edge node's Tier-0 interfaces temporarily go offline during resizing

* Tier-1 router services relocate from one Edge node to another

This may lead to temporary network traffic interruptions during the resize.
The full resize operation may take as long as it did to originally create
and (if requested) expand the Edge cluster.

Do you wish to proceed (y/n)? y
Run confirmation accepted by user.
Getting credentials from SDDC Manager..
Getting Bearer token
count of WLDs supported by our NSX-T cluster: 1
workload_name = mgmt-domain
Credential retrieval completed.
Connection established to vCenter at 10.0.0.12
edge cluster EC-01: 5d169fc3-474e-42a6-b8e1-75fd265c73d3
Refreshing NSX view of Edge node id fa7f3afe-b3b9-4b8e-83f3-91ce36608322
Found vSphere rules for VM edge1-mgmt:
   VCF-edge_EC-01_antiAffinity_b38f42b9beb851202facc2bcc7cd6d7e
Edge node VM edge1-mgmt is in 0 VM groups for cluster mgmt-cluster-01
Refreshing NSX view of Edge node id 64749ef7-d464-48b6-9fdd-f3e8efffc9bf
Found vSphere rules for VM edge2-mgmt:
   VCF-edge_EC-01_antiAffinity_b38f42b9beb851202facc2bcc7cd6d7e
Edge node VM edge2-mgmt is in 0 VM groups for cluster mgmt-cluster-01
For Edge cluster EC-01,
   Edge node edge1-mgmt (10.0.0.23) has form factor SMALL
  Edge node edge2-mgmt (10.0.0.24) has form factor LARGE
Loading Edge cluster config info from /home/vcf/.vcf-edge-redeploy/EC-01.json
Existing cache for EC-01 shows a redeploy operation still in progress, so not refreshing cache from live configuration now.
Marking Edge cluster EC-01 cache with operation-in-progress = True
Loading Edge cluster config info from /home/vcf/.vcf-edge-redeploy/EC-01.json
Check that 2 x SMALL Edge node VMs fit in cluster mgmt-cluster-01's resource pool EC-01
Resource pool has 4000 CPU and 8192 RAM.
After resize, pool's Edge nodes need 4000 CPU and 8192 RAM
Resource pool is large enough: no resize needed for EC-01
Edge node edge1-mgmt already has desired form-factor of small, not resizing it.
Updating known_hosts entry for edge1-mgmt.vcf.sddc.lab (fa7f3afe-b3b9-4b8e-83f3-91ce36608322)
Freshen VCF known_hosts key for edge1-mgmt.vcf.sddc.lab
Freshen known_hosts key for edge1-mgmt.vcf.sddc.lab
# edge1-mgmt.vcf.sddc.lab:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
Current ssh key obtained for edge1-mgmt.vcf.sddc.lab
dropping old edge1-mgmt.vcf.sddc.lab, key type ssh-rsa
Ran post, result = {}
AA rule VCF-edge_EC-01_antiAffinity_b38f42b9beb851202facc2bcc7cd6d7e still exists, updating it..
Re-added edge1-mgmt to AA rule VCF-edge_EC-01_antiAffinity_b38f42b9beb851202facc2bcc7cd6d7e
Updating known_hosts entry for edge1-mgmt.vcf.sddc.lab (fa7f3afe-b3b9-4b8e-83f3-91ce36608322)
Freshen VCF known_hosts key for edge1-mgmt.vcf.sddc.lab
Freshen known_hosts key for edge1-mgmt.vcf.sddc.lab
# edge1-mgmt.vcf.sddc.lab:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
Current ssh key obtained for edge1-mgmt.vcf.sddc.lab
dropping old edge1-mgmt.vcf.sddc.lab, key type ssh-rsa
Ran post, result = {}
posting to url: https://10.0.0.20/api/v1/transport-nodes/64749ef7-d464-48b6-9fdd-f3e8efffc9bf?action=redeploy
resp.status_code = 200
EN VM moid: start=vm-39, cur=vm-39
tnState: ndsState=NODE_READY, outerState=in_progress
EN VM moid: start=vm-39, cur=None
tnState: ndsState=VM_DEPLOYMENT_RESTARTED, outerState=pending
EN VM moid: start=vm-39, cur=None
tnState: ndsState=VM_DEPLOYMENT_IN_PROGRESS, outerState=pending
tnState: ndsState=REGISTRATION_PENDING, outerState=pending
EN VM moid: start=vm-39, cur=None
tnState: ndsState=NODE_NOT_READY, outerState=pending
EN VM moid: start=vm-39, cur=vm-6082
tnState: ndsState=NODE_READY, outerState=in_progress
EN VM moid: start=vm-39, cur=vm-6082
………………
Redeployment successful for Edge node edge2-mgmt
Waited 1317 seconds, or 22 minutes, for redeploy of Edge node edge2-mgmt
AA rule VCF-edge_EC-01_antiAffinity_b38f42b9beb851202facc2bcc7cd6d7e still exists, updating it..
Re-added edge2-mgmt to AA rule VCF-edge_EC-01_antiAffinity_b38f42b9beb851202facc2bcc7cd6d7e
Updating known_hosts entry for edge2-mgmt.vcf.sddc.lab (64749ef7-d464-48b6-9fdd-f3e8efffc9bf)
Freshen VCF known_hosts key for edge2-mgmt.vcf.sddc.lab
Temporarily enabling ssh to edge2-mgmt.vcf.sddc.lab
* posting to url https://10.0.0.20/api/v1/transport-nodes/64749ef7-d464-48b6-9fdd-f3e8efffc9bf/node/services/ssh
Get https://10.0.0.20/api/v1/transport-nodes/64749ef7-d464-48b6-9fdd-f3e8efffc9bf/node/services/ssh/status
ssh runtime_state: running
Freshen known_hosts key for edge2-mgmt.vcf.sddc.lab
# edge2-mgmt.vcf.sddc.lab:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
Current ssh key obtained for edge2-mgmt.vcf.sddc.lab
dropping old edge2-mgmt.vcf.sddc.lab, key type ssh-rsa
Ran post, result = {}
Re-disabling ssh to edge2-mgmt.vcf.sddc.lab
* posting to url https://10.0.0.20/api/v1/transport-nodes/64749ef7-d464-48b6-9fdd-f3e8efffc9bf/node/services/ssh
Get https://10.0.0.20/api/v1/transport-nodes/64749ef7-d464-48b6-9fdd-f3e8efffc9bf/node/services/ssh/status
ssh runtime_state: stopped
Resize of Edge cluster EC-01 nodes completed.
Marking Edge cluster EC-01 cache with operation-in-progress = False
Loading Edge cluster config info from /home/vcf/.vcf-edge-redeploy/EC-01.json
Total run time: 0:25:03, or 1503 seconds
Log written to /home/vcf/resizer/resizer/edge_node_resizer_2023-06-30T01:40:41.log
vcf@sddc-manager [ ~/resizer ]$

When we ran the script again, it detected that a resize operation was already in progress and picked up where it left off. This time it completed successfully and we now have a fully resized edge cluster!
Note: it’s recommended to perform this task during a maintenance window as there will be momentary traffic interruption.

Similar Posts