Infrastructure and Cloud for Enthusiasts

[blog 017]# git commit

VMware NSX, Unable to add a Host Transport Node

So in this blog I am going to be talking about the benfits of NSX sub-cluster in version 4.1.1.0.0., how they work and why they are cool for Cloud Service Providers. At least that was the intention, till one of my lab hosts decided to grenade itself in style. So what you ask, don’t be soft,just rebuild it put it back in the cluster, and that is exactly what I did till I could not prepare the rebuilt host for NSX due to the following error.

Figure 1 – Validation Error

Not a problem. Lets move the host out of the vSphere cluster, and validate that the host still exists using the API.

Once we have validated it just remove it with the API.

GET https://<NSX Manager>/api/v1/transport-nodes/

So my ESX host was not returned via the GET API call which means the problem is within in the Corfu database. Now, I have done a blog previously on removing stuck Edge Transport Nodes from the Corfu database so I have sort of been down this path before. Once again, if you find yourself in this scenario in a production environment don’t go in guns blazing without the good folk at GSS leading the way.

For me I like to get my hands dirty and break things in my labs since I am an idiot who forgets sometimes he has a family and other commitments. Right, so on to the good stuff.

So with the ESX host moved out of the vSphere cluster it is time to run some queries on the Corfu database, and export to file the following tables to search for either the host name or host uuid, and extract the relevant fields associated to the ESX host.

  • HostModelMsg
  • GenericPolicyRealizedResource
  • HostTransportNode
root@nsx:/# /opt/vmware/bin/corfu_tool_runner.py --tool corfu-editor -n nsx -o showTable -t HostModelMsg > /tmp/file1.txt
3. root@nsx:/# /opt/vmware/bin/corfu_tool_runner.py --tool corfu-editor -n nsx -o showTable -t GenericPolicyRealizedResource > /tmp/file2.txt
4. root@nsx:/# /opt/vmware/bin/corfu_tool_runner.py --tool corfu-editor -n nsx -o showTable -t HostTransportNode > /tmp/file3.txt

Once the exports of the tables have been done we need to identify that the stuck ESX host within the json exists and extract the “StringID” and the “Key” for the ESX host for each table exported.

The following is an example of what I am looking for,

"stringId": "/infra/sites/default/enforcement-points/default/host-transport-nodes/esxuat3-30b975c9-60a0-4fb7-bfef-96770ee5f240host-10021"

Key:
{
  "uuid": {
    "left": "10489905160825487935",
    "right": "11102041072723471023"
  }
}

You will notice the uuid at the end of “esxuat3″in the stringId matches the uuid from the previous screen shot.

Now that we have the required information we can shut down the NSX proton service and clean up the Corfu database.

root@nsxt:/# service proton stop; service corfu-server stop
root@nsxt:/# service corfu-server start

The first command removes the ESX host stringid key from “GenericPolicyRealizedResource” table. When the command has completed we need to ensure that “1 records deleted successfully” in the output otherwise we have not deleted anything. This is noted in bold within code snippet.

corfu_tool_runner.py --tool corfu-browser -o deleteRecord -n nsx -t GenericPolicyRealizedResource --keyToDelete '{"stringId": "/infra/realized-state/enforcement-points/default/host-transport-nodes/esxuat3-30b975c9-60a0-4fb7-bfef-96770ee5f240host-10021"}'


Deleting 1 records in table GenericPolicyRealizedResource and namespace nsx.  Stream Id dabf8af4-9eb6-3374-9a18-d273ed7132e9
Namespace: nsx
TableName: GenericPolicyRealizedResource
2023-08-22T00:11:46.361Z | INFO  |                           main |     o.c.runtime.view.SMRObject | ObjectBuilder: open Corfu stream nsx$GenericPolicyRealizedResource id dabf8af4-9eb6-3374-9a18-d273ed7132e9
2023-08-22T00:11:46.362Z | INFO  |                           main |     o.c.runtime.view.SMRObject | Added SMRObject [dabf8af4-9eb6-3374-9a18-d273ed7132e9, PersistentCorfuTable] to objectCache
0: Deleting record with Key {"stringId": "/infra/realized-state/enforcement-points/default/host-transport-nodes/esxuat3-30b975c9-60a0-4fb7-bfef-96770ee5f240host-10021"}

 1 records deleted successfully.

The next command removes the ESX host key from “HostModelMsg” table. When the command has completed we need to ensure that “1 records deleted successfully” in the output otherwise we have not deleted anything. This is noted in bold within code snippet.

corfu_tool_runner.py --tool corfu-browser -o deleteRecord -n nsx -t HostModelMsg --keyToDelete '{"uuid": {"left": "10489905160825487935", "right": "11102041072723471023"} }'


Deleting 1 records in table HostModelMsg and namespace nsx.  Stream Id d8120129-1f35-34c2-a309-e5cf6dbe5487
Namespace: nsx
TableName: HostModelMsg
2023-08-22T00:12:20.049Z | INFO  |                           main |     o.c.runtime.view.SMRObject | ObjectBuilder: open Corfu stream nsx$HostModelMsg id d8120129-1f35-34c2-a309-e5cf6dbe5487
2023-08-22T00:12:20.050Z | INFO  |                           main |     o.c.runtime.view.SMRObject | Added SMRObject [d8120129-1f35-34c2-a309-e5cf6dbe5487, PersistentCorfuTable] to objectCache
0: Deleting record with Key {"uuid": {"left": "10489905160825487935", "right": "11102041072723471023"} }

 1 records deleted successfully.

The final command removes the ESX host stringid key from “HostTransportNode” table. When the command has completed we need to ensure that “1 records deleted successfully” in the output otherwise we have not deleted anything. This is noted in bold within code snippet.

corfu_tool_runner.py --tool corfu-browser -o deleteRecord -n nsx -t HostTransportNode --keyToDelete '{"stringId": "/infra/sites/default/enforcement-points/default/host-transport-nodes/esxuat3-30b975c9-60a0-4fb7-bfef-96770ee5f240host-10021" }'


Deleting 1 records in table HostTransportNode and namespace nsx.  Stream Id a720622f-68c3-3359-9114-12231645d94e
Namespace: nsx
TableName: HostTransportNode
2023-08-22T00:12:49.897Z | INFO  |                           main |     o.c.runtime.view.SMRObject | ObjectBuilder: open Corfu stream nsx$HostTransportNode id a720622f-68c3-3359-9114-12231645d94e
2023-08-22T00:12:49.898Z | INFO  |                           main |     o.c.runtime.view.SMRObject | Added SMRObject [a720622f-68c3-3359-9114-12231645d94e, PersistentCorfuTable] to objectCache
0: Deleting record with Key {"stringId": "/infra/sites/default/enforcement-points/default/host-transport-nodes/esxuat3-30b975c9-60a0-4fb7-bfef-96770ee5f240host-10021" }

 1 records deleted successfully.

Now that the corfu database entries have been removed we can now restart the proton and corfu database services and check the status to ensure they have started correctly.

root@nsxt:/# service proton restart; service corfu-server restart
root@nsxt:/#  service proton status; service corfu-server status

Once all the services have restarted, we can move the ESX host back into the vSphere cluster and allow NSX to prepare the host.

All things going well the host preparation should be successful and back into action for NSX goodness.

Figure 2 – Host Preparation.
Figure 3 – Completed NSX Installation.

Well that’s a wrap for this blog after getting a little side tracked down this rabbit hole. I hope this will help if you get stuck in your labs or at least understand the path GSS would go down in a production situation. Stay tuned for the next one (touch wood) on NSX sub-clusters, and the benefits for Cloud Service Providers. As always keep on NSXing !

Cheers

Tony Williamson

[blog 016]# git commit

VMware Cloud Provider Lifecycle Manager

In my professional career I spend quite a bit of time designing cloud solutions and products. I am always looking at ways to improve the deployment and day 2 operations of products to make operational teams more efficient, provide product consistency and remove the “human factor” which can lead to undesired results in deployments.

VMware Cloud Director https://www.vmware.com/au/products/cloud-director.html is part of my bread and butter so I was looking how I could follow my cloud solution mantra with VMware Cloud Provider Lifecyle Manager (VCPLCM) https://docs.vmware.com/en/VMware-Cloud-Provider-Lifecycle-Manager/index.html .

So what is VCPCLM and how does it provide deployment consistency and day 2 operations ? I am glad you asked,

VCPLCM allows for the creation of Cloud Director environments and the lifecycle of the platform for day 2 operations including software patching and certificates, Existing Cloud Director environments can also be imported to take over what would have normally been manual operations. VCPLCM has three components which are Environments, Datacenters and Tasks.

Lets start with Datacenters. While not mandatory to deploy a VMware Cloud Provider environment, VCPLCM provides integrations into the deployment of the platform with integrations which include vCenter, NSX, NSX ALB, and vROPs. The integrations allow VCPLCM to check the interoperability when Cloud Director is lifecycled and will flag an issue if identified.

The integrations are deployed as you would normally and then registered within VCPLCM. Below is an example of imported integrations into VCPLCM.

Registered Datacenters

Moving onto Environments. This is deployment of VMware Cloud Director, VMware Chargeback, vCloud Usage Meter and RabbitMQ. These are all add-on applications for Cloud Provider platforms however not a requirement as a provider not provide the application extensions or choose another method of lifecycle.

Environments which can deployed.

The application environments can be added into a VMware Cloud Director deployment and also have interoperability checked and flagged if not compliant. The difference with these application environments is they can be life-cycled with later software versions, scale the application (for example adding more VCD Cells) or have certificates updated by VCPLCM to ensure interoperability before updating a Cloud Director environment.

Blow is an example of what actions are available to a deployed Cloud Director environment.

Available actions for a VCD environment.

Below is an example of updating RabbitMQ.

Update or redeploy.

There is one caveat to all this deployment and lifecycle goodness, and that all the OVA’s used for the deployment and updating of applications need to be stored on VCPLCM which can add up to quite a bit of capacity. The location path on the appliance for the OVA’s is as fpllows.

vcplcm@vcplm [ /cplcmrepo ]$ pwd
/cplcmrepo

vcplcm@vcplm [ /cplcmrepo ]$ ls -lr
total 4
drwxrwxrwx 16 root root 189 May 7 13:10 vropsta
drwxrwxrwx 28 root root 4096 May 7 13:10 vcd
drwxrwxrwx 6 root root 60 May 7 13:10 usage
drwxrwxrwx 4 root root 33 May 7 13:10 rmq

So depending on your cost of storage you may want to move the path off to NFS storage.

For example 192.168.1.125:/nfs/vcplm/cplcmrepo 47G 7.1G 40G 16% /cplcmrepo

Well that’s the end of this blog so I hope I have enlightened you to go take a look and see if it suitable for your existing or net new deployments. The best part is if you deploy an Environment or Datacenter from VCPLCM is does not delete your production systems, it just deletes it from the VCPLCM database.

I am hoping do more around Cloud Director as it a great platform multi-tenancy so stay tuned for more blogs on the subject.

[blog 015]# git commit

VMware Cloud Director Tenancy Load Balancing with NSX Advanced Load Balancer

I have spent quite a bit of time recently implementing Cloud Director Tenancy Load Balancing with NSX Advanced Load Balancer and also talking to quite a few people about it. The latest was at the Sydney VMUG Usercon as part of my of “Real World Service Provider Networking Load Balancing” Presentation which I will upload in the next blog. So after all the presenting and talking I thought I should do a blog on the implementation and behind the scenes.

Lets start with what has changed between NSX for vSphere, early implementations of NSX-T and where we are at now with load balancing.

So in NSX for vSphere load balancing was carried out on Edge Services Gateways and had simple functionality around virtual server protocols, port ranges and the backend pools for connectivity. There were also simple service monitoring services for TCP, HTTP and HTTPS to name a few.

Customers with a tenancy inside VMware Cloud Director could create simple load balancing services on demand based on their available IP resourcing assigned.

When NSX-T 2.4 came out it had similar functionality however was assigned to a T1 gateway and an Edge Cluster of least of medium size. While this could be done in NSX-T there was not supported functionality within Cloud Director.

Enter NSX Advance Load Balancer with NSX 3.0 and integration with Cloud Director. Now generally I am a big fan of NSX Advanced Load Balancer and with the integration into Cloud Director is brings new functionality and implementation such as “HTTP Cookie Persistence”, “Transparency Mode” Which allows the preservation of client IP’s and shared / dedicated service engines for customers.

The following diagram show the construct of how load balancing services are provided. An NSX-T Cloud Service Engine Group is assigned to a tenant. I prefer to not share Service Engines between customers as “sharing” can make them nervous even though in reality they would separated via there own routing domain VRF on the Service Engine.

It also allows for a simpler billing for customers as they can consume as many virtual services as they require depending on how many available IPs they have.

Cloud Director API creates a transit overlay network between the T1 and the Service engine and a static route is applied on the T1 for the Virtual Service that is hosted on the Service Engine.

Route advertisement is updated on the T1 via API from vCD to NSX to enable LB VIP Routes and Static Routes to allow advertisement to the T0 and into a customers VRF or their public facing network and VM workloads on NSX overlay networks can access the VIP service.

The management plane of the Service is connected via a “Cloud Management” network which is your typical T1 / T0 design.

Logical Design for NSX Advanced Load Balancer.


I have created the following video that shows the creation of Load Balanced Services and what is required / takes place from a NSX and NSX ALB perspective.

Deployment of NSX Advanced Load Balancer with Cloud Director.

Having an understanding of what happens behind the scenes in my mind is the most import aspect to any design and implementation, as it will help with trouble shooting deployments and existing environments when things don’t go as planned and I like to know the mystery behind the magic.

See you all in the next #git commit.

[blog 014]# git commit

NSX Edge Transport Nodes With Failed Deletion

In my lab I am constantly adding and deleting virtual infrastructure depending on what projects I am working on, or testing for customers, or it could be just the fact my mind works like a squirrel collecting nuts while listening to Punk.

One thing I have come across is when the NSX Manager fails to delete an Edge Transport Node and gets itself into a balked state when the Edge Node has been deleted from the virtual infrastructure, however it is in a “Deletion in Progress” state within the NSX Manager. Even though this is a lab and it does not effect anything, I cannot stand having errors ( kind of like my obsession with certificates ).

Balked Deletion in Progress

Now this issue is not new, and the process is to either delete Edge Nodes via API (if they still exist via API) or delete the entries from the Corfu Database however the process for the DB has changed from 3.2 on-wards which this blog will cover, and for transparency this version of NSX is 4.0.0.1.0 . For an in depth method prior to 3.2 you can check Shank Mohan’s article here. https://www.lab2prod.com.au/2021/11/nsx-t-edge-deletion-failed.html

Before continuing make sure you have a backup of NSX in case things don’t go as planned and we all do backups anyway don’t we ….. don’t we !!, and it is best to have a GSS case logged with VMware before proceeding as this blog provides zero warranty.

The following process needs to be carried out as”root” on each of the NSX Managers in the environment.

From the root login we are going to run the internal Corfu Database Tool to run queries , updates and deletion to remove the stale entries.

First of all we are going to look for any Edge Nodes that are marked for deletion, so in the json payload that is return we need the “stringId” of the Edge Node.

root@nsxtuat:/opt/vmware/bin# /opt/vmware/bin/corfu_tool_runner.py -o showTable -n nsx -t ReplacementInfo

Key:
{
  "stringId": "/infra/sites/default/enforcement-points/default/edge-transport-node/787bc347-d015-43e8-8399-115e45c27f1d"
}

Payload:
{
  "abstractPolicyResource": {
    "managedResource": {
      "displayName": "transport-edge-05",
      "tagsArray": {
      }
    },
    "markedForDelete": true,
    "deleteWithParent": false,
    "locked": false,
    "isOnboarded": false,
    "internalKey": {
      "left": "8681747419887911912",
      "right": "9482629587000262429"
    },

Next we need to stop the Proton Service and Corfu Database, then just start the Corfu database so we can modify the tables. As a habit I always check to make the Corfu Database has started.

root@nsxtuat:/opt/vmware/bin# service proton stop; service corfu-server stop

root@nsxtuat:/opt/vmware/bin# service corfu-server start

root@nsxtuat:/opt/vmware/bin# service corfu-server status
* corfu-server.service - Corfu Infrastructure Server
   Loaded: loaded (/etc/init.d/corfu-server; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2022-08-30 03:26:48 UTC; 3s ago
     Docs: https://github.com/corfudb/corfudb
  Process: 2522 ExecStopPost=/etc/init.d/corfu-server poststop (code=exited, status=0/SUCCESS)
  Process: 2372 ExecStop=/etc/init.d/corfu-server stop (code=exited, status=0/SUCCESS)
  Process: 2838 ExecStart=/etc/init.d/corfu-server start (code=exited, status=0/SUCCESS)
  Process: 2807 ExecStartPre=/etc/init.d/corfu-server prestart (code=exited, status=0/SUCCESS)
    Tasks: 63 (limit: 4915)
   CGroup: /system.slice/corfu-server.service

The next step is to backup all the relevant tables in the database in case we need to restore them so I save them in the tmp directory as I don’t intent to keep them after the NSX Manager reboots down the track.

root@nsxtuat:/# cd /tmp
root@nsxtuat:/tmp# /opt/vmware/bin/corfu_tool_runner.py -o showTable -n nsx -t ReplacementInfo > ReplacementInfo.txt
root@nsxtuat:/tmp# /opt/vmware/bin/corfu_tool_runner.py -o showTable -n nsx -t EdgeNodeExternalConfig > EdgeNodeExternalConfig.txt
root@nsxtuat:/tmp# /opt/vmware/bin/corfu_tool_runner.py -o showTable -n nsx -t EdgeNodeInstallInfo > EdgeNodeInstallInfo.txt
root@nsxtuat:/tmp# /opt/vmware/bin/corfu_tool_runner.py -o showTable -n nsx -t EdgeNodeConfigInfo > EdgeNodeConfigInfo.txt
root@nsxtuat:/tmp# /opt/vmware/bin/corfu_tool_runner.py -o showTable -n nsx -t GenericPolicyRealizedResource > GenericPolicyRealizedResource.txt
root@nsxtuat:/tmp# /opt/vmware/bin/corfu_tool_runner.py -o showTable -n nsx -t EdgeTransportNode > EdgeTransportNode.txt
root@nsxtuat:/tmp# /opt/vmware/bin/corfu_tool_runner.py -o showTable -n nsx -t DeletedVm > DeletedVm.txt


root@nsxtuat:/tmp# ls -lhra
-rw-r--r--  1 root           root           2.3M Aug 30 23:48 GenericPolicyRealizedResource.txt
-rw-r--r--  1 root           root            34K Aug 30 23:49 EdgeTransportNode.txt
-rw-r--r--  1 root           root           7.5K Aug 30 23:47 EdgeNodeInstallInfo.txt
-rw-r--r--  1 root           root            14K Aug 30 23:47 EdgeNodeExternalConfig.txt
-rw-r--r--  1 root           root           7.6K Aug 30 23:47 EdgeNodeConfigInfo.txt
-rw-r--r--  1 root           root           4.4K Aug 30 23:49 DeletedVm.txt

The next step is to take the “stringId” we captured earlier, and in this case it is “787bc347-d015-43e8-8399-115e45c27f1d” and delete the associated stringId keys from each of the database tables.

If you get a response that includes “not found in nsx<table_name> it is not the end of the world, it just means that NSX has already cleaned up the key in that particular table already.

The next step is clean up any stale records in the “Client RPC Messaging Table” so we need to search for our saved “stringID” again. The “stringID” will help us identify the “left” and “right” uuid’s which will be required to remove the stale records. These are highlighted in bold below and is just a snippet of the output.

root@nsxtuat:/tmp#/opt/vmware/bin/corfu_tool_runner.py -o showTable -n nsx -t Client


Key:
{
  "uuid": {
    "left": "8681747419887911912",
    "right": "9482629587000262429"
  }
}

Payload:
{
  "clientType": "cvn-edge",
  "clientToken": "787bc347-d015-43e8-8399-115e45c27f1d",
  "masterClusterNode": {
    "left": "8679300982090774583",
    "right": "16873472161445019477"
  },

With the “left” and “right” uuids obtained we can now delete the stale keys out of the Client, EdgeMsgClientInfo, and EdgeSystemInfo tables. Note the uuids in bold below.

root@nsxtuat:/tmp#/opt/vmware/bin/corfu_tool_runner.py -o deleteRecord -n nsx -t Client --keyToDelete '{"uuid":{"left":8681747419887911912,"right":9482629587000262429}}'

Namespace: nsx
TableName: Client
2022-08-31T05:09:57.553Z | INFO  |                           main |     o.c.runtime.view.SMRObject | ObjectBuilder: open Corfu stream nsx$Client id 55943778-4eff-34a9-bdd0-6a3bd274dc58
Deleting record with Key {"uuid":{"left":8681747419887911912,"right":9482629587000262429}} in table Client and namespace nsx.  Stream Id 55943778-4eff-34a9-bdd0-6a3bd274dc58


root@nsxtuat:/tmp#/opt/vmware/bin/corfu_tool_runner.py -o deleteRecord -n nsx -t EdgeMsgClientInfo --keyToDelete '{"uuid":{"left":8681747419887911912,"right": 9482629587000262429}}'

Namespace: nsx
TableName: EdgeMsgClientInfo
2022-08-31T05:12:00.531Z | INFO  |                           main |     o.c.runtime.view.SMRObject | ObjectBuilder: open Corfu stream nsx$EdgeMsgClientInfo id 954ff3fb-d058-32de-a41b-452ad521950e
Deleting record with Key {"uuid":{"left":8681747419887911912,"right": 9482629587000262429}} in table EdgeMsgClientInfo and namespace nsx.  Stream Id 954ff3fb-d058-32de-a41b-452ad521950e


root@nsxtuat:/tmp#/opt/vmware/bin/corfu_tool_runner.py -o deleteRecord -n nsx -t EdgeSystemInfo --keyToDelete '{"uuid":{"left":8681747419887911912,"right": 9482629587000262429}}'

Namespace: nsx
TableName: EdgeSystemInfo
2022-08-31T05:12:16.629Z | INFO  |                           main |     o.c.runtime.view.SMRObject | ObjectBuilder: open Corfu stream nsx$EdgeSystemInfo id 31c0178f-fedd-3ddf-9b06-6ffc8307ffcf
Deleting record with Key {"uuid":{"left":8681747419887911912,"right": 9482629587000262429}} in table EdgeSystemInfo and namespace nsx.  Stream Id 31c0178f-fedd-3ddf-9b06-6ffc8307ffcf

Now that we have cleanup all the relevant tables, to validate the Edge Node has been removed when can view the EdgeTransportNode table to show only valid Edge Nodes. I won’t show the output as it is quite a lot of json, however you can just search for the name of your Edge Nodes to confirm.

root@nsxtuat:/tmp#/opt/vmware/bin/corfu_tool_runner.py -o showTable -n nsx -t EdgeTransportNode

Now that everything is clean, restart the proton service and log into the NSX manager and you will see that the Edge Node has been deleted. Note that this process has to be done on all NSX Manager nodes in the cluster.

Finally we can ssh back into the NSX Manager as admin and run “start search resync manager” to sync up all the Edge Nodes.

As you can see below “transport-edge-05” has now been removed.

Edge Node Deleted

So all in all this is quite a complex process and took me quite a while to work through so I hope you find the process useful, however as I iterated earlier in the blog, if it is production this should only be attempted with the assistance of GSS and backups are mandatory.

Keep on NSXing peeps !

[blog 013]# git commit

NSX VLAN DHCP

In this blog I am going to start referring NSX-T as NSX since that is now the official name.

https://blogs.vmware.com/partnernews/2022/04/nsx-data-center-name-change.html

So it is quite common to use DHCP when deploying workloads to provide IP addressing and in NSX land is provided by a DHCP server that runs on an Edge Transport Node.

DHCP for overlay segments is quite straight forward so I will quickly cover the requirements, however the main crux of the article are little gotcha that will prevent DHCP from being advertised on the network, and thus prevent workloads acquiring a network address. By default for VLAN networks NSX security will block and broadcasting of DHCP services.

So as normal we will have our DHCP server configured and assigned to an Edge Cluster

NSX DHCP Server

Our basic DHCP configuration is created on our NSX VLAN Segment

VLAN Segment DHCP Configuration

Now at this stage you expect DHCP to be available, however this is not the case for VLAN based DHCP as it will get blocked based on the default Segment Security and needs to be updated.

Segment Security

If we take a look at the default Segment Security Profile we can see that DHCP Server Block and Server Block – IPv6 is enabled.

Default Segment Security

We need to create a new Segment Security Profile with these two options disabled and then apply it to the VLAN segment

New Segment Security
Applied Segment Security

So to validate that DHCP is working as expect obviously the easiest thing to check is whether or our test VM actually gets a DHCP lease, however we can also see this process on the Edge Transport Node by doing a packet capture on the fast path interfaces.

In the following example I am doing a packet capture on fp-eth0 on the primary Edge Transport Node and we can see the lease request.

transport-edge-06> set capture session 0 interface fp-eth0 direction dual
transport-edge-06> set capture session 0 expression vlan 600

08:50:48.332308 00:50:56:a3:62:08 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 346: vlan 600, p 0, ethertype IPv4, 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 00:50:56:a3:62:08, length 300

08:50:48.332308 00:50:56:a3:62:08 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 346: vlan 600, p 0, ethertype IPv4, 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 00:50:56:a3:62:08, length 300

08:50:48.362746 00:50:56:a3:62:08 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 64: vlan 600, p 0, ethertype ARP, Request who-has 192.16.60.1 tell 192.16.60.2, length 46
DHCP Recieved

If we look at the Edge Transport Node Cluster we can also see the lease applied to the test VM.

Edge Transport DHCP Lease

While DHCP is not the most fascinating of topics and in my experience, for VLANs it is done external to NSX via TOR SVIs, Windows / Linux servers, Core Routing etc, however I think it is about time that we start thinking of where the network broadcast is coming from to provide DHCP, and like all good things in NSX we can keep it local to the infrastructure especially if your running HPE / Cisco / <enter vendor here> chassis that have fabric networking. An excellent example of the use for DHCP is in the use of TKG, where DHCP is required for Worker Node IP assignment. So ditch your old deprecated VLAN DHCP and move to NSX !.