Infrastructure and Cloud for Enthusiasts

[blog 012]# git commit

Migrate from NSX-V to NSX-T

Well it has been a while since my last blog but not without good reason. Between starting a new position, studying, being a Husband and a Dad, Xmas and the whole Rona lock down thing which made me buy toys on Ebay (stay tuned), things got away from me a bit, but here we are in 2022 so time to get cracking again.

So what to we want !!. NSX-T !!. When do we want it !!. Now ! … and you have no choice since NSX for vSphere went of general support as of the 16th January 2022, and to purchase maintenance is not a cheap path to go down and does not include any feature enhancements and allows you to keep the lights on.

For many organizations the thought of migrating to NSX-V to T is quite a daunting task, and guess what it is when you have multiple NSX environments, multiple clusters, hundreds of VXLAN networks, Universal Objects, Edges, DLRs, Thousands of firewall rules, Security Groups, IPSets, the list goes on and on.

So if you have your big boy brown pants on take a deep breath as VMware has got your back. There are multiple methods of migration depending on your appetite, from “In place migration” using the NSX for vSphere option to “Lift and shift” using the Distributed Firewall option in NSX-T Migration Coordinator.

Figure 1 – Migration Coordinator

Each migration method has its pros and cons.

For Example –

NSX for vSphere in place migration has very strict rules around your current NSX-V environment as there is only 5x supported Networking Topologies that can be migrated. If you do not fit into the 5x topologies or cannot remediate your infrastructure you are straight into the land of “Lift and Shift” using the Distributed Firewall migration method.

Refer to the follow VMware link on what is supported.

https://docs.vmware.com/en/VMware-NSX-T-Data-Center/3.2/migration/GUID-FA5EFF22-B7CA-4FF8-A0D3-C0CB013F10F1.html

Features that you currently use in your NSX-V environment may not be supported for migration, for example Security Groups with more than 5x Dynamic Members Sets.

To assist in trying to identify potential migration issues you can use the Migration Coordinator to identify the dependencies so you can either remediate or change your migration strategy.

Figure 2 – Example Security Groups That Cannot Migrate

Another considerations for the method of migration is can you accept downtime during the migration, do you wish to do a staged controlled migration over weeks and months, do you have the skill-set in house to rebuild rule-sets and overlay networks via API.

Each Migration Coordination migration mode except for the “NSX for vSphere” migration allows you to choose how you wish to migrate. You may wish to stand up new infrastructure, migrate all the rules and configuration, have complete role back options, however manually move virtual machines in multiple change windows. The point is that Migration Coordinator will assist you in your migration journey.

A good starting point with the journey of migrating from NSX-V to NSX-T is the Migration Coordinator documentation from VMware Networking and Security Tech Zone as it covers all the migration approaches in detail.

https://nsx.techzone.vmware.com/resource/nsx-v-nsx-t-3x-migration-coordinator#_Toc52349355

This blog is only a 100,000 foot view but is intended to point you in the right direction to start your migration journey, not to show you how to do it as each environment is different.

Good luck on with your migration, and note that even if you don’t have NSX-V you can migrate your VMware Distributed Switch Networking into NSX-T and start leveraging all the security and load balancing functionality that the platform provides.

[blog 011]# git commit

Umm NSX-T why do I have 694 ports on a segment ?

So after doing my VMUG UserCon presentation on Rancher I went to clean up the lab and noticed that the NSX-T segment I was using for the control and worker nodes had 694 ports assigned. Damn that’s a lot of ports !

The reason I had so many ports was from months of testing and tinkering while using an external Linux distro DHCP server, and NSX-T thought the ports still existed. 694 ports seems pretty excessive, however when you have a failed deployment and walk away for the night, Rancher attempts to redeploy the nodes.

Figure 1. Ports Connected

Now I am somewhat of a lazy person and try to do most things with code and APIs using Python (note I do say try), and no sane person would go an manually delete 694 segment ports.

So lets dive into the code to clean all this up, and feel free to use it at your own peril !.

First up lets get a list of all the segments in this environment so I can get the segment ids.

The body of the code does an API GET request to the NSX-T manager and returns the logical switches and their switch id.

# -*- coding: utf-8 -*-
"""
Spyder Editor
Author: Tony Williamson
"""

import requests
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

logical_switches=requests.get('https://<enter nsxt url>/api/v1/logical-switches',auth=('admin', 'thisisnotthepasswordyouarelookingfor'),verify=False)
response_code=(logical_switches.status_code)
if response_code != int(200): # Checking response code must be 200 to continue
   print('\n Error, repsonse code {} \n '.format(response_code))
   quit
   
logical_switches=(logical_switches.json())

logical_switches=(logical_switches['results'])


for switch_info in logical_switches:
    switch_name=(switch_info['display_name'])
    switch_id=(switch_info['id'])
    print(switch_name,",", switch_id)

The output from the Python.


172.16.80.0/24 , 42149648-8683-4464-8227-154b4daecc66
kubes-172.16.80.0-24 , e7194142-224b-4f66-8d11-23203151e72a
primus-alb2-vrf-vlan-500-172.50.0.0/24 , ea7b413b-7b76-40ba-8cbc-688c39ba59f1
primus-alb2-vrf-vlan-501-172.50.10.0/24 , 091adeb1-a982-4852-8057-5afee347b114
uat-204-192.168.204.0/24 , 434dcd33-9311-4a49-bc96-2b587d9aa25a
uat-205-192.168.205.0/24 , e5617b10-5fc4-4bad-a1ed-9076fe72fa54
uat1-100-192.168.0.0/23 , 080b659a-5c64-49e5-ba87-53876731a653

So what I want out of the results is the actual switch ids and in this case the id is ‘42149648-8683-4464-8227-154b4daecc66’ .

Now for destructive code ! . Note that I had moved any real ports that were connected to workloads to another segment for the time being.

I have connected back via API to the NSX-T manager and started to carry out a ‘for loop’. For every port that is associated to segment id ‘42149648-8683-4464-8227-154b4daecc66’ pew pew it forcefully and without remorse.

# -*- coding: utf-8 -*-
"""
Spyder Editor
Author: Tony Williamson
"""

import requests
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

logical_switches=requests.get('https://<enter nsxt url>/api/v1/logical-ports/',auth=('admin', 
'thisisnotthepasswordyouarelookingfor'),verify=False)

response_code=(logical_switches.status_code)
if response_code != int(200): # Checking response code must be 200 to continue
   print('\n Error, repsonse code {} \n '.format(response_code))
   quit


logical_switches=(logical_switches.json())
logical_switches=(logical_switches['results'])


for logical_id in logical_switches:
    logical_port_id=(logical_id['id'])
    logical_switch_id=(logical_id['logical_switch_id'])
    if logical_switch_id==('42149648-8683-4464-8227-154b4daecc66'):
        logical_port_url=('https://<enter nsxt url>/api/v1/logical-ports/{}?detach=true'.format(logical_port_id))
        print(logical_port_url)
        requests.delete(logical_port_url,auth=('admin',
'thisisnotthepasswordyouarelookingfor'),verify=False)

The process took a couple of minutes with output of the port ids that were getting deleted.

Figure 2. Output of Port Deletion.

Once that process had completed I ran more code to confirm that all the ports had been deleted which it had as no ports were returned, and I also double checked using Postman. It also took about 15 minutes for the NSX-T Manager to catch up and reflect the changes.

# -*- coding: utf-8 -*-
"""
Spyder Editor
Author: Tony Williamson
"""

import requests
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

logical_switches=requests.get('https://<enter nsxt url>/api/v1/logical-ports/',auth=('admin',
'thisisnotthepasswordyouarelookingfor'),verify=False)

response_code=(logical_switches.status_code)
if response_code != int(200): # Checking response code must be 200 to continue
   print('\n Error, repsonse code {} \n '.format(response_code))
   quit


logical_switches=(logical_switches.json())
logical_switches=(logical_switches['results'])


for logical_id in logical_switches:
    logical_port_id=(logical_id['id'])
    logical_switch_id=(logical_id['logical_switch_id'])
    if logical_switch_id==('42149648-8683-4464-8227-154b4daecc66'):
        print(logical_port_id)
Figure 3. Ports Cleared.

So now NSX-T is back looking all “sexy nice” ( say it with a Borat voice ) !

I hope that this is useful to somebody in the future and don’t be afraid to dip your toe into code, APIs and automation as it is now the new norm. NSX-T comes with its internal API reference guide so get in there and tinker !.

[blog 010]# git commit

OpenSSL and NSX-T Certificates

When it comes to rolling new applications and infrastructure at either at work or in my lab I am one of those obsessed people that do not like using self-signed certificates, so I either use a certificate provider like RapidSSL or leverage an internal CA for handing out certificates.

In my lab I use OpenSSL as my internal CA which I sign all my certificates, mainly because it is open source and I don’t have to worry about having a Microsoft environment to host the certificate services.

I came across an interesting issue when creating certificates for NSX-T where the certificate I was generating was missing an extension even though I was following VMware’s documentation.

https://docs.vmware.com/en/VMware-NSX-T-Data-Center/3.1/administration/GUID-50C36862-A29D-48FA-8CE7-697E64E10E37.html

I ensured that when I generated the certificate that “basicConstraints = cA:FALSE” was included as an extension in the certificate I was generating.

I verified that the required extension was in the certificate by running


openssl x509 -in somensxt.pem -text -noout 

As you can see the required extension exists.

                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Basic Constraints:
                CA:FALSE

Every time I went to validate the imported certificate via API I would get “Import fails with missing extension”

So after a lot of head scratching I looked at how VMware recommends to setup a Certificate Template for a Microsoft Certificate Authority in VMware Validated Design 6.2.

https://docs.vmware.com/en/VMware-Validated-Design/6.2/sddc-deployment-of-the-management-domain-in-the-first-region/GUID-8C4CA6F7-CEE8-45C9-83B4-09DD3EC5FFB0.html

I generated and tested a certificate successfully so inspect the certificate and found some extra extensions that were required .

What was missing in my OpenSSL certificates were X509v3 Extended Key Usage which are not part of general certificate generation as below.

        X509v3 extensions:
            X509v3 Basic Constraints:
                CA:FALSE
            X509v3 Extended Key Usage:
                TLS Web Server Authentication, TLS Web Client Authentication, Code Signing, E-mail Protection

To enable the extension I modified my extension.cnf file which is used during the certificate generation.

[someguy@ca somensxt]# cat extension.cnf
basicConstraints = CA:FALSE
extendedKeyUsage = serverAuth, clientAuth, codeSigning, emailProtection

So for an example on the certificate generation once I had already generated a key and csr using OpenSSL

someguy@ca somensxt]# openssl x509 -req -in somensxt.csr -CA /etc/pki/CA/certs/someawesomeCA.crt -CAkey /etc/pki/CA/private/someawesomeCA.key -CAcreateserial -out somensxt.pem -days 365 -sha256 -extfile extension.cnf

Now when I validate the certificate and import it the API is much happier and I have suppressed my OCD for another day.

GET https://somensxt/api/v1/trust-management/certificates/2f1966f4-9419-40e7-a6bb-3c9d54e27394?action=validate

{
    "status": "OK"
}

That is all for this blog however, as a note my OpenSSL CA does not have a Certificate Revocation List so if you are going to use basic OpenSSL you will have to disable the check in NSX-T by posting an update to the Security Global Config via API.

Below you can see where crl_checking_enabled”: true is changed from true to false.

GET https://somensxt.somedomain.org/api/v1/global-configs/SecurityGlobalConfig

{
    "crl_checking_enabled": true,
    "ca_signed_only": false,
    "eku_checking_enabled": true,
    "resource_type": "SecurityGlobalConfig",
    "id": "b6355bde-adef-4739-a060-0061f2cd86e7",
    "display_name": "b6355bde-adef-4739-a060-0061f2cd86e7",
    "_create_user": "system",
    "_create_time": 1627964941857,
    "_last_modified_user": "system",
    "_last_modified_time": 1627982835257,
    "_system_owned": false,
    "_protection": "NOT_PROTECTED",
    "_revision": 5
}

POST https://somensxt.somedomain.org/api/v1/global-configs/SecurityGlobalConfig

{
    "crl_checking_enabled": false,
    "ca_signed_only": false,
    "eku_checking_enabled": true,
    "resource_type": "SecurityGlobalConfig",
    "id": "b6355bde-adef-4739-a060-0061f2cd86e7",
    "display_name": "b6355bde-adef-4739-a060-0061f2cd86e7",
    "_create_user": "system",
    "_create_time": 1627964941857,
    "_last_modified_user": "system",
    "_last_modified_time": 1627982835257,
    "_system_owned": false,
    "_protection": "NOT_PROTECTED",
    "_revision": 5
}

[blog 009]# git commit

Rancher Kubernetes on vSphere with Bitnami

For this blog I thought I would do something different and present the video demo I did at VMUG UserCon 2021.

The presentation covers installing Rancher Labs, setting up worker and control plane nodes, and then deploying a Kubernetes application with Bitnami Helm Charts.

Learning Kubernetes on vSphere

[blog 008]# git commit

Runecast Predictive Analytics

With many MSPs branching out into multi-cloud solutions to provide a plethora of customer services it is important to be able monitor your infrastructure to maintain uptime and availability for your customers. This challenge becomes exponentially more difficult when you have workloads and infrastructure across services such as AWS, Azure, on-premises vSphere stacks across multiple  supported versions and validated vendors, and Kubernetes,  to name a few.

This challenge though goes beyond just the normal SLAs of uptime and availability. MSPs must ensure all their platforms and services are built to best practices, compliant with CVEs, and comply with security standards used in Australia.

A break down of typical Australian security standards are:

  1. Essential Eight – a Government Cyber Security mitigation strategy[1].
  2. HIPAA – Health Information Privacy[2].
  3. ISO/IEC 27001 – a specification for information security management systems (ISMS)[3].
  4. PCI DSS – security policies for financial institutions and payment processing solutions[4].

So, to be able to monitor, review, remediate, and report on all these requirements is going to be a challenge both in time and human cost.

I have been fortunate to be able to evaluate a product called Runecast Analyzer[5] in my lab. This allows proactive audits across all your environments to provide visibility on Vendor KBs, Best Practices, Vulnerabilities, Security Compliance and Hardware Compatibility.

Even though I am running this in a lab I do try to stick to best practices as much as possible with the limited infrastructure I have. I was absolutely blown away (and a little shocked) at what was analyzed.

For the testing I was analyzing vCenter vSphere version 7.0.2.00100, NSX-T 3.1.1.0.0.1748.185, VMware Cloud Director 10.2.2.17855680 and Rancher Kubernetes 1.19.10. Frankly, it appears all is not well in my lab.

Main Dashboard Compliance

Main Dashboard Configurations

Inventory View

So, let’s break down what we are seeing here in slightly more detail, starting with Config KBs discovered.

Config KBs Discovered

Each KB is broken down classed on severity, with the ability to expand the severity to provide more detail such as the impacted infrastructure, a detailed description of the severity, and a reference link to the VMware KB to resolve the issue. It is important to note that while the detail of the analysis is impressive, application of the KBs to infrastructure is depended on your platform. An example is VMware VCF has stringent requirements around its deployment and applying KBs without consulting the vendor is not recommended and generally would overwritten by SDDC drift packages anyway.

 Let us move onto best practices.

Best Practices

Best Practices are ordered by Severity and the component which has been analysed, and in this example, there is recommendations on vSphere, Kubernetes, VCD and NSX-T. Expanding each of the Severities provides detailed information on the best practice and a URL link to the appropriate knowledge base article depending on the product. In Best Practices you will also note that Security, Availability, Manageability, and Recoverability are all analysed on a per product basis.

Now for Vulnerabilities … and I am looking a lot better-ish with some green Pass Results! (I know that “better-ish” is not a word, but it is my word).

Vulnerabilities

This is a very similar layout to KBs where you can see the related Severity, Issue ID and what product it applies to . Noted is the relevant CVE and advisory range which is important when MSP SLAs are involved. Personally I like this component as I usually rely on Qulays updates for this type of information and in this situation I don’t have to troll through infrastructure that may not be not applicable to my environment, or since I am a middle aged gentleman I just don’t see it in the particular report due to Stigmatism of the eyeball.

Third Floor: Men’s Apparel and Security Compliance.

Security Compliance

I will not go through all the sections in Security Compliance in each of the sections as the analysed report is the same layout and to be honest nobody wants to see around 100 Security Compliance failures against Essential Eight, HIPAA ISO etc as SSH is enabled on my infrastructure.  I can feel the judgement already. An important thing to note is that with PCI DSS Security Compliance virtual machines are also getting analysed.

For transparency, the Security Compliance that I have enabled in this lab is not the complete set, only what I deem in my mind as applicable for Australian workloads. I could have included NIST as it covers US[6] and Australia[7] however the specifics are beyond the scope of this article.

Other Security Compliance standards available include DISA STIG[8], BSI IT-Grundschutz[9] and GDPR[10].

Overall, I am quite impressed with Runecast’s ability to completely analyse just not on-premises VMware and Kubernetes environments, but also tenancies in AWS, Azure and Horizon as well, while making many Architects / Engineers cry at what they thought were secure compliant platforms.

Once the crying is over these analytics can also provide a baseline for where MSPs can leverage automation for the deployment of infrastructure consistency that meets Hardware Compatibility, Best Practices for Infrastructure, and Security Compliance across multiple platforms. Unfortunately, vulnerabilities are a constantly moving goal post, however with Runecast you can run schedule daily analytic reporting of your multi-cloud world allowing you to be on the front foot and proactive with your customers.

From an MSP Operational perspective, to be able to stay on top across multiple platforms is not an easy feat and when you throw multi-cloud and a diverse customer base into the mix you need every bit off assistance you can get. This at times can mean multiple application and reporting sets to get visibility of this data and I think Runecast ticks the box from a single reporting point.

I would like to thank Andre Carpenter at Runecast for the opportunity to test their product and providing me with a trial license. You can follow Andre at https://www.linkedin.com/in/andrecarpenter/ or @andrecarpenter on Twitter, and Runecast at https://www.linkedin.com/company/runecast/ .


[1] https://www.cyber.gov.au/acsc/view-all-content/publications/essential-eight-explained

[2] https://compliancy-group.com/hipaa-australia-the-privacy-act-1988/

[3] https://www.iso.org/isoiec-27001-information-security.html

[4] https://www.pcisecuritystandards.org/pci_security

[5] https://www.runecast.com/

[6] https://www.nist.gov/about-nist

[7] https://www.cyber.gov.au/acsc/view-all-content/referral-organisations/national-institute-standards-and-technology-nist

[8] https://public.cyber.mil/stigs/

[9] https://www.bsi.bund.de/EN/Topics/ITGrundschutz/itgrundschutz_node.html

[10] https://gdpr.eu/data-protection-officer/