Infrastructure and Cloud for Enthusiasts

[blog 006]# git commit

VCF Stretched VSAN Cluster and APIs

In my previous blog I talked about the similarities between a 2 node VSAN cluster vs a stretched VSAN cluster so I thought I would continue the same theme and write about stretched VSAN cluster management in VMWare Cloud Foundation [1] using APIs.

 In VCF, VSAN is a requirement for the Management Workload Domain and is deployed during instantiation of the environment by Cloud Builder, however Cloud Builder will only deploy a single availability zone VSAN cluster. If you want your management plane to be highly available, you can stretch your Management Workload Domain across two availability zones and stretch the VSAN.

The advantage of having a stretched Management Workload Domain is that is provides high availability for your management virtual workloads in the event of for example a Data Center failure. Virtual machines will be brought up in the secondary availability and in my experience, this is generally under 10 minutes.

One thing that is not generally known about VCF is that any changes to the environment are generally done via API within the SDDC Manager web interface, or other common methods of pushing out API requests e.g., Postman or Curl. This applies to any VSAN operation which include stretching / un-stretching VSAN clusters and adding or removing hosts from the VSAN cluster. There is publicly available documentation for VCF API Reference Guide [2].

There are perquisites for stretched VSAN which must be addressed before any API calls can be carried. These include, available hosts must be added to the SDDC Manager and available for use, a VSAN Witness nodes at a tertiary location, and layer 3 network for the VSAN vmkernel at the secondary availability zone where you are stretching the VSAN too.

Once this is done you can retrieve the host ids of the newly assigned hosts and the cluster id which you wish to stretch via API.

To get the hosts run the following curl command or use the SDDC API Explorer.

 $ curl 'https://sddc-manager.yourdomain.local/v1/hosts' -i -u 'admin:VMwareInfra@1' -X GET -H 'Accept: application/json'

You will a JSON respone like the extract below which provides the host ids of the new hosts.

"id": "62771c25-8ef0-430c-b69b-a297e42c0ce1",
"esxiVersion": "7.0.1-17551050",
"fqdn": "esxuat10.techaspire.com.au",
"hardwareVendor": "SomeVendor",
"hardwareModel": "Some Model",

To get the cluster id run the following curl command or use the SDDC API Explorer

$ curl 'https://sddc-manager.yourdomain.local/v1/clusters' -i -u 'admin:VMwareInfra@1' -X GET

You will get a JSON respone like the extract below which provides the cluster id of the existing VSAN cluster you wish to stretch

"id": "c385a274-352f-4359-81c0-409efed3a012",
"name": "mgmt-cluster",
"primaryDatastoreName": "mgmt-vsan01",
"primaryDatastoreType": "VSAN"

As a note, the minimum number of Management hosts required to stretch a VSAN cluster between availability zones is 8 (4x hosts in each AZ ). Standard Workload Domain clusters can have a minimum of 3x hosts per availability zone.

Once all the pre-requisites are completed you can stretch the VSAN cluster to the secondary AZ which require 2x API POSTs to the SDDC Manager. The first is Cluster Validation SPEC API and the API to stretch the cluster.

An example API call and spec file to stretch the cluster would be similar to the following JSON albeit the example is a cut down version.

curl 'https://sddc-manager.yourdomain.local/v1/clusters/c385a274-352f-4359-81c0-409efed3a012/validations' -i -u 'admin:VMwareInfra@1' -X POST -H 'Content-Type: application/json' -d '
{
    "clusterStretchSpec": {
        "hostSpecs": [ {
            "id": "62771c25-8ef0-430c-b69b-a297e42c0ce1",
            "licenseKey": "THISI-SNOTT-HEKEY-YOURA-FTER1",
            "hostNetworkSpec":{ 
               "vmNics":[ 
                  { 
                     "id":"vmnic0",
                     "vdsName":"data"
                  },
                  { 
                     "id":"vmnic1",
                     "vdsName":"data"
                  },
                  { 
                     "id":"vmnic2",
                     "vdsName":"vsan"
                  },
                  { 
                     "id":"vmnic3",
                     "vdsName":"vsan"
                  }
               ]
            }
        },
		       "secondaryAzOverlayVlanId": 1234,
        "isEdgeClusterConfiguredForMultiAZ":true,
        "witnessSpec": {
            "fqdn": "dawitnessnode.yourdomain.local",
            "vsanCidr": "192.20.2.64/27",		
            "vsanIp": "192.20.2.20"
        }
    }
}

Once this task has completed you will have a stretched VSAN cluster across two availability zones.

As per my previous blog, the VSAN cluster will be broken up into a primary and secondary fault domain with virtual machines residing in the primary fault domain unless they are manually vmotioned to the secondary fault domain, or a disaster takes place which brings the virtual machines up in the secondary AZ.

An important note is that once you have expanded your VSAN cluster you will have to modify your Cluster HA Admission Control settings to reflect the number of hosts in a single VSAN fault domain.

Figure 1 – Example HA Configuration

To add or remove hosts from the cluster through the normal lifecycle of infrastructure and capacity requirements APIs are used once again using the same basic requirements of the host and cluster id’s, and is a very quick process especially if you use the SDDC API Explorer tool built into the SDDC Manager.

Figure 2 – Example SDDC Manager API Explorer

If you want to get your head around VCF and API’s there is always VMWare’s Hand on Labs, and  VMUG Advantage allows you to get VCF licensing so you can deploy a nested lab environment following this guide https://blogs.vmware.com/cloud-foundation/2020/01/31/deep-dive-into-vmware-cloud-foundation-part-1-building-a-nested-lab/ .

[1] https://www.vmware.com/au/products/cloud-foundation.html

[2] https://vdc-download.vmware.com/vmwb-repository/dcr-public/60ff5385-d6ee-41d3-9ccf-b719a59f7971/b8ba0641-d8ae-4243-acf2-3639ca248783/index.html

[blog 005]# git commit

Two Node vSAN vs Stretched vSAN

So recently I was doing a personal lab with VMware Tanzu and I decided to do a 2x node vSAN cluster with a Witness node to provide storage to the cluster and I had never done a small vSAN cluster before. The cluster was based on VMware’s vSAN Two-Node Architecture for Remote Offices. https://www.vmware.com/files/pdf/products/vsan/vmware-vsan-robo-solution-overview.pdf

I have built in my work life vSAN and most recently stretch vSAN clusters across Availability Zones and the latter is where I realized that Stretched vSAN and Robo vSAN are the same thing! Minus the whole L3 vSAN Kernel across AZ’s with a Witness node in another AZ, big dollars on infrastructure, 9000 MTU etc. etc. I could go about the differences, but this is a quick blog around the fundamental similarities.

Robo vSAN and Stretched vSAN can be broken down into the same fundamental underlying VMware Storage Technology.

  • Validated Host Infrastructure with localized disk (This could be Hybrid vs All Flash)
  • vSAN disk groups
  • vSAN Fault Domains
  • Preferred and Secondary Fault Domains
  • Requirements of a vSAN Witness node
vSAN Disk Groups

The concepts around vSAN Fault Tolerance (FTT) and Host cluster availability still apply regardless of Two Node or Stretched vSAN.

vSAN Fault Domains

The major differences albeit the availability zones where your vSAN hosts reside will be firstly the number of hosts you have in a Fault Domain. For example, the minimum in a VCF deployment is 4x nodes per Fault Domain. Secondly is that you should not stretch Layer 2 networks between vSAN fault domains as they should be routed VMkernels.

The vSAN witness node in both accounts will have a management network and a secondary network on the same L3 subnet that vSAN VMkernels reside in for vSAN object tracking.  In both cases the Witness Node must be in a separate Availability Zone (except my lab … at least it is on another host that is not in the same cluster).

Ultimately the use of Two node vs Stretched are completely two different architectural requirements, however VMware is maintaining consistency with the overall underlying technology, which I suppose is not a surprise with the rise of VCF and Life Cycle Management.

As a Storage Engineer at heart, I am a big fan of vSAN, Converged Storage Infrastructure, or any type of storage that relies on object, block or meta data replication, albeit it is only as good as the underlying network redundancy below it. The reason why I am big fan is that the scale and resiliency become endless from cold storage to high I/O workloads fronted by NVME controllers which cache and reduce write I/O amplification to SSD based media.

So next time you playing storage take a closer look as you might be surprised at what you find and, in my case, technological consistency.

[blog 004]# git commit

MinIO S3 Gateway on Kobol NAS

I was talking with a colleague of mine who is a well know storage and data protection boffin who has been in the salt mines technically and blogging well before I knew what hexadecimal lun id’s were.

“You want to play with a Kobol NAS” .. “Sure .. hold my beer”. This guy knows how to tweak my inner love for storage. So, what is a Kobol NAS, well it is Open-Source NAS running on Helios 64 ARM and the best bit is you get to build it yourself, https://kobol.io/ . So cool, just insert disks here, after building it of course.

This article is not about the Kobol NAS specifically as I am still yet to do it as promised to said storage guy, but how it can run Docker along with other applications natively such as Open Media Vault. The Helios 64 operating system is quite accommodating and anybody familiar with Debian will feel right at home.

I have been a fan of MinIO for quite a long time and used it for testing S3 extents for products like Veeam and Linux S3 Fuse file systems. For this blog MinIO is essentially an Open-Source file management application that supports unstructured data utilizing S3 compliant API calls such as Puts and Gets and uses a bucket construct for file placement.

There is a product within the MinIO suite which is the Minio S3 Gateway for NAS https://docs.min.io/docs/minio-gateway-for-nas.html and as Kobol can run Docker well hello we now can now ingest S3 objects into out NAS with a web front end to boot and API support.

Once you have your Kobol up and running and install the Docker engine through the TUI the process is quite simple as MinIO as it has its own Github repo https://hub.docker.com/r/minio/minio/

docker run –name minio -p 9000:9000 -v /srv/dev-disk-by-label-kobolxfs/minio:/data -e “MINIO_ACCESS_KEY=enteryourkeyhere” -e “MINIO_SECRET_KEY=enteryourkeyhere” –restart unless-stopped minio minio/minio  server /data

To break down the components of the docker container instantiation –

/srv/dev-disk-by-label-kobolxfs/minio:/data mounts the data folder in the docker container to /srv/dev-disk-by-label-kobolxfs/minio. This can also be seen in the deployed docker container as /dev/md127 which is the raid array on the Kobol NAS

The secret and access key are stipulated during the container instantiation “enteryoukeyhere”, “–restart unless-stopped” is used for when the NAS is rebooted, and you want the docker container to restart automagically and “minio/minio” stipulates the version of MinIO you want to run as a docker container. Other options are minio/stable.

As any other Docker process, you can start, stop and kill the container

The Kobol Open-Source NAS provides the perfect platform to run on as you are not locked into vendor code and gives you the freedom to do what you wish while using the NAS for other services such as general file storage, media services, DNS, proxy services etc.

Could you extend the idea into an Enterprise environment? I have seen many cases where System Engineers have a need to Archive Logs, DB backups, archive FTP backup data etc and the cost of vendor-based solutions is out of commercial reach. MinIO provides an alternative with low capital expenditure and can still easily backed up using snapshot technology by traditional enterprise backup systems such as Rubrik, Veeam, Cohesity <enter vendor of choice here>, as the transactional IO is low if you were to visualize the solution.

There are plenty of options for ingesting S3 objects for Linux, S3 Fuse which can mount in fstab, and Windows has the typical suit from S3 Browser from Amazon to CloudBerry that can mount as a drive.

[blog 03]# git commit

NSX Ninja Program Week 3

Well after 3 months of UTC/GMT -5 time zone I finished up the third week of the NSX-T Ninja Program. This time round it was hosted by Brandon Neil who is a Consultant and VMware certified instructor for the past 19 years and specialising in NSX-T Advanced design. Our other host was Rodney Mcintosh who is the Director of Operations @ 27 Virtual and VMware Specialist. Both these guys have an amazing depth of knowledge around Advanced NSX-T Architecture and Design which was the premise for the third week.

After I injected coffee into my eyeball and got myself comfortable without waking the rest of the family up at 1am in the morning we were presented with the outcome of the third week. This week was not so much around learning concepts and architecture, however, about how to architect large enterprise and Cloud Solutions at scale, and I am not talking Australian scale but Global scale. Sorry Australia, we are just a little blip on the map compared to the rest of the world but at least we are leaders in the industry and embrace and deliver the latest technologies.

As more information came to hand, documents handed, and we all did a stand up on our daily work life to gauge our skillset we were broken into groups and given our task for the week. My group had to Architect a Cloud Solution for South African Cloud Services – “ZCS”. We presented it back to the instructors and the rest of the other students where our solutions were grilled, pulled apart, and justified. The experience reminded me of the Protomolecule in the Amazon series “The Expanse” which “disassembles” human bodies and space craft.

So, in a very collapsed version here is the overview of what was required.

“Zilungele Cloud Services is a VMware Cloud Provider partner providing Cloud services to Sub-Sahara Africa. Zilungele provides Shared and Dedicated Infrastructure-as-a-Service (IaaS), Object Storage, Desktop-as-a-Service, Endpoint Management, Managed Services, and Professional Services. Zilungele is Head-quartered in Johannesburg, South Africa where they have provided services over the past 12 years. Zilungele also has a presence in Luanda, Angola. ZCS has capitalized on the continent’s late entry into data center and inter-networking services. Providing its above-listed services to Financial Institutions, Governmental institutions, Non-Governmental Organizations (NGOs), Multi-National Organizations, as well as local small and mid-sized businesses in need of IT services to facilitate their business objective in a global economy.

Zilungele intends to extend its presence into Eastern and Western Africa with data centers in Nairobi, Kenya, Addis Abba, Ethiopia, Abuja, Nigeria & Dakar Senegal. ZCS plans to maximize their investment in the VMware Cloud Provider program by implementing an asset-heavy solution using the Cloud Provider Pod Stack which includes:

  • vSphere
  • vCloud Director
  • NSX-T Datacenter
  • vSAN
  • Cloud Director Availability
  • Etc…

The use of the VMware Cloud Provider Pod stack will allow ZCS to provide:

  • Multi-Tenant Resource Pooling – create virtual data centers from common infrastructure to cater to heterogeneous enterprise needs.
  • Operational Visibility and Insights – refreshed dashboard and single pane of glass to provide centralized multi-tenant cloud management views.
  • Container-as-a-Service – onramp for enterprises leveraging flexible, on-demand containers and VMs in the same virtual data center and faster time-to-consumption for Kubernetes.
  • Data Center Extension and Cloud Migration – secure VM migration and data center extension
  • Multi-Site Management – Stretch data centers across sites and geographies.
  • Data Protection and Availability – run simple DRaaS offerings that are compatible with enterprise environments.

ZCS intends to begin this implementation in their Luanda, Angola Datacenter, and with the success of that implementation, replicate the roll-out in the new markets they are currently courting.

ZCS has 430 IaaS tenants in Angola 95% of whom leverage the shared IaaS solution. The remaining 5% subscribe either exclusively to their dedicated IaaS solution or a combination of both the shared and dedicated IaaS. The average Shared IaaS tenant runs 10 virtual machines. Clients with a dedicated Private cloud (IaaS) are provided a dedicated vSphere cluster with a minimum of 4 ESXi hosts configured with vSAN.

ZCS expects to increase its customer base by 200% over the next 18 months and needs a solution that will scale accordingly.”

As part of the presentation, we had to produce typical formal documentation which included risk, assumption, constraints, design, decisions, and technical diagrams all of which had to be defended.

So, in a group of 5 people this was going to be achievable as the other members that were in my group were like me, VMware Cloud and Network Architects from other parts of the globe so we should nail this right?  Wrong “<button>Push for Ejection<button>”. Unfortunately, by day 2.5, I was the only person left in my group on the course but that is not to say I did not forge ahead. My 9-hour days turned into 12-hour days as I accumulated all the documentation required for the presentation. On Saturday morning I delivered the presentation first up to my peers which was a 1 ½ hour time slot.

So how did go? I will give myself a solid “B” not just for the effort I put in but how I was able to defend the solution. This was my first time in a public forum doing this and presenting under pressure. I did get caught out with some scaling issues and while being under pressure, curve balls thrown at me to intentionally make me trip up, re-think parts of my solution and get smacked back down again, but this is the same thing that happens if you sit for your VCDX.

Overall, I am extremely proud of the effort that I put in over the last couple of months and I recommend especially to people in Australia to try and get on this course. It is not offered inside Australia so you will need to either try and get in through VMUG USA or be part of a Multi-national company. The amount of knowledge that I have gained, training material, best practices, and real-world reference architectures around NSX-T I have accumulated over the last 3 months is staggering.  It is a shame though that a few people who got the opportunity in the USA to attend the NSX Ninja Program took it for granted which prevented other people getting on board.

A quick shout out to Paul Mancusco who is the Technologist Director for Networking and Security at VMware for his presentation on Cisco ACI with NSX-T. It was an excellent presentation. I will sign off this blog with a “Witching Hour Blackberry Sour” from Aether Brewing Company here in good ol Brisvegas. 8 IBU Wheat Malt Base, Motueka Hops and Blackberries. Sours are not for everybody but give it go, you just never know.

[blog 02]# git commit

NSX Ninja Program Week 2

Recently I was back on UTC/GMT -5 time zone for the second installment of the NSX-T Ninja Program proudly hosted by James Asutaku and Isaac Valdez. James comes at us hard, fast and with massive amounts of architecture, while Isaac takes us into the technical weeds with an amazing ability to destroy a PowerPoint slide with amazing detail and the use his magic Zoom marker. 9-hour days starting at 1am AUS EST and the amount you brain is trying to take takes a toll on my middle-aged body I must admit.

Isaac Valdez Art Work

So, for the second week we had quite an expansive course covering VMware Cloud Foundation 4.0 Deep Dive, VCF Multi-Cloud Architecture, NSX-T Application protection with L3 to L7 DFW, NSX-T L7 Protection, Identity Firewall & URL Filtering, Distributed IDS, Rest API, Tanzu Kubernetes and NSX Intelligence AVI Networks and Advance Routing Design. Phew did you get all that.?

 I absolutely loved the Advanced Routing Design as that is my core with NSX-T and have since implemented it in a production environment with outstanding results in performance and cross availability zone redundancy.

On top of all the lecturing the days were backed up with a huge amount of hand on labs to reinforce what we had mentally ingested. This meant extensive amounts coffee was required daily and frequently. There was no room for the ol “She’ll be right” on the labs as if we did not complete the labs on the day, it effected your labs and progress for the next day. This meant plenty of after hours lab time was on the cards, so you did not get behind (lunch time in my case).

The hardest part of the course I have so far was the 2 days on L4 to L7 security. Not because I could not understand it or fathom concepts, but have you ever tried to do serious security while you half a sleep. I take my hat off to all you Security Engineers and Architects out there for your dedication and sacrifice to the security cause, and I will admit, you are a special breed. It did my head in.

On the fourth day we had a 2-hour engineering presentation and demonstrations by the kind folk at ReSTNSX  https://restnsx.com/ . These guys automate your life with NSX-T from Security Workflows, Day 2 Operations, Deployment, Migration, Object and Policy Mobility and across multiple deployments of NSX-T. In a nutshell these guy’s rock as it would take a full-time crack team of developers to produce the same results in house.

The best part of the NSX Ninja Program is also meeting like-minded people, and both James and Isaac encourage open dialogue and discussion on the topics including people’s experiences. There is a mountain of knowledge from people attending the course whose backgrounds are Network Engineers, Infrastructure Engineers and Architects. One such person I met this week was a gentleman by the name of Tom Grisham. Anybody who has dealt with NSX over the years will know him from his blogs and articles. Thanks Tom https://www.linkedin.com/in/tomgrisham/ for connecting with me, the conversation, and pictures of cold Texas nights.

So just as by body has recovered from non-traveling jet lag and I had my head back down in my day job, the Ninja Program week 3 is about to start up again so stayed tuned for the review. Week 3 is going to be just as full on with Solution Architecture and having to present designs back to my peers on the program.

I will sign off this blog with a “Hop Smith IPA” from Akasha Brewing Company in New South Wales Australia. A nice West Coast style IPA @ 6.8%ABV and 60 IBU which I thought was fitting since I have spent so much time recently in USA hours.