Data Services Manager 2.1.x integration with VMware Cloud Director extension for Data Solutions
In my previous blog posts on Data Services Manager, I showed how to integrate DSM with both Aria Automation and the Cloud Consumption Interface (CCI). However, another DSM integration available to our customers is through VMware Cloud Director extension for Data Solutions. Customers, especially Cloud Service Providers (CSPs), can now leverage this integration to allow their tenants to provision both Postgres and MySQL databases through DSM, whilst all the same time getting all of the day 2 features of DSM managed databases. This includes lifecycle-management, automatic backup and restore, LDAPS integrated access control to the database and so on.
Now, if you are a CSP, you might be asking yourself “doesn’t VMware Cloud Director already have a Data Service Extension (DSE) to achieve this?” The answer is that VMware Cloud Director does indeed have a Data Service Extension, but this is different to the new DSM integration. We feel that DSM integration is a better approach. With DSE, services providers need to prepare and provision a TKG cluster for each of their tenants that wishes to use a data service. Once the Kubernetes cluster is created, and the appropriate Kubernetes operator (or operators) are pushed down to it, tenants can then begin to provision the data services that have been ‘published’ to them. This preparing of TKG clusters and operators for tenants was always a bit of a headache.
Using DSM integration with Cloud Director through its extension for data solutions, DSM will take care of all of the necessary per-tenant scaffolding, including the automatic building of the VMs and Kubernetes cluster as needed for the database. No need to prepare anything on the tenant side. With DSM integration, tenants can leverage the other day 2 operations available in Data Services Manager to manage their Postgres and MySQL database fleet.
Now that we have clarified the difference between DSM and DSE, let’s look at how to integrate DSM with Cloud Director.
Step 1: DSM integration with Cloud Director
Just to be clear, the CSP will still have to install the DSM plugin and appliance, and configure it to work with Cloud Director. It also requires a single TKG cluster for the CSP as this is where both the Data Solutions Operator (DSO) and the DSM Consumption Operator are deployed. Cloud Director makes requests of DSM to provision databases through the DSM Consumption Operator. Since Cloud Director does not have provider own Org VDCs (OVDC), there is a single Org VDC marked as a Solution Org. This is essentially owned by the provider, and is where the necessary components are installed (TKG, DSO, DSM Consumption Operator). Once that setup is done and the CSP has published the services, infrastructure policies and backup locations to the tenants, provisioning a data service becomes a simple a point and click exercise for the tenant. Let me show you how. We will assume that DSM has already been deployed and configured. There already numerous posts on this site which how to achieve this, so I will not repeat how to do that here. Instead, check out this link for a list of related DSM posts. To integrate DSM with Cloud Director, login is as an admin, navigate to More > Data Solutions shown below and select Settings > DSM Integration.
This process will prompt you for the provider’s Kubernetes Cluster, Data Solutions Operator details and all of the relevant DSM information, including Base URL, Admin User, Password for Admin User, and DSM CA cert. Add these details to connect to DSM. If successful, a Connection should look similar to the following, including the status of the DSM Consumption Operator and the Data Solutions Operator:
Once the connection is established, you will see details about infrastructure policies. These are automatically discovered once DSM is integrated. The CSP can then decide which of these infrastructure policies to make available on a per-tenant bases.
Similarly, the CSP can make a decision on which backup locations to make available on a per-tenant basis. Again, these are automatically discovered once DSM is integrated.
The final step for the CSP is to decide which solution to make available to the tenants. As you can see from the list below, solutions from both DSM & DSE may be added as Data Solutions. This is the case in this example. The difference is, of course, that the tenant requires additional infrastructure in the form of a TKG cluster to request DSE solutions, whereas this is taken care of via DSM solutions.
To make a DSM solution available to a tenant, simply select that solution and click on the Publish button to choose a tenant. Here I am choosing DSM Postgres.
The list of tenants then appears. Pick one or more tenants that you wish to make the solution available to.
Once published, you can check which tenants have access to the solution from the UI. In this case, DSM Postgres is published to the tenant ACME.
Step 2: Provision DSM data service as Cloud Director tenant
Now that the CSP has given the tenants access to a data service, let’s see how the tenant can provision a database from within Cloud Director. We now switch contexts to the tenant rather than the provider. To being with, the tenant would select More > Data Solutions from the UI, as shown:
Next, select Solutions. This will take the tenant to a type of market-place which displays all available Data Solutions. Once again, in this environment, both DSE and DSM data solutions are available for provisioning. But once again, at the risk of repeating myself, there is no onus on the tenant to set anything up in advance for the DSM data services. DSM will take care of standing up the VMs and any necessary Kubernetes clusters if one of its databases is chosen. This process also includes the installation of the necessary “operator” to stand up the database or data service. In this example, the tenant has chosen to provision a DSM Postgres database by clicking on the Launch button for that solution. For those of you who have already used DSM, this should look very familiar. Along with a name, tenants can pick from available Postgres versions, infrastructure policies, VM classes, topology, etc. For those of you unfamiliar with these concepts, check out my other blogs posts on DSM, or review the official documentation which explains these in detail.
Tenants can open up the advanced settings if they wish to enable backups, add some database options or even see the YAML manifest that will be applied to the DSM API server to create the database. A tenant could use this to “template” and automate future database deployments. The manifest will update as new configuration choices are made in the UI.
Once the tenant hits the Create button, the request is sent via the Consumer Operator on the Provider TKG cluster to the DSM API server. Tenants can monitor the progress via the Data Solutions > Instances view.
And of course, detailed view of the progress is also available. With Kubernetes being “eventually consistent”, tenants will observe a number of different errors waiting for resources to come online, as shown below in the Resource Details tab:
But eventually, if everything has been configured correctly, the database should come online after a few minutes.
Tenants can now click into the overview section of the instance and retrieve items such as the connection string and certificate, ensuring secure connections between clients and the database.
Tenants can revisit the Resource Details tab to see the full set of provisioned resources that make up the database. The can also check out the various YAML manifests which will allow them to quickly create new databases. They can use the Access Control tab to provide different users within the tenancy access to the databases. This could be Read Only access, Read/Write access or full control over theĀ instance. Last but not least, tenants have the ability to do backups and restore of the database instance. All in all, a very nice experience for the Cloud Director tenants.
While I welcome the introduction of DSM into VMware Cloud Director, there are some major blockers with the current implementation that is preventing us from using it in our environment:
1. DSM databases need access to vCenter Server. In a multi-tenant cloud environment, we do not want customer databases to be in a network that can access the vSphere infrastructure.
2. There is no network segmentation as all DSM databases reside in the same IP Pool / VLAN. Ideally, a DSM database would be placed in a routed network behind a tenant’s Edge GW, so only that tenant has network access to it.
3. The VMs which DSM creates to host the database are completely invisible to VMware Cloud Director, so resource usage (CPU, RAM, storage) will not be allocated to an OrgVDC.
I think if DSM for VCD could use CAPVCD (still maintained?) instead of CAPV to spin up the Kubernetes clusters to host the databases, it would solve a lot of the blockers for us. DSM would no longer need access to vCenter, databases could be placed in tenant VDC networks, and VCD would be aware of the DSM infrastructure resources.
I also think it would be really nice if DSM supported provisioning databases to existing Kubernetes clusters. Having a 1:1 relationship of DSM database to Kubernetes cluster can be quite wasteful, especially if a customer already has a large Kubernetes cluster in their environment.
Thank you for taking the time to provide this thoughtful feedback James. I will bring it to the attention of the product team.
I would like to add one clarification to point 2. Networks (both portgroups and IP ranges) are defined via Infrastructure Policies. It is possible to create per-tenant infrastructure policies with completely different network segment and ranges of IP address so that each tenant has their own ‘isolated’ databases from that perspective. In other words, each tenant has their own unique infra policy.
Re: 1. DSM databases need access to vCenter Server. In a multi-tenant cloud environment, we do not want customer databases to be in a network that can access the vSphere infrastructure.
That does not look correct. Why would DBs need to access VC? DSM appliance does need access to VC, but DSM is accessed only by DSE operator (running on provider solution cluster), not be tenants directly.
Fully agree on the resource usage/quota management (point 3). There are efforts to address this.
Not the database itself, but the K8s cluster on which it is running. I believe it is the CSI driver on this K8s cluster which requests vCenter to create Persistent Volumes/First Class Disks for the database’s volumes. Full list of port requirements for DSM are detailed here: https://docs.vmware.com/en/VMware-Data-Services-Manager/2.1/vmware-data-services-manager/GUID-86255AAD-6D05-4BFB-B499-2A8BF146E2E9.html#network-requirements-summary-13
Interesting. I guess node IPs and DB instance IPs could be hardened with DFW, but that would have to be automated.
Agreed. There are certainly additional enhancements coming down the line.