Getting started with VCF Data Services Manager 2.x - Part 12: Aria Operations for Logs

In this post we are going to look at the log forwarding mechanism in VCF Data Services Manager (DSM). Logs come from two places in DSM. The first is from the DSM Provider Appliance itself, and the second is from the databases and data services which are provisioned by DSM. Two techniques are used to forward the logs to Aria Operations for Logs, formerly known as Log Insight. For the DSM Provider Appliance, we use the Operations for Logs / Log Insight agent. For the databases and data services we use FluentBit. FluentBit can be considered a lightweight version of FluentD, and can be used as a Log Source for Aria Operations for Logs. Thus, for this post, we will ship the DSM logs from both the appliance and the data services to Aria Operations for Logs (On-prem) using the SYSLOG protocol, port 514. This can be configured under the Settings > Log Forwarding section of the DSM UI.

Regular readers will be aware by now that DSM 2.x has been completely rearchitected from the 1.x version. We now use Kubernetes to host the database clusters provisioned by DSM. Thus, the content packs that are currently available in Aria Operations for Logs for MySQL and PostgreSQL no longer work as they are looking for log locations and events which are not applicable to DSM provisioned databases. However, you can still build great charts and dashboards with Aria Operations for Logs. In this post, I will guide you through the creation of some sample dashboards based on logs which are being shipped from both the DSM Provider Appliance and some provisioned databases. I am using Aria Operations for Logs version 8.16.0-23364779 and VCF Data Services Manager version 2.0.2.3995 for this post.

Sample Dashboard

This is a sample dashboard which I built in Aria Operations for Logs after I had configured DSM log forwarding to ship its logs to this destination. Note that there are two DSM deployments sending logs to this Aria Operations for Logs system via the agent, so it will be important when creating dashboards to identify the correct resource. There are also two databases on the first DSM instance shown below which are also shipping logs via Fluentbit. demo-4-logs is a standalone MySQL database, whereas pg-ha-new-01 is a PostgreSQL databases with a replica for availability. Let’s look at how I created these dashboards next.

Let’s begin with the pattern I used for creating the DSM Provider (120) K8s Events dashboard first. The same pattern is used for the DSM Provider (164) K8s Events dashboard, except that the source field has been changed. I’ve added some numbers to the various areas of the screen capture to show you how I captured the relevant service details from the DSM Provider Appliance. These services are responsible for provisioning the Kubernetes clusters on which the databases and data services are run.

Let’s go through each of the steps:

First, you need to specify the source. This is the IP address or FQDN of the DSM Provider Appliance. Once filtered on source, Aria Operations for Logs will only display the events from DSM which have been captured by the log agent running on the appliance.
To display these logs by services, select the container name text in the log as highlighted. My aim is to display the logs on a per container basis. After selecting some of the text in the message, a popup appears in the Aria Operations for Logs UI which allows me to “Extract Field”. This in turn opens the Manage Fields section on the right hand side. Add some pre and post contexts to this field to capture the name of the container, so essentially this is everything between “container_name” and the trailing “[.*]” (which means anything between square brackets). This will give me events from the different containers which provide the services on the DSM appliance. This is what I am extracting for my dashboard.
After saving the new field, I can now select that field in the time series drop-down view above. You probably do not need to select the source option here as these are all coming from the same source anyway. On the right hand side of the graph, you can see the break-down per service as well.
If the above looks good, you can now save this off to a dashboard.

The dashboard for the databases and data services now follow a similar pattern. Obviously the source will be different since the source will now come from the Fluent-bit service inside the K8s cluster which runs the database/data service. You could use the hostname or IP address. That becomes the starting point. However, this time, rather than filter on container name, we are going to filter on K8s pod name. A pod is a K8s construct to manage one or more containers.

Once the logs from the database have been filtered, you can repeat the process shown previously by highlighting a section of text in the log (i.e., the name of a pod). You can then add the pre and post contexts to once again identify the pod name. Add the pod name field to the time series to verify it is capturing what you require. And finally, save it as a dashboard when happy.

If you want to display the errors from the various pods, you can add a second filter that looks for “errors”.

Note: Once log forwarding has been configured in DSM v2.0.x, any new databases will immediately start shipping their logs to the configured destination. However, existing databases will only start shipping logs when they go through a reconcile. In the current versions, you will need to trigger this manually. It could be something as simple as adding an advanced config parameter to the database, scale it out or increase the resources associated with the database. All of these should trigger a reconcile operation and get the logs flowing.

The above screenshots were for a standalone MySQL database. What if I had a replicated database. Well, in the following example, I had a PostgreSQL database with a single replica and a separate Monitor node. To get the logs from all nodes, I added the IP address for all three into my source.

With the logs from all three nodes of the database filtered, I could then proceed with extracting the Pod Name field and building a managed field which I could then use in the time series. This time it would be useful to also include the source, so that you can easily identify which node has the Pod which generated the event.

You might now ask about database logging. This is also possible for PostgreSQL databases provisioned by DSM. The database logs are sent to a container called “instance-logging” on Kubernetes. Therefore, to see the database logs, you might add a first filter for the node or nodes in the database cluster, then add a second filter on the instance-logging container, along the lines of:

container_name="instance-logging"

From there, you could start to build some more detailed filtering and dashboards based on the database logs that you are interested in. Database shutdowns might be one example, which could be displayed with a simple text string of:

 database system is shut down

This could provide a dashboard similar to the following which makes it easy to observe certain database events.

Summary

Thanks for reading this far. Hopefully you can see how VCF Data Services Manager is shipping logs from both the DSM Provider appliance and the databases/data services. Since VCF includes Aria Operations for Logs, this feature provides really good log handling capabilities. Obviously I am only just scratching the surface of what can be achieved with log analysis and dashboards here. Much more can be achieved with the appropriate expertise. While we do not yet have a context pack to make the above easier to configure, it is still quite simple to set up some dashboards to filter the logs generated from DSM. Rest assured, we are looking into what it would take to create a DSM content pack – watch this space.

Getting started with VCF Data Services Manager 2.x – Part 12: Aria Operations for Logs

Sample Dashboard

Summary

Published by Cormac