Azure offers us great possibilities to monitor our infrastructure in a really simple and effective way. However, in the specific case of virtual machines, when we want to go into the monitoring of the services or processes that are being executed in the operating system it is us who have to define what values we want to monitor.
Throughout this post, we will define the steps to be taken in order to create alerts based on the behaviour of the services or processes that are running.
The environment we will use for our tests will be composed of two virtual machines (Linux and Windows) and a Log Analytics Workspace. The virtual machines will be running a web service and a database service.
Creating a “Log Analytics Workspace”
El primer paso para monitorizar nuestros servicios es crear un “Log Analytics Workspace”. Un workspace es un espacio único donde se almacenarán los datos recopilados de las máquinas virtuales.
To create a “Log Analytics workspace” we will need to indicate the subscription and the group of resources where this new service from Azure will be included. Then we will indicate the name we will give it and the region where it will be located.
If in the future you want to associate this workspace with an automation account to para apply automatic updates, you must use the West Europe region instead of North Europe due to location restrictions by Azure.
Connecting virtual machines to the workspace
To collect the data from each of the virtual machines we will have to connect each of them to the workspace. When connecting them Azure installs a monitoring agent in them. To make the connection you must go to the Virtual Machines section within the created workspace, select each of the virtual machines, and press the connect button.
Azure offers us several possibilities to collect data about the state of the services. Firstly, we can collect data from logs indicating which routes you should consult. Secondly, Azure allows us to obtain the logs collected in Windows events and in the Linux Syslog. In this publication, we will focus on the first option.
To begin with, we must create a script that monitors the state of the services and stores the result in a log file. For Azure to be able to collect the information the log file must be in a suitable format. This time the logs we generate will have the following format: one entry per line, each line will start with the date YYYY-MM-DD HH:MM:SS, each field of a line will be separated by a comma and the encoding will be UTF-8. More information about the allowed formats can be found in the official documentation.
In both Windows and Linux scripts, the services to be monitored must be modified. Once the script has been modified, it must be added to the task scheduler or cron to make it run every 4 minutes daily. To check that it is running correctly we can go to the route of the logs and see the results.
Once the scripts are saving information about the state of the services in the log, you have to configure the workspace in Azure so that it goes to the log paths to collect the information.
When adding a new route, we will have to follow 4 simple steps.
- Upload an example log file. As we have previously left the script running, we will already have several log files available for this upload.
- Choose the delimiter of each log. We will select Timestamp with format YYYY-MM-DD HH:MM:SS, this way the logs in Azure will have the time of execution of the script and not the time of collection of Azure.
- Indicate the path from where to take the logs. In our case: “C:\logs\services-*.log” in Windows and “/var/log/azuremon/services-*.log” in Linux.
- Give a name to the table that will store the data. For example: Services_CL.
Finally, it is necessary to validate that the process has been carried out correctly. To do so, a query on the Services_CL table will be executed. It is important to take into account that it usually takes between 5 and 10 minutes to start showing information since it is created.
Azure’s Monitor service has a section of alerts. There you can create new alerts based on metrics or on table queries.
The first thing we have to set up is the scope of the alert, which is the previously created Log Analytics Workspace. Then we have to add the condition of the alert. To do this we select “Custom Log Search”. In the consultation section, we copy this query, indicating that if the number of results is greater than 0 it will be alerted and that the query will be evaluated every 5 minutes for a time range of 30 minutes. The query will look for services that are stopped for more than 10 minutes and less than 15 minutes, thus avoiding receiving several alerts when a service is stopped for hours.
To continue configuring the alert, an action group must be added. For this demonstration, an action group has been created which will have an e-mail configured which will receive an alert when the alert is activated.
Finally, you have to configure the name of the alert and add the degree of severity of the alert. As an example, the alert has been configured with severity 1 as if the services were quite critical. If the monitored services were not critical, the severity could be set to 4 or 5.
Now that the alert is active, you can check that it is working correctly by stopping a monitored service for more than 10 minutes. If the alert has been correctly configured, you should receive an e-mail in the configured account indicating that the service has been stopped. Below is an example of an alert received.
Remove alerts during the intervention
It is usual to find interventions on the services performed by virtual machines. From an update to a reconfiguration, in productive environments, these interventions are minimal and fast. However, larger maintenance windows are always programmed in case of unforeseen events.
It is important that during a period of intervention no false positives are triggered by alerts. Azure allows us to suppress alerts during a defined maintenance period.
In the Azure portal, from the alerts section, we can access the action management section. Once there, we will add an action rule to delete the alerts.
The most important thing when creating a rule is the filter to indicate which alerts to delete. In our case we are going to suppress the alerts that contain the name of our Linux machine.
To check that it has been correctly configured, first the services of the Linux machine are stopped and then those of the Windows machine. As we see in the following image, only alerts are received from the Windows machine while the alert from the Linux machine has been removed.
At Enimbos we love to share our knowledge. Do you want us to continue discussing more monitoring topics in Azure? Contact us and indicate what kind of subject would you like us to talk about.