
Monitoring
Monitoring is an important architectural concern that should be part of any solution, big or small, mission-critical or not, cloud-based or not—it should not be neglected.
Monitoring refers to the act of keeping track of solutions and capturing various telemetry information, processing it, identifying the information that qualifies for alerts based on rules, and raising them. Generally, an agent is deployed within the environment and monitors it, sending telemetry information to a centralized server, where the rest of the processing of generating alerts and notifying stakeholders takes place.
Monitoring takes both proactive and reactive actions and measures against a solution. It is also the first step toward auditing a solution. Without the ability to monitor log records, it is difficult to audit a system from various perspectives, such as security, performance, and availability.
Monitoring helps us identify availability, performance, and scalability issues before they arise. Hardware failure, software misconfiguration, and patch update challenges can be discovered well before they impact users through monitoring, and performance degradation can be fixed before it happens.
Monitoring reactively logs pinpoint areas and locations that are causing issues, identifies the issues, and enables faster and better repairs.
Teams can identify patterns of issues using monitoring telemetry information and eliminate them by innovating new solutions and features.
Azure is a rich cloud environment that provides multiple rich monitoring features and resources to monitor not only cloud-based deployment but also on-premises deployment.
Azure monitoring
The first question that should be answered is, "What must we monitor?" This question becomes more important for solutions that are deployed on the cloud because of the constrained control over them.
There are some important components that should be monitored. They include the following:
Custom applications
Azure resources
Guest OSes (VMs)
Host OSes (Azure physical servers)
Azure infrastructure
There are different Azure logging and monitoring services for these components, and they are discussed in the following sections.
Azure activity logs
Previously known as audit logs and operational logs, activity logs are control-plane events on the Azure platform. They provide information and telemetry information at the subscription level, instead of the individual resource level. They track information about all changes that happen at the subscription level, such as creating, deleting, and updating resources using Azure Resource Manager (ARM). Activity logs help us discover the identity of (such as service principal, users, or groups), and perform actions on (such as write or update), resources (for example, storage, virtual machines, or SQL databases) at any given point in time. They provide information about resources that are modified in their configuration, but not their inner workings and execution. For example, you can get the logs for starting a VM, resizing a VM, or stopping a VM.
The next topic that we are going to discuss is diagnostic logs.
Azure diagnostic logs
The information originating within the inner workings of Azure resources is captured in what are known as diagnostic logs. They provide telemetry information about the operations of resources that are inherent to the resources. Not every resource provides diagnostic logs, and resources that provide logs on their own content are completely different from other resources. Diagnostic logs are configured individually for each resource. Examples of diagnostic logs include storing a file in a container in a blob in a storage account.
The next type of log that we are going to discuss is application logs.
Azure application logs
Application logs can be captured by Application Insights resources and can be managed centrally. They get information about the inner workings of custom applications, such as their performance metrics and availability, and users can get insights from them in order to manage them better.
Lastly, we have guest and host OS logs. Let's understand what these are.
Guest and host OS logs
Both guest and host OS logs are offered to users using Azure Monitor. They provide information about the statuses of host and guest OSes:

Figure 2.16: Logging in Azure
The important Azure resources related to monitoring are Azure Monitor, Azure Application Insights, and Log Analytics, previously known as Operational Insights.
There are other tools, such as System Center Operations Manager (SCOM), that are not part of the cloud feature but can be deployed on IaaS-based VMs to monitor any workload on Azure or an on-premises datacenter. Let's discuss the three monitoring resources in the following section.
Azure Monitor
Azure Monitor is a central tool and resource that provides complete management features that allow you to monitor an Azure subscription. It provides management features for activity logs, diagnostic logs, metrics, Application Insights, and Log Analytics. It should be treated as a dashboard and management resource for all other monitoring capabilities.
Our next topic is Azure Application Insights.
Azure Application Insights
Azure Application Insights provides centralized, Azure-scale monitoring, logs, and metrics capabilities to custom applications. Custom applications can send metrics, logs, and other telemetry information to Azure Application Insights. It also provides rich reporting, dashboarding, and analytics capabilities to get insights from incoming data and act on them.
Now that we have covered Application Insights, let's look at another similar service called Azure Log Analytics.
Azure Log Analytics
Azure Log Analytics enables the centralized processing of logs and generates insights and alerts from them. Activity logs, diagnostic logs, application logs, event logs, and even custom logs can send information to Log Analytics, which can further provide rich reporting, dashboarding, and analytics capabilities to get insights from incoming data and act on them.
Now that we know the purpose of Log Analytics, let's discuss how logs are stored in a Log Analytics workspace and how they can be queried.
Logs
A Log Analytics workspace provides search capabilities to search for specific log entries, export all telemetry data to Excel and/or Power BI, and search a query language called Kusto Query Language (KQL), which is similar to SQL.
The Log Search screen is shown here:

Figure 2.17: Log search in a Log Analytics workspace
In the next section, we will be covering Log Analytics solutions, which are like additional capabilities in a Log Analytics workspace.
Solutions
Solutions in Log Analytics are further capabilities that can be added to a workspace, capturing additional telemetry data that is not captured by default. When these solutions are added to a workspace, appropriate management packs are sent to all the agents connected to the workspace so that they can configure themselves to capture solution-specific data from VMs and containers and then send it to the Log Analytics workspace. Monitoring solutions from Microsoft and partners are available from Azure Marketplace.
Azure provides lots of Log Analytics solutions for tracking and monitoring different aspects of environments and applications. At a minimum, a set of solutions that are generic and applicable to almost any environment should be added to the workspace:
Capacity and performance
Agent health
Change tracking
Containers
Security and audit
Update management
Network performance monitoring
Another key aspect of monitoring is alerts. Alerts help to notify the right people during any monitored event. In the next section, we will cover alerts.
Alerts
Log Analytics allows us to generate alerts in relation to ingested data. It does so by running a pre-defined query composed of conditions for incoming data. If it finds any records that fall within the ambit of the query results, it generates an alert. Log Analytics provides a highly configurable environment for determining the conditions for generating alerts, time windows in which the query should return the records, time windows in which the query should be executed, and actions to be taken when the query returns an alert:

Figure 2.18: Configuring alerts through Log Analytics
Let's go through the steps for configuring alerts through Log Analytics:
The first step in configuring an alert is to add a new alert rule from the Azure portal or automation from the alert menu of the Log Analytics resource.
The first step in configuring an alert is to add a new alert rule from the Azure portal or automation from the alert menu of the Log Analytics resource.
From the resultant panel, select a scope for the alert rule. The scope determines which resource should be monitored for alerts—it could be a resource instance, such as an Azure storage account, a resource type, such as an Azure VM, a resource group, or a subscription:

Figure 2.19: Selecting a resource for the alert
Following resource selection, conditions must be set for the alert. The condition determines the rule that is evaluated against the logs and metrics on the selected resource, and only after the condition turns true is an alert generated. There are a ton of metrics and logs available for generating conditions. In the following example, an alert is created with a static threshold value of 80% for Percentage CPU (Avg) and the data is to be collected every five minutes and evaluated every minute:

Figure 2.20: Creating an alert for Percentage CPU (Avg)
Alerts also support dynamic thresholds, which use machine learning to learn the historical behavior of metrics and detect irregularities that could indicate service issues.
Finally, create an action group or reuse an existing group that determines notifications regarding alerts to stakeholders. The Action Groups section allows you to configure things that should follow an alert. Generally, there should be a remedial and/or notification action. Log Analytics provides eight different ways to create a new action. They can be combined in any way you like. An alert will execute any or all of the following configured actions:
Email/SMS/push/voice notification: This sends an email/SMS/push/voice notification to the configured recipients.
Webhooks: A webhook runs an arbitrary external process using an HTTP POST mechanism. For example, a REST API can be executed, or the Service Manager/ServiceNow APIs can be invoked to create a ticket.
Azure Functions: This runs an Azure function, passing the necessary payload and running the logic that the payload contains.
Logic Apps: This executes a custom Logic Apps workflow.
Email Azure Resource Manager Role: This emails a holder of an Azure Resource Manager role, such as an owner, contributor, or reader.
Secure webhook: A webhook runs an arbitrary external process using an HTTP POST mechanism. Webhooks are protected using an identity provider, such as Azure Active Directory.
Automation runbooks: This action executes Azure Automation runbooks.
ITSM: ITSM solutions should be provisioned before using this option. It helps with connecting and sending information to ITSM systems.
After all of this configuration, you need to provide the Name, Description, and Severity values for the alert rule to generate it.
As mentioned at the beginning of this section, alerts play a vital role in monitoring that helps authorized personnel to take necessary actions based on the alert that's triggered.