Exam Essentials – Monitoring Azure Data Storage and Processing
Azure Monitoring Overview, Handle Skew in Data, Microsoft DP-203, Tune Queries by Using CacheAzure Monitor. Azure Monitor is like a container that encompasses many monitoring components. You can install it as an extension on an Azure VM and configure it to send performance and availability metrics to the workspace for storage. You can also install the extension on servers hosted in a private, on‐premises datacenter. The features contained in the Monitoring section of Azure products in the Azure portal are part of Azure Monitor and include alerts, metrics, diagnostic settings, and logs.
Alerts. Any traceable event that takes place against your product or within the application can be captured and an alert sent to a group of individuals. Log Analytics queries, metric thresholds, or activity logs can all be monitored. When you create an alert, you set the frequency the evaluation is performed and the logic it uses to determine the event, and identify who receives the alert message and how.
Diagnostic settings. The data available for logging in this feature provides platform‐specific information for the associated Azure product. The data can be stored in a Log Analytics workspace, a storage account, event hub, or a customized third‐party solution. When configured to be stored in a Log Analytics workspace, the data is placed into a table that has the same name as the category, with columns for each part of the log. You can query the data using KQL for that given product via the Logs navigation item.
Monitor hub. The Monitor hub in the Azure Synapse Analytics workspace is where you gain insights into the performance and overall health of your resources. SQL, Apache Spark, and Data Explorer pool capacity and utilization metrics can be viewed within this hub. SQL and KQL request performance, Apache Spark jobs, and data flow logging are also viable logging options from the Monitor hub. Monitoring the pipelines that run on the workspace is likely the reason you will access this hub the most. Performance and other execution details about each pipeline run are stored here. Linked connections, integration runtimes, and trigger runs expose their execution logs here as well.
Directed acyclic graphs. Getting to the source of any technical problem is very complex and requires many years of training and experience. Many layers of stacked technologies must be drilled into to find the root cause. A directed acyclic graph (DAG) is useful for getting closer to the actual execution path of a job on an Apache Spark cluster. The illustration, as shown in Figure 9.26, breaks the job into stages and the steps that take place within them.