Optimize and Troubleshoot Data Storage and Data Processing – Troubleshoot Data Storage Processing-1
Develop a Batch Processing Solution, Handle Skew in Data, Microsoft DP-203, Tune Queries by Using CacheIn this section you will learn about compression, shuffling, partitioning, query and pipeline optimization, and troubleshooting. You have already been introduced to most of the content in this section and in this chapter, but not necessarily from an optimization or troubleshooting perspective. So, there is some new learning to gain from this content. Keep in mind as you read through this content that Azure Data Factory and Azure Synapse Analytics, from the perspective of the DP‐203 exam, can be used interchangeably. This means if a question on the exam only includes the option of Azure Data Factory and the book highlights this feature in the context of Azure Synapse Analytics, you can select Azure Data Factory as the answer, with confidence. This is because all new data analytics features will be added to Azure Synapse Analytics, and at some point, only one of these products will exist after all features and customers have migrated to Azure Synapse Analytics.
An important aspect to optimizing and troubleshooting data analytics operations is to understand the different types of issues that occur. These issue types, often referred to as antipatterns, are summarized in Table 10.1.
TABLE 10.1 Performance and troubleshooting antipatterns
Antipattern | Description |
Busy database | Too much data processing is happening on the database server. |
Busy front end | Doing too much work asynchronously. |
Chatty I/O | Continuous execution of small I/O requests. |
Extraneous fetching | Retrieving too much data, not projecting effectively. |
Improper instantiation | Destroying shared objects when reuse is possible. |
Monolithic persistence | Using a single datastore that has different patterns of usage. |
No caching | The lack of cached data. |
Noisy neighbor | Shared resources are consumed by a single tenant. |
Retry storm | Retrying failures too often. |
Synchronous I/O | Blocking threads while I/O completes. |
It is important to understand that the types of issues that can impact your data analytics solution are not infinite. The kinds of issue causing latency or errors are most likely caused by one or more of the antipatterns listed in Table 10.1. If you have offloaded much of the processing of data to the database server, for example, code running in a stored procedure, that processing can cause high utilization of the database compute resources. This can impact the ability of the database to retrieve data requested by other clients. Scaling up the server on which the database runs or relocating some of the data processing to the client can reduce the impact of the busy database antipattern. There is a term called thread starvation, which means that the kernel or the process is no longer able to allocate any threads to perform computations, i.e., processing. There are numerous types of threads, and each type of thread has a built‐in limit on the total number that can be instantiated. A way to get around this limit is to better optimize the utilization of threads by running operations asynchronously. If you have a managed thread that needs to perform I/O, traditionally the managed thread would switch to an I/O thread, wait until the I/O operation has completed, and then return to the managed thread. This is not optimal, because the managed thread is idle, waiting for the I/O to complete. If the managed thread is asynchronous, it will not wait until the I/O operation is complete before returning the thread to the pool to be used for other executions. The busy front‐end antipattern happens when too much work is happening on too many threads, which consumes all the resources on the server. Scaling to a larger machine might resolve the issue, but more investigation into the application code might also be justified to better manage threading.