Optimize Pipelines for Analytical or Transactional Purposes – Troubleshoot Data Storage Processing-1
Handle Skew in Data, Microsoft DP-203Much of the effort required to optimize and tune a pipeline lies in the discovery phase. That means trying to find where the slowness is coming from. Once you locate the general area, you can further debug by drilling down into the area details until you get to the actual piece of code or SQL query step that is causing the problem. That is no simple task. A majority of the content in Chapter 9 had to do with configuring, capturing, and analyzing performance and availability logs. Those logs are what you use to determine where the problem exists, if there is one, and how bad the issue is, and are useful to compare performance to after a change is made. Before delving into the specific performance aspects of optimizing a pipeline, first recall the terms OLTP, OLAP, and HTAP, which were introduced in Chapter 2.
From a transactional perspective, the term OLTP should come to mind. OLTP, which stands for online transactional processing, captures data in real time from activities like placing an order or ingesting a brain wave reading. OLTP tables need to be optimized for inserts.
Alternatively, OLAP, which stands for online analytical processing, is a database that stores historical data that is useful for analysis and business insights gathering. From an OLAP perspective, the database tables need to be optimized for reads. This means you would want to avoid a scenario in which you have an OLTP solution using the same database tables as the one OLAP does because of isolation levels (introduced in Chapter 3) and a potential overutilization of the table. For example, if you are handling a large volume of concurrent incoming data that needs to be inserted into a table, while at the same time a large report that queries heavily on the same table is requested, there would be contention. Not only would there be contention, but if the isolation level is read uncommitted, then the report might retrieve some data that has not yet been committed, which could be rolled back. That would render potentially incorrect results for the report. An approach for managing this common situation is to use HTAP, which stands for hybrid transaction/analytical processing.
A database that would benefit from HTAP is one that must handle both OLTP and OLAP operations. This means the database and tables within it need to handle both inserts, updates, and deletes while simultaneously providing performant data retrievals from select statements. You might be rightly thinking that dedicated SQL pools are not or have not been discussed in the context of OLTP. This is true, and you should feel confident in your learning that a dedicated SQL pool is not what you would want to use for OLTP transactions. This is due to the available performance level offerings, which currently do not offer tiers optimized for OLTP or HTAP implementation. Therefore, HTAP solutions apply more to Azure SQL and even Azure Cosmos DB products.
You implement an HTAP solution into an Azure SQL database by creating a nonclustered columnstore index, similar to the following: CREATE NONCLUSTERED COLUMNSTORE INDEX NCCX_READING_ELECTRODE_FREQUENCY
ON dbo.READING( ELECTRODE_ID, FREQUENCY_ID )