Tune Queries by Using Indexers – Troubleshoot Data Storage Processing-2
Azure Monitoring Overview, Exams of Microsoft, Microsoft DP-203A disadvantage of clustered indexes is that the only queries that benefit from them are those that filter the query in the way the index is created. This means that only queries that have those four IDs in theWHERE clause will utilize the index; other queries will result in a table scan. One way to determine which indexes are needed for your data analytics is by knowing about all the queries and the data they are retrieving. For smaller applications, this might be possible; however, it would not be feasible in an enterprise with numerous teams and many databases. One way you can find out which queries are being run against tables and if they would benefit from an index is by using an indexer. An indexer is the tool or process by which data is indexed. A tool called Query Store is useful for SQL databases and is supported on Azure Synapse Analytics. It is not enabled by default. To enable it, execute the following SQL statement, replacing databaseName with the name of the dedicated SQL pool where you want this feature enabled:
ALTER DATABASE [databaseName]
SET QUERY_STORE = ON;
The information you can find using the data logged by Query Store includes queries with long execution times, queries that perform the most I/O transactions, queries with the highest wait durations, and comparisons between like queries that highlight downward performance trends. This data is pulled from dynamic management views (DMV), which hold information useful for identifying which queries on the database are latent and which queries would benefit from an index. As shown in Table 10.4, there are DMVs that recommend columns in a table that should exist in an index, and much more.
TABLE 10.4 Index‐related Dynamic Management Views
Name | View | Description |
sys.query_store_query_text | Query Store | Stores the query command text |
sys.query_store_query | Query Store | Stores the query command text and runtime execution statistics |
sys.query_store_plan | Query Store | Links plans with queries |
sys.query_store_runtime_stats | Query Store | Stores runtime statistics for queries |
sys.query_store_wait_stats | Query Store | Contains query wait information |
sys.dm_pdw_exec_requests | Execution | Lists all steps taken for recent or active queries |
sys.dm_pdw_request_steps | Execution | Lists all steps taken for a given query |
sys.dm_pdw_dms_workers | Execution | Contains information about workers that process DMS steps |
sys.dm_pdw_sql_requests | Execution | Holds information about SQL queries |
sys.indexes | Object | Lists the indexes on the database |
sys.index_columns | Object | Lists indexed columns per index |
sys.pdw_index_mappings | Synapse | Links the physical index name to the logical index used on Compute node |
sys.dm_db_index_operational_stats | Index | Returns I/O, latching, access method, and locking activity |
sys.dm_db_index_physical_stats | Index | Returns fragments and size information for indexes and data |
sys.dm_db_index_usage_stats | Index | Lists the index types and usage frequency |
sys.dm_db_missing_index_columns | Index | Identifies the columns that are missing an index |
sys.dm_db_missing_index_details | Index | Renders details about missing indexes |
The Query Store uses these DMVs behind the scenes and offers some built‐in capabilities, which can be graphically viewed in SQL Server Management Studio (SSMS). Figure 10.7 illustrates the output of one such report named Top Resource Consuming Queries.
FIGURE 10.7 Tuning queries with the Top Resource Consuming Queries report
The default reports can give you insights into the areas in your data analytics solution that consume the most amount of CPU, memory, and/or time. Once you have the top problem queries identified, you can use the plethora of DMVs from Table 10.4 to dig deeper into the query execution and determine if there is a need to create additional indexes or even re‐create the table with a clustered columnstore index. Performing the actual tuning depends on the queries you discovered from your analysis. Take the previous SQL query, which retrieves data from the READING table and has a clustered index like the following:
CLUSTERED INDEX ( READING_ID, SESSION_ID, ELECTRODE_ID, FREQUENCY_ID )
Then hypothetically consider that you found many queries that matched the following pattern:
SELECT READING_DATETIME, COUNT, VALUE
FROM READING
WHERE ELECTRODE_ID = 2 AND FREQUENCY_ID = 4