DEV Community

Cover image for SQL Server String Aggregation: FOR XML PATH vs. STRING_AGG Performance Comparison and Optimization
SQLFlash
SQLFlash

Posted on

SQL Server String Aggregation: FOR XML PATH vs. STRING_AGG Performance Comparison and Optimization

In SQL Server, string aggregation is a common requirement for data reporting, logging, and data display. Traditional methods often use FOR XML PATH, but since SQL Server 2017, STRING_AGG has become the new string aggregation function. Although both methods achieve the same functionality, their performance differences can significantly impact system efficiency when handling large-scale data.

This article explores the performance differences between FOR XML PATH and STRING_AGG and provides optimization suggestions to help developers choose the most suitable solution.

1. Common String Aggregation Methods in SQL Server

In SQL Server, strings can typically be aggregated using the following methods:

1.1 FOR XML PATH

This method uses XML aggregation to merge multiple rows into a single string.

SELECT Name
FROM Employees
FOR XML PATH('');
Enter fullscreen mode Exit fullscreen mode

1.2 STRING_AGG

Introduced in SQL Server 2017, this new aggregate function simplifies the string aggregation process and supports specifying delimiters.

SELECT STRING_AGG(Name, ', ') AS Employees
FROM Employees;
Enter fullscreen mode Exit fullscreen mode

2. Performance Comparison and Testing

2.1 Test Environment

  • Database Version: SQL Server 2019
  • Table Name: order_details (5 million records, including order_id for grouping)

2.2 Performance Testing Methods

2.2.1 Data Preparation

A table with 5 million records was generated, structured as follows:

CREATE TABLE order_details (
    order_id INT NOT NULL, -- Order number (1-10,000 groups)
    product_name NVARCHAR(50), -- Product name (randomly generated)
    quantity INT -- Purchase quantity (1-10 random values)
);
Enter fullscreen mode Exit fullscreen mode

2.2.2 Test Cases

  1. Small-scale Aggregation: 10 rows per group
  2. Medium-scale Aggregation: 500 rows per group
  3. Large-scale Aggregation: 5,000 rows per group

The goal is to concatenate product_name into a comma-separated string for each group.

2.2.3 Test Statements

-- FOR XML PATH solution
SELECT order_id, STUFF((SELECT ',' + product_name FROM order_details t2 WHERE t1.order_id = t2.order_id FOR XML PATH('')), 1, 1, '') AS products
FROM order_details t1
GROUP BY order_id;

-- STRING_AGG solution (SQL Server 2017+)
SELECT order_id, STRING_AGG(product_name, ',') AS products
FROM order_details
GROUP BY order_id;
Enter fullscreen mode Exit fullscreen mode

2.2.4 Execution Process

Each test case was executed 3 times, with cache cleared before taking the average value. The complete metrics were captured using SET STATISTICS IO, TIME ON.

3. Performance Test Results

Aggregation Scale FOR XML PATH STRING_AGG Performance Improvement
10 rows/group 320 ms 285 ms 12%
500 rows/group 1,850 ms 1,120 ms 40%
5,000 rows/group 18,200 ms 6,740 ms 63%

4. Performance Analysis

4.1 Small-scale Aggregation

For smaller aggregation tasks, both methods perform similarly, with STRING_AGG slightly outperforming FOR XML PATH. However, the performance difference is minimal and unlikely to be a bottleneck.

4.2 Medium-scale Aggregation

As the scale increases, FOR XML PATH's performance significantly decreases due to the overhead of generating XML and handling escape characters.

4.3 Large-scale Aggregation

In large-scale aggregation scenarios, FOR XML PATH's performance issues become even more pronounced. It requires handling large data volumes and converting them to XML, which impacts query efficiency.

5. Performance Reasons

5.1 FOR XML PATH Bottlenecks

  • FOR XML PATH relies on the XML engine, requiring data conversion to XML format and additional processing for escape characters.
  • Extra operations such as removing escape characters, sorting, and XML tag handling increase computational overhead as data volume grows.

5.2 STRING_AGG Advantages

  • STRING_AGG, introduced in SQL Server 2017, is specifically designed for string aggregation with optimized underlying processing, reducing extra computation and conversion steps.
  • Since it does not depend on the XML engine, it is more efficient when handling large datasets.

6. Optimization Suggestions and Best Practices

  1. Prefer STRING_AGG: If using SQL Server 2017 or later, STRING_AGG is recommended for its superior performance and cleaner syntax, especially for large datasets.
  2. Avoid FOR XML PATH: If your application does not require compatibility with earlier SQL Server versions (e.g., 2016 or earlier), avoid using FOR XML PATH due to its inferior performance with larger datasets.
  3. Index Optimization: Ensure proper indexing on the grouping fields (e.g., order_id) to improve query execution efficiency and reduce full table scans.
  4. Batch Processing for Large Datasets: Consider batch processing for very large datasets, aggregating data in pages and concatenating results at the application level to avoid querying excessive data at once.

7. SQLFLASH Optimization Demo

In complex queries, outdated syntax like FOR XML PATH can still exist and impact performance. Tools like SQLFLASH can automate optimization, especially when converting FOR XML PATH to the more efficient STRING_AGG syntax, significantly reducing execution time.

SQLFlash Demo

Example:

-- Original FOR XML PATH query
SELECT order_id, STUFF((SELECT ',' + product_name FROM order_details t2 WHERE t1.order_id = t2.order_id FOR XML PATH('')), 1, 1, '') AS products
FROM order_details t1
GROUP BY order_id;

-- Optimized with SQLFLASH to STRING_AGG
SELECT order_id, STRING_AGG(product_name, ',') AS products
FROM order_details t1
GROUP BY order_id;
Enter fullscreen mode Exit fullscreen mode

8. Summary

In SQL Server, STRING_AGG is the optimal method for string aggregation, particularly for SQL Server 2017 and later versions. For large-scale data processing, STRING_AGG outperforms FOR XML PATH with higher performance and cleaner syntax. Developers should prioritize STRING_AGG and avoid the outdated FOR XML PATH, while ensuring proper indexing to further optimize performance.

This article compares FOR XML PATH and STRING_AGG in SQL Server, highlighting STRING_AGG's superior performance for string aggregation tasks.

Original quote:https://sqlflash.ai/blog/sql-server-string-aggregation-performance-comparison-optimization/

Top comments (0)