The PIVOT function in SQL Server works similarly to a pivot table in Excel, allowing you to transform rows into columns for easier data analysis and reporting. It’s commonly used for building reports, summarizing data by categories, and displaying it in a more readable format. Unlike Excel, where data is processed locally, SQL Server processes the data on the database engine side, making report generation much faster and more efficient, especially when working with large datasets.
Data preparation and expecting result from this blog post
Before diving into the specifics of pivoting, we first need to set up the data we’ll be working with. For the purpose of this article, let’s assume we’re working with a simple Sales table:
CREATE TABLE Sales
(
SalesPerson NVARCHAR(50),
SalesDate DATETIME,
SalesAmount DECIMAL(10, 2)
);
We’ll populate this table with some randomly generated sales data, simulating entries for multiple salespeople across several years:
DECLARE @i INT = 0;
WHILE @i < 10000
BEGIN
INSERT INTO Sales (SalesPerson, SalesDate, SalesAmount)
VALUES
(
-- Random SalesPerson
(CASE
WHEN RAND() < 0.2 THEN 'Alice'
WHEN RAND() BETWEEN 0.2 AND 0.4 THEN 'Bob'
WHEN RAND() BETWEEN 0.4 AND 0.6 THEN 'Charlie'
WHEN RAND() BETWEEN 0.6 AND 0.8 THEN 'David'
ELSE 'Eve'
END),
-- Random date in last years
DATEADD(DAY, ABS(CHECKSUM(NEWID())) % 1460, '2020-01-01'),
-- Random sales amount between 1000 and 10000
FLOOR(RAND() * 9000) + 1000
);
SET @i = @i + 1;
END;
This will generate a dataset with random salespeople, sales dates spanning multiple years, and varying sales amounts as in example below:
SalesPerson | SalesDate | SalesAmount |
---|---|---|
Alice | 2022-01-15 | 5000.00 |
Bob | 2021-06-30 | 7500.00 |
Bob | 2023-03-25 | 9000.00 |
Our goal is to create a sales report by year that looks like this:
SalesPerson | 2020 | 2021 | 2022 | 2023 |
---|---|---|---|---|
Alice | 5000 | 4000 | 8000 | 2000 |
Bob | 7000 | 5500 | 6200 | 3100 |
By the end of this article, you will have a solid understanding of how to manipulate data to achieve expected report.
Static Pivoting in SQL Server
When working with well-defined datasets where the categories or columns are fixed and known ahead of time, static pivoting are more straightforward and efficient. In these cases, you can hardcode the column names directly in the query, which makes the SQL simpler to write and easier to maintain when the structure of the data doesn’t change frequently.
PIVOT
Function – static query
The following query uses the sql server PIVOT function to display sales amounts per year:
SELECT SalesPerson, [2020], [2021], [2022], [2023]
FROM
(
SELECT SalesPerson, DATEPART(YEAR, SalesDate) as Year, SalesAmount
FROM Sales
) AS SourceTable
PIVOT
(
SUM(SalesAmount)
FOR Year IN ([2020], [2021], [2022], [2023])
) AS PivotTable;
Here’s an explanation of what’s happening:
- Line 1: Specifies the report’s column headers (years as columns).
- Lines 2-6: This subquery gathers the source data for the PIVOT table. You can join, aggregate, or manipulate data here as long as you return a valid result set.
- Line 7-8: The PIVOT syntax.
- Line 9: Defines what we want to calculate, in this case, the sum of
SalesAmount
. - Line 10: Specifies how to distribute the data – in our case, across four columns representing the years of sales.
Pros and Cons of the PIVOT
Function
Pros:
- Efficient Report Generation: Since the sql server pivot function processes data on the server side, it is highly efficient for generating reports, particularly when dealing with large datasets, compared to tools like Excel.
- Readable Data Structure: It allows you to transform rows into columns, making the data easier to read and analyze, especially for summary reports.
Cons:
- Complex Syntax: The PIVOT function can be difficult to understand and use at first, particularly for beginners due to its somewhat unintuitive syntax.
- Slower Performance Compared to
CASE
: While useful for rotating data, PIVOT can be slower than usingCASE
statements, especially when handling large or complex queries.
SQL Server CASE WHEN
Statements with Grouping and Conditional Aggregation – static query
The following query achieves the same result as the PIVOT function but uses SUM
and CASE WHEN
for conditional aggregation:
SELECT
SalesPerson,
SUM(CASE WHEN DATEPART(YEAR, SalesDate) = 2020 THEN SalesAmount ELSE 0 END) AS [2020],
SUM(CASE WHEN DATEPART(YEAR, SalesDate) = 2021 THEN SalesAmount ELSE 0 END) AS [2021],
SUM(CASE WHEN DATEPART(YEAR, SalesDate) = 2022 THEN SalesAmount ELSE 0 END) AS [2022],
SUM(CASE WHEN DATEPART(YEAR, SalesDate) = 2023 THEN SalesAmount ELSE 0 END) AS [2023]
FROM Sales
GROUP BY SalesPerson;
At a glance, you can see that this query is much easier to understand, even for less experienced users. The key idea is in the data we pass to the SUM
function. We manipulate the data by checking if it belongs to a given year, and if so, we add the SalesAmount
. Otherwise, we simply add 0 to the overall sum.
Pros and Cons of SQL Server CASE WHEN
Statements with Grouping and Conditional Aggregation
Pros:
- More Readable for Beginners: The
SUM(CASE WHEN...)
syntax is generally easier to understand, especially for those new to SQL, as it uses standard SQL functions instead of the more complex PIVOT syntax. - Better Performance: In some cases,
SUM(CASE WHEN...)
can be faster than PIVOT, particularly with larger datasets or more complex queries, as it allows more granular control over the aggregation process.
Cons:
- Less Compact: The
SUM(CASE WHEN...)
approach can result in more verbose queries, especially when aggregating many columns, making the query harder to maintain compared to the more concise PIVOT syntax.
Dynamic Pivoting in SQL Server
It’s often necessary to create pivot tables where the number of columns isn’t fixed or known beforehand. For instance, when new categories or years are continuously added to your data, hardcoding column names as we did earlier, becomes impractical. This is where dynamic pivoting comes into play
PIVOT
Function – dynamic query
Dynamic pivoting allows you to generate a PIVOT table where the column names are determined at runtime. To achieve this, you need to use dynamic SQL, building the query dynamically based on the data present in your database.
Let’s modify our earlier scenario where we want to generate a sales report by year. Instead of hardcoding specific years ([2020], [2021], [2022], [2023])
, we’ll dynamically extract the distinct years from the Sales table and pivot the data accordingly.
DECLARE @cols NVARCHAR(MAX),
@query NVARCHAR(MAX);
-- Step 1: Generate a list of years dynamically
SELECT @cols = STRING_AGG(QUOTENAME([Year]), ', ')
FROM (SELECT DISTINCT DATEPART(YEAR, SalesDate) AS Year FROM Sales) AS Years;
-- Step 2: Construct the dynamic SQL query
SET @query = 'SELECT SalesPerson, ' + @cols + '
FROM
(
SELECT SalesPerson, DATEPART(YEAR, SalesDate) AS Year, SalesAmount
FROM Sales
) AS SourceTable
PIVOT
(
SUM(SalesAmount)
FOR Year IN (' + @cols + ')
) AS PivotTable;';
-- Step 3: Execute the dynamic SQL query
EXEC sp_executesql @query;
Here’s an explanation of what’s happening:
- Step 1: We use
STRING_AGG
to dynamically generate a comma-separated list of years (e.g.,[2020], [2021], [2022]
) from theSalesDate
column. This list is stored in the variable@cols
. - Step 2: We build the dynamic SQL query using the list of years stored in
@cols
variable. TheQUOTENAME
function ensures that the column names are properly escaped. - Step 3: Finally, the dynamically generated SQL query is executed using
sp_executesql
.
Benefits of Dynamic Pivoting
- Flexibility: The dynamic pivot allows you to handle datasets with changing or unknown column categories. As new years (or sales person, etc.) are added to the data, the query will automatically adapt, eliminating the need for manual query adjustments.
- Scalability: This approach is particularly useful when dealing with large datasets or reports that require frequent updates, making your query adaptable to the evolving structure of the data.
SQL Server CASE WHEN
Statements with Grouping and Conditional Aggregation – dynamic query
In cases where you want to use CASE WHEN
statements dynamically – similar to dynamic pivoting – you’ll need to generate the query at runtime, especially when the number of categories (e.g., years, sales person, etc.) is unknown or changing. Like the dynamic PIVOT example, this approach is useful for handling evolving datasets without hardcoding specific values in your SQL query.
DECLARE @cols NVARCHAR(MAX),
@query NVARCHAR(MAX);
-- Step 1: Generate a dynamic list of CASE WHEN conditions for each year
SELECT @cols = STRING_AGG('SUM(CASE WHEN DATEPART(YEAR, SalesDate) = ' +
CAST([Year] as nvarchar) +
' THEN SalesAmount ELSE 0 END) AS [' +
CAST([Year] AS NVARCHAR) + ']', ', ')
FROM (SELECT DISTINCT DATEPART(YEAR, SalesDate) AS Year FROM Sales) AS Years;
-- Step 2: Construct the dynamic SQL query
SET @query = 'SELECT SalesPerson, ' + @cols + '
FROM Sales
GROUP BY SalesPerson;';
-- Step 3: Execute the dynamic SQL query
EXEC sp_executesql @query;
Here’s an explanation of what’s happening:
- Step 1: We dynamically generate the
CASE WHEN
statements usingSTRING_AGG
. This function constructs a list of conditions for each distinct year in theSales
table. EachCASE WHEN
condition sums theSalesAmount
for the specific year and outputs it as a separate column. - Step 2: We combine the generated
CASE WHEN
conditions into the main SQL query. This query selects theSalesPerson
and uses the dynamically created aggregation columns. - Step 3: The query is executed using
sp_executesql
, which allows dynamic execution of the constructed SQL.
Benefits of Dynamic CASE WHEN
- Customization: The dynamic
CASE WHEN
approach offers fine control over the aggregation logic. You can add more complex conditions if necessary (e.g., filtering by product categories, regions, etc.). - Performance: While
CASE WHEN
can be faster than PIVOT in certain cases, the dynamic version maintains that performance benefit, making it useful when working with large datasets or when fine-grained control over the query is needed.
No responses yet