I will be using this database for all examples which is available on my Github public repo
PostgreSQL GROUPING SETS
More complex grouping operations are possible using the concept of grouping sets.
There are 3 steps.
1. The data selected by the FROM and WHERE clauses is grouped separately by each specified grouping set,
2. aggregates computed for each group just as for simple GROUP BY clauses,
3. and then the results returned.
I will go though each step in separate queries later on but for now, let's go through GROUPING SET syntax and example.
PostgreSQL GROUPING SET Syntax
SELECT
column_1,
column_2,
aggregate_function(column_3)
FROM
table
GROUP BY
GROUPING SETS (
(column_1, column_2),
(column_1),
(column_2),
()
);
The above syntax has 4 GROUPING SETS but we can add or remove as per our requirement.
PostgreSQL GROUPING SET Example
Note: References to the grouping columns or expressions are replaced by null values in result rows for grouping sets in which those columns do not appear.
SELECT
ship_city,
ship_country,
SUM (freight)
FROM
orders
GROUP BY
GROUPING SETS (
(ship_city, ship_country),
(ship_city),
(ship_country),
()
);
//Output
ship_city, ship_country sum
null null 64942.74
"Seattle" "USA" 1353.0599
"Kirkland" "USA" 70.009995
"Paris" "France" 108.28
"Genève" "Switzerland" 1001.2899
"Campinas" "Brazil" 322.38004
....
....
162 rows
I know that for the first time, it does seem confusing. But I am going to break it down in small chunks of query.
In this part we are just telling Postgres to display these 3 columns from the table order:
SELECT
ship_city,
ship_country,
SUM (freight)
FROM
orders
And then we are telling postgres to GROUP BY using different GROUPING SETS.
The first one is :
(ship_city, ship_country),
which is going to group by the query results using ship_city and ship_country and the next two will separately GROUP BY the query results using ship_city and ship_country.
The last one, where we don't specify anything in the brackets (), will returns the SUM of freight shipped to all cities and countries. It defines an empty grouping set.
Now, how GROUPING SETS make it easier to write the query?
Let's write down the query for the same results that we got using GROUPING SETS in the example above.
SELECT
NULL,
NULL,
SUM (freight)
FROM
orders
UNION ALL
SELECT
ship_city,
ship_country,
SUM (freight)
FROM
orders
GROUP BY
ship_city,
ship_country
UNION ALL
SELECT
ship_city,
NULL,
SUM (freight)
FROM
orders
GROUP BY
ship_city
UNION ALL
SELECT
NULL,
ship_country,
SUM (freight)
FROM
orders
GROUP BY
ship_country
//Output
ship_city, ship_country sum
null null 64942.74
"Seattle" "USA" 1353.0599
"Kirkland" "USA" 70.009995
"Paris" "France" 108.28
"Genève" "Switzerland" 1001.2899
"Campinas" "Brazil" 322.38004
....
....
162 rows
Let's break down this monstrous query.
Because the UNION ALL requires all result sets to have the same number of columns with compatible data types, you need to adjust the queries by adding NULL to the selection list of each.
In all the following queries you can remove the null. I have kept it to make it easy to understand
1st chunk
SELECT
NULL,
NULL,
SUM (freight)
FROM
orders
//Output
column? column? sum
null null 64942.74.
This query finds the total freight shipped to all cities and countries. It defines an empty grouping set.
2nd chunk
SELECT
ship_city,
ship_country,
SUM (freight)
FROM
orders
GROUP BY
ship_city,
ship_country
//Output
ship_city ship_country sum
"Seattle" "USA" 1353.0599
"Kirkland" "USA" 70.009995
"Paris" "France" 108.28
...
...
70 rows
This query defines a grouping set of the ship_city and ship_country. It returns the freight shipped to each city with it's country.
3rd chunk
SELECT
ship_city,
NULL,
SUM (freight)
FROM
orders
GROUP BY
ship_city
//Output
ship_city column? sum
"Butte" null 129.96
"Paris" null 108.28
"Cowes" null 363.65
...
...
70 rows
This query finds the freight shipped to ship_city. It defines a grouping set of the city.
4th chunk
SELECT
NULL,
ship_country,
SUM (freight)
FROM
orders
GROUP BY
ship_country
//Output
column? ship_country sum
null "Argentina" 598.58
null "Spain" 861.88995
null "Switzerland" 1368.5298
...
...
21 rows
This query finds the freight shipped to ship_country. It defines a grouping set of the country.
All those queries combined make the unified result set we saw earlier with the aggregated data for all grouping sets.
Summary
Even though the query gives us the result set we want, it is really really lengthy. And because of that it has a performance issue as PostgreSQL has to scan the orders table separately for each chunk of small queries.
That is why PostgreSQL has the GROUPING SETS which is the sub clause of the GROUP BY clause.
The GROUPING SETS allows you to define multiple grouping sets in the same query.
database GROUP BY GROUPING SETS sql clause sql query