Yogesh Chauhan's Blog

DISTINCT ON: The confusing, unique and useful feature in Postgres

in Postgres on April 19, 2020

We saw how DISTINCT works at the end of this post: Select Statement In Postgres With Examples

Let’s explore it further.

When I saw DISTINCT ON, I was like, there must not be anything new about it, you know, just another similar kind of feature with a different name. But I was wrong! It seems very powerful feature to me at least!

What I did was nothing special. Googled it and went through tons of articles. Most of them are filled with the official documentation but not the simple explanation and some of them are with nice decent explanation. I couldn’t understand most of it, to be frank. I was wondering why the feature is so hard to get hold of! So, I tried to play with it and find out more about it. Before I show some of my findings let’s just go through some crappy theory first!

🤓 As per the official documentation, SELECT DISTINCT ON ( expression [, …] ) keeps only the first row of each set of rows where the given expressions evaluate to equal.

In simple terms, if we use DISTINCT ON then it will give us the first result from the set of results which are grouped together.

Now the question is, when did we group them? We just simply use DISTINCT ON. Well, with DISTINCT ON, we just want PostgreSQL to return a single row for each distinct group defined by the ON clause.

🤓 The DISTINCT ON clause will only return the first row based on the DISTINCT ON(column) and ORDER BY clause provided in the query. For other columns, it will return the corresponding values. Basically, it LIMITs 1 by default when we use DISTINCT ON.

The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY. That means we can decide what row we want in our results- but by only ascending and descending the columns.

Note that the “first row” of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. That means the results will be different if we don’t use ORDER BY as Postgres is not smart enough to know our minds! The results will not be in order basically. For example,


SELECT DISTINCT ON (location) location, time, report
FROM weather_reports;

Above example is from official documentation. Now, we are not adding ORDER BY clause in it and that’s why we don’t know which of the rows will be selected. If we add, ORDER BY like the following example, we can be sure of a specific row.


SELECT DISTINCT ON (location) location, time, report
FROM weather_reports
ORDER BY location, time DESC;

The query retrieves the most recent weather report for each location.

Now, let’s just take a few more examples.

Is there any difference between the results of DISTINCT and DISTINCT ON queries?

Yes, there is. That’s why they have both! Take a look at the queries and results below.

I am using this database. https://github.com/ydchauh/yogeshchauhan.com-public

Simple DISTINCT query:


SELECT DISTINCT country, contact_name, company_name 
FROM customers;

Output:

distinct-on-query-output-yogesh-chauhan-1
distinct-on-query-output

DISTINCT ON query:


SELECT DISTINCT ON (country) contact_name, country, company_name 
FROM customers;

Output:

distinct-on-query-output-yogesh-chauhan-2
distinct-on-query-output
distinct-on-query-output-yogesh-chauhan-3
distinct-on-query-output

What about adding ORDER BY?


SELECT DISTINCT ON (country) contact_name, country, company_name 
FROM customers 
ORDER BY country;

Output:

distinct-on-query-output-yogesh-chauhan-4
distinct-on-query-output

There is no difference between the outputs from the previous and the current query. That’s because Postgres is showing results in ASC (Ascending) order by default.

Let’s try DSC (Descending) order.


SELECT DISTINCT ON (country) contact_name, country, company_name 
FROM customers 
ORDER BY country DESC;

Output:

distinct-on-query-output-yogesh-chauhan-5
distinct-on-query-output

As we can see, the results are completely different. So, when we use the DISTINCT ON, we need to be careful what we want in our results.

Now, by this point, we know that it acts like GROUP BY.

Then, why do we need GROUP BY in Postgres?

Well, let’s understand that by queries.


SELECT country, contact_name, company_name 
FROM customers 
GROUP BY country;

The query above will raise an error as we are not using COUNT or any other aggregate functions as well as we are not adding all the columns to the GROUP BY clause. So, one of those options we need to choose. Either add aggregate function or add all columns to the GROUP BY clause.

The screenshot of the error:

distinct-on-query-output-yogesh-chauhan-6
distinct-on-query-output

So, if we add all the columns to the GROUP BY clause then we will basically get the same results as just simple DISTINCT query.


SELECT country, contact_name, company_name 
FROM customers 
GROUP BY country, contact_name, company_name;

Output:

distinct-on-query-output-yogesh-chauhan-7
distinct-on-query-output

We can not add an aggregate function to just one column. Take a look at the query and results below.


SELECT country, COUNT(contact_name), company_name 
FROM customers 
GROUP BY country;
distinct-on-query-output-yogesh-chauhan-8
distinct-on-query-output

So, we need to write down query like this:

SELECT country, COUNT(contact_name), COUNT(company_name) FROM customers GROUP BY country;


SELECT country, COUNT(contact_name), COUNT(company_name) 
FROM customers 
GROUP BY country;

Output:

distinct-on-query-output-yogesh-chauhan-9
distinct-on-query-output

So, as we can see in the screenshot above, we are getting the number of rows using GROUP BY but if we want the first result from all those groups, we need to use DISTINCT ON and that’s why I consider it a powerful feature of Postgres.

Credit: Postgresql Docs


Most Read

#1 Solution to the error “Visual Studio Code can’t be opened because Apple cannot check it for malicious software” #2 How to add Read More Read Less Button using JavaScript? #3 How to check if radio button is checked or not using JavaScript? #4 Solution to “TypeError: ‘x’ is not iterable” in Angular 9 #5 PHP Login System using PDO Part 1: Create User Registration Page #6 How to uninstall Cocoapods from the Mac OS?

Recently Posted

#Apr 8 JSON.stringify() in JavaScript #Apr 7 Middleware in NextJS #Jan 17 4 advanced ways to search Colleague #Jan 16 Colleague UI Basics: The Search Area #Jan 16 Colleague UI Basics: The Context Area #Jan 16 Colleague UI Basics: Accessing the user interface
You might also like these
JavaScript arrays: a separate data type or Objects?JavaScriptWindow innerHeight and innerWidth properties in JavaScriptJavaScriptHow to delete a local and a remote Git branches?MiscellaneousWhat happens when we add Numbers and Strings in JavaScript?JavaScriptThe basics of @import rule in SCSS (Sass)SCSSFunction Scope in JavaScript for BeginnersJavaScript