Yogesh Chauhan's Blog

DISTINCT ON: The confusing, unique and useful feature in Postgres

in Postgres on April 19, 2020

We saw how DISTINCT works at the end of this post: Select Statement In Postgres With Examples

Let’s explore it further.

When I saw DISTINCT ON, I was like, there must not be anything new about it, you know, just another similar kind of feature with a different name. But I was wrong! It seems very powerful feature to me at least!

What I did was nothing special. Googled it and went through tons of articles. Most of them are filled with the official documentation but not the simple explanation and some of them are with nice decent explanation. I couldn’t understand most of it, to be frank. I was wondering why the feature is so hard to get hold of! So, I tried to play with it and find out more about it. Before I show some of my findings let’s just go through some crappy theory first!

🤓 As per the official documentation, SELECT DISTINCT ON ( expression [, …] ) keeps only the first row of each set of rows where the given expressions evaluate to equal.

In simple terms, if we use DISTINCT ON then it will give us the first result from the set of results which are grouped together.

Now the question is, when did we group them? We just simply use DISTINCT ON. Well, with DISTINCT ON, we just want PostgreSQL to return a single row for each distinct group defined by the ON clause.

🤓 The DISTINCT ON clause will only return the first row based on the DISTINCT ON(column) and ORDER BY clause provided in the query. For other columns, it will return the corresponding values. Basically, it LIMITs 1 by default when we use DISTINCT ON.

The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY. That means we can decide what row we want in our results- but by only ascending and descending the columns.

Note that the “first row” of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. That means the results will be different if we don’t use ORDER BY as Postgres is not smart enough to know our minds! The results will not be in order basically. For example,


SELECT DISTINCT ON (location) location, time, report
FROM weather_reports;

Above example is from official documentation. Now, we are not adding ORDER BY clause in it and that’s why we don’t know which of the rows will be selected. If we add, ORDER BY like the following example, we can be sure of a specific row.


SELECT DISTINCT ON (location) location, time, report
FROM weather_reports
ORDER BY location, time DESC;

The query retrieves the most recent weather report for each location.

Now, let’s just take a few more examples.

Is there any difference between the results of DISTINCT and DISTINCT ON queries?

Yes, there is. That’s why they have both! Take a look at the queries and results below.

I am using this database. https://github.com/ydchauh/yogeshchauhan.com-public

Simple DISTINCT query:


SELECT DISTINCT country, contact_name, company_name 
FROM customers;

Output:

distinct-on-query-output-yogesh-chauhan-1
distinct-on-query-output

DISTINCT ON query:


SELECT DISTINCT ON (country) contact_name, country, company_name 
FROM customers;

Output:

distinct-on-query-output-yogesh-chauhan-2
distinct-on-query-output
distinct-on-query-output-yogesh-chauhan-3
distinct-on-query-output

What about adding ORDER BY?


SELECT DISTINCT ON (country) contact_name, country, company_name 
FROM customers 
ORDER BY country;

Output:

distinct-on-query-output-yogesh-chauhan-4
distinct-on-query-output

There is no difference between the outputs from the previous and the current query. That’s because Postgres is showing results in ASC (Ascending) order by default.

Let’s try DSC (Descending) order.


SELECT DISTINCT ON (country) contact_name, country, company_name 
FROM customers 
ORDER BY country DESC;

Output:

distinct-on-query-output-yogesh-chauhan-5
distinct-on-query-output

As we can see, the results are completely different. So, when we use the DISTINCT ON, we need to be careful what we want in our results.

Now, by this point, we know that it acts like GROUP BY.

Then, why do we need GROUP BY in Postgres?

Well, let’s understand that by queries.


SELECT country, contact_name, company_name 
FROM customers 
GROUP BY country;

The query above will raise an error as we are not using COUNT or any other aggregate functions as well as we are not adding all the columns to the GROUP BY clause. So, one of those options we need to choose. Either add aggregate function or add all columns to the GROUP BY clause.

The screenshot of the error:

distinct-on-query-output-yogesh-chauhan-6
distinct-on-query-output

So, if we add all the columns to the GROUP BY clause then we will basically get the same results as just simple DISTINCT query.


SELECT country, contact_name, company_name 
FROM customers 
GROUP BY country, contact_name, company_name;

Output:

distinct-on-query-output-yogesh-chauhan-7
distinct-on-query-output

We can not add an aggregate function to just one column. Take a look at the query and results below.


SELECT country, COUNT(contact_name), company_name 
FROM customers 
GROUP BY country;
distinct-on-query-output-yogesh-chauhan-8
distinct-on-query-output

So, we need to write down query like this:

SELECT country, COUNT(contact_name), COUNT(company_name) FROM customers GROUP BY country;


SELECT country, COUNT(contact_name), COUNT(company_name) 
FROM customers 
GROUP BY country;

Output:

distinct-on-query-output-yogesh-chauhan-9
distinct-on-query-output

So, as we can see in the screenshot above, we are getting the number of rows using GROUP BY but if we want the first result from all those groups, we need to use DISTINCT ON and that’s why I consider it a powerful feature of Postgres.

Credit: Postgresql Docs


Most Read

#1 How to check if radio button is checked or not using JavaScript? #2 Solution to “TypeError: ‘x’ is not iterable” in Angular 9 #3 How to add Read More Read Less Button using JavaScript? #4 How to uninstall Cocoapods from the Mac OS? #5 PHP Login System using PDO Part 1: Create User Registration Page #6 How to Use SQL MAX() Function with Dates?

Recently Posted

#Aug 15 Is PHP still good for back-end programming? #Aug 10 How to create a multisite network in WordPress? #Aug 3 How to create a circle that follows a cursor using JavaScript and CSS? #Aug 3 How to make a curtain slider using jQuery and CSS? #Aug 2 How to progressively load images and add a blurry placeholder? #Aug 1 How to create a placeholder loader (throbber) using CSS?
You might also like these
How to check if a link has http or https in it in JavaScript?JavaScriptHow to verify your domain in Google Console with a TXT record?MiscellaneousKeyValuePipe in Angular 9AngularFile System Integrity: How to Keep an Eye on Your Files and Folder Change?MiscellaneousAlways add associated labels to your Form elementsUI/UXWhat is XP (Extreme Programming) and When should it be used?Miscellaneous