Yogesh Chauhan's Blog

Recursive WITH Queries in Postgres (Common Table Expressions)

in Postgres on June 20, 2020

SELECT in WITH RECURSIVE

We saw the use of WITH in Common Table Expressions in this post: Common Table Expressions (CTE) In PostgreSQL

CTE are nothing but temporary tables that we can use in the same query execution.

The optional RECURSIVE modifier changes WITH from a mere syntactic convenience into a feature that accomplishes things not otherwise possible in standard SQL.

Using RECURSIVE, a WITH query can refer to its own output. A very simple example is this query to sum the integers from 1 through 100:


WITH RECURSIVE t(n) AS (
    VALUES (1)
  UNION ALL
    SELECT n+1 FROM t WHERE n < 100
)
SELECT sum(n) FROM t;

//Output

5050

In the example above, the working table has just a single row in each step, and it takes on the values from 1 through 100 in successive steps. In the 100th step, there is no output because of the WHERE clause, and so the query terminates.

From the query above, we can easily make the syntax:


WITH RECURSIVE CTE_name AS(
    CTE_query -- non-recursive
    UNION [ALL]
    CTE_query  -- recursive
) SELECT * FROM CTE_name;

The general form of a recursive WITH query is always a non-recursive term, then UNION (or UNION ALL), then a recursive term, where only the recursive term can contain a reference to the query’s own output.

Such a query is executed as follows:

Recursive Query Evaluation

1. Evaluate the non-recursive term. For UNION (but not UNION ALL), discard duplicate rows. Include all remaining rows in the result of the recursive query, and also place them in a temporary working table.

2. So long as the working table is not empty, repeat these steps:

1. Evaluate the recursive term, substituting the current contents of the working table for the recursive self-reference. For UNION (but not UNION ALL), discard duplicate rows and rows that duplicate any previous result row. Include all remaining rows in the result of the recursive query, and also place them in a temporary intermediate table.

2. Replace the contents of the working table with the contents of the intermediate table, then empty the intermediate table.

Example

Recursive queries are typically used to deal with hierarchical or tree-structured data.

I am using this database for the next example which is available on my Github public repo

The following query returns the list of all the employees who reports to employee with id 5.


WITH RECURSIVE employee_list AS (
	SELECT
		employee_id,
		reports_to,
		first_name, 
	    title
	FROM
		employees
	WHERE
		reports_to = 5
	UNION
		SELECT
			e.employee_id,
			e.reports_to,
			e.first_name, 
			e.title
		FROM
			employees e
		INNER JOIN employee_list s ON s.employee_id = e.reports_to
) SELECT
	*
FROM
	employee_list;

//Output

employee_id   reports_to    first_name      title
6                     5	    "Michael"       "Sales Representative"
7                     5     "Robert"        "Sales Representative"
9                     5     "Anne"          "Sales Representative"

End of results

When working with recursive queries it is important to be sure that the recursive part of the query will eventually return no tuples, or else the query will loop indefinitely.

Sometimes, using UNION instead of UNION ALL can accomplish this by discarding rows that duplicate previous output rows.

However, often a cycle does not involve output rows that are completely duplicate: it may be necessary to check just one or a few fields to see if the same point has been reached before.

The standard method for handling such situations is to compute an array of the already-visited values. 

For example, consider the following query that searches a table graph using a link field:


WITH RECURSIVE search_graph(id, link, data, depth) AS (
        SELECT g.id, g.link, g.data, 1
        FROM graph g
      UNION ALL
        SELECT g.id, g.link, g.data, sg.depth + 1
        FROM graph g, search_graph sg
        WHERE g.id = sg.link
)
SELECT * FROM search_graph;

This query will loop if the link relationships contain cycles.

Because we require a “depth” output, just changing UNION ALL to UNION would not eliminate the looping.

Instead we need to recognize whether we have reached the same row again while following a particular path of links.

We add two columns path and cycle to the loop-prone query:


WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
        SELECT g.id, g.link, g.data, 1,
          ARRAY[g.id],
          false
        FROM graph g
      UNION ALL
        SELECT g.id, g.link, g.data, sg.depth + 1,
          path || g.id,
          g.id = ANY(path)
        FROM graph g, search_graph sg
        WHERE g.id = sg.link AND NOT cycle
)
SELECT * FROM search_graph;

Aside from preventing cycles, the array value is often useful in its own right as representing the “path” taken to reach any particular row.

In the general case where more than one field needs to be checked to recognize a cycle, use an array of rows.

For example, if we needed to compare fields f1 and f2:


WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
        SELECT g.id, g.link, g.data, 1,
          ARRAY[ROW(g.f1, g.f2)],
          false
        FROM graph g
      UNION ALL
        SELECT g.id, g.link, g.data, sg.depth + 1,
          path || ROW(g.f1, g.f2),
          ROW(g.f1, g.f2) = ANY(path)
        FROM graph g, search_graph sg
        WHERE g.id = sg.link AND NOT cycle
)
SELECT * FROM search_graph;

Tip: Omit the ROW() syntax in the common case where only one field needs to be checked to recognize a cycle. This allows a simple array rather than a composite-type array to be used, gaining efficiency.

Tip: The recursive query evaluation algorithm produces its output in breadth-first search order. You can display the results in depth-first search order by making the outer query ORDER BY a “path” column constructed in this way.

A helpful trick for testing queries when you are not certain if they might loop is to place a LIMIT in the parent query.

For example, this query would loop forever without the LIMIT:


WITH RECURSIVE t(n) AS (
    SELECT 1
  UNION ALL
    SELECT n+1 FROM t
)
SELECT n FROM t LIMIT 100;

This works because PostgreSQL’s implementation evaluates only as many rows of a WITH query as are actually fetched by the parent query.

Using this trick in production is not recommended, because other systems might work differently.

Also, it usually won’t work if you make the outer query sort the recursive query’s results or join them to some other table, because in such cases the outer query will usually try to fetch all of the WITH query’s output anyway.

A useful property of WITH queries is that they are evaluated only once per execution of the parent query, even if they are referred to more than once by the parent query or sibling WITH queries.

Thus, expensive calculations that are needed in multiple places can be placed within a WITH query to avoid redundant work.

Another possible application is to prevent unwanted multiple evaluations of functions with side-effects.

However, the other side of this coin is that the optimizer is less able to push restrictions from the parent query down into a WITH query than an ordinary sub-query.

The WITH query will generally be evaluated as written, without suppression of rows that the parent query might discard afterwards. (But, as mentioned above, evaluation might stop early if the reference(s) to the query demand only a limited number of rows.)

The examples above only show WITH being used with SELECT, but it can be attached in the same way to INSERT, UPDATE, or DELETE.

In each case it effectively provides temporary table(s) that can be referred to in the main command.


Most Read

#1 Solution to the error “Visual Studio Code can’t be opened because Apple cannot check it for malicious software” #2 How to add Read More Read Less Button using JavaScript? #3 How to check if radio button is checked or not using JavaScript? #4 Solution to “TypeError: ‘x’ is not iterable” in Angular 9 #5 PHP Login System using PDO Part 1: Create User Registration Page #6 How to uninstall Cocoapods from the Mac OS?

Recently Posted

#Apr 8 JSON.stringify() in JavaScript #Apr 7 Middleware in NextJS #Jan 17 4 advanced ways to search Colleague #Jan 16 Colleague UI Basics: The Search Area #Jan 16 Colleague UI Basics: The Context Area #Jan 16 Colleague UI Basics: Accessing the user interface
You might also like these
Learn how to add Scroll Indicator using CSS and JavaScript?CSSLearn to create profile card using HTML and CSSCSSinclude, include_once, require, require_once in PHPPHPHow to Draw a Circle in HTML5 Using Canvas Tag?HTMLSanitize inputs using built-in WordPress functionsWordPressHow to Commit and Rollback Changes in SQL?SQL/MySQL