Google+ Followers

Pagination of query results in Postgres

Marko's recent blog post on speeding up count(*) in Postgres sent me to his previous post where I saw a very valid reason from Darren Duncan for pagination of results from a database. The point being that web applications are usually expected to show the page links and allow the user to jump to different pages in the result.

One solution is to run the query twice, once with just a count(*) and then the actual query, but that'd be very inefficient.

A few experiments later I think I have a solution; a bit inefficient than running the plain query but much better than running the query twice.
with
REAL_QUERY as
 (select relname, oid from pg_class limit 5)
-- User should ignore the rest of the query
select *, show_rowcounter() as row_count
from (select REAL_QUERY.*, increment_rowcounter() as row_number
      from REAL_QUERY,
      (select reset_rowcounter()) as _c
      order by row_number
      offset 0) as _subq;

Don't get scared by the length of the query above, all one needs to do is put the real query between the first pair of parenthesis after the REAL_QUERY identifier.

The result would be the same as the original query, but with two additional columns: row_number and row_count. row_number numbers each row starting from 1, and row_count shows the total number of rows in the result.
.   relname    | oid  | row_number | row_count 
---------------+------+------------+-----------
 pg_statistic  | 2619 |          1 |         5
 pg_type       | 1247 |          2 |         5
 pg_attribute  | 1249 |          3 |         5
 pg_toast_1262 | 2844 |          4 |         5
 pg_toast_2604 | 2830 |          5 |         5
(5 rows)
The 'ORDER BY row_number' clause adds the overhead, but it is necessary so that the row_count is calculated before the first row is produced at the top level. I wish I could introduce a MATERIALIZE node in the query plan at will.

If your REAL_QUERY has an ORDER BY or GROUP BY clause then you can remove the ORDER BY row_number clause from the outer query, since Postgres will  make sure that show_rowcounter() will not be called before the last call of increment_rowcounter().

The above trick uses the Common-Table-Expression feature (introduced in Postgres version 8.4), because I wanted to make it look like a template where user's real query is visible at the top.

If you are running on an older version you can easily modify it to be a simple query because CTE used above is not recursive.

select *, show_rowcounter() as row_count
from (select REAL_QUERY.*, increment_rowcounter() as row_number
      from (select relname, oid from pg_class limit 5) as REAL_QUERY,
      (select reset_rowcounter()) as _c
      order by row_number
      offset 0) as _subq;
Now the guts of the trickery: PL/Perl functions:

-- PL/Perl
create or replace function reset_rowcounter() returns int as $p$
    $_SHARED{rowcounter} = 0;
$p$ language plperl stable;

create or replace function increment_rowcounter() returns int as $p$
    $_SHARED{rowcounter} = $_SHARED{rowcounter} + 1;
    return $_SHARED{rowcounter};
$p$ language plperl;

create or replace function show_rowcounter() returns int as $p$
    return $_SHARED{rowcounter};
$p$ language plperl;
BTW, this also shows how one can get Oracle's ROWNUM like feature in Postgres.