11 June 2008

Caching queries via functional indices in PostgreSQL.

Straight to the point:


foo=# select count(*) from bar;
count
--------
624569
(1 row)

foo=# explain analyze select count(*) from bar where baz ilike '%some%string%' and quux = '123';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=25846.58..25846.59 rows=1 width=0) (actual time=1543.654..1543.655 rows=1 loops=1)
-> Seq Scan on bar (cost=0.00..25846.53 rows=17 width=0) (actual time=288.667..1543.556 rows=32 loops=1)
Filter: (((baz)::text ~~* '%some%string%'::text) AND (quux = '123'::bpchar))
Total runtime: 1543.798 ms


It takes 1.5 seconds on our sample data to find 32 rows we want out of 624569 rows. If this happens to be a large table, and we often need to run this query, rather than creating some external or internal trigger to cache this query, we can use PostgreSQL's partial indices to do the work for us:


foo=# create index bar_idx_some_string on bar(id) where baz ilike '%some%string%' and quux = '123';
CREATE INDEX
foo=# explain analyze select count(*) from bar where baz ilike '%some%string%' and quux = '123';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=12.71..12.72 rows=1 width=0) (actual time=0.470..0.473 rows=1 loops=1)
-> Index Scan using bar_idx_some_string on bar (cost=0.00..12.67 rows=17 width=0) (actual time=0.122..0.406 rows=32 loops=1)
Total runtime: 0.534 ms


Now it takes 0.0005 seconds.

This also works if in a single transaction or query session, you need to run a number of queries against a table with the similar conditions. Create the index first (give it a temporary name), run the queries, then drop the index.

Caveats:
Creating the index takes up about same time as the original query, so you need to reuse the conditions to get back your initial investment.

Adding or updating to the table takes up a teeny more time. Works best on infrequently updated tables.

Kind of useless if the index condition matches a large percentage of the table.

You can use some rather complication functions and conditions for your index, but they must be immutable and must not use external info (other tables, time of day).

CREATE INDEX locks the table against writes. Use CREATE INDEX CONCURRENTLY on live systems. It comes with it own caveats though.

Docs: CREATE INDEX