cybertec-postgresql / pg_squeeze

A PostgreSQL extension for automatic bloat cleanup

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Differences to PG12 REINDEX TABLE/INDEX CONCURRENTLY

novabyte opened this issue · comments

Now that PG12 has been released I'm curious what the pros/cons or differences are to the process and outcome achieved by pg_squeeze vs the new REINDEX TABLE/INDEX CONCURRENTLY command:

https://www.postgresql.org/docs/12/sql-reindex.html#SQL-REINDEX-CONCURRENTLY

What are the occasions (if any) where you'd recommend that pg_squeeze should continue to be used?

pg_squeeze can actually be used to rebuild indexes concurrently, but it's not too efficient because besides indexes it also "rebuilds" the table. The primary goal of the pg_squeeze project is to eliminate table bloat, i.e. unused space in tables that VACUUM (unless it's VACUUM FULL) leaves behind. REINDEX command does not care about table bloat.

REINDEX command does not care about table bloat.

I appreciate the fast reply but I don't think this is true. If you read the link I shared in my original post it includes this section:

An index has become “bloated”, that is it contains many empty or nearly-empty pages. This can occur with B-tree indexes in PostgreSQL under certain uncommon access patterns. REINDEX provides a way to reduce the space consumption of the index by writing a new version of the index without the dead pages.

The REINDEX command can be used to rebuild tables, indexes, and databases concurrently in PG12.

As far as I can tell what continues to make pg_squeeze unique and different is that it uses pg background workers to schedule execution runs and works with older versions of Postgres. I'd just like to confirm my thoughts. Thanks.

When I said that REINDEX does not care about table bloat, I meant empty space in the table. The REINDEX TABLE command rebuilds all indexes on given table, but does not rebuild the table itself.

Yep, there is both index and table file system space used. REINDEX only addresses the index ones. You can validate this by seeing the table file sizes before doing the REINDEX and see that the table file sizes didn't change. Use pg_relation_filepath() to do this for both table name and index name(s).

Thanks guys. These are great notes on the differences. I would suggest that it's worth adding these details to the README. I can take a go at it unless one of the maintainers/committers is interested to do it quicker.