Alembic always detects changes on indexes that use to_tsvector()

Question

Alembic always detects changes on indexes that use to_tsvector()

caumons opened this issue 6 months ago · comments

Since I upgraded to Alembic 1.13.1, SQLAlchemy 2.0.25, Flask-SQLAlchemy 3.1.1, Flask 3.0.0 and Python 3.12, everytime I run flask db migrate Alembic detects unexisting changes on indexes that use to_tsvector() for full text search using PostgreSQL 16. It keeps creating migrations to drop and recreate those unchanged indexes again and again.

The Alembic output is:

INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.ddl.postgresql] Detected sequence named 'post_reading_id_seq' as owned by integer column 'post_reading(id)', assuming SERIAL and omitting
INFO  [alembic.ddl.postgresql] Detected sequence named 'chapter_comment_reaction_id_seq' as owned by integer column 'chapter_comment_reaction(id)', assuming SERIAL and omitting
INFO  [alembic.ddl.postgresql] Detected sequence named 'page_id_seq' as owned by integer column 'page(id)', assuming SERIAL and omitting
INFO  [alembic.ddl.postgresql] Detected sequence named 'book_reservation_id_seq' as owned by integer column 'book_reservation(id)', assuming SERIAL and omitting
INFO  [alembic.ddl.postgresql] Detected sequence named 'category_search_id_seq' as owned by integer column 'category_search(id)', assuming SERIAL and omitting
INFO  [alembic.ddl.postgresql] Detected sequence named 'chapter_reading_id_seq' as owned by integer column 'chapter_reading(id)', assuming SERIAL and omitting
INFO  [alembic.ddl.postgresql] Detected sequence named 'post_comment_reaction_id_seq' as owned by integer column 'post_comment_reaction(id)', assuming SERIAL and omitting
INFO  [alembic.ddl.postgresql] Detected sequence named 'alert_id_seq' as owned by integer column 'alert(id)', assuming SERIAL and omitting
INFO  [alembic.ddl.postgresql] Detected sequence named 'contact_message_id_seq' as owned by integer column 'contact_message(id)', assuming SERIAL and omitting
INFO  [alembic.ddl.postgresql] Detected sequence named 'chapter_reaction_id_seq' as owned by integer column 'chapter_reaction(id)', assuming SERIAL and omitting
INFO  [alembic.ddl.postgresql] Detected sequence named 'broadcast_message_id_seq' as owned by integer column 'broadcast_message(id)', assuming SERIAL and omitting
INFO  [alembic.ddl.postgresql] Detected sequence named 'snippet_id_seq' as owned by integer column 'snippet(id)', assuming SERIAL and omitting
INFO  [alembic.ddl.postgresql] Detected sequence named 'text_search_id_seq' as owned by integer column 'text_search(id)', assuming SERIAL and omitting
INFO  [alembic.ddl.postgresql] Detected sequence named 'post_reaction_id_seq' as owned by integer column 'post_reaction(id)', assuming SERIAL and omitting
INFO  [alembic.autogenerate.compare] Detected changed index 'ix_category_ts' on 'category': expression #1 "to_tsvector('spanish'::regconfig, (name::text || ' '::text) || hashtag::text)" to "to_tsvector('spanish', name || ' ' || hashtag)"
INFO  [alembic.autogenerate.compare] Detected changed index 'ix_chapter_ts' on 'chapter': expression #1 "to_tsvector('spanish'::regconfig, (((title::text || ' '::text) || teaser::text) || ' '::text) || raw_text)" to "to_tsvector('spanish', chapter.title || ' ' || teaser || ' ' || raw_text)"
INFO  [alembic.autogenerate.compare] Detected changed index 'ix_post_ts' on 'post': expression #1 "to_tsvector('spanish'::regconfig, (((title::text || ' '::text) || teaser::text) || ' '::text) || raw_text)" to "to_tsvector('spanish', post.title || ' ' || teaser || ' ' || raw_text)"
  Generating /app/migrations/versions/673f9d0115e2_.py ...  done

The affected models look as follows (simplified):

class Chapter(BasePublication):
    part_id = db.Column(db.Integer, db.ForeignKey('part.id', ondelete='SET NULL'), index=True)
    classification = db.relationship('Part', backref=db.backref('publications', lazy='dynamic'))

    categories = db.relationship('Category', secondary='chapter_category',
                                 backref=db.backref('chapters', lazy='dynamic'), lazy='dynamic')

    __ts_vector__ = sa.func.to_tsvector(
        sa.literal('spanish'), sa.text("chapter.title || ' ' || teaser || ' ' || raw_text"))

    __table_args__ = (
        db.Index('ix_chapter_ts', __ts_vector__, postgresql_using='gin'),
        db.UniqueConstraint('number', 'fragment', name='unique_chapter_number_fragment'),
    )

class Post(BasePublication):
    topic_id = db.Column(db.Integer, db.ForeignKey('topic.id', ondelete='SET NULL'), index=True)
    classification = db.relationship('Topic', backref=db.backref('publications', lazy='dynamic'))

    categories = db.relationship(
        'Category', secondary='post_category', backref=db.backref('posts', lazy='dynamic'), lazy='dynamic')

    __ts_vector__ = sa.func.to_tsvector(
        sa.literal('spanish'), sa.text("post.title || ' ' || teaser || ' ' || raw_text"))

    __table_args__ = (
        db.Index('ix_post_ts', __ts_vector__, postgresql_using='gin'),
        db.UniqueConstraint('number', 'fragment', name='unique_post_number_fragment'),
    )

class Category(BaseModel):
    name = db.Column(db.String(50), nullable=False, unique=True)
    hashtag = db.Column(db.String(51), nullable=False, unique=True)

    __ts_vector__ = sa.func.to_tsvector(
        sa.literal('spanish'), sa.text("name || ' ' || hashtag"))

    __table_args__ = (
        db.Index('ix_category_ts', __ts_vector__, postgresql_using='gin'),
    )

One of the multiple generated migrations looks like follows:

def upgrade():
    # ### commands auto generated by Alembic - please adjust! ###
    with op.batch_alter_table('category', schema=None) as batch_op:
        batch_op.drop_index('ix_category_ts', postgresql_using='gin')
        batch_op.create_index('ix_category_ts', [sa.text("to_tsvector('spanish', name || ' ' || hashtag)")], unique=False, postgresql_using='gin')

    with op.batch_alter_table('chapter', schema=None) as batch_op:
        batch_op.drop_index('ix_chapter_ts', postgresql_using='gin')
        batch_op.create_index('ix_chapter_ts', [sa.text("to_tsvector('spanish', chapter.title || ' ' || teaser || ' ' || raw_text)")], unique=False, postgresql_using='gin')

    with op.batch_alter_table('post', schema=None) as batch_op:
        batch_op.drop_index('ix_post_ts', postgresql_using='gin')
        batch_op.create_index('ix_post_ts', [sa.text("to_tsvector('spanish', post.title || ' ' || teaser || ' ' || raw_text)")], unique=False, postgresql_using='gin')

    # ### end Alembic commands ###


def downgrade():
    # ### commands auto generated by Alembic - please adjust! ###
    with op.batch_alter_table('post', schema=None) as batch_op:
        batch_op.drop_index('ix_post_ts', postgresql_using='gin')
        batch_op.create_index('ix_post_ts', [sa.text("to_tsvector('spanish'::regconfig, (((title::text || ' '::text) || teaser::text) || ' '::text) || raw_text)")], unique=False, postgresql_using='gin')

    with op.batch_alter_table('chapter', schema=None) as batch_op:
        batch_op.drop_index('ix_chapter_ts', postgresql_using='gin')
        batch_op.create_index('ix_chapter_ts', [sa.text("to_tsvector('spanish'::regconfig, (((title::text || ' '::text) || teaser::text) || ' '::text) || raw_text)")], unique=False, postgresql_using='gin')

    with op.batch_alter_table('category', schema=None) as batch_op:
        batch_op.drop_index('ix_category_ts', postgresql_using='gin')
        batch_op.create_index('ix_category_ts', [sa.text("to_tsvector('spanish'::regconfig, (name::text || ' '::text) || hashtag::text)")], unique=False, postgresql_using='gin')

    # ### end Alembic commands ###

When I was using Alembic 1.7.6, SQLAlchemy 1.4.31, Flask-SQLAlchemy 2.5.1, Flask 2.0.3 and Python 3.10 this didn't happen as there was no support for this kind of indexes at that moment.

I hope this can get resolved soon.

Best regards.

Federico Caselli · Answer 1 · Tue Jan 09 2024 22:50:16 GMT+0800 (China Standard Time)

Hi,

Index expression compare was added to postgresql, and this seems to be an unsupported case.
For not you can ignore these indexes using name filter or object filters of alembic

Enric Caumons · Answer 2 · Tue Jan 09 2024 22:54:06 GMT+0800 (China Standard Time)

Hi @CaselIT, thank you very much for your fast response!

I've just added the code of one of the generated migrations so you can see how they look like.

Yes, I'll ignore these 3 indexes while this case is unsupported.

Best regards.

Michael Bayer · Answer 3 · Tue Jan 09 2024 23:31:17 GMT+0800 (China Standard Time)

@CaselIT should we have some doc notes about this specifically for indexes on PostgreSQL? since this is going to go on for a long time

Federico Caselli · Answer 4 · Wed Jan 10 2024 01:15:14 GMT+0800 (China Standard Time)

Well we kinda said "next issue we get we place this under a config flag" so we could do that.

I think the regexp we use to remove the type cast used by pg is not detecting it in that complex case, so that should be fixed, but I guess it will not be the last issue we get

mvanderlee · Answer 5 · Mon Feb 19 2024 17:43:36 GMT+0800 (China Standard Time)

I'm running into this as well, but in my case it seems to related to Alembic not matching PSQL's cast expressions.
CAST(column AS TEXT) == column::text Yet Alembic sees them as different.

This migration is created every single time.

def upgrade() -> None:
    op.drop_index('idx_user_email_search_idx', table_name='user', postgresql_using='gin')
    op.create_index('idx_user_email_search_idx', 'user', [sa.text("to_tsvector(CAST('simple' AS REGCONFIG), CAST(coalesce(email, '') AS TEXT))")], unique=False, postgresql_using='gin')
    

def downgrade() -> None:
    op.drop_index('idx_user_email_search_idx', table_name='user', postgresql_using='gin')
    op.create_index('idx_user_email_search_idx', 'user', [sa.text("to_tsvector('simple'::regconfig, COALESCE(email, ''::character varying)::text)")], unique=False, postgresql_using='gin')

Federico Caselli · Answer 6 · Mon Jun 03 2024 17:04:40 GMT+0800 (China Standard Time)