sqlalchemy / alembic

A database migrations tool for SQLAlchemy.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Alembic always detects changes on indexes that use to_tsvector()

caumons opened this issue · comments

Since I upgraded to Alembic 1.13.1, SQLAlchemy 2.0.25, Flask-SQLAlchemy 3.1.1, Flask 3.0.0 and Python 3.12, everytime I run flask db migrate Alembic detects unexisting changes on indexes that use to_tsvector() for full text search using PostgreSQL 16. It keeps creating migrations to drop and recreate those unchanged indexes again and again.

The Alembic output is:

INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.ddl.postgresql] Detected sequence named 'post_reading_id_seq' as owned by integer column 'post_reading(id)', assuming SERIAL and omitting
INFO  [alembic.ddl.postgresql] Detected sequence named 'chapter_comment_reaction_id_seq' as owned by integer column 'chapter_comment_reaction(id)', assuming SERIAL and omitting
INFO  [alembic.ddl.postgresql] Detected sequence named 'page_id_seq' as owned by integer column 'page(id)', assuming SERIAL and omitting
INFO  [alembic.ddl.postgresql] Detected sequence named 'book_reservation_id_seq' as owned by integer column 'book_reservation(id)', assuming SERIAL and omitting
INFO  [alembic.ddl.postgresql] Detected sequence named 'category_search_id_seq' as owned by integer column 'category_search(id)', assuming SERIAL and omitting
INFO  [alembic.ddl.postgresql] Detected sequence named 'chapter_reading_id_seq' as owned by integer column 'chapter_reading(id)', assuming SERIAL and omitting
INFO  [alembic.ddl.postgresql] Detected sequence named 'post_comment_reaction_id_seq' as owned by integer column 'post_comment_reaction(id)', assuming SERIAL and omitting
INFO  [alembic.ddl.postgresql] Detected sequence named 'alert_id_seq' as owned by integer column 'alert(id)', assuming SERIAL and omitting
INFO  [alembic.ddl.postgresql] Detected sequence named 'contact_message_id_seq' as owned by integer column 'contact_message(id)', assuming SERIAL and omitting
INFO  [alembic.ddl.postgresql] Detected sequence named 'chapter_reaction_id_seq' as owned by integer column 'chapter_reaction(id)', assuming SERIAL and omitting
INFO  [alembic.ddl.postgresql] Detected sequence named 'broadcast_message_id_seq' as owned by integer column 'broadcast_message(id)', assuming SERIAL and omitting
INFO  [alembic.ddl.postgresql] Detected sequence named 'snippet_id_seq' as owned by integer column 'snippet(id)', assuming SERIAL and omitting
INFO  [alembic.ddl.postgresql] Detected sequence named 'text_search_id_seq' as owned by integer column 'text_search(id)', assuming SERIAL and omitting
INFO  [alembic.ddl.postgresql] Detected sequence named 'post_reaction_id_seq' as owned by integer column 'post_reaction(id)', assuming SERIAL and omitting
INFO  [alembic.autogenerate.compare] Detected changed index 'ix_category_ts' on 'category': expression #1 "to_tsvector('spanish'::regconfig, (name::text || ' '::text) || hashtag::text)" to "to_tsvector('spanish', name || ' ' || hashtag)"
INFO  [alembic.autogenerate.compare] Detected changed index 'ix_chapter_ts' on 'chapter': expression #1 "to_tsvector('spanish'::regconfig, (((title::text || ' '::text) || teaser::text) || ' '::text) || raw_text)" to "to_tsvector('spanish', chapter.title || ' ' || teaser || ' ' || raw_text)"
INFO  [alembic.autogenerate.compare] Detected changed index 'ix_post_ts' on 'post': expression #1 "to_tsvector('spanish'::regconfig, (((title::text || ' '::text) || teaser::text) || ' '::text) || raw_text)" to "to_tsvector('spanish', post.title || ' ' || teaser || ' ' || raw_text)"
  Generating /app/migrations/versions/673f9d0115e2_.py ...  done

The affected models look as follows (simplified):

class Chapter(BasePublication):
    part_id = db.Column(db.Integer, db.ForeignKey('part.id', ondelete='SET NULL'), index=True)
    classification = db.relationship('Part', backref=db.backref('publications', lazy='dynamic'))

    categories = db.relationship('Category', secondary='chapter_category',
                                 backref=db.backref('chapters', lazy='dynamic'), lazy='dynamic')

    __ts_vector__ = sa.func.to_tsvector(
        sa.literal('spanish'), sa.text("chapter.title || ' ' || teaser || ' ' || raw_text"))

    __table_args__ = (
        db.Index('ix_chapter_ts', __ts_vector__, postgresql_using='gin'),
        db.UniqueConstraint('number', 'fragment', name='unique_chapter_number_fragment'),
    )
class Post(BasePublication):
    topic_id = db.Column(db.Integer, db.ForeignKey('topic.id', ondelete='SET NULL'), index=True)
    classification = db.relationship('Topic', backref=db.backref('publications', lazy='dynamic'))

    categories = db.relationship(
        'Category', secondary='post_category', backref=db.backref('posts', lazy='dynamic'), lazy='dynamic')

    __ts_vector__ = sa.func.to_tsvector(
        sa.literal('spanish'), sa.text("post.title || ' ' || teaser || ' ' || raw_text"))

    __table_args__ = (
        db.Index('ix_post_ts', __ts_vector__, postgresql_using='gin'),
        db.UniqueConstraint('number', 'fragment', name='unique_post_number_fragment'),
    )
class Category(BaseModel):
    name = db.Column(db.String(50), nullable=False, unique=True)
    hashtag = db.Column(db.String(51), nullable=False, unique=True)

    __ts_vector__ = sa.func.to_tsvector(
        sa.literal('spanish'), sa.text("name || ' ' || hashtag"))

    __table_args__ = (
        db.Index('ix_category_ts', __ts_vector__, postgresql_using='gin'),
    )

One of the multiple generated migrations looks like follows:

def upgrade():
    # ### commands auto generated by Alembic - please adjust! ###
    with op.batch_alter_table('category', schema=None) as batch_op:
        batch_op.drop_index('ix_category_ts', postgresql_using='gin')
        batch_op.create_index('ix_category_ts', [sa.text("to_tsvector('spanish', name || ' ' || hashtag)")], unique=False, postgresql_using='gin')

    with op.batch_alter_table('chapter', schema=None) as batch_op:
        batch_op.drop_index('ix_chapter_ts', postgresql_using='gin')
        batch_op.create_index('ix_chapter_ts', [sa.text("to_tsvector('spanish', chapter.title || ' ' || teaser || ' ' || raw_text)")], unique=False, postgresql_using='gin')

    with op.batch_alter_table('post', schema=None) as batch_op:
        batch_op.drop_index('ix_post_ts', postgresql_using='gin')
        batch_op.create_index('ix_post_ts', [sa.text("to_tsvector('spanish', post.title || ' ' || teaser || ' ' || raw_text)")], unique=False, postgresql_using='gin')

    # ### end Alembic commands ###


def downgrade():
    # ### commands auto generated by Alembic - please adjust! ###
    with op.batch_alter_table('post', schema=None) as batch_op:
        batch_op.drop_index('ix_post_ts', postgresql_using='gin')
        batch_op.create_index('ix_post_ts', [sa.text("to_tsvector('spanish'::regconfig, (((title::text || ' '::text) || teaser::text) || ' '::text) || raw_text)")], unique=False, postgresql_using='gin')

    with op.batch_alter_table('chapter', schema=None) as batch_op:
        batch_op.drop_index('ix_chapter_ts', postgresql_using='gin')
        batch_op.create_index('ix_chapter_ts', [sa.text("to_tsvector('spanish'::regconfig, (((title::text || ' '::text) || teaser::text) || ' '::text) || raw_text)")], unique=False, postgresql_using='gin')

    with op.batch_alter_table('category', schema=None) as batch_op:
        batch_op.drop_index('ix_category_ts', postgresql_using='gin')
        batch_op.create_index('ix_category_ts', [sa.text("to_tsvector('spanish'::regconfig, (name::text || ' '::text) || hashtag::text)")], unique=False, postgresql_using='gin')

    # ### end Alembic commands ###

When I was using Alembic 1.7.6, SQLAlchemy 1.4.31, Flask-SQLAlchemy 2.5.1, Flask 2.0.3 and Python 3.10 this didn't happen as there was no support for this kind of indexes at that moment.

I hope this can get resolved soon.

Best regards.

Hi,

Index expression compare was added to postgresql, and this seems to be an unsupported case.
For not you can ignore these indexes using name filter or object filters of alembic

Hi @CaselIT, thank you very much for your fast response!

I've just added the code of one of the generated migrations so you can see how they look like.

Yes, I'll ignore these 3 indexes while this case is unsupported.

Best regards.

@CaselIT should we have some doc notes about this specifically for indexes on PostgreSQL? since this is going to go on for a long time

Well we kinda said "next issue we get we place this under a config flag" so we could do that.

I think the regexp we use to remove the type cast used by pg is not detecting it in that complex case, so that should be fixed, but I guess it will not be the last issue we get

I'm running into this as well, but in my case it seems to related to Alembic not matching PSQL's cast expressions.
CAST(column AS TEXT) == column::text Yet Alembic sees them as different.

This migration is created every single time.

def upgrade() -> None:
    op.drop_index('idx_user_email_search_idx', table_name='user', postgresql_using='gin')
    op.create_index('idx_user_email_search_idx', 'user', [sa.text("to_tsvector(CAST('simple' AS REGCONFIG), CAST(coalesce(email, '') AS TEXT))")], unique=False, postgresql_using='gin')
    

def downgrade() -> None:
    op.drop_index('idx_user_email_search_idx', table_name='user', postgresql_using='gin')
    op.create_index('idx_user_email_search_idx', 'user', [sa.text("to_tsvector('simple'::regconfig, COALESCE(email, ''::character varying)::text)")], unique=False, postgresql_using='gin')
    

See also #1485

Shame there does not seem to be the equivalent of "NOT VALID" for indexes, otherwise we could have made pg compare the expression directly.