jackc / tern

The SQL Fan's Migrator

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Question] pg_advisory_lock not supported by CockroachDB

jvanalstine opened this issue · comments

Greetings,

This is my favorite migration tool. However, I just started a new project with CockroachDB, and on migration I get a warning

Error initializing migrator:
  ERROR: unknown function: pg_advisory_lock() (SQLSTATE 42883)

It looks like there are a couple places where the function is used in migrate.go. Is it your opinion that this lock function is critical to the library? It seems like cockroach always uses SERIALIZABLE isolation.

The advisory lock is to protect against multiple migrations being run at the same time. tern would run fine without it, but concurrently running migrations would have unforeseen consequences.

#31 is somewhat related. The solution I proposed there was to allow user provided SQL for lock and unlock. With that in place you could use some other means of locking such as creating a table and checking its existence.

The solution I proposed there was to allow user provided SQL for lock and unlock.

I am new to cockroach, but I believe they will not support pg_advisory_lock anytime soon. So this solution would likely work for roach. From what I can tell, there are no locks in cockroach. Also, I am not able to find any recommendations with respect to concurrently applying schema migrations. It looks like it runs raft and is CP, so I think they can just be applied concurrently without much worry.

Tern was mentioned here btw

Also, I am not able to find any recommendations with respect to concurrently applying schema migrations. It looks like it runs raft and is CP, so I think they can just be applied concurrently without much worry.

What I mean is that two migration processes are run at the same time. You never would want that to happen. e.g. migration 1 is running twice at the same time. Now it is extremely likely that one would fail because adding or dropping a table or column can only be done once. But if it was a data migration the user might not be so lucky.

Practically speaking though, it is a very unlikely case that is being protected against. An optional flag to disable locking might be reasonable.

What I mean is that two migration processes are run at the same time. You never would want that to happen. e.g. migration 1 is running twice at the same time. Now it is extremely likely that one would fail because adding or dropping a table or column can only be done once. But if it was a data migration the user might not be so lucky.

Makes sense.

An optional flag to disable locking might be reasonable.

I think a feature flag here makes a lot of sense. Do you want to take a wack at it or should I?

It seems like there ought to be a flag called something like --no-lock that cases this bit of code to be skipped and replaced with a regular transaction.

The Migrator could be lock aware in the same way that it is transaction aware. That is, change the MigratorOptions struct to

type MigratorOptions struct {
	// DisableLock causes the Migrator to run migrations without pg_advisory_lock()
	DisableLock bool
	// DisableTx causes the Migrator not to run migrations in a transaction.
	DisableTx bool
	// MigratorFS is the interface used for collecting the migrations.
	MigratorFS MigratorFS
}

And then change MigrateTo to

func (m *Migrator) MigrateTo(ctx context.Context, targetVersion int32) (err error) {

	if !m.options.DisableLock {
		err = acquireAdvisoryLock(ctx, m.conn)
		if err != nil {
			return err
		}
		defer func() {
			unlockErr := releaseAdvisoryLock(ctx, m.conn)
			if err == nil && unlockErr != nil {
				err = unlockErr
			}
		}()
	}

I added the flag here. Although, there is still a problem with multiple transactions trying to run on the same connection. It works if I set disable-tx = true in the ini conf file.

ERROR: current transaction is aborted, commands ignored until end of transaction block (SQLSTATE 25P02)

Example Tern conf

[database]
# host is required (network host or path to Unix domain socket)
host = localhost
port = 9000
# database is required
database = platform
# user defaults to OS user
user = platform_admin
password =
disable-lock = true
# version_table = public.schema_version
#
# sslmode generally matches the behavior described in:
# http://www.postgresql.org/docs/9.4/static/libpq-ssl.html#LIBPQ-SSL-PROTECTION
#
# There are only two modes that most users should use:
# prefer - on trusted networks where security is not required
# verify-full - require SSL connection
# sslmode = prefer
#
# sslrootcert is generally used with sslmode=verify-full
# sslrootcert = /path/to/root/ca

# Proxy the above database connection via SSH
# [ssh-tunnel]
# host =
# port = 22
# user defaults to OS user
# user =
# password is not required if using SSH agent authentication
# password =

[data]
# Any fields in the data section are available in migration templates
# prefix = foo

If you'd like a roach stack to play with

base-infra.yml:

version: "3.9"
services:
  roach-0:
    container_name: roach-0
    image: cockroachdb/cockroach:latest
    volumes:
      - cockroach_data_0:/cockroach/cockroach-data
    command: start --insecure --locality=datacenter=roach_net --join=roach-0,roach-1,roach-2
    ports:
      - "9000:26257"
      - "8080:8080"
    networks:
      - roach_net
    hostname: roach-0
  roach-1:
    container_name: roach-1
    image: cockroachdb/cockroach:latest
    volumes:
      - cockroach_data_1:/cockroach/cockroach-data
    command: start --insecure --locality=datacenter=roach_net --join=roach-0,roach-1,roach-2
    networks:
      - roach_net
    hostname: roach-1
  roach-2:
    container_name: roach-2
    image: cockroachdb/cockroach:latest
    volumes:
      - cockroach_data_2:/cockroach/cockroach-data
    command: start --insecure --locality=datacenter=roach_net --join=roach-0,roach-1,roach-2
    networks:
      - roach_net
    hostname: roach-2
volumes:
  cockroach_data_0:
  cockroach_data_1:
  cockroach_data_2:

networks:
  roach_net:
    driver: bridge

immediately after running docker-compose -f base-infra.yml up, run docker exec -it roach-0 /cockroach/cockroach init --insecure

That error indicates that transactions are still being used somewhere and an error occurs inside it.

Yes I figured that out thank you. Another thing that could be done is using the strategy that golang-migrate uses. If the url has postgresql:// as the protocol, then an advisory lock is applied like in this repo. If cockroach:// is the protocol, then a lock table is created

  schema_name |    table_name     | type  |     owner      | estimated_row_count | locality
--------------+-------------------+-------+----------------+---------------------+-----------
  public      | schema_lock       | table | platform_admin |                   0 | NULL
  public      | schema_migrations | table | platform_admin |                   1 | NULL

Yeah, that might work. Though I'm a little reluctant to use lock tables. The nice thing about advisory locks is they automatically get cleaned up if the migration connection dies unexpectedly. Whereas the strategy golang-migrate uses can require manual cleanup in an error case. See https://github.com/golang-migrate/migrate/blob/aef2747761d2a59d87e36fa942d9ece5339a2c41/database/cockroachdb/cockroachdb.go#L199.

I'd love to move from migrate to tern but hit this too. Is the plan to wait for CockroachDB to make a stubbed out non-functional advisory lock implementation? cockroachdb/cockroach#13546

@dghubble

The only potential solutions that have come up on the tern side is an option to disable or customize the locking logic. I would lean to customizing the lock logic as that would also allow disabling it if desiring.

But I'm not currently working on this and I don't know anyone who is.

Would you be open to a change along these lines? Any idea what kind of customisation would you want to allow?

Would allowing users to opt into the go migrate style be acceptable?

@tmcnicol Which approach? Several have been mentioned on this thread.

Sorry I was referring to the go-migrate option, of a lock table that people could opt into. Was thinking that we could get around the unlocking when the connection unexpectedly dies by using a heartbeat mechanism.

@tmcnicol Yes, that seems reasonable.

@jackc , I'm wondering if there are any in progress changes that support the opt-in lock table support that @tmcnicol mentioned in his comment above?

@ryanluu12345 Not that I know of.