krasserm / akka-persistence-cassandra

A replicated Akka Persistence journal backed by Apache Cassandra

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Schema migration tool

krasserm opened this issue · comments

With the new plugin API in akka-persistence 2.4, the implementation of CassandraJournal and the underlying schema can be strongly simplified (for example, we can remove headers and markers, ...).

Although it would be possible to stay backwards compatible with the existing schema, it would make working on #48 and the maintenance of the plugin unnecessarily complex. Hence, a migration tool seems to be a better solution than backwards compatibility.

  • depends on #48
  • supersedes #54.

Happy to work on #48 and this issue if you want @krasserm

You've seen https://github.com/comeara/pillar ?
Looks good from my point of view.

@chbatey this would be fantastic. Thanks for offering help and looking forward to see PRs :-)

@matlockx thanks for the hint, will take a closer look soon.

commented

I use pillar on my project, but is not well maintained. I have a fork here https://github.com/smr-co-uk/pillar/tree/patch with merged pull requests, a couple of improvements and improved documentation.

@PeterLappo thanks for sharing!

I've been getting some questions from people trying to migrate their systems from 0.3 to 0.4. As far as I can tell, the schema changes are:

  • Rename the processor_id column to persistence_id
  • Drop the marker (clustering key) column.
  • Add the static used column.

However 0.3 creates tables with the compact storage option. This means, amongst other things, that marker column can't be dropped, nor can the used column be added.

This leads me to ask:

  1. Has anyone ever actually tried migrating a table? (A large one?)
  2. How is this tool supposed to work?

Right now it looks like there's no simple migration path for people with data, and that any eventual path will involve a stop-the-world process during which a new table is created and the data copied into it.

Have I missed something?

@asnare copying data into the new schema can be made parallel to a running old application. When the initial data migration is finished, the old application needs to be stopped and the remaining small fraction of old data needs to be migrated in a second step. This should keep the downtime at a minimum.

I've removed old data so far. I plan to write a small spark job to migrate old data in the future.

Another schema change I would like to do at some point is to not store the PersistentRepr as a serialized blob, but add additional columns and only only store the event as a blob. I think that can be done in a backwards compatible way, but wanted to mention it here also.

@patriknw would be a big improvement.