Schema migration tool
krasserm opened this issue · comments
With the new plugin API in akka-persistence 2.4, the implementation of CassandraJournal
and the underlying schema can be strongly simplified (for example, we can remove headers and markers, ...).
Although it would be possible to stay backwards compatible with the existing schema, it would make working on #48 and the maintenance of the plugin unnecessarily complex. Hence, a migration tool seems to be a better solution than backwards compatibility.
You've seen https://github.com/comeara/pillar ?
Looks good from my point of view.
I use pillar on my project, but is not well maintained. I have a fork here https://github.com/smr-co-uk/pillar/tree/patch with merged pull requests, a couple of improvements and improved documentation.
@PeterLappo thanks for sharing!
I've been getting some questions from people trying to migrate their systems from 0.3 to 0.4. As far as I can tell, the schema changes are:
- Rename the
processor_id
column topersistence_id
- Drop the
marker
(clustering key) column. - Add the static
used
column.
However 0.3 creates tables with the compact storage
option. This means, amongst other things, that marker
column can't be dropped, nor can the used
column be added.
This leads me to ask:
- Has anyone ever actually tried migrating a table? (A large one?)
- How is this tool supposed to work?
Right now it looks like there's no simple migration path for people with data, and that any eventual path will involve a stop-the-world process during which a new table is created and the data copied into it.
Have I missed something?
@asnare copying data into the new schema can be made parallel to a running old application. When the initial data migration is finished, the old application needs to be stopped and the remaining small fraction of old data needs to be migrated in a second step. This should keep the downtime at a minimum.
I've removed old data so far. I plan to write a small spark job to migrate old data in the future.
Another schema change I would like to do at some point is to not store the PersistentRepr
as a serialized blob, but add additional columns and only only store the event as a blob. I think that can be done in a backwards compatible way, but wanted to mention it here also.
@patriknw would be a big improvement.
Continued here: akka/akka-persistence-cassandra#11