Lifecycle hook to modify / upgrade data.

Question

Lifecycle hook to modify / upgrade data.

niemyjski opened this issue 4 months ago · comments

It would be nice to have a way to hook into the persisted store to upgrade data when it's first read from the store / synced across tabs. Providing a new serializer really doesn't work for this use case.

For example, let's say I have persisted('test', [defaults]).. And I upgrade or add additional defaults in my code. I want to ensure the data always contains the default data; I'd expect to be able to pass a func to apply on top of the parsed data. But I only need to migrate this once. It doesn't make sense to do this more than once via a subscribe.

I guess one alternative is to just do the action immediately, but this doesn't help if the store is modified across tabs. One could also version the key and upgrade the old but doesn't really scale unless you scan a ton of keys. What are your thoughts?

Joshua Nussbaum · Answer 1 · Wed Mar 06 2024 21:57:19 GMT+0800 (China Standard Time)

Hi @niemyjski,

Yes, migration is something that would be good to support, provided their is a clean way.

I agree that syncing across tabs complicates things, as you can have different versions of the code running in each tab.

I believe this means versioning the key name would be best.

One way is to a version number as an option and support migration functions:

// default is version 1
persisted('myKey', initial, { version: 1 })

Then later, if you want to create a new version:

// default is version 1
persisted('myKey', initial, {
  version: 2,
  migrate: {
    2: (state) => {
      // add state
      state.newField = 123
  
      // remove state
      delete state.oldField
  
      // update state
      state.otherField = computeNewValue(state)
  
      return state
   }
  }
})

But the version metadata would need to be stored somewhere, not sure if it's safe to mutate the existing local storage data, for example by adding a new field __version. Or a maybe version metadata could be a stored under a separate local storage key?

Joshua Nussbaum · Answer 2 · Fri Mar 08 2024 00:50:30 GMT+0800 (China Standard Time)

Also, this would take lot of work to do, and it's not something I personally need right now. So the work would probably need to be sponsored.

José Pablo Ramírez Vargas · Answer 3 · Fri Mar 08 2024 17:17:44 GMT+0800 (China Standard Time)

Hello, @joshnuss. This is the same I asked some time ago: I called it a validator function, exactly for upgrade scenarios. It would be a callback to the function specified by the consumer of the package. The end "initial result" would be the return of the function.

If you approve, I can make this. We would proceed to discuss the design, then I'll code and PR. Let me know. Thanks.

Joshua Nussbaum · Answer 4 · Fri Mar 08 2024 18:08:01 GMT+0800 (China Standard Time)

Thanks Jose!

Can you share some examples of API your thinking about, just so we're on the same page.

I'm all for this provided it's simple.

Blake Niemyjski · Answer 5 · Fri Mar 08 2024 21:55:46 GMT+0800 (China Standard Time)

I've been thinking on this for the past two days and I agree. I think we should just provide a simple function that gets the value and returns a value before the writable is returned. Then it can do anything, and it can be responsible for versioning and upgrading etc.. We keep this library really simple. I think only a post hook is needed because I feel like you should implement your own serializer if you need a pre hook. Only question I have is if we should provide some kind of context with options and storage value. I'd almost say no.

Blake Niemyjski · Answer 6 · Fri Mar 08 2024 21:59:57 GMT+0800 (China Standard Time)

Also thinking on key versioning. Perhaps if you had an alias list/pattern of old keys, one could fall back to that if the current key isn't available, and you could call this function with that (with one or more values matching key/key patterns) and it could be responsible for returning the correct value. This may need to be a pre serialization decision BUT I feel like it's out of scope of this issue and can be solved in other ways outside of this lib.

Blake Niemyjski · Answer 7 · Fri Mar 08 2024 22:01:56 GMT+0800 (China Standard Time)

This is how I'm going to be using this function we build (to ensure default filters are always present)

https://github.com/exceptionless/Exceptionless/blob/b5591a5d1fb10bbf9d10dd78f72568e12142893b/src/Exceptionless.Web/ClientApp/src/routes/(app)/issues/%2Bpage.svelte#L33-L35

Joshua Nussbaum · Answer 8 · Fri Mar 08 2024 22:43:13 GMT+0800 (China Standard Time)

I think we should just provide a simple function that gets the value and returns a value before the writable is returned

That makes sense, @webJose thoughts? Does this cover what you were thinking too?

So does API like this make sense?

persisted('myKey', initial, {
  preprocess(value) {
    // to add new default
    value.newField = ...
    
    // to modify
    value.existing = f(value.existing)
    
    // to remove a field
    delete value.deprecatedField

    // return value
    return value
  }
})

I think only a post hook is needed because I feel like you should implement your own serializer if you need a pre hook

Though serializer should be about format, ie returning a string.
I like the "orthoginalness" of have a preprocess/postprocess

For example, adding a version before writing:

persisted('myKey', initial, {
  postprocess(value) {
    value.version = 41

    return value
  },
  preprocess(value) {
    if (value.version == 40) {
      // do migration
    }
    return value
  }
})

Feel free to suggest different names and/or disagree of course 😅

José Pablo Ramírez Vargas · Answer 9 · Sat Mar 09 2024 03:53:31 GMT+0800 (China Standard Time)

Hello, everyone. I'll explain my original motivation back then and my idea.

The problem I envisioned was related to data changes through the course of time. Example: Value is an object like { optA: true }. So optA has 2 choices. Over time, the application grows and there are now more options to cover. So now optA is upgraded to string to use an enumeration of string values. The problem, you probably guessed, is existing users having the Boolean version. The solution: The persisted() function accepts a function as part of the options. This function, written by the consumer of the library (us developers), take the deserialized value as input, validates it in any way, shape or form needed (for example, upgrading the Boolean value of optA to the new string value), and returns the result. This result is immediately written to the store and is immediately returned to subscribers.

That's it. No concept of version key or anything. This is why I called it a "validation function". Perhaps a better name is a "normalizer function" because it is allowed to mutate the data.

Joshua Nussbaum · Answer 10 · Mon Mar 11 2024 00:18:12 GMT+0800 (China Standard Time)

@webJose the API I proposed above, doesn't require a version number. So it should handle the scenario you've outlined.

I think the name validator sounds a bit too specific, because the function can validate or mutate or replace.

Bertram Madsen · Answer 11 · Fri Apr 19 2024 03:13:02 GMT+0800 (China Standard Time)

Agree 100% that a migration function is not the way to go. Migration is fundamentally a different challenge than what this library is designed for, and is better left up to a proper migration layer.

With that said though a post-processing function would be awesome! (and it's actually currently something I abuse the serialization functionality to accomplish - specifically for migration). I do not however see what use a pre-processing functionality would be of? In my mind, serialization should always be the first step (as this is the steps we take to transform the data into something we can work with), and the only thing that should come before serialization should be some kind of formatting of the input to enable proper serialization, which in my mind would be a serialization step.

I also think that we should rethink the naming scheme a little bit. Both because I don't believe a pre-processing function is necessary (in which case we don't need the otherwise nicely "orthoginal" naming, and because post-processing (in my mind at least) could mean a lot of things besides step between serialization and data-loading.

If we can agree on something here, I would love to write up a proper PR with tests.

Joshua Nussbaum · Answer 12 · Fri Apr 19 2024 09:41:08 GMT+0800 (China Standard Time)

Thanks @bertmad3400!

I guess the naming needs work.

To clarify the meaning of that naming schema:

preprocess: The thing that happens when reading the data, right before it's returned to user. An ideal place to change the shape of data for migration purposes.
postprocess: What happens before writing the data. Ideal place to patch data.

Open to suggestions

Bertram Madsen · Answer 13 · Fri Apr 19 2024 16:44:40 GMT+0800 (China Standard Time)

So was working on this PR (using the names preRead and preWrite to be consistent with the error handler naming), but noticed something weird in the codebase. The store is set up such that if the storage event is fired, but the event.newValue is null, the store contents is set to null. I think this is a really bad idea for a couple of reasons:

It doesn't align with user expectations. Neither the documentation nor the type systems (which I have no clue why they don't complain) even indicate that the store that a potential user is depending on could suddenly change content type to null, which - I could imagine - could potentially break applications.
I do not see the use of this. First of, users of this library seems unlikely to interact directly with the local storage api (as they are using this library to avoid that) and as such this functionality seems like something that can only cause confusion. Secondly, I just don't see the reason for doing this instead of just setting the store to null.
If the intention with this was to solve a problem like #217, then I really think we should add a specific method for doing this.

Unless I missed something @joshnuss, I propose that I remove this functionality in the coming PR aimed at this issue, and then we can add in a more though-through and most importantly type-compliant version of it later if needed.

Edit: My proposed solution would make it so that the store just ignores events where the event.newValue is null

Joshua Nussbaum · Answer 14 · Tue Apr 23 2024 10:59:06 GMT+0800 (China Standard Time)

Thanks @bertmad3400

For the naming: what do you think about beforeRead and beforeWrite?

About the null check, could it be to check for undefined? It might be a type thing

Bertram Madsen · Answer 15 · Sat Apr 27 2024 01:58:52 GMT+0800 (China Standard Time)

Everything in here should be addressed by #250.