superduper-io / superduper

Superduper: Integrate AI models and machine learning workflows with your database to implement custom AI applications, without moving your data. Including streaming inference, scalable model hosting, training and vector search.

Home Page:https://superduper.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[QUEUES] Allow jobs created on `Component` creation to be passed onto downstream components

blythed opened this issue · comments

  1. Compute features
  2. Train PCA on finished computed features
  3. Compute dimension reduced features
class DimReduceModel(Model):
    trainer: ...
    upstream: ...

listener1 = Listener(features_model, ...)
dim_reduce_model = DimReduceModel(trainer=PCATrainer(), upstream=[listener1])
listener2 = Listener(dim_reduce_model, ...)

Explanation

  • The listener1 creates features
  • dim_reduce_model trains on those features
  • listener2 creates features after the training is complete
Screenshot 2024-07-18 at 9 21 18 PM

After discussion with @kartik4949 we may have 2 use-cases:

  1. Computations on existing data (1-time jobs)
  2. Computations on future data (triggered jobs)

One idea might be to add 2 parallel components:

  • Listener (triggered)
  • Map (1-time)

Solution:

  1. Component initialization
  2. Component activation
  3. Real time triggers

To extend the current superduper.base.event.Event class:

Add an event_type attribute with possible values: model_update, insert_data, model_apply, schedule_jobs.

For different event types, include necessary information in the data field, such as:

  • insert_data: data should include table and ids.
  • model_update: should include type_id, identifier, table, and ids.

Before the event is sent to the queue, it should first pass through the EventManager. The EventManager will identify the event type, check if the relevant components trigger jobs, and dispatch the event to the corresponding queue.

Discussion with @jieguangzhou
image