superduper-io / superduper

Superduper: Integrate AI models and machine learning workflows with your database to implement custom AI applications, without moving your data. Including streaming inference, scalable model hosting, training and vector search.

Home Page:https://superduper.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[LOGGING-IMPR] Improve quality and usability of logging

blythed opened this issue · comments

LOGGING

Usage of Logging Levels

  • ERROR: Information notifications when the system/application/data will inevitably encounter errors due to current actions. For example:
    • Database Connection Failed(CRITICAL LEVEL?
    • Application Load Failed
    • Data Query Failed
  • WARNING: Information notifications when the system/application/data may encounter errors due to current actions. For example:
    • Metadata update shows no errors but does not succeed.
    • Identifier is set improperly during component encoding
    • When using predict_in_db in the model, IDs are passed in, but empty data is retrieved.
  • INFO: Used to display general system application status with concise data information whenever possible. For example:
    • When displaying data IDs, show the count instead of a list.
    • In the predict_in_db method of the model, print the current model identifier in the predict_in_db method, select/ids, the amount of data predicted, and the number of final returned results.
    • When applying a component, display the component’s type_id, identifier, version, list of children, and the number of jobs generated.
  • DEBUG: Information printed during program or data debugging
    • Decode/Encode: Input and output data, specific key points of input and output during intermediate recursion.
    • Inserting and reading data: Relevant data amounts and details of key processed data.
    • Additional debug information at specific points encountered during bug fixing.

Logging Scope

All logs need to be categorized (can use loguru’s bind)

  • All Leaf bindings should be categorized using type_id + identifier.
  • For all other classes with limited instances, use the class name. DataBackend
  • Common method classes/Method should have unique names. DECODE/ENCODE
  • General methods should use the full module name along with the line number.

Example with Components

components

  • Component
    • INFO:
      • read/export
  • Listener:
    • INFO
      • Jobs related to creation/start
      • Cleanup
      • Creation of model output tables
    • Debug
      • Detailed process information during triggers
  • Model:
    • WARNING:
      • Inconsistencies in data volume during predict_in_db predictions, empty query results, and other data anomalies
      • Errors occurring in some data during predict_batches
    • INFO
      • Jobs related to creation/execution
      • Key summarized information in predict_in_db/validate_in_db
      • Auto_schema in the model output
    • DEBUG
      • Detailed job information
      • Detailed information in predict_in_db/validate_in_db
      • Link information during call for graph/eager mode construction
  • Vector Index:
    • INFO
      • Jobs related to creation/execution
      • Summary information for Vectors add/delete/query operations (including copy_vectors/delete_vectors from other areas)
    • DEBUG
      • Detailed information on returned data and scores during queries
  • Other Components:
    • Design INFO and DEBUG levels according to their specific characteristics

The above is an example for a module. For the following modules, the logging levels should be designed specifically according to the logging standards.

Queue/Jobs

Datalayer

DataBackend/MetadataStore/ArtifactStore

Encode/Decode

Other General Methods

OTHER

Configuration

Global and local logging levels can be configured through settings:

  • GLOBAL
  • Configurations based on bind granularity

Log Files

For now, use a single unified log file, but logging policies can be configured, such as size-based rolling, time-based rolling, etc.