Meghdeep / InsightCore

Workers for Insight v2 rewrite.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

InsightCore

InsightCore is a complete rewrite of the popular Insight Discord bot for EVE online. Insight became difficult to maintain, develop, and test. Insight code is difficult for collaborators to understand for making contributions.

Core is built using Celery task queues and aims to fix problems faced by the original project.

This project contains numerous improvements and fixes to problems faced by the original bot:

Component / Category InsightCore Insight Discord bot
Bootstrapping None. Data is queried and cached from ESI as needed and isn't stored in a relational database. Stream configuration is stored in JSON and loaded into Redis. Bootstrapping required for initial SDE dump on fresh DB (45 minutes+) and for feed load on every startup (1-25 minutes). Stream confiig is loaded into SQLAlchemy ORM objects which is slow.
Steam / feed config One stream model fits all. The stream type (entity, radar, proximity) is abstract and defined by the configuration of the model. The model doesn't define the stream type.

Stream filtering supports including / excluding all attributes from mails by default.
Model defines stream type increasing development complexity to add features. Support for custom / new feed types is developmentally costly.
Configuration of streams Users submit the stream model JSON to a web api to update feeds.

Users can configure streams through a web UI (coming soon!) or use their own scripts to modify and submit stream config JSON directly to the web API.
Users can easily make their own custom scripts that auto update proximity feeds based on their dynamically changing location to show nearby hostiles for example.
Feed config is managed through Discord text prompts.
Every option requires code for Discord command prompts, database inserts / updates, and lookups.

Development of feed config changes is costly.

Users cannot easily make scripts to update their feeds.
Scaling Tasks are stateless and InsightCore instances can be scaled unrestricted across multiple hosts and CPUs.

Instance tasks workers on a node use multiprocessing fork so workers are not limited by the Python GIL.
No scaling. Only one Insight instance can run at a time. Insight is limited to one CPU core by Python Global Interpreter Lock.
Platform agnostic Built primarily as a Celery message queuing service that filters EVE Online mails.

Targets (Discord, Slack, etc) are simple integrations to the existing platform and are easily added.
Built completely around Discord with the discord.py bot library.

Cannot support anything but Discord as it's built around discord.py.
Features Only KM filtering and posting. No extra or additional features / commands.

Core is meant to be as simple as possible to do one task extremely well without excessive features.
Excessive features and commands were added making the project scope difficult to maintain and test.
Collaboration Core is well documented and compartmentalized so components are easy for collaborators to understand and maintain. Code is not well documented and is hard to follow. Community contribution is extremely difficult.
Code Quality Nearly 1.5K lines of code across 38 files. Code is as simple as possible divided into easy to understand linear stages without excessive dependencies.

When Core is complete the lines of code shouldn't be much more than it is currently.
Nearly 15K lines of code across 248 files. Difficult to maintain, test, and debug. Project size is difficult for new contributors.
Database / Storage Core only stores stream config in MongoDB. A historical record of mails, ESI data, etc is never archived or stored on a database as this data is stored in Redis / cache until it expires (minutes to days).

Storage usage will likely never grow more than 100MB for a deployment after years of growth.
All ESI data and KM data is stored in a SQLite or Postgres database. Some features (local scan, search) query the database for past data.

Database size grows over time. The current public bot consumes 30GB after years of running.
Cost Core can be easily integrated with auto scaling services such as AWS ECS or EC2 auto scaling groups to scale out based on load activity. Resources must be scaled up if more compute / memory or database capacity is needed often requiring a lengthy bootstrapping process and restart.

Provisioned compute resources are always running and billed regardless of load.

Tasks

InsightCore uses celery tasks to parse mails, query ESI, and post messages. A task is a single function that consumes work from a RabbitMQ message queue.

Tasks pass completed work and requests to other tasks once a work stage is complete. Data between tasks is passed as JSON messages.

Core has a single linear pipeline task for receiving mails, parsing data, loading data from ESI, filtering mails against active streams, and posting those mails to targets (Discord, Slack). In addition, there are asynchronous standalone tasks that handle stream creation, updates, and deletions.

The Pipeline Stages

Tasks in the pipeline stages, explanation, and code links. This pipeline is linear and tasks are ordered by their processing order.

Stage / Task Name & Code Link Incoming data from previous stages Outgoing data to next stages Notes
GetMailRedisQ Unparsed JSON from RedisQ to ProcessMailEnqueueESICalls

Unparsed JSON from RedisQ to ProcessMailLoadFromESI
Runs from celery beat scheduled task. Receives JSON from RedisQ and sends to next stages.

Uses Redis locking to ensure only one worker can access RedisQ API at a time across all InsightCore nodes.
ProcessMailEnqueueESICalls Unparsed JSON from GetMailRedisQ Get all EVE IDs that will be missing names and other data from JSON mail and enqueue calls to ESI to resolve the data.
ProcessMailLoadFromESI Unparsed JSON from GetMailRedisQ Mail model as JSON data with all names and data from ESI loaded to EnqueueMailToStreams. Continuously queries Redis to see if data is resolved from the ProcessMailEnqueueESICalls stage. Generates a mail model with updated ESI data.

If data has not been resolved this stage waits and retries until all data is loaded and then passes the mail to EnqueueMailToStreams stage.
EnqueueMailToStreams Completed Mail model as JSON from ProcessMailLoadFromESI Mail and Stream model to StreamFiltersStage1. Queries MongoDB for all running streams and for each stream submit to the StreamFiltersStage1 queue a task consisting of the mail and stream config JSON.
StreamFiltersStage1 Mail and Stream model as JSON from EnqueueMailToStreams. Visual model with nested mail and stream models as JSON to StreamFiltersStage2. Runs filters for a single stream against the mail. All filters in this stage do not require additional ESI calls.

If all filters pass then a visual model is generated and is passed to StreamFiltersStage2 as JSON. If a single filter fails then the task returns without passing the data to StreamFiltersStage2.
StreamFiltersStage2 Visual model from StreamFiltersStage1. Visual model to target platform task (Slack webhook, Discord webhook, etc.) Runs filters for a single stream against the mail. Filters in this stage require ESI calls (system distance from filter to mail system, etc.).

This stage queues up ESI calls and retries the filter in a loop until data is resolved.

If all filters pass then the visual model is passed to the target platform task (Slack, Discord, etc.). If a single filter fails then the task returns without passing the data to the next stage.
PostDiscord Visual model from StreamFiltersStage2. Converts the visual to a JSON payload for Discord embed webhooks and submits the data to the Discord API.

Standalone tasks

These tasks do not interact with the core pipeline stages.

Stage / Task Name & Link Notes
CreateModifyStream Takes incoming stream JSON data, validates the stream model, and updates the stream data in MongoDB.

About

Workers for Insight v2 rewrite.

License:GNU General Public License v3.0


Languages

Language:Python 100.0%