[FR] Make RuleCollection Initialization Faster
eric-forte-elastic opened this issue · comments
Summary
One of the largest contributors to the time it takes to run unit tests is the rule loader. One part of this that takes significant time is the adding and validating rules in the RuleCollection's class initialization function.
This issue proposes that prior to any potential refactor to the rule loader, we make a minor update to the RuleCollection class to multi thread adding rules via the init. While this is a minor change it should provide noticeably faster load times, and thus faster unit tests.
Update 12/20/23
Upon further experimentation, we discovered that the simply multi-threading loading the rule files and/or the init of the RuleLoader can have some unintended consequences. While the unit test speed may increase based on configuration (see PR for more details), when one runs a basic instantiation of the RuleLoader, the loading time increases with the multi threading. Given this, it is expected that much of the execution time for loading the rules is I/O bound. As such, I would recommend closing this issue and deferring specific optimizations until we make more broad updates/refactoring to the RuleLoader class.
Test Script
import time
from detection_rules.rule_loader import RuleCollection
start_time = time.time()
rules = RuleCollection.default()
end_time = time.time()
execution_time = end_time - start_time
print(f"Execution time: {execution_time} seconds")
Timing Details
Base execution time, no multi-threading.
detection-rules on multi_thread_rule_loader [?] is v0.1.0 via v3.8.18 (venv) on eric.forte took 56s
❯ python test_rule_loader.py
Execution time: 76.79581332206726 seconds
Multi-threading just load files, which leads to errors with loading.
detection-rules on multi_thread_rule_loader [!?] is v0.1.0 via v3.8.18 (venv) on eric.forte
❯ python test_rule_loader.py
Error loading rule in /home/forteea1/Code/clean_mains/detection-rules/rules/integrations/azure/defense_evasion_azure_service_principal_addition.toml
Error loading rule in /home/forteea1/Code/clean_mains/detection-rules/rules/integrations/google_workspace/collection_google_drive_ownership_transferred_via_google_workspace.toml
Error loading rule in /home/forteea1/Code/clean_mains/detection-rules/rules/integrations/google_workspace/initial_access_external_user_added_to_google_workspace_group.toml
Execution time: 236.1852207183838 seconds
Multi-threading just the init.
detection-rules on multi_thread_rule_loader [!?] is v0.1.0 via v3.8.18 (venv) on eric.forte
❯ python test_rule_loader.py
Execution time: 133.23614525794983 seconds
This has been moved to the Foundational Prep Meta and put back on deck.
In effect, may be a duplicate of: #2609
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This has been closed due to inactivity. If you feel this is an error, please re-open and include a justifying comment.