Speed up message filtering
marxin opened this issue · comments
Right now, we have quite a long list of regular expressions that are used for message filtering (Filters
in TOML configuration).
This can be very slow when a huge number of warnings/errors is emitted:
$ ./lint.py /tmp/binaries/python310-botocore-1.29.45-1.1.noarch.rpm -t -c configs/openSUSE
...
Check Duration (in s) Fraction (in %) Checked files
FilesCheck 9.5 95.9
if I print all messages, I get 35K messages of cross-directory-hard-link
titles. All these are filtered out by:
Filters = [
...
'.*cross-directory-hard-link.*',
we might want to consider having a separate list of message titles that can be quickly searched.
A prototype patch:
diff --git a/configs/openSUSE/opensuse.toml b/configs/openSUSE/opensuse.toml
index fdb7ad1c..67b3937d 100644
--- a/configs/openSUSE/opensuse.toml
+++ b/configs/openSUSE/opensuse.toml
@@ -31,6 +31,10 @@ DisallowedDirs = [
"/etc/NetworkManager/dispatcher.d",
]
+FilterErrorTitles = [
+ 'cross-directory-hard-link',
+]
+
Filters = [
# Stuff autobuild takes care about
'.*invalid-version.*',
@@ -41,7 +45,6 @@ Filters = [
'.*non-versioned-file-in-library-package.*',
'.*hardcoded-path-in-buildroot-tag.*',
'.*no-buildroot-tag.*',
- '.*cross-directory-hard-link.*',
# Do not validate package rpm groups
'.*devel-package-with-non-devel-group.*',
diff --git a/rpmlint/filter.py b/rpmlint/filter.py
index db1a2c94..3519dacf 100644
--- a/rpmlint/filter.py
+++ b/rpmlint/filter.py
@@ -32,6 +32,7 @@ class Filter:
self.strict = config.strict
# list of filter regexes
self.filters_regexes = [re.compile(f) for f in config.configuration['Filters']]
+ self.filter_titles = set(config.configuration['FilterErrorTitles'])
# list of blocked filters
self.blocked_filters = set(config.configuration['BlockedFilters'])
# set of filters that are actually used in add_info
@@ -153,6 +154,8 @@ class Filter:
result_no_color = f'{filename}{arch}:{line} {level}: {rpmlint_issue}{detail_output}'
# unused-rpmlintrc-filter warnings should be skipped
if rpmlint_issue != 'unused-rpmlintrc-filter' and rpmlint_issue not in self.blocked_filters:
+ if rpmlint_issue in self.filter_titles:
+ return
for f in self.filters_regexes:
if f.search(result_no_color):
self.used_filters.add(f.pattern)
with the patch applied, I get to:
Check time report (>1% & >0.1s):
Check Duration (in s) Fraction (in %) Checked files
FilesCheck 0.5 55.6