Summarization mechanism for `port` labels in Flow metrics
rboucher-me opened this issue · comments
The port
label associated with the top_(in|out)_(src_dst)_ports_(bytes|packets)
flow metrics has very high cardinality, primarily because source client ports are randomly chosen (typically in the 49152–65535 range). Information on individual port numbers is of limited value, though aggregate measures could still provide value. This ticket proposes one approach to reduce the cardinality of the port
label.
Suggest implementing port
summarization as a by-product of the enrichment process, by which numerical port numbers are converted to human-readable labels. This process could also "collapse" multiple port numbers to a smaller set of names, thereby providing rudimentary summarization.
Building upon the etc/services
file format, we could add support for lists and ranges (instead of single number values) to achieve this. Here is a sample file to illustrate this idea:
ftp 21/udp # File Transfer [Control]
ftp 21/tcp # File Transfer [Control]
ssh 22/udp # SSH Remote Login Protocol
ssh 22/tcp # SSH Remote Login Protocol
domain 53/udp # Domain Name Server
domain 53/tcp # Domain Name Server
...
dhcp 67,68/udp
dhcp 67,68/tcp
...
dynamic-client 49152–65535/udp
dynamic-client 49152–65535/tcp