Flow Stitching from VMware distributed switch

Question

Flow Stitching from VMware distributed switch

buckwheattb opened this issue 10 months ago · comments

Hello,

We are collecting NetFlow from a VMware distributed switch, but when the flows arrive, specifically when flows occur between two VM's that reside on different ESX hosts, all of the metrics are doubled. Presumably because each ESX host is exporting its own NetFlow stream.

I have tried configuring the VDS to use a single IP address for all NetFlow exports as well as setting it to use each ESX host for the sender address (peer_src_ip) with the same results. If we test a file copy operation from one VM to another on the same ESX host, all is good...however, the same test between to VM's on different ESX hosts results in all metrics being doubled.

I was under the impression that one of the functions of stitching was to accommodate for multiple NetFlow sources sending the same observed flow. Is this correct? Ex: two physical switches sending NetFlow regarding a connection between two hosts spanning those switches would be identified as the same and de-duplicated?

Our basic test config is below:

plugins: print[pmacct_full_test_9992] aggregate[pmacct_full_test_9992]:src_host,dst_host, proto,src_port,dst_port aggregate_filter[pmacct_full_test_9992]:(src net 192.168.50.120 or src net 192.168.50.123) and (dst net 192.168.50.123 or dst net 192.168.50.120) print_output[pmacct_full_test_9992]:csv print_output_file[pmacct_full_test_9992]:/etc/pmacct/pmacct_9992_test/%Y%m%d%H%M.csv print_history[pmacct_full_test_9992]: 60s print_history_offset[pmacct_full_test_9992]: m print_history_roundoff[pmacct_full_test_9992]:m print_output_file_append[pmacct_full_test_9992]: true print_refresh_time[pmacct_full_test_9992]:60

Is there another key or something we are missing?

Version
Output of nfacctd -V:

`NetFlow Accounting Daemon, nfacctd 1.7.7-git [20211107-0 (ef37a41)]

Arguments:
'--enable-mysql' '--enable-pgsql' '--enable-sqlite3' '--enable-kafka' '--enable-geoipv2' '--enable-jansson' '--enable-rabbitmq' '--enable-nflog' '--enable-ndpi' '--enable-zmq' '--enable-avro' '--enable-serdes' '--enable-redis' '--enable-gnutls' 'AVRO_CFLAGS=-I/usr/local/avro/include' 'AVRO_LIBS=-L/usr/local/avro/lib -lavro' '--enable-l2' '--enable-traffic-bins' '--enable-bgp-bins' '--enable-bmp-bins' '--enable-st-bins'

Libs:
cdada 0.3.5
libpcap version 1.8.1
MariaDB 10.3.31
PostgreSQL 110013
sqlite3 3.27.2
rabbimq-c 0.11.0
rdkafka 1.8.2
jansson 2.14
MaxmindDB 1.6.0
ZeroMQ 4.3.2
Redis 1.0.3
GnuTLS 3.6.7
avro-c
serdes
nDPI 3.4.0
netfilter_log

Plugins:
memory
print
nfprobe
sfprobe
tee
mysql
postgresql
sqlite
amqp
kafka

System:
Linux 5.4.17-2136.307.3.1.el8uek.x86_64 #2 SMP Mon May 9 17:29:47 PDT 2022 x86_64

Compiler:
gcc 8.3.0
`

Your help in this is greatly appreciated.

Paolo Lucente · Answer 1 · Sat Aug 05 2023 01:44:34 GMT+0800 (China Standard Time)

Hi @buckwheattb ,

No, the flow stitching feature is to stitch two flows into one, summing their metrics (packets, bytes). In other words it is not a de-duplication feature and, unfortunately, there is not one in pmacct that may fit your use-case.

Paolo

buckwheattb · Answer 2 · Sat Aug 05 2023 22:13:59 GMT+0800 (China Standard Time)

Hmm…that’s a problem for sure.

So what is the strategy when traffic is passing across multiple NetFlow exporters (ie. multiple switches or firewalls all of are sending NetFlow)?

Do you have any ideas or suggestions on ways to de-dupe flows from multiple sources?

Perhaps post processing based on time stamps? Or is there a flow ID/sequence I’d that could be exposed?

or, is there a different solution that you are aware of that might handle this issue?

thanks

Paolo Lucente · Answer 3 · Wed Aug 09 2023 03:29:47 GMT+0800 (China Standard Time)

Hi @buckwheattb ,

The best de-dupe strategy, if you would ask me, is first off to pick a direction - say ingress - and stick to it, that is, only sample ingress flows; then, having a knowledge of which interfaces are edge and which ones are core, discard as part of post-processing all flows coming from core interfaces.

This concept holds up very well in real networks but i am not familiar with a virtualized environment like yours. For example, say you have EU1 -> SW1 -> SW2 -> EU2, where EU would be end-users and switch would be your ESX hosts. When a flow goes from EU1 to EU2 so it is entering SW2 from a core interface (an interface where there is no end users connected), what do you see in NetFlow as ingress interface? Zero? Some index that you can easily reckon?

Basing on timestamps may work but it's going to be slightly more complicated by the fact stamps will never match perfectly; there will be some time-locality for sure for two identical flows but YMMV; also this strategy does require to log every single 5-tuple flow to the database (along with time stamps) in order to perform the temporal de-duplication; whereas basing on the ingress interface, what i was proposing in the previous paragraph, has the beauty that you can already aggregate data at collection time, as long as you retain the ingress interface info for the post-processing.

Paolo

buckwheattb · Answer 4 · Wed Aug 09 2023 04:18:50 GMT+0800 (China Standard Time)

hi Paolo,

I was afraid of that...unfortunately, a VDS does not give the option of ingress/egress...I potentially cold only process internal flows (communications within the ESX host alone) and then possibly configure NetFlow from the core switch...but that may or may not be an option..I'll check on that.

I would like your input on the following sample data generated between two VM's that are hosted on separate ESX hosts (192.168.50.32 and 192.168.50.33):

I have color coded each pair of duplicate flows

I'm thinking an option could be as follows:
drop all of the flows into a pre-processing table, and then kick off a copy from the pre-processing table to the production table where I only import based on the following criteria:

grab the flows that match based on src_ip, dst_ip, src_port, dst_port, protocol
I should, in the case of traffic passing across ESX hosts, end up with a pair of flows (peer_src_ip <> peer_src_ip)
once that pair is identified, simply drop one altogether and insert the other into the production DB table

If I have an aggregate posting, say every 5 minutes, that should be sufficient for that de-duplication logic to run. Each 5 minute "push" would be evaluated only on the records pushed...we would consider a duplicate across batches to be a new flow altogether using the same port. Totals and time-based charts would just be handled by the front-end query generating the data.

I'd like your thoughts on this approach...any pitfalls that I could be missing?

I'll still check out the viability of configuring NetFlow on the core switches, but would like to have this approach as a backup.

Thank you again for your input and help

Paolo Lucente · Answer 5 · Wed Aug 09 2023 05:23:38 GMT+0800 (China Standard Time)

Hi @buckwheattb ,

The approach you are describing may indeed work. About potential pitfalls, i am thinking to two:

if you see the color pairs in your screenshot, not all match perfectly -- see the yellow one as an example. Which one of the two would you discard and what happened to the rest of the packets / bytes on the other?
In these very small possible time mis-alignments (which may be even just not start / end time of the flows but rather the export time from the ESX host to the collector, it may be that one falls in a certain 5 mins batch and the other in the following one. So to be extra sure you should be introducing some latency in the post-processing / committing the flow, ie. never process the last 5 mins but process the penultimate one and, if there is some duplicate missing, check if in the last one does contain it.

Paolo