csvfile - In progress results over time, similar to FIO?

Question

csvfile - In progress results over time, similar to FIO?

AeonJJohnson opened this issue 2 years ago · comments

Greetings.

In testing elbencho and using the csvfile option I get a start and end summary that can be graphed. I am looking for graphing the performance over time, identifying flushes, page cache, storage device issues, etc.

Is there a way to execute elbencho where results are recorded to csvfile at an interval during the benchmark, say every 500ms to 1s?

Thanks

Glenn K. Lockwood · Answer 1 · Fri Feb 11 2022 08:33:27 GMT+0800 (China Standard Time)

+1 for this feature if it doesn't exist. I've been fudging this by running multiple iterations and seeing how performance varies at a much coarser granularity, but being able to sample instantaneous performance at high frequency (acknowledging that the sampling itself might impact performance) would be really valuable.

Sven Breuner · Answer 2 · Sat Feb 12 2022 00:09:58 GMT+0800 (China Standard Time)

Ok, so I know what I'll be doing this weekend :-)

Sven Breuner · Answer 3 · Mon Feb 14 2022 06:13:22 GMT+0800 (China Standard Time)

Taking a few days longer than I originally expected, so not done yet, but I'm on it...

Sven Breuner · Answer 4 · Mon Feb 21 2022 08:19:49 GMT+0800 (China Standard Time)

@AeonJJohnson , @glennklockwood : Done and uploaded.

Would of course be great if you could try it out to let me know if it fits for the intended purpose.

The corresponding new parameters are

--livecsv PATH: path to csv file for progress log, by default contains only aggregate results
--livecsvex: extended format, adds lines for individual thread/service results
--liveint N: update interval in milliseconds, defaults to 2000

Here's a simple example for creation of a bunch of small files on my little home NAS box:

$ elbencho -d -w -n 10 -N 100 -s 1m -t 16 --livecsv /tmp/live.csv /mnt/smb/smallfiles/

The first lines of the /tmp/live.csv:

$ column -x -s, -t -n /tmp/live.csv
Label  Phase  RuntimeMS  Rank   MixType  Done%  DoneBytes   MiB/s  IOPS  Entries  Entries/s  Active  CPU  Service  
       WRITE  2001       Total           11     1998585856  953    953   1890     945        16      47            
       WRITE  4000       Total           12     2163212288  78     78    2048     79         16      6             
       WRITE  6000       Total           13     2325741568  77     77    2202     77         16      9             
       WRITE  8000       Total           14     2474639360  71     71    2344     71         16      5             
[...]

=> So this shows the significant difference between early cached writes and the drop after the first 2 seconds.

Glenn K. Lockwood · Answer 5 · Sat Feb 26 2022 03:33:17 GMT+0800 (China Standard Time)

Thanks Sven! This looks great for my purposes. Verified by looking at writing out 6 TB.

Sven Breuner · Answer 6 · Sat Feb 26 2022 06:23:12 GMT+0800 (China Standard Time)

Thanks for checking, Glenn! That's a very interesting graph with the first 22min having quite extreme ups and downs and the remainder appearently not having such high peaks anymore.

@AeonJJohnson: I'm closing this issue under the assumption that the added feature works as intended. But of course please feel free to reopen if something doesn't seem right.