breuner / elbencho

A distributed storage benchmark for file systems, object stores & block devices with support for GPUs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

csvfile - In progress results over time, similar to FIO?

AeonJJohnson opened this issue · comments

Greetings.

In testing elbencho and using the csvfile option I get a start and end summary that can be graphed. I am looking for graphing the performance over time, identifying flushes, page cache, storage device issues, etc.

Is there a way to execute elbencho where results are recorded to csvfile at an interval during the benchmark, say every 500ms to 1s?

Thanks

+1 for this feature if it doesn't exist. I've been fudging this by running multiple iterations and seeing how performance varies at a much coarser granularity, but being able to sample instantaneous performance at high frequency (acknowledging that the sampling itself might impact performance) would be really valuable.

Ok, so I know what I'll be doing this weekend :-)

Taking a few days longer than I originally expected, so not done yet, but I'm on it...

@AeonJJohnson , @glennklockwood : Done and uploaded.

Would of course be great if you could try it out to let me know if it fits for the intended purpose.

The corresponding new parameters are

  • --livecsv PATH: path to csv file for progress log, by default contains only aggregate results
  • --livecsvex: extended format, adds lines for individual thread/service results
  • --liveint N: update interval in milliseconds, defaults to 2000

Here's a simple example for creation of a bunch of small files on my little home NAS box:

$ elbencho -d -w -n 10 -N 100 -s 1m -t 16 --livecsv /tmp/live.csv /mnt/smb/smallfiles/ 

The first lines of the /tmp/live.csv:

$ column -x -s, -t -n /tmp/live.csv
Label  Phase  RuntimeMS  Rank   MixType  Done%  DoneBytes   MiB/s  IOPS  Entries  Entries/s  Active  CPU  Service  
       WRITE  2001       Total           11     1998585856  953    953   1890     945        16      47            
       WRITE  4000       Total           12     2163212288  78     78    2048     79         16      6             
       WRITE  6000       Total           13     2325741568  77     77    2202     77         16      9             
       WRITE  8000       Total           14     2474639360  71     71    2344     71         16      5             
[...]

=> So this shows the significant difference between early cached writes and the drop after the first 2 seconds.

Thanks Sven! This looks great for my purposes. Verified by looking at writing out 6 TB.

Unknown

Thanks for checking, Glenn! That's a very interesting graph with the first 22min having quite extreme ups and downs and the remainder appearently not having such high peaks anymore.

@AeonJJohnson: I'm closing this issue under the assumption that the added feature works as intended. But of course please feel free to reopen if something doesn't seem right.