pdwittig/sdxl-lightning-bench

Just out of curiosity, I ran a quick and dirty benchmark running the 4-step SDXL-Lightning actually running 7 steps.

I ran these on Modal using both an L4 and an A10G, and the A10 was about 20% faster on average. The A10 has 2x the memory bandwidth and 3.3% more tensor compute. Given that these models are compute bound, the A10 speed up could be attributed to slightly higher compute and ?

A10 (FP16 Tensor Core - 250 teraFLOPS*)

Average inference time: 1.684442165090909
Min inference time: 1.628979968
Max inference time: 3.003188259
Raw results [3.003188259, 1.628979968, 1.634157636, 1.633920185, 1.632166398, 1.634742384, 1.638032848, 1.636198288, 1.647106481, 1.642948748, 1.645220351, 1.652886258, 1.65037344, 1.655337061, 1.657762198, 1.64895866, 1.653973952, 1.637640212, 1.639242132, 1.638310127, 1.639992661,1.640161564, 1.648551131, 1.643649482, 1.648825089, 1.646447859, 1.647533104, 1.644710121, 1.65310148, 1.64772509, 1.636447187, 1.636655842,1.641645252]

L4 (FP16 Tensor Core - 242 teraFLOPS*)

Average inference time: 2.0296630644545455
Min inference time: 1.97664988
Max inference time: 2.955913207
Raw results [2.955913207, 1.976649889, 1.984226713, 1.986912842, 1.986057031, 1.991211569, 1.978668463, 1.993407956, 1.992659455, 1.983810999, 1.995162692, 1.997734757, 1.981861345, 2.000222977, 1.995995473, 1.990108384, 1.997197408, 2.010701616, 2.001823559, 2.00060905, 2.004600128, 1.999529021, 2.002904233, 2.016194647, 2.016083237, 2.01816902, 2.015234098, 2.017169804, 2.012226661, 2.018453951, 2.022456305, 2.015755741, 2.019168896]

pdwittig / sdxl-lightning-bench

About

Languages