hidet-org / hidet

An open-source efficient deep learning framework/compiler, written in python.

Home Page:https://hidet.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Bug] How do you handle graph breaks coming from Dynamo?

tbaggu opened this issue · comments

Hi

I am taking this repo as reference to implement code for our custom back-end. During the development process most of the cases when i use hugging-face models directly i see lot of graph breaks, in Fx graph,

My understanding on the inductor side is, each subgraph compiled and ran the inference sent the results to CPU, and the next subgraph will start executing which is time consuming?

So my question is ,
Have you seen such graph breaks if so how Hidet handles them?

similar to below case
https://discuss.pytorch.org/t/stitching-together-graph-breaks-for-large-compilation-units/194793/5

Hi @tbaggu ,

The dynamo custom backends can not control how the torch model is partitioned and converted to fx graph.

The torch dynamo dispatch the fx graph to custom backend and the backend compiles the fx graph to some executable and returns to dynamo. The compilation only happens once and the compiled executable will be used many times. As long as the compiled executable is efficient, the overhead will not be very large.

Yes.

model_opt = torch.compile(model, backend='custom-backend')

model_opt(x) # sub graph will be compiled and executed
model_opt(x) # the cached compiled executable will be used, no compilation at all

for each subgraph result should comeback to CPU

No, the result will stay in the same device (cpu or gpu) as the eager execution of the orginal model.

Closing as this issue is not directly related to Hidet.