pytorch / ort

Accelerate PyTorch models with ONNX Runtime

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Running ORTModule with other EPs from ORT

chethanpk opened this issue · comments

I am building a new wheel with the OneDNN EP using Onnx runtime training. After that is installed, I install torch_ort and then run the configure, but it does not seem to work ( I get the same error asking me to run the configure again). From the instructions, I see that there is no recipe for this combination. Is this possible or is there any other way for me to build a custom wheel and use it to train bert model with OneDNN and ORT?

Hi @chethanpk, can you please post the output of the configure step?

@natke
C:\Users\WOS>python -m torch_ort.configure
running build
running build_ext
C:\Python37\lib\site-packages\torch\utils\cpp_extension.py:305: UserWarning: Error checking compiler version for cl: [WinError 2] The system cannot find the file specified
warnings.warn(f'Error checking compiler version for {compiler}: {error}')
building 'aten_op_executor' extension
Emitting ninja build file C:\Python37\lib\site-packages\onnxruntime\training\ortmodule\torch_cpp_extensions\build\temp.win-amd64-3.7\Release\build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.28.29910\bin\HostX86\x64\link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:C:\Python37\lib\site-packages\torch\lib /LIBPATH:C:\Python37\libs /LIBPATH:C:\Python37\PCbuild\amd64 "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.28.29910\ATLMFC\lib\x64" "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.28.29910\lib\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\lib\um\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.19041.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.19041.0\um\x64" c10.lib torch.lib torch_cpu.lib torch_python.lib /EXPORT:PyInit_aten_op_executor C:\Python37\lib\site-packages\onnxruntime\training\ortmodule\torch_cpp_extensions\build\temp.win-amd64-3.7\Release\Python37\lib\site-packages\onnxruntime\training\ortmodule\torch_cpp_extensions\aten_op_executor\aten_op_executor.obj /OUT:build\lib.win-amd64-3.7\aten_op_executor.cp37-win_amd64.pyd /IMPLIB:C:\Python37\lib\site-packages\onnxruntime\training\ortmodule\torch_cpp_extensions\build\temp.win-amd64-3.7\Release\Python37\lib\site-packages\onnxruntime\training\ortmodule\torch_cpp_extensions\aten_op_executor\aten_op_executor.cp37-win_amd64.lib
Creating library C:\Python37\lib\site-packages\onnxruntime\training\ortmodule\torch_cpp_extensions\build\temp.win-amd64-3.7\Release\Python37\lib\site-packages\onnxruntime\training\ortmodule\torch_cpp_extensions\aten_op_executor\aten_op_executor.cp37-win_amd64.lib and object C:\Python37\lib\site-packages\onnxruntime\training\ortmodule\torch_cpp_extensions\build\temp.win-amd64-3.7\Release\Python37\lib\site-packages\onnxruntime\training\ortmodule\torch_cpp_extensions\aten_op_executor\aten_op_executor.cp37-win_amd64.exp
Generating code
Finished generating code

Hi @natke, did you get a chance to take a look at this?

Hi @chethanpk we currently do not have support for running torch_ort.configure on a Windows machine. Have you given this a try on a linux machine?

@baijumeswani will try it on linux and let you know.

@baijumeswani I tried it on linux and I was able to complete the training and it did not error out at ORTModule. However it is not using the OneDNN EP. It was using the default CPU EP.
Is there any way this has to be configured to use OneDNN EP?
The OnnxRuntime installation was done using the wheel I build with OneDNN enabled.

Thanks @chethanpk for reporting this. On further looking, it would appear that we currently have support for cuda and rocm execution providers through ORTModule. I will ask internally to see if/how we can support this.
https://github.com/microsoft/onnxruntime/blob/master/orttraining/orttraining/python/training/ortmodule/_graph_execution_manager.py#L248-L250

Just a minor update: supporting other EPs in ORTModules is on our to-do list but we don't have a deadline for it.

Is there any update on this? I am currently running by forcing it to use DNNL EP by default and building the wheel with DNNL EP but we need it so that anyone else can directly build and use it.

Hi @chethanpk, I'm the PM for this package. Can you reach out to me at nakersha@microsoft.com and we can have a conversation about your use case

Closing this issue now. Please re-open the issue in case we can provide more assistance through this channel.