Running ORTModule with other EPs from ORT

Question

Running ORTModule with other EPs from ORT

chethanpk opened this issue 3 years ago · comments

I am building a new wheel with the OneDNN EP using Onnx runtime training. After that is installed, I install torch_ort and then run the configure, but it does not seem to work ( I get the same error asking me to run the configure again). From the instructions, I see that there is no recipe for this combination. Is this possible or is there any other way for me to build a custom wheel and use it to train bert model with OneDNN and ORT?

Nat Kershaw (MSFT) · Answer 1 · Tue Aug 31 2021 04:19:02 GMT+0800 (China Standard Time)

Hi @chethanpk, can you please post the output of the configure step?

chethanpk · Answer 2 · Tue Aug 31 2021 04:21:07 GMT+0800 (China Standard Time)

@natke
C:\Users\WOS>python -m torch_ort.configure
running build
running build_ext
C:\Python37\lib\site-packages\torch\utils\cpp_extension.py:305: UserWarning: Error checking compiler version for cl: [WinError 2] The system cannot find the file specified
warnings.warn(f'Error checking compiler version for {compiler}: {error}')
building 'aten_op_executor' extension
Emitting ninja build file C:\Python37\lib\site-packages\onnxruntime\training\ortmodule\torch_cpp_extensions\build\temp.win-amd64-3.7\Release\build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.28.29910\bin\HostX86\x64\link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:C:\Python37\lib\site-packages\torch\lib /LIBPATH:C:\Python37\libs /LIBPATH:C:\Python37\PCbuild\amd64 "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.28.29910\ATLMFC\lib\x64" "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.28.29910\lib\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\lib\um\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.19041.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.19041.0\um\x64" c10.lib torch.lib torch_cpu.lib torch_python.lib /EXPORT:PyInit_aten_op_executor C:\Python37\lib\site-packages\onnxruntime\training\ortmodule\torch_cpp_extensions\build\temp.win-amd64-3.7\Release\Python37\lib\site-packages\onnxruntime\training\ortmodule\torch_cpp_extensions\aten_op_executor\aten_op_executor.obj /OUT:build\lib.win-amd64-3.7\aten_op_executor.cp37-win_amd64.pyd /IMPLIB:C:\Python37\lib\site-packages\onnxruntime\training\ortmodule\torch_cpp_extensions\build\temp.win-amd64-3.7\Release\Python37\lib\site-packages\onnxruntime\training\ortmodule\torch_cpp_extensions\aten_op_executor\aten_op_executor.cp37-win_amd64.lib
Creating library C:\Python37\lib\site-packages\onnxruntime\training\ortmodule\torch_cpp_extensions\build\temp.win-amd64-3.7\Release\Python37\lib\site-packages\onnxruntime\training\ortmodule\torch_cpp_extensions\aten_op_executor\aten_op_executor.cp37-win_amd64.lib and object C:\Python37\lib\site-packages\onnxruntime\training\ortmodule\torch_cpp_extensions\build\temp.win-amd64-3.7\Release\Python37\lib\site-packages\onnxruntime\training\ortmodule\torch_cpp_extensions\aten_op_executor\aten_op_executor.cp37-win_amd64.exp
Generating code
Finished generating code

chethanpk · Answer 3 · Sat Sep 18 2021 03:44:24 GMT+0800 (China Standard Time)

Hi @natke, did you get a chance to take a look at this?

Baiju Meswani · Answer 4 · Wed Sep 22 2021 04:31:13 GMT+0800 (China Standard Time)

Hi @chethanpk we currently do not have support for running torch_ort.configure on a Windows machine. Have you given this a try on a linux machine?

chethanpk · Answer 5 · Wed Sep 22 2021 04:53:23 GMT+0800 (China Standard Time)

@baijumeswani will try it on linux and let you know.

chethanpk · Answer 6 · Fri Sep 24 2021 04:12:25 GMT+0800 (China Standard Time)

@baijumeswani I tried it on linux and I was able to complete the training and it did not error out at ORTModule. However it is not using the OneDNN EP. It was using the default CPU EP.
Is there any way this has to be configured to use OneDNN EP?
The OnnxRuntime installation was done using the wheel I build with OneDNN enabled.

Baiju Meswani · Answer 7 · Sat Sep 25 2021 00:13:36 GMT+0800 (China Standard Time)

Thanks @chethanpk for reporting this. On further looking, it would appear that we currently have support for cuda and rocm execution providers through ORTModule. I will ask internally to see if/how we can support this.
https://github.com/microsoft/onnxruntime/blob/master/orttraining/orttraining/python/training/ortmodule/_graph_execution_manager.py#L248-L250

Wei-Sheng Chin · Answer 8 · Tue Nov 16 2021 08:09:02 GMT+0800 (China Standard Time)

Just a minor update: supporting other EPs in ORTModules is on our to-do list but we don't have a deadline for it.

chethanpk · Answer 9 · Fri Mar 11 2022 03:34:13 GMT+0800 (China Standard Time)

Is there any update on this? I am currently running by forcing it to use DNNL EP by default and building the wheel with DNNL EP but we need it so that anyone else can directly build and use it.

Nat Kershaw (MSFT) · Answer 10 · Fri Mar 11 2022 05:44:42 GMT+0800 (China Standard Time)

Hi @chethanpk, I'm the PM for this package. Can you reach out to me at nakersha@microsoft.com and we can have a conversation about your use case

Baiju Meswani · Answer 11 · Wed May 11 2022 01:24:03 GMT+0800 (China Standard Time)

Closing this issue now. Please re-open the issue in case we can provide more assistance through this channel.