Deployment requirements based on libtorch or ONNX

Question

Deployment requirements based on libtorch or ONNX

wangrui9720 opened this issue 3 months ago · comments

After training ctgan, we hope to use C++ to call this model to work in real time. After trying, ctgan can't be deployed in torchscript and other formats, because the input and output data of ctgan are based on python's pandas library, while the input and output of libtorch are required to be in tensor format. We really need to provide a deployment method based on C++, which can improve the efficiency of software operation. We look forward to your proposal!

SDV Team · Answer 1 · Wed Apr 17 2024 00:51:05 GMT+0800 (China Standard Time)

Hi @wangrui9720! It’s great to see your interest in the SDV ecosystem. This comment is a reminder to consult your legal before adopting the SDV into your project, as SDV (and most of the related libraries such as CTGAN) has source-available, BSL license.

For more information, you can read through our license FAQs (not legal advice) or our blog. For any other questions, please refer to our Support Page. You can also inquire about a commercial license to allow additional use.

Srini Kadamati · Answer 2 · Fri May 10 2024 00:54:57 GMT+0800 (China Standard Time)

Hi there @wangrui9720 do you mind sharing a bit more about your use case? A few suggestions to consider:

GaussianCopulaSynthesizer, from SDV, is an alternative model that is significantly faster than our GAN based models like CTGAN. SDV is our batteries-included framework that sits one level above CTGAN and offers a better user experience.
To speed up CTGAN model training time, you can often get very good synthetic data quality with less rows than you think. You can read more about our thinking and advice here.

wangrui9720 · Answer 3 · Mon May 13 2024 09:54:48 GMT+0800 (China Standard Time)

Hi there @wangrui9720 do you mind sharing a bit more about your use case? A few suggestions to consider:

GaussianCopulaSynthesizer, from SDV, is an alternative model that is significantly faster than our GAN based models like CTGAN. SDV is our batteries-included framework that sits one level above CTGAN and offers a better user experience.

To speed up CTGAN model training time, you can often get very good synthetic data quality with less rows than you think. You can read more about our thinking and advice here.

This is the code that I call the trained ctgan model.

from ctgan import CTGAN
import pandas as pd

def load_ctgan_model():
model_path = 'Z:/project/pkl/ctgan-test.pkl'
ctgan = CTGAN.load(model_path)
return ctgan

def get_welding_parameters(ctgan, NG_piece, desired_rows=500, batch_size=100):

conditioned_data_list = []

while len(conditioned_data_list) < desired_rows:
   
    generated_data = ctgan.sample(batch_size)

    new_data = generated_data[generated_data[slice] == NG_piece]
 
    conditioned_data_list.extend(new_data.values)


conditioned_data = pd.DataFrame(conditioned_data_list, columns=generated_data.columns)

if len(conditioned_data) > desired_rows:
    conditioned_data = conditioned_data.iloc[:desired_rows]

average_welding_time = conditioned_data[time（ms）].mean()
average_welding_temp = conditioned_data[temp（℃）'].mean()

return average_welding_time, average_welding_temp

When I want to deploy the trained ctgan code for real-time output, I can only call this python code with c++. The Gaussiancoupulaasynthesizer you mentioned is also the python code that needs me to call Gaussiancoupulaasynthesizer with c++ to train, right? Looking forward to your reply!

Srini Kadamati · Answer 4 · Tue May 21 2024 21:29:36 GMT+0800 (China Standard Time)

Ah now I understand @wangrui9720 you're correct that CTGAN and SDV don't actually currently support portability of just the machine learning model. The pkl file also contain a lot of Python library context because all that context is usually needed to run the Synthesizer capabilities to generate synthetic data.

We have a feature request issue in SDV to enable the exporting of just the model weights: sdv-dev/SDV#1970

I'll close this issue off and will add your use case over there so we can collect more examples for the team to prioritize! Thanks!