alexandre01 / deepsvg

[NeurIPS 2020] Official code for the paper "DeepSVG: A Hierarchical Generative Network for Vector Graphics Animation". Includes a PyTorch library for deep learning with SVG data.

Home Page:https://www.lingosub.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

From Generation to Translation

riven314 opened this issue · comments

This is a very interesting work!
Do you think this model can be easily adapted to tackle SVG translation problem? (ie translating a SVG from one type to another)
Do you have any related literatures in your mind for these line of work?
Would love to hear your thought!

Hey Alex,
Yes for sure, we even had this kind of application in mind while developing DeepSVG. Instead of training for input reconstruction, just replace the output target with the desired translation.

One such example could be SVG "beautification". Given a dataset of clean SVGs, randomly jitter it and feed it as input to the model, whose goal is now to reconstruct the clean SVG. This could maybe be used by graphic designers to automatically clean their vector drawings...

What kind of SVG translation do you have in mind?

"Beautification" is a good idea! I have 2 ideas in my mind:

  1. transferring one emoji from one style to another (e.g. author style, color ... etc.)
  2. Treating a layout as a SVG object (layout is a composition of different shapes), I wanna transform a layout from one kind to another. (having identical shapes, simply apply geometric transformation on different shapes) I wanna see if the model could capture the spatial pattern. And by the time interpolation is applied between two samples, could I see a smooth transition/ nice disentanglement between shapes.

Great!

  • While 1. sounds potentially feasible, I see the problem that you won't have enough training data and the model will just overfit without capacity to generalize.
    1. sounds like a great idea! On our side, we imagined, as potential applications/extensions to DeepSVG, doing HTML/CSS generation, treating the different 'div' sections as square boxes. But what type described has even the advantage that you can generate as much data as you want. And I'm pretty sure this Transformer-based architecture is capable of finding these spatial relationships :)

Thanks for your feedback! I am actually in a brain-storming stage, thinking which direction is more rewarding and fast to iterate haha

  • For 1, I also foresee limited data is an obstacle, especially an author usually only created a limited amount of emojis. Do you think a generic transfer learning could apply in this model? (e.g. fine-tuning the decoder part with few examples)
  • For 2, this is a model capacity I am really interested in because such capacity could enable the model to be applied in many areas such as layout design! To quickly validate the concept, I think RICO (webpage layout dataset) is a good dataset for that. I am thinking to create a pseudo pair (artificially transform each webpage layout to a particular form) in order to convert the dataset to one for translation problem. To further extend that, the model could also take into account the image for each placeholder. Do you think the current model could be able to encode shape + its enclosed image?

This is interesting conversation. @riven314 and @alexandre01 . I am actually using Rico dataset currently but the focus of the work is classification. @alexandre01 do you think this work can be easily used for retraining on UI component SVGs to classify?