emilwallner / Screenshot-to-code

A neural network that transforms a design mock-up into a static website.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

OCR and Font detection, which of these three is better approach?

jonoNel opened this issue · comments

I want to train the model on font detection and OCR using below links but i'm not sure of the 4 options how best to do it:

  1. Train on top of existing model ie add the new data to the existing dataset
  2. Train the the networks independently but combine the output like ensemble model?
  3. Make a brand new neural network using the logics and algorithms of the other neural networks?

OCR
https://github.com/Tony607/keras-image-ocr/blob/master/image-ocr.ipynb
https://mc.ai/how-to-train-a-keras-model-to-recognize-text-with-variable-length/

Fonts:
https://tsprojectsblog.wordpress.com/2017/08/19/using-a-neuronal-network-for-font-character-detection-in-images/
https://tanmayshah2015.wordpress.com/2015/12/01/synthetic-font-dataset-generation/

I’m still a noob when It comes to ML but for fonts I think a whole new network would be overkill, we could use existing tools like https://github.com/Vasile-Peste/Typefont which uses tesseract and integrate it into our project somehow

This is a great next step. I'd go for option 2.

As this project scales, I like the idea of having niche models, e.g. one for layout, one for font and text, and one for animations, etc. Then have an integration pipeline that fits everything together. This makes it more modular and easier to collaborate on.

The difficult aspect of text and font recognition is inserting it into the HTML. Here's what I'd start with:

  1. Try finding an existing model that extracts the text in a page and separates them by area, and then finds the font associated with each area. (I'd probably skip the font to start with to narrow down the problem, then eventually add max 10-20 fonts)
  2. Input training data:
    a) The screenshot including the correct text
    b) The HTML with unique div tags and a placeholder for the text
    c) One of the text snippets and a potential font tag.
    Output: The unique div tag that corresponds to the text snippet in c.
  3. Write a script that extracts all the text/fonts using existing OCR, makes a prediction for each text snippet and inserts it into the HTML.

Ok thank you