How to judge the image is consistent with SMILES?

Question

How to judge the image is consistent with SMILES?

mapengsen opened this issue 2 years ago · comments

How to judge the image is consistent with SMILES?
the molecular image to SMILES，how can i judge that they are consistent?

or how to prediction the output SMILES accurate?

Kohulan Rajan · Answer 1 · Tue Nov 29 2022 16:15:58 GMT+0800 (China Standard Time)

More than 90% of the time the SMILES generated by the DECIMER Image Transformer are accurate.
Moreover, you could visualize the generated structure from the predicted SMILES here: decimer.ai

Otto Brinkhaus · Answer 2 · Tue Nov 29 2022 17:05:02 GMT+0800 (China Standard Time)

Hey @mapengsen,

I see that you have also asked this question on the img2mol repository. It is not easy to get an "accuracy" value directly from DECIMER and img2mol. If you have some sort of ground truth, you can calculate the Tanimoto distance based on a fingerprint of your choice. Otherwise, you will have to perform a visual inspection on https://decimer.ai as @Kohulan recommended.

If you need the estimate of a confidence value, you could use OSRA with the "--p" flag which will print one, but in our benchmarks, OSRA has shown to be a lot less robust than our solution. As it can be quite some struggle to compile OSRA with all of its dependencies, I would highly recommend using it with Docker. I have updated a dockerised version to the latest OSRA version here.

Good luck with your work!

Mick Vleeshouwer · Answer 3 · Tue Nov 29 2022 20:48:05 GMT+0800 (China Standard Time)

@OBrink I see you have contributed to both repositories. I am currently looking into your great work / models and I was wondering if there is a main difference between DECIMER and img2mol? Do both support handwritten detection and do the features itself differ.

Otto Brinkhaus · Answer 4 · Tue Nov 29 2022 21:20:16 GMT+0800 (China Standard Time)

Hey @iMicknl,
thanks for your interest! Both models have the same basic structure of CNN+Sequence model. They use different convolutional neural networks for the feature extraction from the images and different sequence models for the translation of the feature vectors into SMILES representations of the molecules. Both models have not been trained on handwritten structures, the capability to read these types of images is a side-effect of both model's capability to generalise relatively well.

In previous work, the Bayer team has interpreted the compressed latent feature vector in their encoder-decoder architectures as a molecular descriptor. They showed that this 512-dimensional feature-vector which they call CDDD (continuous data-driven descriptors) can carry all the molecular information necessary to translate between SMILES and IUPAC name representations of molecules. They have done really cool work, have a look here for more information. For img2mol, they have trained a CNN encoder to generate the CDDD based on chemical structure depictions. On top of that, they use their previously published RNN decoder to generate SMILES strings based on the CDDD. The img2mol CNN has been trained on approximately 10 million images (please correct me if I am wrong, that's the number I remember).

As img2mol is based on the CDDD work, some limitations directly derive from that:

stereochemistry cannot be encoded
markush structures cannot be represented

The latest version of DECIMER uses EfficientNet V2 as a CNN and a transformer as a sequence model. @Kohulan has experienced with different RNN decoder models before, and we found that transformers outperform the RNNs by far. DECIMER has been trained on approximately 400 million data points + ~100 million markush structures. Of course, I am biased, as I am part of the group that develops DECIMER, but let me list some advantages of DECIMER:

capable of reading stereochemistry
capable of reading markush structures
DECIMER outperforms every other available tool on common benchmark sets (and every other set that we have generated) --> We are working on a publication that shows these results. The current plan is to publish the preprint before Christmas. One outstanding example: On the handwritten structure dataset, we get 70% average Tanimoto distance without that the model has seen a single handwritten structure during training.
everything about DECIMER, the source code, the models and the data generation tool (RanDepict) is open-source and published under permissive licenses
DECIMER comes with a nice web app (https://decimer.ai) that combines the OCSR tool with DECIMER Segmentation for whole-document-analysis. If you want to run it locally, you can use the source code available here to build it with docker-compose.

As I mentioned, I might be biased as I am part of the @steinbeck group at the university of Jena. Maybe we can get the opinion from someone in the @bayer-science-for-a-better-life team? :)

Have a nice day!

Pengsen Ma · Answer 5 · Tue Nov 29 2022 21:24:52 GMT+0800 (China Standard Time)

@OBrink
First of all, thank you very much for your warm reply,
It is very important to input a 2D image, then get his SMILES, and calculate the accuracy of this conversion process (I don't know groundtruth, because this 2D image may be generated by neural network). It will also be a very good job if the accuracy of its conversion can be calculated.

Otto Brinkhaus · Answer 6 · Tue Nov 29 2022 21:31:26 GMT+0800 (China Standard Time)

@mapengsen
Thanks for the suggestion! We will discuss what would be necessary to implement something like a confidence value, but I am afraid that this feature does not exist in the current version of DECIMER, and I cannot make any promises.

Pengsen Ma · Answer 7 · Tue Nov 29 2022 21:35:57 GMT+0800 (China Standard Time)

@mapengsen Thanks for the suggestion! We will discuss what would be necessary to implement something like a confidence value, but I am afraid that this feature does not exist in the current version of DECIMER, and I cannot make any promises.

Thank you for your reply. I'm currently doing research in this area. If you have any ideas about doing this work in the future, we can communicate and look forward to your cooperation.
@OBrink

Pengsen Ma · Answer 8 · Thu Dec 01 2022 20:29:31 GMT+0800 (China Standard Time)

It is not easy to get an "accuracy" value directly from DECIMER and img2mol.
but can you give me some suggestions on how to evaulate the "accuracy" value between "generated image A" and "transition SMILES B"

Otto Brinkhaus · Answer 9 · Thu Dec 01 2022 20:38:31 GMT+0800 (China Standard Time)

Have you generated that image based on a SMILES string? Or is this something completely different? The wobbliness of the characters reminds me of some experiments that I have done with generative models for the generation of chemical structure depictions. Additionally, it is not a valid chemical structure. If you know the molecule that you depicted, then you can calculate the Tanimoto distance based on a fingerprint of your choice. If you don't know the depicted molecule, I would recommend re-depicting the resolved molecule based on the SMILES str and comparing them manually. Or you generate the SMILES based on the generated image manually.

No matter what you do, you will not get around some manual tedious work if you don't know the depicted molecules.

If it is not an endless amount of structures, I strongly recommend using our user interface on https://decimer.ai (see screenshot). If you are worried about the security of your data and you don't want to upload anything to a web app, you can also run it locally (see here how to do that)

Pengsen Ma · Answer 10 · Thu Dec 01 2022 20:48:09 GMT+0800 (China Standard Time)

Now i am doing the generated model ,and it will generated endless molecular images, i want to know the accuracy of "generated image " and "SMILES" that computed by machine automatic. i think i can computer the similarity between "generated images" and "image2SMILES2image" by some metrics(SSIM,PSNR...)

Thank you very very much ,You are very warm-hearted.

Otto Brinkhaus · Answer 11 · Thu Dec 01 2022 20:57:04 GMT+0800 (China Standard Time)

Let me know how it works out!

Does "image2SMILES2image" mean you want to generate an image, use DECIMER to get the SMILES, redepict it and determine the similarity between the original image and the re-depicted image? What is this good for? What kind of structures does your generative model produce?

Pengsen Ma · Answer 12 · Thu Dec 01 2022 21:01:08 GMT+0800 (China Standard Time)

computer the metrics between A and A'
B2A' by Rdkit

Pengsen Ma · Answer 13 · Thu Dec 01 2022 21:06:15 GMT+0800 (China Standard Time)

it is important , because you can use the accuracy Backpropagate the network to better update the Network gradient.