zhongsheng-chen / SDF_Converter

A SDF conversion tool which is intended to handle SDF files from Mass of North America (MoNA) database

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A SDF Conversion Tool

This tool is intended to handle SDF files from Mass of North America (MoNA)

Package requirement

Dataset description

The dataset in SDF format used to test this tool was download from Vaniya-Fiehn Natural Products Library of MoNA. It was failed to retrieve molecules from its blocks, each of which starts with molecule's title line and ends at the four dollar signs ($$$$).

Motivation

For some reasons unknown or unveiled yet by MoNA, all the SDFs provided are not work well when loaded. It seems to me that I've gotten the reasons why they can not works well after aligning them to a standard SDF file. Following SDF format specifications, comparisons show that those dataset files are missing some necessary lines and M END ahead of atom's coordination and bond's connections. Please see SDF file specifications from a overview on Chemical Table File. In order to convert those bad SDFs to their good counterparts, both lines and M END required have been append to the raw files properly.

Usage

For converting a bad SDF dataset, followed by specifying the path to it, a converted SDF file would be written, as do a file for keeping that failed blocks if failed_block_file_name is designated.

$ python convert_sdf_utils.py \

            --path_to_bad_sdf=/sdf/like/file/path \

            --failed_block_file_name=/save/failed/block/to/file \

            --output_dir=/save/path/to/converted/sdf \

            --alsologtostderr

When loading molecules from the converted SDF, it is worth mentioning that you can reset the global constant variable MAX_ATOMS in mass_spec_constants.py to a proper value to passe out any molecule whose number of atoms is below MAX_ATOMS.

For example, a maximum number of atoms of the converted SDF for MoNA-export-HMDB.sdf is 92, so I assigned here MAX_ATOMS to 1000 so that make ensure all the molecules stored in that converted SDF can be fully loaded.

# Assign MAX_ATOMS to 1000 in mass_spec_constants.py
MAX_ATOMS = 1000
MAX_ATOM_ID = 1000

After that settings, you can load molecules from the converted SDF with

import parse_sdf_utils

def main():
    converted_sdf_name = 'path/to/converted/SDF'
    mol_list = parse_sdf_utils.get_sdf_to_mol(converted_sdf_name)

Acknowledgement

Some modules are imported from google brain team's efforts on deep-molecular-massspec, which give a easy way to parse molecules from SDFs.

About

A SDF conversion tool which is intended to handle SDF files from Mass of North America (MoNA) database

License:MIT License


Languages

Language:Python 100.0%