UnixJunkie / molenc

MolEnc: a molecular encoder using rdkit and OCaml.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

molenc_dsmi

UnixJunkie opened this issue · comments

  • featurize as unfolfed counted FP (vector length = dictionary size)
  • featurize as sequences of works (variable length; so add adequate padding; also using dictionary)
  • support output in .AP format; whatever the encoding type (bitstring, count vector, sequence)

a sequence in AP format is obtained by counting how many times each transition (pair of words) is used