douweschulte / pdbtbx

A library to open/edit/save (crystallographic) Protein Data Bank (PDB) and mmCIF files in Rust.

Home Page:https://crates.io/crates/pdbtbx

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Utilities?

rvhonorato opened this issue · comments

Hey! I've been writing a wasm library that (heavily) relies on pdbtbx in a new frontend I'm working on. My objective with it is to do as many client-side validations/processing/parsing as possible in the client-side.

I must say that at first I was a bit skeptic but pdbtbx is the exact tool for the job, the api is simple and straightforward - congrats on that! 😅

Looking into the docs, I see there are already some utilities such as remove_atoms_by.

My question is if it would be in the scope of the project to add sort of utilities that are a bit more complex.

For example in the library I mentioned I added;

/// Returns a vector of tuples representing the chains in contact in a given PDB structure.
// ...
/// # Example
///
/// ```
/// use pdbtbx::PDB;
/// use wasm_lib::pdb::chains_in_contact;
///
/// let pdb = PDB::new("path/to/pdb/file").unwrap();
/// let contacts = chains_in_contact(pdb);
pub fn chains_in_contact(structure: pdbtbx::PDB) -> Vec<(String, String)> {
// ...

Also other functions to check wether the PDB contains non-canonical aminoacids, identify the molecular type based on the residue name etc.

I'd be open to add such utilities here instead, since imo this could be useful for other users but I also understand if that makes it a bit too specific for each use case. Let me know what you think.

Thanks!
In general I approve of adding any method that would have to be implemented by more than one person for their project. Feel free to open a PR with the functions you wrote and think could be useful, then we can have a discussion there about the individual functions and maybe find a way of rewriting them to be more useful in the general case. Or if you want we could also have a discussion about those here before we actually have to fit the functions into the pdbtbx code.

I think it works best case by case, I'll draft it up as soon as I can make time.

As for if it will be useful, I certainly hope so!

Any comments @sverhoeven? I'll do this anyway for wenmr, maybe we can also add it to the ivresse projects.

No comments, indeed using these utilities in ivresse would be nice.

@rvhonorato sorry to hijack this thread. is your wasm lib public? i'd also like to use pdbtbx on the frontend. interested to see how that's done

@janosh here you are: https://github.com/i-VRESSE/pdbtbx-ts. A very nice somewhat recent feature of github is that you can find forward and reverse dependencies on github itself: https://github.com/douweschulte/pdbtbx/network/dependents.

@janosh the one @douweschulte posted above is a good example awesomely made by @sverhoeven.

However the way I'm using it in our new frontend is a bit different since I've ran into a few issues when trying to use the wasm lib together with the formik validation (probably because of my bad TS 😅); see a small snippet of how it ended working:

#[wasm_bindgen]
pub fn list_chains(bytes: js_sys::Uint8Array) -> JsValue {
    utils::set_panic_hook();
    // go from `js_sys::Uint8Array` to `pdbtbx::PDB` (omitted)
    let structure = pdb::load_pdb_from_bytes(bytes); 
    let chains = pdb::identify_chains(structure);
    serde_wasm_bindgen::to_value(&chains).unwrap()
}

pub fn identify_chains(structure: pdbtbx::PDB) -> Vec<String> {
    let mut chains: Vec<String> = Vec::new();
    // I know this for is not very rusty :P
    for chain in structure.chains() {
        chains.push(chain.id().to_string());
    }
    chains
}
import init, { list_chains, chains_in_contact } from "wasm-lib"; // @ts-ignore

// ...

const identifyChains = async (bytes: Uint8Array): Promise<string[]> => {
  try {
    await init();
    let result = await list_chains(bytes);
    return result;
  } catch (error) {
    // error handling
    return [];
  }
};

const readPDB = async (file: File) => {
  return new Promise<void>((resolve, reject) => {
    const reader = new FileReader();
    reader.onload = async () => {
      if (reader.result && reader.result instanceof ArrayBuffer) {
        const arrayBuffer = reader.result;
        const atomBuffer = keepOnlyAtomRecords(arrayBuffer);
        const observedChains = await identifyChains(
          new Uint8Array(atomBuffer),
        );
        // ######
        // observedChains is `string[]` and can be set as a state and used in the component
        // ######
}

const handleFileChange = async (
    event: React.ChangeEvent<HTMLInputElement>,
  ) => {
    const file = event.target.files?.[0];

    if (file) {
      // ...
      if (file.name.endsWith(".zip")) {
        await readZIP(file);
      } else if (file.name.endsWith(".pdb")) {
        await readPDB(file);
      }
    }
  };

const ProdigyForm = () => {
// ... lots of stuff here
  return (
    <Form>
    // ...
        <Field // from Formik
            onChange={(e: ChangeEvent<HTMLInputElement>) => {
              handleFileChange(e);
           }}

In this snippet above, once a structure is uploaded in a field, the wasm lib is triggered and whatever chains are present there are returned as a string[]. I have other functions that also calculate the contacts between the chains, rmsds and all other sort of utilities and validations, they all run very fast even in very large complexes. Hope that helps!