swd543 / semrep-data

Semantic web representation data from the nlm website for a host of drugs.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Read https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3540517/

UMLS unique concept identifier (CUI), which is linked to one or more UMLS semantic types in the files MRSTY and SRSTRE1.

http://europepmc.org/article/PMC/4765595

CUI references https://www.ncbi.nlm.nih.gov/books/NBK9676/pdf/Bookshelf_NBK9676.pdf https://www.ncbi.nlm.nih.gov/books/NBK9685/pdf/Bookshelf_NBK9685.pdf

Workflow

  • Create dictionary of unique paragraphs in CSV (ddossemrep.py)
  • Split data to be processed in parallel (splitdata.py)
  • Fill up the dictionary with output from semrep (concurrentddossemrep.py)
  • Combine split data into one big dictionary (combiner.py)
  • Verify if our dictionary is consistent and valid (debughash.py)
  • view generated pickle dictionary (viewpickle.py)
  • Generate CSV for each semrep output in dictionary (decompiler.py)
  • Generate mapping file for each semrep output (decompiler.py)
  • Generate rdfs for each semrep text (decompiler.py)

Input files

  • The provided CSV file is in (./XMLProduct_DBID_2/XMLProduct_DBID_2.csv)

Generated files

  • The most important data is (./temp/hashdump.pkl). This holds the filled dictionary in python pickle format!
  • After running (decompiler.py), the generated csv are outputted to (./mapping/input/input{hash}.csv)
  • Mapping files are located at (./mapping/myfirstmapping.ttl) (drug-active ingredient) and (./mapping/mapmeta.ttl) (semrep)
  • Outputted nt files are located at (./mapping/mf.nt) (semrep) and (./mapping/o.nt) (drug-active ingredients)

About

Semantic web representation data from the nlm website for a host of drugs.


Languages

Language:Python 99.5%Language:Shell 0.5%