swd543 / semrep-data

Semantic web representation data from the nlm website for a host of drugs.

Read https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3540517/

UMLS unique concept identifier (CUI), which is linked to one or more UMLS semantic types in the files MRSTY and SRSTRE1.

http://europepmc.org/article/PMC/4765595

CUI references https://www.ncbi.nlm.nih.gov/books/NBK9676/pdf/Bookshelf_NBK9676.pdf https://www.ncbi.nlm.nih.gov/books/NBK9685/pdf/Bookshelf_NBK9685.pdf

Workflow

Create dictionary of unique paragraphs in CSV (ddossemrep.py)
Split data to be processed in parallel (splitdata.py)
Fill up the dictionary with output from semrep (concurrentddossemrep.py)
Combine split data into one big dictionary (combiner.py)
Verify if our dictionary is consistent and valid (debughash.py)
view generated pickle dictionary (viewpickle.py)
Generate CSV for each semrep output in dictionary (decompiler.py)
Generate mapping file for each semrep output (decompiler.py)
Generate rdfs for each semrep text (decompiler.py)

Input files

The provided CSV file is in (./XMLProduct_DBID_2/XMLProduct_DBID_2.csv)

Generated files

The most important data is (./temp/hashdump.pkl). This holds the filled dictionary in python pickle format!
After running (decompiler.py), the generated csv are outputted to (./mapping/input/input{hash}.csv)
Mapping files are located at (./mapping/myfirstmapping.ttl) (drug-active ingredient) and (./mapping/mapmeta.ttl) (semrep)
Outputted nt files are located at (./mapping/mf.nt) (semrep) and (./mapping/o.nt) (drug-active ingredients)

About

Semantic web representation data from the nlm website for a host of drugs.

Languages

Language:Python 99.5%Language:Shell 0.5%