rociomer / data-sharing-perspective

Data and scripts to generate figures for the data sharing perspective.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

data-sharing-perspective

Data and scripts to generate figures for the perspective "Data sharing in chemistry: lessons learned and a case for mandating structured reaction data" by R Mercado, SM Kearnes, and CW Coley.

Environment

You can use a conda environment to run the plotting scripts in this repo. To set up the environment, run:

conda create -n data-sharing-perspective seaborn -c anaconda
conda activate data-sharing-perspective`

Generating plots

To create the plot for figures 1 and 2 in the manuscript, run:

python plot-entries.py
python plot-contributors.py

The first script will plot data entries over time and the second script will plot contributors/sources over time, for the following databases:

Files will be created in plots/.

Illustrator files

Files used for making the figures shown in the paper are available in illustrator/. Made using Adobe Illustrator.

Raw data

The raw data for the above plots is available in data/. For individual sources, see below:

CSD

Structures available in the CSD (cumulative): CSD structures Data collected from:

PDB

Depositors for data in the PDB (cumulative): PDB depositors Entries available in the PDB (cumulative) PDB entries Data collected from:

PubChem

Sources for data in PubChem (cumulative): PubChem sources Data entries in PubChem (cumulative): PubChem BioAssays Data collected from:

* Accessed Dec 25, 2022.

ChEMBL

Sources (documents) for data in ChEMBL (cumulative): ChEMBL documents Compound entries in ChEMBL (cumulative): ChEMBL compounds Data collected from:

Links to icons used in paper figures

Flaticon images linked here (used freely with attribution): 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11

About

Data and scripts to generate figures for the data sharing perspective.


Languages

Language:Python 100.0%