soumyaxyz / abstractAnalysis

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Abstract segmentation with sparse data

This is the source code for the paper Segmenting Scientific Abstracts into Discourse Categories: A Deep Learning-Based Approach for Sparse Labeled Data ( Arxiv preprint ), presented in JCDL 2020.

Data

This repository includes the dataset of segemented CS abstracts

Data Source Directory
PubMed-non-RCT1 non RCT articles from PubMed PubMedData/
cs.NI cs.networks subdomain from arxiv.org arxiv_final/
cs.TLT IEEE Transactions on Learning Technologies IEEE_final/TLT/
cs.TPAMI IEEE Transactions on Transactions on Pattern Analysis and Machine Intelligence IEEE_final/TPAMI/
cs.combined cs.NI + cs.TLT + cs.TPAMI Merged/

1 The PubMed-non-RCT dataset was too large to be included in this repository. The code to bulid the dataset is provided along with a small sample of data.

Embeddings

We utilized the Common Crawl (42B tokens 300 dimention) GLOVE embedding in word2vec format.

Dependencies

  • python 3.5.6
  • tensorflow 1.10.0
  • keras 2.2.4
  • keras-self-attention 0.47.0
  • sklearn 0.20.3

Usage

  1. Navigate to Code/

  2. Set the PRETRAINED_EMBEDDINGS location2 in line 5 of Code/embeddings_loader.py

  3. Run abstract_analysis.py

    python abstract_analysis.py -h
    usage: abstract_analysis.py [-h] [-b] [-f] [-s]
                                [{arxiv,IEEE_TLT,IEEE_TPAMI,merged}]
                                [retraining_size]

    positional arguments:
      {arxiv,IEEE_TLT,IEEE_TPAMI,merged}
                            The evaluation dataset, default= arxiv
      retraining_size        Data size for fine tuning, default= 340

    optional arguments:
      -h, --help            show this help message and exit
      -b, --generate_baseline
                            For generating baseline without pre training
      -f, --fine_tune_with_pred
                            For evaluating the effect of transfer learning
      -s, --predict_and_save
                            To generate labels for unlabled abstracts,
                            conflicts with -f/--fine_tune_with_pred

2 This might cause issues with line endings. To solve the issue open and save all files in the local system.

Contributors

  • Soumya Banerjee
  • Dr Debarshi Kr Sanyal
  • Dr Samiran Chattopadhyay

About


Languages

Language:Python 100.0%