abhishek9sharma / apibot

Python Scripts to Convert Java API Documentation into Natural Language Format

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ExtractDocInfo

A set of Python Scripts to Convert Java API Documentation into a more readable Natural Language Format. These related to w.r.t Domain Adapation Component w.r.t Paper "APIBot: Question Answering Bot for API Documentation"

Requirements

python>=3.5.2 (Tested on Python 3.5.2 and Ubuntu 16.04 LTS)
Ubuntu==16.04 LTS
html2text==2018.1.9
nltk==3.2.1
beautifulsoup4==4.6.0
lxml==4.2.1

Usage

Download the whole project and then

  1. Delete the .keep files in all subdirectories of folder Data.
  2. Download the Java SE Documentaion from the official link.
  3. Unzip the .zip file extracted in previous step to folder Data. You should see a docs folder.
  4. Go the the folder ExtractDocInfo and run IterateOverAPIDocs.py.
  5. You should see the converted documents in the folder FACTS.

Misc:

  • In case you delete the folders in Data you may run the .\setup.sh file present in the same folder.

References

Bibtex Citation

@inproceedings{
  tian2017apibot,
  title={APIBot: Question answering bot for API documentation},
  author={Tian, Yuan and Thung, Ferdian and Sharma, Abhishek and Lo, David},
  booktitle={Automated Software Engineering (ASE), 2017 32nd IEEE/ACM International Conference on},
  pages={153--158},
  year={2017},
  organization={IEEE}
}
Tested on Python 3.5.2 and Ubuntu 16.04 LTS

About

Python Scripts to Convert Java API Documentation into Natural Language Format

License:MIT License


Languages

Language:Python 99.9%Language:Shell 0.1%