To help search, filter, and download papers from 'acl anthology' (https://aclanthology.org/).
- Retrieve papers from acl anthology.
retrieve directly from website acl anthology.
e.g.Retriever.acl(2021, ConfConsts.LONG)
download all papers's info to local (MySQL database).
e.g.db = AnthologyMySQL(cache_enable=True)
db.create_tables()
db.load_data() # load data and put into database
- Import ABuilder to support chain operations for MySQL.
e.g.data = ABuilder().table('paper').where({"year": ["in", years_limit]}).where({"venue": ["in", venue_limit]}).query()
- Filter papers with by keyword.
e.g.filtered = papers.filter('title', 'xxx') | papers.filter('abstract', 'xxx')
e.g.filtered = papers.and_containing_filter(attr, [keyword1, keyword2])
- Download papers.
e.g.downloader.multi_download(filtered, download_path)
- Local cache available.
- Log available.
- Statistics available (although I only count the total number of papers).
- Firstly. MySQL is required. Mine is MySQL 8.
Configurate your MySQL database and add asrc/configuration/mysql_cfg.py
file.
The example ofsrc/configuration/mysql_cfg.py
is as follows:
class MySQLCFG(object):
HOST = 'localhost'
PORT = 3306
USER = "root"
PASSWORD = "xxx"
DB = "xxx"
Meanwhile, create the corresponding database in your MySQL database.
- Secondly. If you want to use ABuilder.
You need to make a tasks/database.py
with configurations of you MySQL.
You can refer to the homepage of ABuilder.
In the latest version, I made the tasks/database.py
get info from the configuration. No need to make this file any more:
- Download and decompress the code, open a terminal and checkout to the root directory.
run
pip install requirements.txt
cd tasks
python basic_task.py
By running this code, this basic_task
will firstly download all papers within a certain time span from Acl Anthology to the local disk, and then search papers by input key words.
I develop this project by Python 3.6, and it doesn't support python 2.
2023.6.14 The code is updated to support the lastest acl anthology pages. Current python version is 3.10 . 2023.7.2 Update the README.
@article{tang2022recent,
title={Recent advances in neural text generation: A task-agnostic survey},
author={Tang, Chen and Guerin, Frank and Li, Yucheng and Lin, Chenghua},
journal={arXiv preprint arXiv:2203.03047},
year={2022}
}
homepage
There are many conferences and contents belonging to them.
Choose one, and we can see papers' list.