imshashank / Open-Access-Corpus

The growth of open access (OA) academic publication world- wide challenges libraries and academics about how to organize, rank, and include new work in library catalogs. However, systematic evaluation of trends in cross-disciplinary open access publication is hindered by fragmentation of sources and inconsistent presentation of content. To facilitate research in this space, we have compiled the holdings of the three largest directories of open access academic publications: Directory of Open Access Journals (DOAJ), PubMed Central Open Access (PMC OA), and ArXiv. We standardize record format to support cross-source comparison, and we apply a variety of heuristics and external sources to improve record quality. We publish this corpus of 2,788,559 documents as a freely available resource at http://goo.gl/ KOlgVX.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

imshashank/Open-Access-Corpus Issues

No issues in this repository yet.