Lulzx / sotawhat

Returns latest research results by crawling arxiv papers and summarizing abstracts. Helps you stay afloat with so many new papers everyday.

Home Page:https://huyenchip.com/2018/10/04/sotawhat.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

sotawhat

License

Read more about SOTAWHAT here.

You can use sotawhat through a web interface here. Thanks hmchuong!

This script runs using Python 3. It requires nltk, six, and pyspellchecker. To install it as a Python package, follow the following steps:

Step 1: clone this repo, and go inside that repo:

$ git clone [HTTPS or SSH linnk to this repo]
$ cd sotawhat

Step 2: install using pip

$ pip3 install .

On Windows, due to encoding errors, the script may cause issues when run on the command line. It is recommended to use pip install win-unicode-console --upgrade prior to launching the script. If you get UnicodeEncodingError, you must install the above.

In MacOS, you can get the SSL error

[nltk_data] Error loading punkt: <urlopen error [SSL:
[nltk_data]     CERTIFICATE_VERIFY_FAILED] certificate verify failed:
[nltk_data]     unable to get local issuer certificate (_ssl.c:1045)>

this will be fixed by reinstalling certificates

$ /Applications/Python\ 3.x/Install\ Certificates.command

Usage

This project adds the sotawhat script for you to run globally on Terminal or commandline.

To query for a certain keyword, run:

$ sotawhat [keyword] [number of results]

For example:

$ sotawhat perplexity 10

or

$ sotawhat language model 10

If you don't specify the number of results, by default, the script returns 5 results. Each result contains the title of the paper with author and published date, a summary of the abstract, and link to the paper.

We've found that this script works well with keywords that are:

  • a model (e.g. transformer, wavenet, ...)
  • a dataset (e.g. wikitext, imagenet, ...)
  • a task (e.g. language model, machine translation, fuzzing, ...)
  • a metric (e.g. BLEU, perplexity, ...)
  • random stuff

Summarization

You can also use the script to summarize a paper using GPT3.5 after you get it's url from the step above. For example:

$ sotawhat summarize https://arxiv.org/abs/1809.04281

It uses the gpt-3.5-turbo-16k model and will request for your OpenAI API key. You can get one here. The simple prompt will generate a 150 word summary of the paper to help you decide if you want to read further.

You can also use the script to list down the key findings of the paper if you don't feel like leaving the command line interface using:

$ sotawhat keyfindings https://arxiv.org/abs/1809.04281

The script works well with papers shorter than 20 pages or so as the max token length is 16k, any paper or document bigger than that might throw an error.

About

Returns latest research results by crawling arxiv papers and summarizing abstracts. Helps you stay afloat with so many new papers everyday.

https://huyenchip.com/2018/10/04/sotawhat.html


Languages

Language:Python 100.0%