SebastianPartarrieu / paperXai

Your arxiv daily digest; brought to you by your favorite LLM provider

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

paperXai License: MIT Python PRs Welcome

Current ambition: Your arXiv daily digest - brought to you by your favorite LLM provider.

This is a very early-stage project which aims to sift through all the latest papers in AI posted on arXiv and filter according to your interests before giving you a short summary. The main pain point we're trying to solve is the sheer mass and noise of current information around AI.

Future ambitions: aggregate across multiple news sources/paper repositories and create a fully automatic, personalized, newsletter that we can each tweak according to what we want to read. There are a few great newsletters out there and I'm not saying these will be out of business just yet; however, why can't I read something completely tailored to my interests?

Installation

If you already have miniconda installed (if not go install it):

  • conda env create -f environment.yml
  • conda activate llms
  • pip install -r requirements.txt
  • pip install -e .
  • go to src/paperxai, create a credentials.py file and enter fill in OPENAI_API_KEY = "your-key-here"

Once you've finished the installation procedure, a good place to start may be the notebooks/example_workflow.ipynb notebook which gives a good overview of the different parts of the package.

Usage

The most important details of the report are defined in the config.yml file (sections, questions, llm provider ...).

Option #1 -> run a script or notebook

conda activate llms

python scripts/create_arxiv_report --path_config config.yml

open display/reports/{Y-m-d}-report.html: this should open the report in your browser to make it easier to read (you might need to run {browser_name} display/reports/{Y-m-d}-report.html).

You can follow the details of the script workflow in the notebook for an overview of the details of how we create the report.

Option #2 -> use the streamlit webapp

cd display

streamlit run webapp.py

Testing

Development

Any contributions are welcome. Starting out as a solo project, I took the very bad habit of using only the master branch before using a cleaner feature branch based development process. There are also some arbitrary choices that have been made (such as using some minimalist modules instead of using libraries like langchain).

(checklist) before pushing changes or opening a PR:

  • pip list --format=freeze > requirements.txt
  • remove paperxai from the requirements and add pip install -e .

TODO

(Not necessarily in order of priority)

  • Write script to run report creation from CLI
  • More formats for the report (e.g markdown, pdf)
  • Quick streamlit webapp where you enter api key, launch report creation and it loads the report directly
  • Work further on report style
  • Handle pubmed API and adapt report creation code
  • Support email integration to receive it automatically
  • Handle document batching and retrieval from the whole paper for those selected based off abstract

Disclaimer

This does not substitute discovering papers/information through the multitude of other ways. It's useful if you have a few predefined topics and want to sift through the large volume of incoming information. It's a toy project.

Thanks

Thank you to arXiv for use of its open access interoperability.

About

Your arxiv daily digest; brought to you by your favorite LLM provider

License:MIT License


Languages

Language:Jupyter Notebook 47.8%Language:Python 35.9%Language:HTML 16.2%