An automated tool for discovering insights from research papaer corpora
- Exponential growth of papers outpacing human review capacity
- Challenges in retrieving papers from rapidly evolving domains
- Inability to construct and contrast arguments across multiple papers
- Propagation of problematic claims due to incomplete reviews
We process each documents once and index the extracted key points for search. The users only need to read the full documents that are almost guaranteed to contain relevant information. This saves both users' time and compute. Each user interaction is only a simple embedding and cosine similarity match (maybe rarank).
- ACL 2023 Paper in Markdown after OCR - This dataset contains 2150 papers in markdown format from Association for Computational Linguistics (ACL) 2023
- Clean up the current code base and upload the key components (frontend, backend, data pipeline)
- Buy a domain and design a landing page
- Host a public demo (GCP credit secured)
- Publish the first draft of technical report / paper
- Process ACL, NAACL, EMNLP, EACL papers (2022, 2023, 2024) for the public demo
- Release the first public dataset
- Publish the first “delve-deep” report
- ... (I have a long list)
Please allow me to release all the draft frontend and backend code. Then I will make a plan to gather all the efforts from the open-source community.
My name is Yifei Hu, a Ph.D. Candidate at Purdue University. I study NLP and HCI. You can follow me on X: https://x.com/hu_yifei