Future-House / paper-qa

High accuracy RAG for answering questions from scientific documents with citations

Home Page:https://futurehouse.gitbook.io/futurehouse-cookbook

Repository from Github https://github.comFuture-House/paper-qaRepository from Github https://github.comFuture-House/paper-qa

Error with async vs non async design

ncilfone opened this issue · comments

The synchronous version just calls the async version in a loop.

Per the docs: https://futurehouse.gitbook.io/futurehouse-cookbook/paperqa#async

This can cause issues if there is already an event loop running since the 'sync' code is still calling async code under the hood.

For instance, trying to run pqa2 using Sanic (which uses uvloop under the hood to support async) and using the following code which should be 'sync'

docs.add(path=path, settings=settings)

Results in the following

File "uvloop/loop.pyx", line 1512, in uvloop.loop.Loop.run_until_complete
File "uvloop/loop.pyx", line 1505, in uvloop.loop.Loop.run_until_complete
File "uvloop/loop.pyx", line 1379, in uvloop.loop.Loop.run_forever
File "uvloop/loop.pyx", line 520, in uvloop.loop.Loop._run
RuntimeError: this event loop is already running.

TL;DR Looping async code in a sync function by simply wrapping it in run_until_complete doesn't guarantee sync behavior.

Pulling a quote from the nest_asyncio library (RIP Ewald):

"By design asyncio does not allow its event loop to be nested. This presents a practical problem: When in an environment where the event loop is already running it’s impossible to run tasks and wait for the result. Trying to do so will give the error “RuntimeError: This event loop is already running”."

Hey there, @ncilfone! I'm Dosu, a bot here to help you with bug fixes, questions, and contributions. I'm here to assist you while you wait for a human maintainer.

The issue you're encountering is due to the fact that the synchronous docs.add method in PaperQA internally calls an asynchronous function using run_until_complete, which can lead to a RuntimeError if an event loop is already running, as is the case with frameworks like Sanic that use uvloop for async operations [1].

To resolve this, you should use the asynchronous version of the method, docs.aadd, directly in environments where an event loop is already active. This avoids the problem of trying to nest event loops. Here's how you can modify your code to use the async method:

import asyncio
from paperqa import Docs

async def main():
    docs = Docs()
    await docs.aadd(path=path, settings=settings)

# Ensure this is called in an environment where the event loop is managed, like within a Sanic route handler
await main()

This approach ensures that you are not attempting to run a new event loop within an existing one, thus avoiding the RuntimeError.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

Resolved this in #925 and #953, thanks for the surfacing this to us 👍