scieloorg / opac-airflow

Componente de coleta e identificação das alterações realizadas nos metadados do SciELO

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ausência de tratamentos de exceções em `check_website` para `sci_arttext` e `sci_pdf`

robertatakenaka opened this issue · comments

Descrição do problema

Ausência de tratamentos de exceções em check_website para sci_arttext e sci_pdf

Passos para reproduzir o problema

  1. Acesse a página

https://airflow.scielo.br/admin/airflow/log?task_id=check_sci_pdf_uri_items_id&dag_id=check_website&execution_date=2020-10-27T16%3A38%3A37.327593%2B00%3A00&format=json

  1. Observe os erros
[2020-10-27 17:58:31,576] {async_requests.py:45} INFO - Requested https://new.scielo.br/scielo.php?script=sci_pdf&pid=S0037-86822017000500658: 301
[2020-10-27 17:58:31,896] {async_requests.py:45} INFO - Requested https://new.scielo.br/scielo.php?script=sci_pdf&pid=S0037-86822017000500670: 301
[2020-10-27 17:58:31,920] {async_requests.py:45} INFO - Requested https://new.scielo.br/scielo.php?script=sci_pdf&pid=S0037-86822017000500675: 301
[2020-10-27 17:58:32,035] {async_requests.py:45} INFO - Requested https://new.scielo.br/scielo.php?script=sci_pdf&pid=S0037-86822017000500680: 301
[2020-10-27 17:58:32,278] {async_requests.py:45} INFO - Requested https://new.scielo.br/scielo.php?script=sci_pdf&pid=S0037-86822017000500689: 301
[2020-10-27 17:58:39,624] {base_events.py:1619} ERROR - unhandled exception during asyncio.run() shutdown
task: <Task finished coro=<fetch() done, defined at /usr/local/airflow/dags/common/async_requests.py:20> exception=UnboundLocalError("local variable 'response' referenced before assignment")>
Traceback (most recent call last):
  File "/usr/local/airflow/dags/common/async_requests.py", line 28, in fetch
    async with do_request(uri) as response:
  File "/usr/local/lib/python3.7/site-packages/aiohttp/client.py", line 1012, in __aenter__
    self._resp = await self._coro
  File "/usr/local/lib/python3.7/site-packages/aiohttp/client.py", line 582, in _request
    break
  File "/usr/local/lib/python3.7/site-packages/aiohttp/helpers.py", line 596, in __exit__
    raise asyncio.TimeoutError from None
concurrent.futures._base.TimeoutError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/asyncio/runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "/usr/local/lib/python3.7/asyncio/base_events.py", line 587, in run_until_complete
    return future.result()
  File "/usr/local/airflow/dags/common/async_requests.py", line 55, in fetch_many
    for url in uri_items
  File "/usr/local/airflow/dags/common/async_requests.py", line 39, in fetch
    response.uri = uri
UnboundLocalError: local variable 'response' referenced before assignment

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/airflow/dags/common/async_requests.py", line 28, in fetch
    async with do_request(uri) as response:
  File "/usr/local/lib/python3.7/site-packages/aiohttp/client.py", line 1012, in __aenter__
    self._resp = await self._coro
  File "/usr/local/lib/python3.7/site-packages/aiohttp/client.py", line 483, in _request
    timeout=real_timeout
  File "/usr/local/lib/python3.7/site-packages/aiohttp/connector.py", line 499, in connect
    raise e
  File "/usr/local/lib/python3.7/site-packages/aiohttp/connector.py", line 490, in connect
    await fut
concurrent.futures._base.CancelledError

Comportamento esperado

Que as exceções sejam tratadas e que o processamento passe para o próximo documento.
Verificar a possiblidade de fazer lotes menores de URI

Screenshots ou vídeos

n/a

Anexos

n/a

Ambiente utilizado

produção