alephdata / aleph

Search and browse documents and data; find the people and companies you look for.

Home Page:http://docs.aleph.occrp.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

BUG: PDF Folders "unknown"

PlainSite opened this issue · comments

Previously when importing data through alephclient, the "Folder" metadata item would be populated with the name of the folder a PDF was in. Now, it seems to be "unknown" for more recent imports.

I was using this information to link data in ElasticSearch with records in a separate external database. Its being missing is problematic as a result.

Steps to reproduce the behavior:

  1. Import PDFs through alephclient
  2. Check "Folder" for PDFs in Aleph UI

I'd expect the Folder to show the name of the folder the PDF was in.

I don't know what Aleph version I'm actually running. I've upgraded several times, and I don't think it has ever been a smooth process. Upgrading following the steps in the documentation never seems to increment the version number in the UI, which is very confusing. The "About" page of the UI says 3.12.7. My docker-compose.yml file says

version: "3.2"

at the top but also references 3.16.1 for convert-document and ingest-file, and 3.12.7 for worker, api, shell and UI. Also, it's impossible to fully follow the directions in the documentation to upgrade because of the error:

ERROR: for redis  missing signature key
ERROR: missing signature key

after running docker-compose pull --parallel.

Screenshot below.
folder

Hi @PlainSite, thanks for your bug report. I’ll start with some information about the different version numbers. Hopefully that clears a few things up :)

  • The version at the top of the docker-compose.yml file (version: "3.2") is the version number of the Docker Compose file format. This is related to Docker Compose and not directly related to Aleph in any way.

  • The version displayed on the about page is the correct Aleph version number. This should match the version for the api, worker, ui services etc.

To get back to your original issue: Unfortunately, I haven’t been able to reproduce the issue you describe using the current Aleph version. In order to understand what’s going on, could you try the following:

  1. Create a minimal reproduction example (e.g. create a folder with only one subdirectory and a single file in that directory).

  2. Upload the folder with alephclient.

  3. Post the alephclient command and output. Make sure to redact any potentially sensitive information.

Closing this for now. @PlainSite please feel free to reopen if you can provide reproduction steps.