alephdata / aleph

Search and browse documents and data; find the people and companies you look for.

Home Page:http://docs.aleph.occrp.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Adding a new entity from a custom processor

terion-name opened this issue · comments

Hi! Sorry for noob questions here, but docs are lacking this topic and there is nothing I could find online on this.

I am trying to make a custom processor for aleph. Managed to set up it in basic form, was referencing https://github.com/alephdata/ingest-file/blob/main/ingestors/manager.py

I successfully read data during ingestion, I can edit entities properties and save them successfully:

class ServiceWorker(Worker):
    def _analyze(self, dataset, task):
        entity_ids = set(task.payload.get("entity_ids"))
        writer = dataset.bulk()
        for entity in dataset.partials(entity_id=entity_ids):
            if entity.schema.is_a('Analyzable'):
                entity.set('title', 'test title')
                writer.put(entity)
        writer.flush()
        return list(entity_ids)

What I am struggling with is creating new entities. I analyze input entities and extract valuable data from them and want to put them back to aleph. I've made it like in https://github.com/alephdata/ingest-file/blob/main/ingestors/manager.py but with no luck. Example:

class ServiceWorker(Worker):
    def _analyze(self, dataset, task):
        entity_ids = set(task.payload.get("entity_ids"))
        writer = dataset.bulk()
        for entity in dataset.partials(entity_id=entity_ids):
            if entity.schema.is_a('Analyzable'):
                newentity = model.make_entity(model.get("Person"), key_prefix=dataset.name)
                newentity.add('name', 'John Doe')
                newentity.add('birthDate', '1980-01-01')
                newentity.add('nationality', 'us')
                # newentity.add("document", entity.id)
                newentity.make_id('John Doe')
                newentity.context = {
                    "created_at": entity.context.get("created_at"),
                    "updated_at": entity.context.get("updated_at"),
                    "role_id": entity.context.get("role_id"),
                    "mutable": False,
                }
                # newentity.set("processingStatus", "success")
                writer.put(newentity.to_dict())
        writer.flush()
        return list(entity_ids)

This code produces no errors and at a first glance seems to work, but no new entites appear in dataset and I can't see any new records in database.

Adding newentity.set("processingStatus", "success") or newentity.add("document", entity.id) results in error (like unknown property document)

Adding this also doesn't help:

                newid = newentity.make_id('John Doe')
                newentity.context = {
                    "created_at": entity.context.get("created_at"),
                    "updated_at": entity.context.get("updated_at"),
                    "role_id": entity.context.get("role_id"),
                    "mutable": False,
                }
                Namespace(entity.context.get("namespace")).apply(entity)
                entity_ids.add(newid)

What am I missing?

Well, this code actually works. It seems that UI has a big lag in update and because of this I didn't see that entity was added on dataset homepage untill restarted entire compose. After that it appeared and inside of People collection interface it updates ok (but not on homepage)