Adding a new entity from a custom processor
terion-name opened this issue · comments
Hi! Sorry for noob questions here, but docs are lacking this topic and there is nothing I could find online on this.
I am trying to make a custom processor for aleph. Managed to set up it in basic form, was referencing https://github.com/alephdata/ingest-file/blob/main/ingestors/manager.py
I successfully read data during ingestion, I can edit entities properties and save them successfully:
class ServiceWorker(Worker):
def _analyze(self, dataset, task):
entity_ids = set(task.payload.get("entity_ids"))
writer = dataset.bulk()
for entity in dataset.partials(entity_id=entity_ids):
if entity.schema.is_a('Analyzable'):
entity.set('title', 'test title')
writer.put(entity)
writer.flush()
return list(entity_ids)
What I am struggling with is creating new entities. I analyze input entities and extract valuable data from them and want to put them back to aleph. I've made it like in https://github.com/alephdata/ingest-file/blob/main/ingestors/manager.py but with no luck. Example:
class ServiceWorker(Worker):
def _analyze(self, dataset, task):
entity_ids = set(task.payload.get("entity_ids"))
writer = dataset.bulk()
for entity in dataset.partials(entity_id=entity_ids):
if entity.schema.is_a('Analyzable'):
newentity = model.make_entity(model.get("Person"), key_prefix=dataset.name)
newentity.add('name', 'John Doe')
newentity.add('birthDate', '1980-01-01')
newentity.add('nationality', 'us')
# newentity.add("document", entity.id)
newentity.make_id('John Doe')
newentity.context = {
"created_at": entity.context.get("created_at"),
"updated_at": entity.context.get("updated_at"),
"role_id": entity.context.get("role_id"),
"mutable": False,
}
# newentity.set("processingStatus", "success")
writer.put(newentity.to_dict())
writer.flush()
return list(entity_ids)
This code produces no errors and at a first glance seems to work, but no new entites appear in dataset and I can't see any new records in database.
Adding newentity.set("processingStatus", "success")
or newentity.add("document", entity.id)
results in error (like unknown property document
)
Adding this also doesn't help:
newid = newentity.make_id('John Doe')
newentity.context = {
"created_at": entity.context.get("created_at"),
"updated_at": entity.context.get("updated_at"),
"role_id": entity.context.get("role_id"),
"mutable": False,
}
Namespace(entity.context.get("namespace")).apply(entity)
entity_ids.add(newid)
What am I missing?
Well, this code actually works. It seems that UI has a big lag in update and because of this I didn't see that entity was added on dataset homepage untill restarted entire compose. After that it appeared and inside of People
collection interface it updates ok (but not on homepage)