Not able to use JsonPathReader

Question

Not able to use JsonPathReader

genehynson opened this issue 3 years ago · comments

Hi there - I'm running into issues using the JsonPathReader controller service with the PutInfluxDatabaseRecord_2 processor. Can you help me identify if I'm doing something wrong or if this is a bug?

FlowFile content:
{"m":"m_val", "f1":"f1_val", "t1":"t1_val"}

Flow:

PutInfluxDatabaseRecord_2 Settings:

JsonPathReader settings:

Error:

nifi          | java.lang.IllegalStateException: Cannot write FlowFile to InfluxDB because the required field 'f' is not present in Record.
nifi          | 	at org.influxdata.nifi.processors.RecordToPointMapper.findRecordField(RecordToPointMapper.java:229)
nifi          | 	at org.influxdata.nifi.processors.RecordToPointMapper.lambda$mapFields$0(RecordToPointMapper.java:133)
nifi          | 	at java.util.ArrayList.forEach(ArrayList.java:1259)
nifi          | 	at org.influxdata.nifi.processors.RecordToPointMapper.mapFields(RecordToPointMapper.java:131)
nifi          | 	at org.influxdata.nifi.processors.RecordToPointMapper.mapRecord(RecordToPointMapper.java:108)
nifi          | 	at org.influxdata.nifi.processors.RecordToPointMapper.mapRecordV2(RecordToPointMapper.java:102)
nifi          | 	at org.influxdata.nifi.processors.internal.FlowFileToPointMapperV2.mapRecord(FlowFileToPointMapperV2.java:82)
nifi          | 	at org.influxdata.nifi.processors.internal.AbstractFlowFileToPointMapper.mapInputStream(AbstractFlowFileToPointMapper.java:128)
nifi          | 	at org.influxdata.nifi.processors.internal.AbstractFlowFileToPointMapper.mapFlowFile(AbstractFlowFileToPointMapper.java:85)
nifi          | 	at org.influxdata.nifi.processors.internal.FlowFileToPointMapperV2.addFlowFile(FlowFileToPointMapperV2.java:75)
nifi          | 	at org.influxdata.nifi.processors.PutInfluxDatabaseRecord_2.onTrigger(PutInfluxDatabaseRecord_2.java:169)
nifi          | 	at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
nifi          | 	at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1202)
nifi          | 	at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:214)
nifi          | 	at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:103)
nifi          | 	at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
nifi          | 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
nifi          | 	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
nifi          | 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
nifi          | 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
nifi          | 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
nifi          | 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
nifi          | 	at java.lang.Thread.run(Thread.java:748)

JsonTreeReader controller service works fine but I'm wanting to use JsonPathReader so I can query properties in a complex json object.

And according to this comment I believe this should be possible.

This feels like a pretty trivial example so I'm hoping someone can point out what I'm doing wrong - thanks!

NiFi Version: 1.14.0

Gene Hynson · Answer 1 · Wed Jan 05 2022 07:52:38 GMT+0800 (China Standard Time)

fwiw I've also tried creating an Arvo schema and setting it in the Schema Text property of JsonPathReader and got the same result. Here's the schema I used:

{
  "type": "record",
  "name": "mqtt_schema",
  "namespace": "mqtt.nifi",
  "doc:" : "ARVO schema for mqtt",
  "fields": [
    { "name": "m", "type": "string" },
    { "name": "f1", "type": "string" },
    { "name": "t1", "type": "string" }
  ]
}

Jakub Bednář · Answer 2 · Wed Jan 05 2022 16:03:55 GMT+0800 (China Standard Time)

Hi @genehynson,

thanks for using our bundle.

Your configuration looks right...

Can you check how looks like incoming message into PutInfluxDatabaseRecord_2?
Can you export and share your NiFi's flow?

Regards

Robert Hajek · Answer 3 · Wed Jan 05 2022 22:33:29 GMT+0800 (China Standard Time)

Hi @genehynson, you are using wrong field names in your scheme. Scheme field names must correspond with fields you want to use in PutInfluxDatabaseRecord_2.

Correct schema is:

{
  "type": "record",
  "name": "mqtt_schema",
  "namespace": "mqtt.nifi",
  "doc:" : "ARVO schema for mqtt",
  "fields": [
    { "name": "m", "type": "string" },
    { "name": "f", "type": "string" },
    { "name": "t", "type": "string" }
  ]
}

I assume that you want to convert field "f1" that from mqtt json message into field named "f" in influxdb.

Gene Hynson · Answer 4 · Thu Jan 06 2022 00:27:39 GMT+0800 (China Standard Time)

Hi @rhajek thanks for the tip! yeah that did indeed solve the issue.

Question: Does the Influxdb bundle not support the "Infer schema" setting?

Robert Hajek · Answer 5 · Thu Jan 06 2022 17:42:55 GMT+0800 (China Standard Time)

@genehynson, I have tested the "Infer schema" option with theJsonTreeReader and it works fine for simple json without the additional configuration. You only need to specify in the PutInfluxDatabaseRecord_2 which record fields will be used as the influx measurement, fields and tags.

JsonPathReader requires a JsonPath expression for each field.

Gene Hynson · Answer 6 · Fri Jan 07 2022 00:57:15 GMT+0800 (China Standard Time)

Hey @rhajek thanks for checking. I'm familiar with JsonTreeReader but want to use JsonPathReader so I can support more complex json objects.

Whenever I specify "infer schema" with JsonPathReader, all my FlowFiles are dropped. I believe I have everything configured correctly but please correct me if you see something wrong.

This is the error I get:

This is my PutInfluxDatabaseRecord_2 processor configuration:

This is my JsonPathReader configuration:

This is the message payload:

{"m":"m_val", "f1":"f1_val", "t1":"t1_val"}

And here is my flow definition:
testJson.json.zip

Robert Hajek · Answer 7 · Wed Jan 12 2022 21:59:20 GMT+0800 (China Standard Time)

Hi @genehynson,

I did some testing/debugging around "Infer Schema" and I found that it works differently from our expectation. The schema is created automatically from incoming json and contains only fields that are present in the json (with the same name).

Adding a new property in JsonPathReader does not add the property into the schema. This explains, why only fields with the same name are correctly mapped into record fields. In your example an infer schema contains only fields "m", "f1", "t1". Mappings f -> $.f1 and t ->$.t1 do nothing and are ignored. Renaming of the fields is not possible with an infer schema.

There are several options how to workaround this:

Use custom schema like

{
  "type": "record",
  "name": "mqtt_schema",
  "namespace": "mqtt.nifi",
  "doc:" : "ARVO schema for mqtt",
  "fields": [
    { "name": "m", "type": "string" },
    { "name": "f", "type": "string" },
    { "name": "t", "type": "string" }
  ]
}

Then the mapping f -> $.f1 will work.

Use JoltTransformJSON processor to convert json into flat json with final field names.
It depends on your use cases, may be custom written JSON transformation processor will be more flexible.

Gene Hynson · Answer 8 · Thu Jan 13 2022 00:28:13 GMT+0800 (China Standard Time)

Ah very interesting. Thanks for investigating further! I think we'll go with option 1 - creating an arvo schema. Thanks again for the quick responses.