orientechnologies / orientdb-gremlin

TinkerPop3 Graph Structure Implementation for OrientDB

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

java.lang.IllegalArgumentException: rawElement must not be null!

jbreiding opened this issue · comments

I need a little bit of help narrowing down how to prevent this error.

Let me first bring up this is non-deterministic and I am unable to reproduce locally, however I have a deployed environment we I know I will happen a few times a day. So what I am looking for is the right kind of knobs to turn to collect the right debugging information.

I am using kotlin as the dev language. This is running against 3.0.29 for orientdb-gremlin library and docker tag is 3.0.29-tp3. And tinkerpop gremlin-driver is 3.4.4.

Running on azulu-11 for runtime.

java.lang.IllegalArgumentException: rawElement must not be null! at org.apache.tinkerpop.gremlin.orientdb.OrientElement.<init>(OrientElement.java:24) at org.apache.tinkerpop.gremlin.orientdb.OrientVertex.<init>(OrientVertex.java:26) at org.apache.tinkerpop.gremlin.orientdb.OElementFactory.wrapVertex(OElementFactory.java:22) at org.apache.tinkerpop.gremlin.orientdb.OrientVertex.lambda$vertices$1(OrientVertex.java:46) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195) at java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1812) at java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:294) at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206) at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:169) at java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:300) at java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681) at org.apache.tinkerpop.gremlin.process.traversal.step.map.FlatMapStep.processNextStart(FlatMapStep.java:44) at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:143) at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:50) at org.apache.tinkerpop.gremlin.process.traversal.step.filter.FilterStep.processNextStart(FilterStep.java:37) at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:143) at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.hasNext(ExpandableStepIterator.java:42) at org.apache.tinkerpop.gremlin.process.traversal.step.util.CollectingBarrierStep.processNextStart(CollectingBarrierStep.java:107) at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:143) at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:50) at org.apache.tinkerpop.gremlin.process.traversal.step.filter.FilterStep.processNextStart(FilterStep.java:37) at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:143) at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.hasNext(DefaultTraversal.java:197) at kotlin.sequences.TransformingSequence$iterator$1.hasNext(Sequences.kt:176) at kotlin.sequences.FilteringSequence$iterator$1.calcNext(Sequences.kt:131) at kotlin.sequences.FilteringSequence$iterator$1.hasNext(Sequences.kt:156) at kotlin.sequences.SequencesKt___SequencesKt.toCollection(_Sequences.kt:702) at kotlin.sequences.SequencesKt___SequencesKt.toMutableList(_Sequences.kt:732) at kotlin.sequences.SequencesKt___SequencesKt.toList(_Sequences.kt:723)

Hi @jbreiding

do you have the traversal that throw this error?

Thanks

Sort of, the trouble is this doesn't hit consistently on the same point in the traversal every time.

What specifics are looking for? I can see if I can mock one out, as at time this could contain where steps and sometimes they do not.

Hi @jbreiding

do you have the traversal that throw this error?

Thanks

For the most recent error from telemetry the traversal looks similar to this:

g.V({rid}).in({edgelabel}).out({edgelabel}).where(__.has("id", P.neq("")).range(0, 50)

The results are processed in a kotlin sequence as follows:
.asSequence().mapNotNull { it.toModel<{targetClass}>() }.toList()

Ultimately running this same query a second, which we have built in results in an error free execution.

@jbreiding

just a question how did you configure the g?

I believe every configuration knob is a default. The graph for this traversal is acquired from a vertex object, vertexObject.graph();

That vertex object is from another another query, and graph that graph/g was from an OrientGraphFactory pool with a maxsize of 500.

Currently the workload is spread across two api servers communicating to a single orientdb instance deployed as a stateful set in a k8s cluster.

Is that what you were looking for?
Connections are acquired they are from noTx()

@jbreiding

reading the code one possible cause is that the edge contains in or from pointer to vertex that is null or point to a deleted vertex.

You can check this by querying directly the edge class looking for out/in field which are null.

Interesting, I could query for edges across my db instance for this condition to confirm if it could be the case correct?

select from E where in is null or out is null
This returns an empty set, is that the direction you were suggesting?

I would like to clarify something, the same query gets retried when an error like this occurs from our user interface. That request succeeds, I have done some digging to see if it was data related, but given that behavior I would expect I could run the same query and reproduce the issue locally. This not something I have been able to do.

I also see a similar error being logged in my telemetry, I have confirmed this edge is populated, the in and out are not empty.

21:48:37.512 [reactor-http-epoll-1] WARN c.o.o.core.record.impl.OEdgeIterator - Record (#24:556) is null

at com.orientechnologies.common.log.OLogManager.log(OLogManager.java:188) at com.orientechnologies.common.log.OLogManager.warn(OLogManager.java:229) at com.orientechnologies.orient.core.record.impl.OEdgeIterator.createGraphElement(OEdgeIterator.java:73) at com.orientechnologies.orient.core.record.impl.OEdgeIterator.createGraphElement(OEdgeIterator.java:36) at com.orientechnologies.orient.core.iterator.OLazyWrapperIterator.hasNext(OLazyWrapperIterator.java:93) at com.orientechnologies.common.collection.OMultiCollectionIterator.hasNextInternal(OMultiCollectionIterator.java:97) at com.orientechnologies.common.collection.OMultiCollectionIterator.hasNext(OMultiCollectionIterator.java:78) at com.orientechnologies.orient.core.record.impl.OEdgeToVertexIterator.hasNext(OEdgeToVertexIterator.java:25)

@jbreiding

try this, which will also detect if some link is invalid

select from E where out.@class is null or in.@class is null

Empty result set as well.

I spent a bit of time experimenting. There is very clear a bit of a data issue in our implementation. There is a bit of wonkiness in how I was sharing connections across http requests.

I have cleared that up in favor of sharing just the pool/factory. I am still seeing some wonkiness and I am digging further. But it is definitely looking exactly like an underlying data problem. Thanks for your help.