gjrwebber / spring-data-gremlin

Spring data gremlin makes it easier to implement Graph based repositories. This module extends Spring Data to allow support for potentially any Graph database that implements the Tinkerpop Blueprints 2.x API.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AbstractPersonRespositoryTest.testCollectionsCascadeRemove() sometimes fails

gjrwebber opened this issue · comments

AbstractPersonRespositoryTest.testCollectionsCascadeRemove() test sometimes fails for OrientDB and/or TitanDB. I can't remember seeing it fail for Tinker, but it could have. If it does it is less common.

For some additional info, please refer to #44 Test cases *.java::testCollectionsCascadeRemove() fail occasionally. The log file attached in #44 shows that test case '....data.gremlin.object.tests.tinker.core.Tinker_Core_PersonRepositoryTest > testCollectionsCascadeRemove' fail sometimes.

#44 is closed as it is duplicated with this issue.

Dear Graham,

I did some debugging and found some hint. Would you like to check whether it is useful for you?

In my test, I run 'gradle test' many times under './spring-data-gremlin-tinker/'.

I added many logs into:

  1. **.AbstractPersonRespositoryTest.java::testCollectionsCascadeRemove(){};
  2. GremlinCollectionViaPropertyMapper.java::copyToVertex(){};
  3. GremlinGraphAdapter.java::addEdge(){}.

And found:

A. GremlinCollectionViaPropertyMapper.java::copyToVertex(){} always work well, including its code:
for (Edge vertexToDelete : CollectionUtils.disjunction(existingLinkedEdges, actualLinkedEdges)) { vertexToDelete.remove(); }

B. In the '*.core.repository.AbstractPersonRepositoryTest.java::testCollectionsCascadeRemove(){}',

`public void testCollectionsCascadeRemove() {
    Person graham = repository.findByFirstName("Graham").get(0);
    assertEquals(5, graham.getLocations().size());

    List<Located> locations = new ArrayList<Located>(graham.getLocations());
    locations.remove(0);
    graham.setLocations(new HashSet<Located>(locations)); 
    repository.save(graham); //<--

    graham = repository.findByFirstName("Graham").get(0);

    locations = new ArrayList<Located>(graham.getLocations());
    assertEquals(4, locations.size());
}`

For test case testCollectionsCascadeRemove(), in the above code, the line 'repository.save(graham)' normally only remove an edge, does not add a new edge into graph db. But sometimes when executing 'gradle test', this line call 'GremlinGraphAdapter.java::addEdge()'. This cause the problem.

When execute 'gradle test':

X. When this problem is reproduced, the log and stacktrace is like the attached file 'log-when-issue-is-reproduced.txt'.
log-when-issue-is-reproduced.txt

Y. When this problem is not reproduced, the line 4~84 in 'log-when-issue-is-reproduced.txt' does not show up.

For debugging, I added log into GremlinGraphAdapter.java::addEdge(){}. Can set breakpoint at line 'needLog = true;' in below code and use IDEA/Eclipse to debug it. If this line is reached, it means this problem is reproduced.

`public Edge addEdge(Object o, Vertex outVertex, Vertex inVertex, String name) {
    LOGGER.info("GremlinGraphAdapter.java::addEdge(), outVertex id is: " + outVertex.getId() + ", inVertex id is: " + inVertex.getId() + ", label is: " + name);

    Exception excep = new Exception("Just for print stacktrace.");
    boolean needLog = false;
    for (StackTraceElement stackTraceElement : excep.getStackTrace()){
        if (stackTraceElement.toString().contains("testCollectionsCascadeRemove")) {
            needLog = true;
        }
    }
    if(needLog){
        LOGGER.info("GremlinGraphAdapter.java::addEdge(): stacktrace: ", excep);
    }

    Edge edge = outVertex.addEdge(name, inVertex);
    return edge;
}`

Thanks,
Baogang

By the way, attached batch file gradle_test_till_fail.bat.txt is helpful to reproduce the this problem.

'cd /spring-data-gremlin/spring-data-gremlin-tinker' and run this bat file, it will execute 'gradle test' again and again till the exit code of 'gradle test' is not 0.

Thanks Baogang! I am away this weekend, so I will have a look when I get
back.

On Thu, 14 Apr 2016 at 23:42 kerler notifications@github.com wrote:

By the way, attached batch file gradle_test_till_fail.bat.txt
https://github.com/gjrwebber/spring-data-gremlin/files/219218/gradle_test_till_fail.bat.txt
is helpful to reproduce the this problem.

'cd /spring-data-gremlin/spring-data-gremlin-tinker' and run this bat
file, it will execute 'gradle test' again and again till the exit code of
'gradle test' is not 0.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#30 (comment)

Dear Graham,

Could you tell me your email address? I want to send you two log files. I met problem when uploading the log files to this issue.

My email is liubaogang@gmail.com.

I guess I found the root cause.

====Background Info====

The org.springframework.data.gremlin.object.core.domain.Person.java has two fields annotated with '@LinkVia':

`@LinkVia
private Set<Located> locations;

@LinkVia
private Located currentLocation;`

The current code of spring-data-gremlin has flaws when handling '@LinkVia':

Flaw_A: If a POJO class has multiple fields (no matter collection type or not) annotated with '@LinkVia', spring-data-gremlin can not differentiate them when loading value into those field from graph db, i.g. all the instances of Edge type(such as 'Located' above) are used to load values into each of thoes fields.

Flaw_B: If a POJO class has only one field which is "collection type annotated with @LinkVia", spring-data-gremlin can handle it well, using org.springframework.data.gremlin.schema.property.mapper.GremlinCollectionViaPropertyMapper.java. But, if a POJO class has a field which is "single element annotated with @LinkVia", for example the "currentLocation" above, spring-data-gremlin can NOT handle it well. See below code, for sevaral instances of the Edge type, actually the last instance in the for-loop is used to load value into the field.

   `org.springframework.data.gremlin.schema.property.mapper.GremlinLinkViaPropertyMapper.java:

    public <K> Object loadFromVertex(GremlinRelatedProperty property, Vertex vertex, Map<Object, Object> cascadingSchemas) {

    //        GremlinRelatedProperty adjacentProperty = getAdjacentProperty(property);

    Object val = null;
    for (Edge linkedEdge : vertex.getEdges(property.getDirection(), property.getRelatedSchema().getClassName())) {
        val = property.getRelatedSchema().cascadeLoadFromGraph(linkedEdge, cascadingSchemas);
    }

    return val;`

====Suggested Solution of Code Fixing====

For solving Flaw_A, when persisting an instance of POJO class, the name of field annotated with '@LinkVia' need to be persisted into graph db. For '@link' field, the field name is persisted to the Edge label. For '@LinkVia' field, the Edge labe has been used by the name of the class annotated with '@edge', so maybe spring-data-gremlin has to auto generate a field in Edge in graph db with field name like 'secondaryLabelName'.

For solving Flaw_B, need to change code GremlinLinkViaPropertyMapper.java based on the solution used for solving Flaw_A.

====The logic of occurance of this bug====

Based on 'Background Info' above, the bug can be reproduced when all the conditions mentioned below are satisfied.

C0. In testCollectionsCascadeRemove(), after executing "Person graham = repository.findByFirstName("Graham").get(0)", graham.getLocations() contains graham.getCurrentLocation().

(C0 is always true due to Flaw_A and Flaw_B.)

C1. In testCollectionsCascadeRemove(), after excuting "locations.remove(0)", just the value of "graham.getCurrentLocation()" is removed from the "locations".

(C1 may not be always true.)

C2. In testCollectionsCascadeRemove(), when executing "repository.save(graham)", the field "graham.locations" is saved to graph db first--this make one instance of Edge is removed from graph db, then the field "graham.currentLocation" is saved to graph db--this make the instance of Edge removed from graph db is inserted into graph db again.

(C2 may not be always true.)

Mate, you are a legend! Do you reckon you could create a merge request to
fix the test to fail every time?
On Fri, 15 Apr 2016 at 18:26, kerler notifications@github.com wrote:

I guess I found the root cause.

====Background Info====

The org.springframework.data.gremlin.object.core.domain.Person.java has
two fields annotated with '@LinkVia':

`@LinkVia
private Set locations;

@LinkVia
private Located currentLocation;`

The current code of spring-data-gremlin has flaws when handling '@LinkVia':

Flaw_A: If a POJO class has multiple fields (no matter collection type or
not) annotated with '@LinkVia', spring-data-gremlin can not differentiate
them when loading value into those field from graph db, i.g. all the
instances of Edge type(such as 'Located' above) are used to load values
into each of thoes fields.

Flaw_B: If a POJO class has only one field which is "collection type
annotated with @LinkVia", spring-data-gremlin can handle it well, using
org.springframework.data.gremlin.schema.property.mapper.GremlinCollectionViaPropertyMapper.java.
But, if a POJO class has a field which is "single element annotated with
@LinkVia", for example the "currentLocation" above, spring-data-gremlin can
NOT handle it well. See below code, for sevaral instances of the Edge type,
actually the last instance in the for-loop is used to load value into the
field.

`org.springframework.data.gremlin.schema.property.mapper.GremlinLinkViaPropertyMapper.java:

public <K> Object loadFromVertex(GremlinRelatedProperty property, Vertex vertex, Map<Object, Object> cascadingSchemas) {

//        GremlinRelatedProperty adjacentProperty = getAdjacentProperty(property);

Object val = null;
for (Edge linkedEdge : vertex.getEdges(property.getDirection(), property.getRelatedSchema().getClassName())) {
    val = property.getRelatedSchema().cascadeLoadFromGraph(linkedEdge, cascadingSchemas);
}

return val;`

====Suggested Solution of Code Fixing====

For solving Flaw_A, when persisting an instance of POJO class, the name of
field annotated with '@LinkVia' need to be persisted into graph db. For '
@link https://github.com/Link' field, the field name is persisted to
the Edge label. For '@LinkVia' field, the Edge labe has been used by the
name of the class annotated with '@edge https://github.com/Edge', so
maybe spring-data-gremlin has to auto generate a field in Edge in graph db
with field name like 'secondaryLabelName'.

For solving Flaw_B, need to change code GremlinLinkViaPropertyMapper.java
based on the solution used for solving Flaw_A.

====The logic of occurance of this bug====

Based on 'Background Info' above, the bug can be reproduced when all the
conditions mentioned below are satisfied.

C0. In testCollectionsCascadeRemove(), after executing "Person graham =
repository.findByFirstName("Graham").get(0)", graham.getLocations()
contains graham.getCurrentLocation().

(C0 is always true due to Flaw_A and Flaw_B.)

C1. In testCollectionsCascadeRemove(), after excuting
"locations.remove(0)", just the value of "graham.getCurrentLocation()" is
removed from the "locations".

(C1 may not be always true.)

C2. In testCollectionsCascadeRemove(), when executing
"repository.save(graham)", the field "graham.locations" is saved to graph
db first--this make one instance of Edge is removed from graph db, then the
field "graham.currentLocation" is saved to graph db--this make the instance
of Edge removed from graph db is inserted into graph db again.

(C2 may not be always true.)


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#30 (comment)

Dear Graham,

Just I did the following code change, I run the test cases several times, all failed. You may reproduce it on your PC.

`====Code Change====

In test cases below:
org.springframework.data.gremlin.object.core.repository.AbstractPersonRepositoryTest.java::testCollectionsCascadeRemove(){}
org.springframework.data.gremlin.object.neo4j.repository.AbstractPersonRepositoryTest.java::testCollectionsCascadeRemove(){}

Replace "locations.remove(0);" with:
"if (locations.contains(graham.getCurrentLocation())) {
locations.remove(graham.getCurrentLocation());
} else {
locations.remove(0);
}"

====End of Code Change====
`

I think the reasoning is that:

C0 is always true due to Flaw_A and Flaw_B;
C1 become true because of code change above;
C2 is true because in **.Person.java the field "locations" is in front of "currentLocation".

Ok thanks, I'll be able to look into this a bit later today. Thanks for
your help!

On Mon, 18 Apr 2016 at 10:54 kerler notifications@github.com wrote:

Dear Graham,

Just I did the following code change, I run the test cases several times,
all failed. You may reproduce it on your PC.

`====Code Change====

In test cases below:

org.springframework.data.gremlin.object.core.repository.AbstractPersonRepositoryTest.java::testCollectionsCascadeRemove(){}

org.springframework.data.gremlin.object.neo4j.repository.AbstractPersonRepositoryTest.java::testCollectionsCascadeRemove(){}

Replace "locations.remove(0);" with:
"if (locations.contains(graham.getCurrentLocation())) {
locations.remove(graham.getCurrentLocation());
} else {
locations.remove(0);
}"

====End of Code Change====
`

I think the reasoning is that:

C0 is always true due to Flaw_A and Flaw_B;
C1 become true because of code change above;
C2 is true because in **.Person.java the field "locations" is in front of
"currentLocation".


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#30 (comment)

Dear Graham,

After the above code change on the test case source code, although it is easier to reproduce the problem, in my later tests sometime the test run is still passed. So C2 may not always be true.

If in your test it is not easy to reproduce the problem, please let me know.

Hi Baogang, I have made a bunch of changes lately for a project I have been
working on. I have not been able to replicate this issue. Do you mind
checking as well?

Thanks,
Graham

On Mon, 18 Apr 2016 at 23:38 kerler notifications@github.com wrote:

Dear Graham,

After the above code change on the test case source code, although it is
easier to reproduce the problem, in my later tests sometime the test run is
passed. So C2 may not always be true.

If in your test it is not easy to reproduce the problem, please let me
know.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#30 (comment)

Hi Graham,

In branch https://github.com/kerler/spring-data-gremlin/tree/test-issue-30, there are code which make #30 reproducible. The branch is branched from the last code in 2015.

I tried to branch from the newest code in 2016.04 and make code change, but in my test there is below error.

java.lang.StackOverflowError at java.util.Objects.hashCode(Objects.java:98) at java.util.HashMap$Node.hashCode(HashMap.java:296) at java.util.AbstractMap.hashCode(AbstractMap.java:507) at java.util.Objects.hashCode(Objects.java:98) at java.util.HashMap$Node.hashCode(HashMap.java:296) at java.util.AbstractMap.hashCode(AbstractMap.java:507) at java.util.Objects.hashCode(Objects.java:98) at java.util.HashMap$Node.hashCode(HashMap.java:296) at java.util.AbstractMap.hashCode(AbstractMap.java:507) at java.util.Objects.hashCode(Objects.java:98) at java.util.HashMap$Node.hashCode(HashMap.java:296) at java.util.AbstractMap.hashCode(AbstractMap.java:507) at java.util.Objects.hashCode(Objects.java:98) at java.util.HashMap$Node.hashCode(HashMap.java:296) at java.util.AbstractMap.hashCode(AbstractMap.java:507) at java.util.Objects.hashCode(Objects.java:98) at java.util.HashMap$Node.hashCode(HashMap.java:296) at java.util.AbstractMap.hashCode(AbstractMap.java:507) at java.util.Objects.hashCode(Objects.java:98) at java.util.HashMap$Node.hashCode(HashMap.java:296) at java.util.AbstractMap.hashCode(AbstractMap.java:507) at java.util.Objects.hashCode(Objects.java:98) at java.util.HashMap$Node.hashCode(HashMap.java:296) at java.util.AbstractMap.hashCode(AbstractMap.java:507) at java.util.Objects.hashCode(Objects.java:98)

Thanks,
Baogang