Why another API - continue in JPA 3.0 as Not Only SQL API

Question

Why another API - continue in JPA 3.0 as Not Only SQL API

Tibor17 opened this issue 5 years ago · comments

I have experiences with graph database, Lucene, MongoDB and I watched your video https://www.youtube.com/watch?v=HLlKSDrRImA

Main Question:
Why we did not make an interface over existing solution and why we did not provide 3 or 4 sub-interfaces of the interface EntityManager for special databases?
e.g. JsonEntityManager, FullTextSearchEntityManager, GraphEntityManager, KeyValueStoreEntityManager. Combinations with these interfaces are possible as well.

I know Deltaspike CDI Extension project providing Repository API which has similar philosophy.
This NoSQL API with Repositories is a kind of similar to Deltaspike and Spring Data.

I am wondering why we develop such an API the Deltaspike already has, and why we do not adopt it in JPA 3.0 as I have described before.

We all know that JPA is not SQL only. It was used by the s/w developers together with RDBMS databases but least developers know that it can be used with NO SQL databases as well, e.g. Lucene and MongoDB. One prove is the existing project with OGM in Hibernate having FullTextEntityManager which extends the ordinal interface EntityMnager.

I guess all we have to do is to provide interface over existing databases and minimize the variations of very similar API like JPA and NoSQL.

Werner Keil · Answer 1 · Wed Apr 10 2019 01:51:28 GMT+0800 (China Standard Time)

I think that's a valid question. There are general IP issues @otaviojava and his company (Tomitribe) have to solve for ANY of this to inspire or improve ANY Jakarta EE project, but given JPA is already part of Jakarta EE and some core elements like Entity have a similar name, they might as well be in JPA if we find enough synergies.

Tibor Digana · Answer 2 · Wed Apr 10 2019 02:42:06 GMT+0800 (China Standard Time)

@PersistentContext(unitName = "crm")
private EntityManager em;
...
    GraphEntityManager gem = em.unwrap(GraphEntityManager.class);
    gem.createGraphQuery("(p:Person {name: \"Jennifer\"})-[rel:LIKES]->(g:Technology {type: \"Graphs\"})")

Romain Manni-Bucau · Answer 3 · Wed Apr 10 2019 02:43:55 GMT+0800 (China Standard Time)

Also have a more generic question: why a spec? Spring data prooves us that there is no abstraction possible for nosql databases today so it brings pretty much nothing and you integrate with vendors so is a spec worth it or just noise? In other words contributing to vendors is likely saner for jakataee until abstraction becoles a thing like jpa is.

Tibor Digana · Answer 4 · Wed Apr 10 2019 02:53:10 GMT+0800 (China Standard Time)

I was in the situation where I could keep EntityManager in the injection point and used FullTextSearchEntityManager which was proprietary class from Hibernate OGM.
I would like to use one approach in the entire application:

standard API
properietary implementation (Spring, Hibernate OGM)

but not both because there's is more and more questions why to use standard API, let's throw it away and rely on Hibernate and AMQ. One day you would have to invest energy to postpone maintenance because of using EclipseLink and Kafka instead. So you rework the code+deployment instead of deployment only. That's why Java/Jakarta EE is here : write once, run multiple times which means hardware independence in EE world.

Tibor Digana · Answer 5 · Wed Apr 10 2019 03:13:08 GMT+0800 (China Standard Time)

JPQL is very SQL-like, so I don't think it can be fully transformed to NoSQL query.
Not worth to use it here.
Instead, let's write JavaDoc where the method createGraphQuery() uses native graph query.
This means that the mental approach is the same in the entire application - still the same API from artifact jakartaee-api or so.

Otávio Santana · Answer 6 · Wed Apr 10 2019 05:02:02 GMT+0800 (China Standard Time)

Hey @Tibor17

First of all, thank you for the discussion, and I hope that you enjoyed the presentation.

That is an excellent question.
And we have several reasons to don't use EntityManager on this:

Yes, what you said is true; they had this plan. However, they rollback because EntityManager was designed on SQL technology. Oracle did an in-depth study about that, and the conclusion was the same and they did a presentation about Java EE 9. More information.
EntityManager has transaction methods that usually won't fit on NoSQL database such as getTransaction(), isJoinedToTransaction(), joinTransaction().
Usually, NoSQL databases won't fit with JTA and even databases that has support to ACID such as MongoDB that documentation has the warnings to do not use it all the time.

In most cases, multi-document transaction incurs a greater performance cost over single document writes, and the availability of multi-document transaction should not be a replacement for effective schema design.

SpringData don't use EntityManager, but it has its own interfaces that are the Repository and CrudRepository.
The NoSQL databases structures are different, so key-value a developer just can return information from the key and they, usually, don't have stored procedures, transactions and so on. So, does make sense to extend an interface that will use around 15% of the methods?
At the Hibernate OGM, there is the danger of overpromising. That means the mapper make an emulation of behavior that a NoSQL database does not support. Gunnar, at the time as Hibernate OGM engineer, did a presentation at JavaOne about this scope. Such as Cassandra and search for any field, "behind the lines", it needs to put either a Lucene engine or create several secondary indexes.

Secondary indexes are tricky to use and can impact performance greatly.
From the documentation.

NoSQL database has several query languages such as Gremlin, CQL, N1QL, MangoQL and so on. That won't fit in just SQL as a native query.

To conclude, yes, we can think of an interface that will be shared with both SQL and NoSQL once they are storage technology. But, I don't think an interface make to SQL such as EntityManager is generic enough to do this kind of job. The annotations are useful to start once it uses some DDD; therefore, technology agnostic. But the relationship annotation is also another issue, once usually NoSQL does not have support to normalization approach. Indeed, they go to the opposite side. The closest one to use these association annotations such as OneToMany are Graph databases such as Neo4J, but it has an edge that can be much deeper than just "emulate" SQL database.

I did this study using around 25 databases that you can find here: http://www.jnosql.org/doc/

If there is something that is not clear, please, let me know and again thank you for the discussion.

Tibor Digana · Answer 7 · Wed Apr 10 2019 08:30:46 GMT+0800 (China Standard Time)

@otaviojava
What makes sense to me is to use Repository API in JPA (follow Deltaspike).
The JPA API in here would more break this API.

Otávio Santana · Answer 8 · Wed Apr 10 2019 16:50:58 GMT+0800 (China Standard Time)

@Tibor17

Yes, I like an interface approach. Currently, that is what Spring Data does.
We can talk about it on the JPA list and see what they think about it:
https://accounts.eclipse.org/mailing-list/jpa-dev
Perhaps, a module with some annotations and this interface that will share among these persistence technologies.
What do you think?
If yes, could you please fire the message there?

Otávio Santana · Answer 9 · Sun Apr 14 2019 18:16:52 GMT+0800 (China Standard Time)

Romain, about Spring Data. They are for far the most successful framework when we talk about integration between Java and NoSQL database.
That why we've drunk on their reference.
To know more about https://spring.io/projects/spring-data

Romain Manni-Bucau · Answer 10 · Sun Apr 14 2019 18:27:41 GMT+0800 (China Standard Time)

@otaviojava in terms of marketing and communication only. Lot of long term project drop it because its duplicates an api which must be known anyway without making straight to evolve and maintain with custom logic. It encourages a new duplicated layer (~DAO) so is not always a good bet. Short term projects and PoC love it but this is not real life and valid for jakarta. Let's assume you drop spring data, what is the complexity to impl the same thing? Almost the same. I am not saying that randomly.
Why spring is most known is mainly marketing and cause it is an umbrella project for 6-7 subprojects. Typically you know apache commons but do you know all of them?
My request is to not do of jakarta a new microprofile where api are copied from some other framework without any thinking of the ecosystem and real life projects feedback. This is what is nosql to me at the moment because it does not provide a way to switch of backend - whereas jpa does to be concrete, so it is not yet needed and would compete with a concurrent it cant compete with, the actual vendor.

Oliver Drotbohm · Answer 11 · Mon Apr 15 2019 19:19:39 GMT+0800 (China Standard Time)

I think this would be a bad idea. I've stated in a few places already why JPA is a bad fit for NoSQL data stores. The bottom line is that JPA has so many concepts built in that only make sense on relational databases, you'd end ups spending most of the documentation which parts of it don't work for which store and why. That's tedious for the implementors. It's also not very convenient to use as a user. Ironically, the lower JDBC has much less ties to relational databases than the high-level JPA.

In general, I'd even go as far as contesting whether it's a good idea to abstract over database types in general, but that's a different story.

One thing I think might be worth exploring (probably in a different ticket) is whether the repository concepts in DeltaSpike can be aligned with the ones proposed for JNoSQL. The Spring Data team would be happy to provide our experience. That would then allow people to use a repository programming model that's consistent in the JakartaEE world (across Deltaspike for JPA and NoSQL via JNoSQL) and the Spring world. Probably worth noting that all the Spring Data modules already ship a CDI extension, so that the repositories (backed by both relational and non-relational) stores can seamlessly be used in a JavaEE / JakartaEE environment already.

Lot of long term project drop it…

Can you qualify this with numbers? We see usage numbers doubling year over year, especially in the JPA space as it's so much convenience for so little effort.

… because its duplicates an API…

Which API?

It encourages a new duplicated layer (~DAO)…

Duplicating what? Repositories are a building block of DDD, not a "layer".

Why spring is most known is mainly marketing…

And shipping repository implementations for JPA and NoSQL stores for almost a decade, successfully running in production ever since. I guess that's the more convincing part of it. Marketing doesn't ship code. 😉

This is what is nosql to me at the moment because it does not provide a way to switch of backend - whereas jpa does to be concrete, so it is not yet needed and would compete with a concurrent it cant compete with, the actual vendor.

In the 10 years I've spend on this stuff, the amount of projects I have seen that would've switched JPA implementations or even databases can be counted on the fingers of one hand. That might be my filter bubble admittedly but something so rare feels like a weird thing to optimize for. Especially in the NoSQL space where you – more than with relational databases – choose stores because of their special features.

Werner Keil · Answer 12 · Mon Apr 15 2019 19:35:47 GMT+0800 (China Standard Time)

DeltaSpike is not an RI of anything and although it likely implements either JSR 382, MicroProfile Config or both (there's quite a bit of uncertainty in this field at the moment ;-/ ) it is also superseded by plenty of other frameworks.

JNoSQL applies the Repository pattern used by Spring Data and others, therefore I don't think we'll get to a one size fits all solution, but for consistency Jakarta EE should also take those APIs into approach that are already there and try not to reinvent the wheel and introduce all sorts of duplication Java EE has done over the years.

Werner Keil · Answer 13 · Mon Apr 15 2019 19:42:07 GMT+0800 (China Standard Time)

@rmannibucau, @odrotbohm, all,

A spec always offers more safety and reduces dependency on a single project or vendor. Spring (Data, Boot, etc.) may be quite common and it is also Open Source which makes it easier to keep it maintained, even if the main company behind it had other plans.
Look at essential components under the hood like Hystrix, not only used by Spring but in a lot of places.
Netflix had enough of it and probably needs more resources to battle Disney or other rivals on the content front, so they no longer maintain Hystrix. Whether or not others step in with a fork remains to be seen, but it shows, what can happen if several downstream projects and entire enterprises depend on the will and mercy of a single provider.

Romain Manni-Bucau · Answer 14 · Mon Apr 15 2019 20:32:20 GMT+0800 (China Standard Time)

@odrotbohm my overall point was not against spring but against the fact to use spring as a proof of anything, I tried to answer to your points to clarify that:

Lot of long term project drop it…

Can you qualify this with numbers? We see usage numbers doubling year over year, especially in the JPA space as it's so much convenience for so little effort.

Well JPA is out of scope and I agree I see JPA still used mainly as a workaround for the verbosity of criteria API and some API enhancement built on top o fit like paging - which means for jakartaee that the API should evolve but does not imply a proxy based solution too (I'm not strictly against it but at that stage it is against what exists and has the cost to redo it completely which is likely not desired).

In the last years I know only one project on a ~dozen which kept spring-data layer. Also the feedback on my JakartaEE impl side is that it does not work for a serious part of users (it is hard to estimate it since you only get cases where it does not work but I got ~dozens of complains vs 2 users being happy with it so I assume real numbers should be somewhere around 50-50 in %?).

… because its duplicates an API…

Which API?

The data access one.

It encourages a new duplicated layer (~DAO)…

Duplicating what? Repositories are a building block of DDD, not a "layer".

No really, spring-data goes further than repository@DDD but not far enough - by construction - to be a service so it is in between. Now the point is that if you drop that repository layer it is already provided by your vendor so you just create a new API - I can agree it can be nicer, but it does not bring user features. I don't think JakartaaEE role is to bring new API on top of vendor API but only to bring guarantees to users like portability, stabiblity (in time) etc...

Why spring is most known is mainly marketing…

And shipping repository implementations for JPA and NoSQL stores for almost a decade, successfully running in production ever since. I guess that's the more convincing part of it. Marketing doesn't ship code.

You can say the same of JPA or even JSF, why a spring technology is used is mainly due to marketing - not even judging it technically. Same applies for netflix stacks, it does not fit most of apps but is used cause it is Netflix. Point was not that spring is bad but that communication and adoption is highly biased and can't be relied on to take any decision.

In the 10 years I've spend on this stuff, the amount of projects I have seen that would've switched JPA implementations or even databases can be counted on the fingers of one hand. That might be my filter bubble admittedly but something so rare feels like a weird thing to optimize for. Especially in the NoSQL space where you – more than with relational databases – choose stores because of their special features.

Hmm, not sure I get this one, what I'm saying is exactly that: you rely on NoSQL for special features so you decide to bind yourself to a vendor so providing a generic API on top of it is pointless IMHO.

@keilw deltaspike-data is the de facto standard for EE jpa repositories just because there is no CDI native alternative (spring-data CDI extension still relies on some spring "environment" to work well AFAIK because it was not built for JakataEE directly).
Also note that hystrix example is a bad one IMHO because it is not a backend you choose for its particularities but just a lib which does not require to be bound to a vendor - microprofile fault tolerance late changes show it. I NoSQL land the vendor is still what isdriving your choice so it is hard to decoralate from it at Jakarta level.

Werner Keil · Answer 15 · Mon Apr 15 2019 20:57:58 GMT+0800 (China Standard Time)

@rmannibucau I'm not saying, deltaspike-data is not an equivalent (or even foundation) to spring-data, it just isn't any form of RI, that's Weld.
With Jakarta EE however this could change, as the term "Reference Implementation" no longer exists and several "Compatible Implementations" are supposed to live side-by-side ;-)

Romain Manni-Bucau · Answer 16 · Mon Apr 15 2019 21:13:58 GMT+0800 (China Standard Time)

@keilw weld if the CDI RI, here we speak of NoSQL API, deltaspike was likely mentionned as a data access layer (general meaning) so not sure I'm following. That said agree it is not important yet, already finding if there is space for such a spec is way more critical IMHO.

Tibor Digana · Answer 17 · Tue Apr 16 2019 21:28:23 GMT+0800 (China Standard Time)

One big advantage that we have in JavaEE against Spring is the CDI Extensions because EE can grow without making any release in EE. I think many developers still do not understand it, and they consider EE as a framework insteads of extensible API. Making EE more less juicy is to have Data in JPA for such people. Not sure if standalone Data API is really sufficient without the base data layer API. Joining this API with JPA was bad idea. The question is the level of abstraction of this API, if it is ok or not.

…

On Mon, Apr 15, 2019 at 3:14 PM Romain Manni-Bucau ***@***.***> wrote: @keilw <https://github.com/keilw> weld if the CDI RI, here we speak of NoSQL API, deltaspike was likely mentionned as a data access layer (general meaning) so not sure I'm following. That said agree it is not important yet, already finding if there is space for such a spec is way more critical IMHO. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#165 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA_yR0vWsIq_QqVYxcWlJQqw6K7Dnwwwks5vhHsYgaJpZM4ckz50> .

-- Cheers Tibor

Werner Keil · Answer 18 · Tue Apr 16 2019 21:35:06 GMT+0800 (China Standard Time)

@Tibor17 I think it really depends on the type of database. Especially for column based systems IMO it would be ridiculous to have all those definitions in JNoSQL "same as JPA", they must be from JPA. Most of the others don't even exist in JPA, so there is no question about overlap.

@rmannibucau I think Hibernate OGM answered many of the questions here. While it mostly uses JPA it also offers support for non-relational systems where JPA is usually not sufficient.

Romain Manni-Bucau · Answer 19 · Tue Jul 16 2019 17:57:42 GMT+0800 (China Standard Time)

just for the record: OGM (ans spring data) proved it is not needed since it is equivalent to the vendor solution without the advantage of the abstraction so I think the issue is actually very relevant and Eclipse is likely going to hit the same drawback not respecting JakartaEE original goal, so 2 failures without any gain :(.

Tibor Digana · Answer 20 · Tue Jul 16 2019 18:03:39 GMT+0800 (China Standard Time)

@rmannibucau
Don't you have feeling that the issue should be reformulated?
JakartaEE original goal? Post a link what you mean.
I guess you should post a clear issue because you know more than me in this area. Thx

Romain Manni-Bucau · Answer 21 · Tue Jul 16 2019 18:22:31 GMT+0800 (China Standard Time)

@Tibor17 have to admit I don't know if it should or not. Fact is EE (Java at that time) always had the advantage to be portable and bring standardization. Now it is an eclipse thing it keeps the same advantage even if standardization is less "built in" since it became a vendor but the abstraction is still there in the whole stack. JNoSQL clearly does not follow that principle. Where I'm unsure about if it must be discussed or not is that most of the active Jakarta guys just tries to find local specific solutions so not sure it is worth trying to keep something same instead of making it yet another vendor. Happy to help keeping JakartaEE useful if I'm not the only one.

Tibor Digana · Answer 22 · Tue Jul 16 2019 18:55:34 GMT+0800 (China Standard Time)

@rmannibucau
@keilw
@otaviojava
@odrotbohm
Honestly I want to see jakarta EE be successful and grow in marker (over Springboot).
It's not nice to see tensions.
I think the leaders and devs should start talking. Please utilize any way, video, chat, whatever is suited, but please be consistent altogeher and if anything is conflict solve it.

Otávio Santana · Answer 23 · Wed Jul 17 2019 16:52:53 GMT+0800 (China Standard Time)

Thank you for the feedback.
We have the email list for any discussion like that, but sure. I'll start to do monthly meetings about Jakarta NoSQL and further debate like this.

Otávio Santana · Answer 24 · Wed Jul 17 2019 16:54:02 GMT+0800 (China Standard Time)

@Tibor17 @rmannibucau about the abstraction:
https://github.com/eclipse-ee4j/nosql