ava-innersource / Liquid-Application-Framework-1.0-deprecated

Liquid is a framework to speed up the development of microservices

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Define a CosmosDB PartitionKey for collections

andreracz opened this issue · comments

Today, the PartitionKey is fixed with undefined value.
This could impact scalability.

The main issue here is how to do this in a Database agnostic manner

So, from what we gathered:

This issue is somewhat important; right now Liquid does the following:

  • When it creates new collections on Cosmos the partition key is set to be the name of the collection, which, in turn, is the name of the object. This can potentially be avoided or changed directly via Azure Portal, but we can imagine that some of our customers are relying on Liquid creating collections (unfortunately).

  • When it accesses documents (GetById, Delete, Update, etc.) it does so with a cross partition query (which is strange given that all documents have a single partition). By doing that, the query fans out to all cosmos partitions, which means a lot of (maybe unnecessary) consumption.

Users can use the Azure Portal to define better partitions key, but currently Liquid won't let the application inform the partition on a query, which leads to increased consumption.

Actually, the cross-partition queries are done for Queries only, for Update, GetById and Delete, the information is searched using a default partition value of Undefined. This has some bad implications:

  • Since all documents have the same Partition Key value, they will all reside in the same Logical Partition, which has a limit size of 10GB
  • The queries will no scale.

The easiest fix, that does not require code changes, but will require that the Collections are recreated and Data is migrated is to use the id as a Partition Key. Implications:

This seems like a good solution for version 1.0, but we should implement a more flexible approach that allows the user to choose a suitable partition key for future versions.

@bruno-brant and @guilhermeluizsp, I would like your opinion on this.

Also, I would label this a bug, due to the severity.

@andreracz I think you've said it all and I also agree to use id as the default partition key, for now.

@bruno-brant I believe this should be one of the highest priorities. Shouldn't we move it to the 1.0.0 milestone (@andreracz also mentioned it)?

I agree, prioritizing it given the kind of issues could stem from this.

I'm marking it as a bug because a number of issues could arise in production if we don't treat this correctly.

One thing that I forgot to mention is that choosing Id as the primary has one downside: transactions can only occur on a single partition, so we are essentially losing the capability to use transactions.

@bruno-brant, I've created a pull-request (#216) with the fix, the Unit Tests are not implemented yet.