Gremlinq / ExRam.Gremlinq

A .NET object-graph-mapper for Apache TinkerPop™ Gremlin enabled databases.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CosmosDB: Guid IDs instead of object IDs of Vertex?

BenjaminAbt opened this issue · comments

Is there any sample / way we can work with a typed value like Guid instead of object? for the Vertex Id?
Object is extremely error-prone..

We have already tried to work with Guid directly or a struct-based abstractions.
This works so far on reading, but on vertex writing diff argument exceptions are thrown like

Gremlin.Net.Driver.Exceptions.ResponseException: InvalidRequestArguments: 

ActivityId : a6f0053f-ab75-4b99-92bb-6f0934182c84
ExceptionType : ArgumentException
ExceptionMessage :
	Value of variable _b is not a constant type. Cannot assign complex values to groovy variable. (Parameter 'value')
Source : Microsoft.Azure.Cosmos.Gremlin.Core
	HResult : 0x80070057

We could not find any reference to this in the docs either.

Thanks for your help!

IDs in CosmosDb are always strings. That's the closest you can get.

I transferred the issue to ExRam.Gremlinq for future reference.

A more extensive opinion on that matter: As IDs in CosmosDB are always strings, even when they look like Guids, don't use Guids on your POCOs to represent them. It'll explode as soon as you encounter a custom ID that does not parse as Guid. Also, on deserialization, you'll lose the original representation of the Guid - hyphens, cases, etc. Even though Gremlin could be forced to work with Guids just fine, don't go there if you have control over the source code of your POCOs. Just use strings.

Technically you are right that an Id is declared as a string in CosmosDB. By default, however, CosmosDB uses guids as content rep.
Nevertheless, the typed representation is actually the safer choice, regardless of the content implementation.

My wish would actually be a generic way of specifying the type.

// Default vertex
public class Vertex :  IVertex
{
    public object? Id { get; set; }
    public string? Label { get; set; }
    public string PartitionKey { get; set; } = "PartitionKey";
}

// Typed vertex Id
public class Vertex :  IVertex
{
    // we know every Id is technically a guid; so we could parse the content (strict);
    //  works on all reads, but fails on all writes like V.Add();
    public Guid? Id { get; set; } 

    public string? Label { get; set; }
    public string PartitionKey { get; set; } = "PartitionKey";
}

// Generic vertex Id
public class Vertex :  IVertex<object>
{
    public string? Label { get; set; }
    public string PartitionKey { get; set; } = "PartitionKey";
}
// aka
public class Vertex :  IVertex<Guid>
{
    public string? Label { get; set; }
    public string PartitionKey { get; set; } = "PartitionKey";
}

public interface Vertex<TVertexId>
{
    public TVertexId? Id { get; set; } // soft
}

The object currently forces us to use an insecure signature (because it is error-prone) or very complex abstractions.

I'm absolutely with you on saving and handling the API through/with CosmosDB and Gremlin.
However, I am focusing the code-safe implementation with the API / the models.

Why won't strings work for you?

Strings work. That was not my concern.

My intention was more on typing: The wish is that corresponding method signatures can work with typed values and not only with strings and objects. In addition, there is the mapping of vertex objects to projections and other models (aka dto or whatever).
Using Object and String everywhere is simply error-prone - no more, no less :-)

Not sure why you don't consider a string a "typed value".

Also, I'm not sure I understand what you mean by "CosmosDB uses guids as content rep.". Assigning a custom Id to a vertex that just says "myCustomId" is perfectly fine in CosmosDB and it's not a Guid. Technically, using a Guid in your POCOs is dangerous.

What's error prone about a string? Sorry, I don't get the rationale.

You can still force your method signatures to use Guids, if Guids are central to the domain you're programming for, and do the appropriate conversions. But when dealing with Ids on CosmosDB, strings are the only right representation of IDs. I could not encourage you to keep using Guids because it'll blow up sooner or later.

Thanks for all your answers so far!

I'm still not concerned with the Guid per se, but with the handling of the API and its signatures.
I am not interested in the content of the string, but how to address / use methods.
I don't care what the Id represents in terms of content. The Guid is simply the default behavior of CosmosDB.

Imagine a signature like
Task MyMethod(object personId, object cityId, object carId)

Since the base type object is used everywhere, you have to pay a lot of attention to the correct order and even that you don't accidentally pass a completely wrong value / use a diff value source.

The signature with strings is a bit better, because then at least you can pass only strings. But of course there remains the factor that the Id can be anything.
Task MyMethod(string personId, string cityId, string carId)

We can actually only achieve a true type-safe signature with physically separated types.
Task MyMethod(PersonId personId, CityId cityId, CarId carId)

A corresponding possible implementation of the Id could look like this:

    public readonly struct PersonId
    {
        public string Value { get; }
        public PersonId(string value) => Value = value;

        public static implicit operator PersonId(string id) => Parse(id);
        public static implicit operator string(PersonId id) => id.Value;

This may look like overhead at first, but it is not or hardly at all at runtime, but it ensures easier handling of models across the application. The advantage is that while we talk about strings for all content values, the separate structs give us higher type safety for both methods and models.

This works fine for reading vertices (Netwonsoft.Json recognizes here accordingly the operators and serializes perfectly), but for writing vertices the API is not able to process the struct properly, because (I guess) it sees the struct itself and not the value.

There's always the possibility to use a custom serializer, like it has been done for the CosmosDbKey-struct (a combination of id and partitionKey for use with g.V(...)). You will have to define these for every custom Id type though. For deserialization, I guess Newtonsoft.Json handles proper conversion just magically.

BTW the IVertex interface does not have to be used. Gremlinq is just fine without. Of course, IVertex defines the object Id in the first place but it's not necessary.

Thanks for the hints. That's what we were looking for!

May this be closed then ?

Yes, we can close this for now.

Looks like we have to create a bunch of extension methods to get this working (because the implementation of V() does not work with structs) but I guess we will reach our goal of type safety! :-)

Thanks!

V works fine with structs, as this test shows and CosmosDbKey is a struct. There'll be boxing involved of course.

We can now serialize / deserialize our Struct with the vertices but every V() methods responds with

Value of variable _b is not a constant type. Cannot assign complex values to groovy variable. (Parameter 'value')

So if a struct had to work either the serializer does not like our struct or we have forgotten something unconsciously..

You'll have to override serialization as shown here and of course ultimately serialize it to a string. The error message you get is not from Gremlinq.

Yes, it comes from Gremlin.NET

Gremlin.Net.Driver.Exceptions.ResponseException: InvalidRequestArguments

but it must be related to this. It works with object but does throw this ex with our struct.

Our registration based on this.

public static IGremlinQueryEnvironment RegisterGraphModelSerialization(this IGremlinQueryEnvironment e)
{
    e.ConfigureSerializer(s =>
        s.ConfigureFragmentSerializer(fs =>
            fs.Override<PersonId>((fragment, environment, overridden, recurse)
                => recurse.Serialize(fragment.Value, environment))));

    return e;
}

We also tried to attach .ToGroovy() and we also tried to return (object)fragment.Value;

// Register Options
.ConfigureOptions(options => options.SetValue(WebSocketGremlinqOptions.QueryLogLogLevel, LogLevel.None))
// Register Custom Stuff
.RegisterGraphModelSerialization()
// Register Database
.UseCosmosDb(builder => builder

I guess we'll have to invest more time to dig through the source code tho

Immutability is king. Everything is immutable. If you call something on e and return e, you've done nothing. Return the result of Configure.XYZ.

Oh. embarrassing mistake.....

Serialization and Deserialization works now. But I guess there is a bug in the Linq implementation.

If we use Where(v=>v.Id == personId); the following passage passes null into our implicit operator but passes the value if we use string or object.

https://github.com/ExRam/ExRam.Gremlinq/blob/89f6ba298db53ce0d9a28a73280a529e730d74e4/src/ExRam.Gremlinq.Core/Serialization/GremlinQueryFragmentSerializer.cs#L217

When we have more time to investigate the bug, then we would create an issue.
Currently the workaround is that we work with V(personId) instead of Where, as this works.