Support class inheritance (HAST-241)

Question

Support class inheritance (HAST-241)

Piedone opened this issue 4 years ago · comments

If classes are inherited from then the members of the base classes should be duplicated, just as if they were part of the child class. This is because otherwise the base members could be used from multiple locations at the same time concurrently and with different types for their "this" parameters.

Currently, base method calls aren't handled well either, the "this" parameter is not added for e.g. base.MyMethod().

However, polymorphism still wouldn't be supported (since e.g. variables and arrays need a static type) so this would only be partial support, and non-supported cases need to be handled and let fail gracefully.

Jira issue

FractalFir · Answer 1 · Sat Sep 30 2023 20:49:51 GMT+0800 (China Standard Time)

Since Hastlayer takes in a whole assembly as an input, it has all the info about parent/child classes, and their relationships.
Using this info, another approach to polymorphism can be used: tagged unions. This approach tends to not be used because it is less flexible (all child classes must be known at compile time), but this is not a problem with Hastalyer.
The unique advantage of using tagged unions, is that their layout and size is fully known at compile time (they are a static type).
In a tagged union, variants are distinguished by the tag. Choosing the tag value becomes a little bit more complex with multiple levels of inheritance, but I am going to start by describing the simplest, one-level case.

Different ways of implementing tagged unions

There are 2 ways to implement tagged unions, with different restrictions.

The least memory intensive involves reinterpreting the bytes after tag based on its value. This may be a bit problematic to implement in hardware, since bits of multiple fields would be mixed together. Boolean values, which occupy 1 bit in hardware, make this even messier. I am not entirely sure if this approach is even possible. This is how such tagged union roughly looks like:

using System.Runtime.InteropServices;
[StructLayout(LayoutKind.Explicit)]
struct PolimoprhicClass{
    [FieldOffset(0)]
    // Tag, big enough to give every derived type uniquie ID. 
    // May need to be bigger if there is a lot of derived classes.
    byte tag;
    // All fields sizeof(tag) bytes after the tag
    // Type representing the Base Class
    struct BaseClass{
        int base_field;
    }
    //Starts at byte offset sizeof(tag), overlaps with other variants
    [FieldOffset(1)]
    BaseClass baseClass;
    // Type representing the Derived Class A
    struct DerivedClassA{
        int base_field;
        float child_field_a;
    }
    //Starts at byte offset sizeof(tag), overlaps with other variants
    [FieldOffset(1)]
    DerivedClassA derivedClassA;
    // Type representing the Derived Class B
    struct DerivedClassB{
        int base_field;
        int child_field_b;
    }
    //Starts at byte offset sizeof(tag), overlaps with other variants
    [FieldOffset(1)]
    DerivedClassA derivedClassB;
}

This is the most effort intensive option, and it is only the best in terms of memory usage. The comingling of fields could also make certain hardware optimizations harder.

Another approach is allowing only fields with matching types to overlap. This can be more space/memory intensive, but does not involve fields of different types overlapping. It avoids situations where a field shares space with parts of other fields. This is an example on how that could work, with original, polymorphic C# code, and transformed code with static types.

using System;
// Example polimorphic code:
class Animal{
    public int age;
    public virtual void Descibe(){
        Console.WriteLine("This is an animal.");
    }
    public Animal(int age){
        this.age = age;
    }
}
class Snake:Animal{
    int length;
    bool hasVenom;
    public override void Descibe(){
        Console.WriteLine($"This is a {length} cm snake. Does it have venom:{hasVenom}.");
    }
    public Snake(int age,int length,bool hasVenom):base(age){
        this.length = length;
        this.hasVenom = hasVenom;
    }
}
class Dog:Animal{
    int height;
    float happines;
    public override void Descibe(){
        Console.WriteLine($"Woof. I am a {height} cm dog, and my happines is {happines}");
    }
    public Dog(int age,int height,float happines):base(age){
        this.height = height;
        this.happines = happines;
    }
}

This code is functionally equivalent to the polymorphic code above, and is something Hastlayer should have no issues turning into FPGAs. Hastalyer could be doing something similar to this on CIL assemblies.

enum AnimalType{
    Animal,
    Snake,
    Dog,
}
class PolimorphicAnimal{
    AnimalType unionTag;
    // Fields of the base class
    int age;
    // Fields `height` of `Dog` and `length` of `Snake` have the same type.
    // We can use one field to store them both.
    int length_or_height;
    // Those fields have different types, so there could be some issues storing them togther.
    bool hasVenom;
    float happines;
    void Descibe(){
        // "Virtual" method chosen based on tag. I had inlined the bodies of those methods for brevity,
        // but real implementation could call a method based on the tag.
        switch (this.unionTag){
            case AnimalType.Dog:
                Console.WriteLine($"Woof. I am a {length_or_height} cm dog, and my happines is {happines}");
                break;
            case AnimalType.Snake:
                Console.WriteLine($"This is a {length_or_height} cm snake. Does it have venom:{hasVenom}.");
                break;
            case AnimalType.Animal:
                Console.WriteLine("This is an animal.");
                break;
            default:
                // Invalid union tags shold be impossible for all valid C#. 
                // So, it would be safe to ignore them, but handling them is possible.
                throw new Exception("Invalid union tag");
        }
    }
    // Translated constructor of Animal
    public static PolimorphicAnimal newAnimal(int age){
        PolimorphicAnimal animal = new PolimorphicAnimal();
        // Set the fields relevant to the variant
        animal.age = age;
        // Set the type tag
        animal.unionTag = AnimalType.Animal;
        return animal;
    }
    // Translated constructor of Snake
    public static PolimorphicAnimal newSnake(int age,int length,bool hasVenom){
        PolimorphicAnimal animal = new PolimorphicAnimal();
        // Set the fields relevant to the variant
        animal.age = age;
        animal.length_or_height = length;
        animal.hasVenom = true;
        // Set the type tag
        animal.unionTag = AnimalType.Animal;
        return animal;
    }
    // Translated constructor of Dog
    public static PolimorphicAnimal newDog(int age,int height,float happines){
        PolimorphicAnimal animal = new PolimorphicAnimal();
        // Set the fields relevant to the variant
        animal.age = age;
        animal.length_or_height = height;
        animal.happines = happines;
        // Set the type tag
        animal.unionTag = AnimalType.Animal;
        return animal;
    }
}

This example also shows the solution to dispatching virtual methods, which can get transformed into switch statements.
Class constructors need to set a valid tag, ensuring that there never can be an invalid variant.

Multiple levels of inheritance

There are some more considerations when talking about multi-level inheritance, particularly when talking about assigning tag values to variants. One big question is if class hierarchy should be "flattened".

For a hierarchy like this:

  graph TD;
      Base-->A;
      Base-->B;
      Base-->C;
      B-->B1;
      B-->B2;
      B-->B3;

We can either keep it "as is" with multiple layers of tags. A tag in Base could denote if a variant is either Base, A, B or C.
A separate tag could then denote the variant of B (B, B1, B2 or B3).

We can instead give each variant of B a separate tag value in Base, making them distinct variants of Base.

  graph TD;
      Base-->A;
      Base-->B;
      Base-->B1;
      Base-->B2;
      Base-->B3;
      Base-->C;

This will cause casting between Base to B to be slightly more costly, since we now have to convert between tags specific for Base to B. It has the advantage of needing only one tag to describe the whole hierarchy.

Self-referential types

Hastlayer forbids self-referential types. When introducing polymorphism/inheritance, this needs to be taken into account and handled with a relevant error message. This message probably should also include the description of the hierarchy.
Maybe something like:

Self-referential field! Type B contians a field "fieldName" of type A, from which B is inherits.

This message should also handle multiple layers of inheritance.

Zoltán Lehóczky · Answer 2 · Mon Oct 02 2023 06:42:14 GMT+0800 (China Standard Time)

Thank you for your incredibly thorough design evaluation, @FractalFir! You elaborate on a very clever approach.

You're right that PolimorphicAnimal is something that Hastlayer can in principle understand (minus Console since that's not applicable on the hardware, and exceptions, but that wouldn't be an issue).

I think with the tagged union approach as you describe it we'd indeed solve inheritance, including polymorphism. However, the development effort and the increased FPGA fabric usage (see details below) might be too costly unless we encounter cases where inheritance would be invaluable. Keep in mind that the code that we process with Hastlayer will always be a small part of the overall application since it'll be the part that's the most performance-intensive and highly parallelized. Having limited .NET support there is usually a worthy price to pay for the potential orders of magnitude performance and power efficiency increase.

Some primer on memory with Hastlayer

Note that memory on the hardware is a very differently used concept than in .NET as usual. While we do have RAM too on an FPGA board, that can for now only be accessed via SimpleMemory and is mostly used for managing the input and output values of the hardware-implemented algorithm. However, when you create an object and store it in a variable/field/property, or create an array (or an array of objects), you're not actually allocating in RAM. Rather, you're allocating on the FPGA fabric, in compile-time.

E.g., a new Dog() will allocate space as FPGA fabric (including LUTs, or small pieces of distributed RAM) to store everything in that object. If you have an if that instantiated different classes in its body and in an else, then both of these will be etched into the FPGA fabric. Then, during runtime, no RAM will be used for them, and thus there's no need for a GC either.

Note that only what's needed to store the data in the objects of a given class are replicated for every instance. Methods (including property bodies) aren't, and rather, all executions are channeled into the same hardware implementations (unless the consuming code is multi-threaded, then all threads have their own copies for parallel execution).

This also means though that if we use a more complex PolimorphicAnimal everywhere where we originally used the simpler Snake or Dog, then the increased FPGA fabric usage will be in the ballpark of {fabric(Snake + Dog) * instance count of Snake and Dog}.

How I'd do this (a simple approach without polymorphism)

Still just talking about doing this in a limited way that would still be useful for Hastlayer we could do the following. We generate two classes (would be three if Animal weren't abstract, but for now I'd assume it is) by changing the C# AST so that later processing and VHDL generation understand it the same if there were two manually created classes as such:

Snake

age
length
hasVenom
Describe() (the Snake implementation)

Dog

age
height
happiness
Describe() (the Dog implementation)

If there are base invocation expressions then there would be an Animal.Describe() implementation too.

This would solve DRY with inheritance, but wouldn't support polymorphism. So, you would be allowed to declare a variable, field, or else as type Cat and Dog, but not Animal (because storing Cats and Dogs in that we wouldn't be able to handle).

Conclusion

All this being said we haven't actually encountered a use-case where inheritance would've been critical or even convenient yet, so this is perhaps (especially given the complexity) better kept unsupported for now, until the need arises.

FractalFir · Answer 3 · Wed Oct 04 2023 05:39:57 GMT+0800 (China Standard Time)

Thank you for a clarification on how types are represented on an FPGA. This is roughly how I assumed it worked (this is why I assumed fields overlapping could pose an issue), but it is nice to get a clarification about the inner workings of the project.

I agree with the overall conclusion. Polymorphism is not something that will be common in high-performance C#, due to its overhead on standard CPUs. The amount of FPGA space polymorphism costs will not likely be worth it for almost all cases. The only use case I could see was that it could make the initial stages of porting some code slightly easier, but would need to be replaced later anyway.

One of the reasons I took a deeper look into this problem was that it had no solution yet. Since I had a vague idea about a potential one, I wanted to share it, so if the need ever arose, it would be there.

Zoltán Lehóczky · Answer 4 · Wed Oct 04 2023 06:01:59 GMT+0800 (China Standard Time)

And thank you for doing that! If we were to add proper inheritance support, certainly your above solution would be the starting point, it's an excellent idea.