JuliaLang / julia

The Julia Programming Language

Home Page:https://julialang.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RFC: staged / meta / generated types ?

Jutho opened this issue · comments

Now that we have staged functions (or whatever name that will converge out of the discussion in #7474), can we be more greedy and request something similar to define types, i.e. where the type definition depends on the value of the parameters in a more general way than can currently be expressed?

This would allow the easy implementation of fixed size / stack allocated arrays ( #7568 ), and the construction of @timholy in #8432 to build custom types to be used in the cartesian iterators clearly expresses the need for this. It can also have many other interesting uses. NTuple{N,Int} and Tuple{Int,Int,Float} (in the notation of #8470 ) could be special cases (although I guess there are technical reasons that they need to be implemented separately?).

As usual, your question shows an impressive mastery of all the factors. Here I was wondering if anyone was paying attention, then you go and show you've understood it all down to the letter...

For #8432, I was mulling over how kosher it would be to call eval from inside a "stagedfunction". There are quite a few functions (as well as one type) that need to be instantiated. One option would be to have each one be a separate stagedfunction, and also add stagedtypes. But at least in this case all of them could be triggered through start/eachindex (with stagedfunctions, eachelement can now go away, whew), where the "inner" portion of the stagedfunction also creates all the other functions/types via eval. That's basically the same design as it has now, but the check on implemented would no longer be necessary.

Fixed sized arrays could conceivably do the same thing: make the constructor a stagedfunction, and in the "inner" portion create the type via eval before generating the returned body expression.

Until we implement caching of stagedfunction expressions (and for cases where the created type won't have a unique signature for each call to the stagedfunction), we'll still have to maintain an implemented cache---but at least it will only affect the "inner" portion of the stagedfunction, and won't be used during repetitive operations.

So, bottom line is: I honestly don't yet know whether we need stagedtypes. In the cases I can foresee, it seems there's a good workaround (as long as people don't hate the eval trick).

Correction: won't work for fixed size arrays (duh). The stagedfunction won't have access to the actual sizes desired.

So yeah, we probably need stagedtypes too (if we don't use tuples).

related: #8322

You can declare types that correspond to integers, and thus pass the array size as a type. This would give a staged function access to the array size.

type zero end
type succ{T} end

and then you have zero, succ{zero}, succ{succ{zero}}, etc.

typedint(i::Integer) = i==0 ? zero : succ{typedint(i-1)}

I thought of that. I was thinking the user would have to deal with that ugliness, but actually we could hide it behind a wrapper:

type TypeConst{T} end
FixedArray{T}(::Type{T}, dims...) = _FixedArray(T, map(x->TypeConst{x}, dims))

and then have _FixedArray be a stagedfunction.

On 25 Sep 2014, at 16:26, Tim Holy notifications@github.com wrote:

As usual, your question shows an impressive mastery of all the factors. Here I was wondering if anyone was paying attention, then you go and show you've understood it all down to the letter...

Thanks for the compliment. Understanding is one thing, implementing everything like you’re doing is surely yet another level. Although I also hope to start exploring staged functions soon.

For #8432, I was mulling over how kosher it would be to call eval from inside a "stagedfunction". There are quite a few functions (as well as one type) that need to be instantiated. One option would be to have each one be a separate stagedfunction, and also add stagedtypes. But at least in this case all of them could be triggered through start/eachindex (with stagedfunctions, eachelement can now go away, whew), where the "inner" portion of the stagedfunction also creates all the other functions/types via eval. That's basically the same design as it has now, but the check on implemented would no longer be necessary.

Fixed sized arrays could conceivably do the same thing: make the constructor a stagedfunction, and in the "inner" portion create the type via eval before generating the returned body expression.

Surely the fact that you can mimic the staged type behaviour makes it seem as if it should be not such a drastic change to natively support this feature in language. The advantage would be that the user is no longer confronted with the internal types (like Subscript_3 ) that implements this, and would only see (i.e. Subscript{N} ) as a concrete type/immutable. As long as there is no native support, it would also be good to have #2403 . This way, for something like FixedArray, FixedArray can both be the method that needs to be called to create an object and at the same time the abstract supertype of the dynamically generated concrete types that implement this).
Until we implement caching of stagedfunction expressions (and for cases where the created type won't have a unique signature for each call to the stagedfunction), we'll still have to maintain an implemented cache---but at least it will only affect the "inner" portion of the stagedfunction, and won't be used during repetitive operations.

So, bottom line is: I honestly don't yet know whether we need stagedtypes. In the cases I can foresee, it seems there's a good workaround (as long as people don't hate the eval trick).

In summary, the advantage of having it is that one would not need to expose the internal implementations to the user. =

I agree it would be better to have them.

Right now I'm just a little too tuckered out to implement them myself, so if there's a short-term cheat available, I'll probably use it :).

I think type families are the Haskell equivalent: https://www.haskell.org/haskellwiki/GHC/Type_families

I've been considering how awesome @Generated types would be for a while. Today I managed to write a package for Julia 0.4 that works pretty well (i.e. much better than I expected) with no extra language support: GeneratedTypes.jl.

After I saw NamedTuples, it seems pretty easy to register types and constructors in macros and generated functions. The only issue is that in Julia 0.5, eval is forbidden from @generated functions... not sure what that is about (and is it the same for pure functions?). Appart from that, I thought this might be a good way forward.

Since #49187 was closed as a duplicate, I'll put this here:


Here are a few example things this could be used for:

Simpler SArray, and MArray which supports non-isbits types

Click me

Implementing these would be pretty much trivial if you had a @generated struct. Something like

@generated struct SArray{Size, T, N, IsMutable} <: AbstractArray{T, N}
    fields = map(1:prod(Size)) do i
        name = Symbol(:_, i)
        :($name :: $T)
    end
    Expr(:block, Expr(:mutable, IsMutable),  fields...)
end

This would make it so that if someone wrote say

SArray{(2, 2), String, 2, true}

that would represent something like

mutable struct SArray_2_2_String_2_true
    _1 :: String
    _2 :: String
    _3 :: String
    _4 :: String
end

And unlike the current implementation of MArray, this would support setfield! on the actual fields we care about letting us easily implement setindex!.

Even more powerfully, SArray could use this to decide that if say you gave it SArray{1000000000, Float64, 1, false}, that is way way too big to profitably store as an inline struct and instead heap allocate a vector or something to use as its storage.

Having a general way to melt, mutate, and then freeze an immutable struct

Click me
@generated struct Melted{T}
    fields = map(1:fieldcount(T)) do i
        name = fieldname(T, i)
        type = fieldtype(T, i)
        :($fieldname :: $fieldtype)
    end
    constructor = quote
        Melted(x::T) where {T} = new{T}($((:(getfield(x, $i) for i in 1:fieldcount(T))...))
    end
    Expr(:block, Expr(:ismutable, true), fields...,  constructor)
end

so that one could e.g. write

Melted{Complex{Float64}}

and get a struct equivalent to

mutable struct Melted_Complex_Float64
    re::Float64
    im::Float64
    MeltedComplexFloat64(x::ComplexFloat64) = new(getfield(x,1), getfield(x, 2))
end

People can then freely do things like e.g.

let m = Melted(big(1) // big(2))
    m.num = big(2)
    m.den = big(3)
    Rational(m)
end 

Note that this way we do not bypass the inner constructor of Rational.

Packages like Accessors.jl would not become unnecessary, but instead would have their scope reduced to intelligently
dealing with the properties and constructors of a type, and this would become an additional tool in their toolkit.

Forbidding specific values from a type signature

Click me

This is maybe too frivilous a use for what would likely be quite heavy machinery, but currently we have no way to put restrictions on the values in a type signature, and we have to rely on inner constructors to reject them. That is, I can write things like

Array{Float64, -1000.1}

and this is a perfectly valid type, it just will be rejected by all of its inner constructors. Having @generated structs though could allow one to forbid people from even representing an invalid value in a type signature just like how we currently can reject invalid types in a signature:

julia> struct Blah{T <: Integer} end
           

julia> Blah{String}
ERROR: TypeError: in Blah, in T, expected T<:Integer, got Type{String}

Compactified structs

Click me

Take for example Unityper.jl which takes in an expression like

@compactify begin
    @abstract struct Foo
        common_field::Int = 1
    end
    struct a <: Foo
        a::Bool = true
        b::String = "hi"
    end
    struct b <: Foo
        a::Int = 1
        b::Complex = 1 + im
    end
end;

and then compactifies these structs into one concrete struct with a minimal memory layout:

julia> dump(a())
Foo
  common_field: Int64 1
  ###a###2: Int64 1
  ###Any###3: String "hi"
  ###tag###4: var"###Foo###1" ₋₃₋₁₂₉Foo₋__a₋₃₋₁₉₉₂₋₋

julia> dump(b())
Foo
  common_field: Int64 1
  ###a###2: Int64 1
  ###Any###3: Complex{Int64}
    re: Int64 1
    im: Int64 1
  ###tag###4: var"###Foo###1" ₋₃₋₁₂₉Foo₋__b₋₃₋₁₉₉₂₋₋

julia> fieldtypes(Foo)
(Int64, Int64, Any, var"###Foo###1")

This works somewhat well, but cannot work currently if we wanted Foo to be a parametric type with parametric fields. With a @generated struct, we could generate a compactified layout precisely tailored to a set of parameters.


Just like @generated functions, this would be pretty heavy duty stuff that regular users shouldn't be doing, but I think it'd
allow authors of serious packages to do a lot of things that currently aren't possible (and in some cases, stop them from doing some worrying pointer shenanigans that doesn't generalize well), so I think having this feature should be an eventual goal.

JuliaDiff/ChainRulesCore.jl#626 reminded me of this and inspired a question which I haven't found an answer to online: what would be the technical challenges for implementing a mutable version of NamedTuple? Something that can be allocated in one go (unlike a NamedTuple of Refs) and doesn't incur any reconstruction overhead (unlike a mutable struct wrapping a NamedTuple, as shown in #49187 (comment)).

mutable version of NamedTuple

Or, maybe a mutable version of Tuple? (We can create named tuples from tuples by overloading getproperty, and in this case setproperty!).

I intentionally left out Tuple in case the difference in type variance posed an issue, but if it doesn't than absolutely.

Variance absolutely is important for subtyping. But AFAICT we can still treat things like Tuple{Union{Int, Nothing}} as if it were a concrete type of some value (or at least rely on it being a well optimized type of a variable or “slot”) - and if it were mutable then you could update the 1 field with either an integer or nothing. But given the way inference works you’d need to define that type in advance, like you do a vector, so covariance wouldn’t be particularly “useful”. Speculatively, if such a thing were added to the language, it may be easiest to make it invariant…