cognitect-labs / transducers-js

Transducers for JavaScript

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Official spec for transformer protocol

tgriesser opened this issue · comments

@swannodette if you recall, this was something I had asked about and while I don't recall the exact reason for not having a transformer protocol implemented in transducers-js, I do remember there was a reason.

I've been thinking about it a bit lately, and I do think it's worthwhile to be able to define, particularly for JavaScript where it's much simpler to bake this into the prototype rather than defining as a lookup map of handlers as in transit-js.

As more libraries in JavaScript begin implementing this protocol (this was prompted by @kevinbeaty's excellent transducer PR on the ramda.js project), I wanted to see if there could be an agreed upon spec for the transformer before things get too far along.

This is the implementation kicked off by @jlongster

var t = require('./transducers');
Immutable.Vector.prototype[t.protocols.transformer] = {
  init: function() {
    return Immutable.Vector().asMutable();
  },
  result: function(vec) {
    return vec.asImmutable();
  },
  step: function(vec, x) {
    return vec.push(x);
  }
};

For starters - the use of a Symbol('transformer') is problematic as Symbol() creates a unique value (Symbol('transformer') !== Symbol('transformer')), and so you lose any interop when defined independently in multiple libraries.

I'd propose all transformer protocols be implemented as a @@transformer string (similar to @@iterator) until if/when the transformer is officially recognized in the well known symbols list.

I'd also propose that the spec behave similarly to an iterator, in that it is a function which returns the transformer, rather than as just an object proposed above:

Immutable.Vector.prototype['@@transformer'] = function() {
  return {
    init: function() {
      return new Immutable.Vector().asMutable();
    },
    result: function(vec) {
      return vec.asImmutable();
    },
    step: function(vec, x) {
      return vec.push(x);
    }
  };
}

This makes the implementation more useful, as it can refer to the current object in the init to know what value makes to init, should the object be subclassed:

SomeObject.prototype['@@transformer'] = function() {
  var obj = this;
  return {
    init: function() {
      return new obj.constructor()
    },
    step: function(result, arr) {
      return result.set(arr[0], arr[1]);
    },
    result: function(obj) {
      return obj;
    }    
  };
}

For starters - the use of a Symbol('transformer') is problematic as Symbol() creates a unique value (Symbol('transformer') !== Symbol('transformer')), and so you lose any interop when defined independently in multiple libraries.

This is a very good point. I'm going to open a new PR on Ramda to remove the Symbol until this is worked out. It's not being exposed, nor documented, nor otherwise used internally other than to initialize _into.

@tgriesser @kevinbeaty

Seems to me the logic should be more like the following:

var TRANSFORMER = null;

if(typeof Symbol != "undefined") {
    TRANSFORMER = Symbol.for("cognitect/transformer");
} else {
    TRANSFORMER = "@@cognitect/transformer";
}

Globally stealing names seems like a really bad idea.

Aren't we looking for something which behaves similar to __transducers_reduced__? - but in this case a standard duck-type for defining an object as a transformable which works between libraries?

I agree that stealing top-level names via Symbol.for wouldn't be ideal

@tgriesser I'm suggesting that Symbols are the right path forward, using Symbol.for is fine as long as you namespace.

var ITransducer = {
    reduced: Symbol.for("cognitect/reduced"),
    transformer: Symbol.for("cognitect/transformer"),
    /* ... */
};

This seems like a good approach to me. Perhaps this is what @jlongster already did?

I thought I had done that but I know I reworked the protocol in v2 of my lib so I must have dropped that: https://github.com/jlongster/transducers.js/blob/master/transducers.js#L8

Symbol.for is the right approach, but since it is supposed to be library-agnostic maybe transduce/transformer or something like that? I agree about namespacing. Symbols were supposed to get around namespacing but are only good for library-specific approaches... Here it doesn't really matter whether we use a symbol or not, imho

I would suggest maybe we could have a separate library that defines these symbols that we all use, but that requires an equality check. And because npm loves to duplicate libraries, I don't think that will really work. Bah.

@jlongster Symbol.for always returns the same thing. Dupes would only be across JS Contexts but I've always considered that a lost cause honestly. I suspect that Symbols will consistently compare faster than Strings in the future, that's the primary reason to prefer them.

Symbol.for is the right approach, but since it is supposed to be library-agnostic

Yeah, this is the main thing I'm looking for, just a way of making it library agnostic...

Then the second concern was whether the value is a function which defines/returns the transformer (which makes things a bit more flexible and is similar to an iterator), or just a plain object.

Libraries could do the following:

var stringProp = "@@cognitect/transformer",
    symProp = typeof Symbol != "undefined" ? Symbol.for("cognitect/transformer") : stringProp;

x[stringProp] = x[symProp] = function() {
   /* ... */
};

@jlongster Symbol.for always returns the same thing. Dupes would only be across JS Contexts but I've always considered that a lost cause honestly. I suspect that Symbols will consistently compare faster than Strings in the future, that's the primary reason to prefer them.

Ah I see. Semantically I didn't see any benefits but that's true.

Then the second concern was whether the value is a function which defines/returns the transformer (which makes things a bit more flexible and is similar to an iterator), or just a plain object.

Is using obj.constructor like that supported everywhere? You usually want to handle data structures in a specific way, like with the immutable vector returns a mutable version for performance.

Also I'm personally not against transducer/transformer etc. as long as people don't mind that we're taking that top level name.

Is using obj.constructor like that supported everywhere?

Afaik, yes.

You usually want to handle data structures in a specific way, like with the immutable vector returns a mutable version for performance.

Agreed, the benefit here is that if you're eventually able to subclass vector, you would only need to define the transformer once at the top-level Vector object and it'd be inherited properly (and create one of that instance when init rather than a generic Vector)

Here's an idea: we break apart the object into three functions on the prototype:

SomeObject.prototype['@@transducer/init'] = function() {
  return new this.constructor();
};

SomeObject.prototype['@@transducer/step'] = function(result, arr) {
  return result.set(arr[0], arr[1]);
};

SomeObject.prototype['@@transducer/result'] = function(obj) {
  return obj;
};

If we are going to namespace the symbols anyway. If that init function works (I can't remember if this.constructor works like that), it could be the default one and init is optional.

Also with the above I think 95% of data structures would just need to implement the step function the default init and result ones should be fine.

Here's an idea: we break apart the object into three functions on the prototype:

That approach sounds great to me!

people don't mind that we're taking that top level name.

I wouldn't mind - though if performance is what we're after, I don't believe the latest benchmarks are leaning in favor of symbols (for now).

@jlongster I'm assuming we also want "@@transducer/reduced" no?

reduced doesn't need to be part of a transducer, does it? I've lost context for transducers a bit, so you'll have to remind me why that's not just a library function that uses the agreed upon __transducers_reduced__ property or whatever that was.

@jlongster well transducers need to detect it, wether they use a library function to do it shouldn't matter. Just saying that if we're defining the required properties it seems they should all follow the same naming convention. Speaking of which it seems to me we probably need to supply the name to get the reduced value, "@@transducers/value". The benefit is again we're not stealing value from programmers and there's a data oriented way to deal with this stuff (no API).

@swannodette This protocol is for implementing the "bottom" transducer though in a data structure so that's it's "transducable". Detecting if something is reduced isn't data structure-specific, right? I guess I have a hard time seeing why this (https://github.com/jlongster/transducers.js/blob/master/transducers.js#L111) belongs here

Edit: oops messed up line numbers

@jlongster but that's obviously broken right if not handled carefully? If you wrap your value in your own Reduced how can a different transducer from a different library detect it and extract the value? Sorry for not being clear, I think the protocol needs to be more general than the bottom transducer otherwise we're not all going to be able to compose the way we want.

@swannodette That's what the __transducers_reduced__ property is for, which makes it compatible across libraries. Wait, are you just saying we should rename that to follow this convection? Not to actually attach it to SomeObject.prototype, but that our library's isReduced should check @@transducer/reduced? Yeah we can rename that, definitely.

@jlongster yep just saying we should rename to make this whole thing uniform. So the protocol is not just about the transformers, but how to deal with reduced values as well. Then you can really mix and match libraries.

Absolutely! Sorry for the confusion. So we're in agreement?

  • Data structures can implement 3 methods: @@transduce/step, @@transduce/init, and @@transduce/result. Only step is required.
  • A "reduced" object is indicated by a @@transduce/reduced property that equals "true". If this is a true, the value can be retrieved on the value property of the same object.

@jlongster I think the value should be stored in @@transducer/value, the truth is that people duck type in the craziest ways you'd never expect. Putting the value in a key unlikely to clash just avoids all of this.

@swannodette that's true. sounds good to me! those are all just internal details anyway.

Is my understanding correct that agreed protocol is:

Symbol.for('transducer/init')
Symbol.for('transducer/step')
Symbol.for('transducer/result')
Symbol.for('transducer/reduced')
Symbol.for('transducer/value')

BTW is there a need for transducer/value ? Isn't transducer/reduced enough ? So if result has that field it's reduced and actual value is stored in under that field ?

If broken apart, can we say that x is a transformer if it has @@transducer/step? Or do we need to check for all 3?

@Gozala we already went through that idea before, we don't want hasOwnProperty checks because the value may very well be something that is JavaScript false-y.

@kedashoe yes just checking for @@transducer/step should be enough.

This all sounds great.

While we're at it, what do you think of using @@transducer/reduce to mark a dispatch method of type function(reducingFunction, init) for use similar to IReduceInit in Clojure? (Useful with eduction, etc.)

This all sounds great.

While we're at it, what do you think of using @@transducer/reduce to mark a dispatch method of type function(reducingFunction, init) for use similar to IReduceInit in Clojure? (Useful with eduction, etc.)

I would like that, but would rather use different function signature (structure, reducer, init) => state, which is what I actually us already:
https://github.com/Gozala/transducers/blob/1f0efdfef18fe253d1c9b3759967401d431a23d1/src/transducers.js#L277-L311

I would also like to propose that target[Symbol.for("transducer/reduce")] was invoked as follows:

target[Symbol.for("transducer/reduce")](step, initial, collection)

Note that step here has a (result, input) => result signature, as unless I'm mistaken reduce does not need to do init neither it needs to do result both are done by a transduce.

I was thinking about it a little differently. The idea is that the reducible object does not need to be concerned with transducers or transformers. It would be defined similar to Array.prototype.reduce. You could do something like this:

function reduce(xf, init, coll){
  if(isArray(coll)){
    return arrayReduce(xf, init, coll)
  }
  if(isFunction(coll[Symbol.for('transducer/reduce')])){
    return methodReduce(xf, init, coll)
  }
  // whatever else
  return iteratorReduce(xf, init, coll) 
}

function methodReduce(xf, init, coll){
  var result = coll[Symbol.for('transducer/reduce')](function(result, value){
      return xf.step(result, value)
    }, init)
  return xf.result(result)
}

... And the reducible object would just have to be concerned with implementing a contract similar to Array.prototype.reduce:

function eduction(t, coll) {
  return new Eduction(t, coll)
}
function Eduction(t, coll){
  this.t = t
  this.coll = coll
}
Eduction.prototype[Symbo.iterator] = function(){
  return sequence(this.t, this.coll)[Symbol.iterator]()
}
Eduction.prototype[Symbol.for('transducer/reduce')] = function(rf, init){
  return transduce(this.t, rf, init, this.coll)
}

@kevinbeaty going to leave transducer/reduce alone for now as I think this requires some more thought. The above minimal protocol is enough to get everyone on the same page about the fundamentals.

Implemented and released. Thanks everyone!

Same here! My project conforms.