Official spec for transformer protocol

Question

Official spec for transformer protocol

tgriesser opened this issue 9 years ago · comments

@swannodette if you recall, this was something I had asked about and while I don't recall the exact reason for not having a transformer protocol implemented in transducers-js, I do remember there was a reason.

I've been thinking about it a bit lately, and I do think it's worthwhile to be able to define, particularly for JavaScript where it's much simpler to bake this into the prototype rather than defining as a lookup map of handlers as in transit-js.

As more libraries in JavaScript begin implementing this protocol (this was prompted by @kevinbeaty's excellent transducer PR on the ramda.js project), I wanted to see if there could be an agreed upon spec for the transformer before things get too far along.

This is the implementation kicked off by @jlongster

var t = require('./transducers');
Immutable.Vector.prototype[t.protocols.transformer] = {
  init: function() {
    return Immutable.Vector().asMutable();
  },
  result: function(vec) {
    return vec.asImmutable();
  },
  step: function(vec, x) {
    return vec.push(x);
  }
};

For starters - the use of a Symbol('transformer') is problematic as Symbol() creates a unique value (Symbol('transformer') !== Symbol('transformer')), and so you lose any interop when defined independently in multiple libraries.

I'd propose all transformer protocols be implemented as a @@transformer string (similar to @@iterator) until if/when the transformer is officially recognized in the well known symbols list.

I'd also propose that the spec behave similarly to an iterator, in that it is a function which returns the transformer, rather than as just an object proposed above:

Immutable.Vector.prototype['@@transformer'] = function() {
  return {
    init: function() {
      return new Immutable.Vector().asMutable();
    },
    result: function(vec) {
      return vec.asImmutable();
    },
    step: function(vec, x) {
      return vec.push(x);
    }
  };
}

This makes the implementation more useful, as it can refer to the current object in the init to know what value makes to init, should the object be subclassed:

SomeObject.prototype['@@transformer'] = function() {
  var obj = this;
  return {
    init: function() {
      return new obj.constructor()
    },
    step: function(result, arr) {
      return result.set(arr[0], arr[1]);
    },
    result: function(obj) {
      return obj;
    }    
  };
}

Kevin Beaty · Answer 1 · Thu Mar 12 2015 20:13:27 GMT+0800 (China Standard Time)

For starters - the use of a Symbol('transformer') is problematic as Symbol() creates a unique value (Symbol('transformer') !== Symbol('transformer')), and so you lose any interop when defined independently in multiple libraries.

This is a very good point. I'm going to open a new PR on Ramda to remove the Symbol until this is worked out. It's not being exposed, nor documented, nor otherwise used internally other than to initialize _into.

David Nolen · Answer 2 · Fri Mar 20 2015 03:37:18 GMT+0800 (China Standard Time)

@tgriesser @kevinbeaty

Seems to me the logic should be more like the following:

var TRANSFORMER = null;

if(typeof Symbol != "undefined") {
    TRANSFORMER = Symbol.for("cognitect/transformer");
} else {
    TRANSFORMER = "@@cognitect/transformer";
}

Globally stealing names seems like a really bad idea.

Tim Griesser · Answer 3 · Fri Mar 20 2015 03:43:52 GMT+0800 (China Standard Time)

Aren't we looking for something which behaves similar to __transducers_reduced__? - but in this case a standard duck-type for defining an object as a transformable which works between libraries?

I agree that stealing top-level names via Symbol.for wouldn't be ideal

David Nolen · Answer 4 · Fri Mar 20 2015 03:47:11 GMT+0800 (China Standard Time)

@tgriesser I'm suggesting that Symbols are the right path forward, using Symbol.for is fine as long as you namespace.

var ITransducer = {
    reduced: Symbol.for("cognitect/reduced"),
    transformer: Symbol.for("cognitect/transformer"),
    /* ... */
};

This seems like a good approach to me. Perhaps this is what @jlongster already did?

James Long · Answer 5 · Fri Mar 20 2015 03:49:27 GMT+0800 (China Standard Time)

I thought I had done that but I know I reworked the protocol in v2 of my lib so I must have dropped that: https://github.com/jlongster/transducers.js/blob/master/transducers.js#L8

Symbol.for is the right approach, but since it is supposed to be library-agnostic maybe transduce/transformer or something like that? I agree about namespacing. Symbols were supposed to get around namespacing but are only good for library-specific approaches... Here it doesn't really matter whether we use a symbol or not, imho

James Long · Answer 6 · Fri Mar 20 2015 03:50:35 GMT+0800 (China Standard Time)

I would suggest maybe we could have a separate library that defines these symbols that we all use, but that requires an equality check. And because npm loves to duplicate libraries, I don't think that will really work. Bah.

David Nolen · Answer 7 · Fri Mar 20 2015 03:54:25 GMT+0800 (China Standard Time)

@jlongster Symbol.for always returns the same thing. Dupes would only be across JS Contexts but I've always considered that a lost cause honestly. I suspect that Symbols will consistently compare faster than Strings in the future, that's the primary reason to prefer them.

Tim Griesser · Answer 8 · Fri Mar 20 2015 03:54:32 GMT+0800 (China Standard Time)

Symbol.for is the right approach, but since it is supposed to be library-agnostic

Yeah, this is the main thing I'm looking for, just a way of making it library agnostic...

Then the second concern was whether the value is a function which defines/returns the transformer (which makes things a bit more flexible and is similar to an iterator), or just a plain object.

David Nolen · Answer 9 · Fri Mar 20 2015 03:58:51 GMT+0800 (China Standard Time)

Libraries could do the following:

var stringProp = "@@cognitect/transformer",
    symProp = typeof Symbol != "undefined" ? Symbol.for("cognitect/transformer") : stringProp;

x[stringProp] = x[symProp] = function() {
   /* ... */
};

James Long · Answer 10 · Fri Mar 20 2015 03:59:54 GMT+0800 (China Standard Time)

@jlongster Symbol.for always returns the same thing. Dupes would only be across JS Contexts but I've always considered that a lost cause honestly. I suspect that Symbols will consistently compare faster than Strings in the future, that's the primary reason to prefer them.

Ah I see. Semantically I didn't see any benefits but that's true.

Then the second concern was whether the value is a function which defines/returns the transformer (which makes things a bit more flexible and is similar to an iterator), or just a plain object.

Is using obj.constructor like that supported everywhere? You usually want to handle data structures in a specific way, like with the immutable vector returns a mutable version for performance.

David Nolen · Answer 11 · Fri Mar 20 2015 04:02:16 GMT+0800 (China Standard Time)

Also I'm personally not against transducer/transformer etc. as long as people don't mind that we're taking that top level name.

Tim Griesser · Answer 12 · Fri Mar 20 2015 04:06:09 GMT+0800 (China Standard Time)

Is using obj.constructor like that supported everywhere?

Afaik, yes.

You usually want to handle data structures in a specific way, like with the immutable vector returns a mutable version for performance.

Agreed, the benefit here is that if you're eventually able to subclass vector, you would only need to define the transformer once at the top-level Vector object and it'd be inherited properly (and create one of that instance when init rather than a generic Vector)

James Long · Answer 13 · Fri Mar 20 2015 04:13:52 GMT+0800 (China Standard Time)

Here's an idea: we break apart the object into three functions on the prototype:

SomeObject.prototype['@@transducer/init'] = function() {
  return new this.constructor();
};

SomeObject.prototype['@@transducer/step'] = function(result, arr) {
  return result.set(arr[0], arr[1]);
};

SomeObject.prototype['@@transducer/result'] = function(obj) {
  return obj;
};

If we are going to namespace the symbols anyway. If that init function works (I can't remember if this.constructor works like that), it could be the default one and init is optional.

James Long · Answer 14 · Fri Mar 20 2015 04:16:10 GMT+0800 (China Standard Time)

Also with the above I think 95% of data structures would just need to implement the step function the default init and result ones should be fine.

Tim Griesser · Answer 15 · Fri Mar 20 2015 04:16:30 GMT+0800 (China Standard Time)

Here's an idea: we break apart the object into three functions on the prototype:

That approach sounds great to me!

people don't mind that we're taking that top level name.

I wouldn't mind - though if performance is what we're after, I don't believe the latest benchmarks are leaning in favor of symbols (for now).

David Nolen · Answer 16 · Fri Mar 20 2015 04:18:26 GMT+0800 (China Standard Time)

@jlongster I'm assuming we also want "@@transducer/reduced" no?

James Long · Answer 17 · Fri Mar 20 2015 04:21:33 GMT+0800 (China Standard Time)

reduced doesn't need to be part of a transducer, does it? I've lost context for transducers a bit, so you'll have to remind me why that's not just a library function that uses the agreed upon __transducers_reduced__ property or whatever that was.

David Nolen · Answer 18 · Fri Mar 20 2015 04:23:57 GMT+0800 (China Standard Time)

@jlongster well transducers need to detect it, wether they use a library function to do it shouldn't matter. Just saying that if we're defining the required properties it seems they should all follow the same naming convention. Speaking of which it seems to me we probably need to supply the name to get the reduced value, "@@transducers/value". The benefit is again we're not stealing value from programmers and there's a data oriented way to deal with this stuff (no API).

James Long · Answer 19 · Fri Mar 20 2015 04:29:17 GMT+0800 (China Standard Time)

@swannodette This protocol is for implementing the "bottom" transducer though in a data structure so that's it's "transducable". Detecting if something is reduced isn't data structure-specific, right? I guess I have a hard time seeing why this (https://github.com/jlongster/transducers.js/blob/master/transducers.js#L111) belongs here

Edit: oops messed up line numbers

David Nolen · Answer 20 · Fri Mar 20 2015 04:30:59 GMT+0800 (China Standard Time)

@jlongster but that's obviously broken right if not handled carefully? If you wrap your value in your own Reduced how can a different transducer from a different library detect it and extract the value? Sorry for not being clear, I think the protocol needs to be more general than the bottom transducer otherwise we're not all going to be able to compose the way we want.

James Long · Answer 21 · Fri Mar 20 2015 04:33:22 GMT+0800 (China Standard Time)

@swannodette That's what the __transducers_reduced__ property is for, which makes it compatible across libraries. Wait, are you just saying we should rename that to follow this convection? Not to actually attach it to SomeObject.prototype, but that our library's isReduced should check @@transducer/reduced? Yeah we can rename that, definitely.

David Nolen · Answer 22 · Fri Mar 20 2015 04:33:47 GMT+0800 (China Standard Time)

@jlongster yep just saying we should rename to make this whole thing uniform. So the protocol is not just about the transformers, but how to deal with reduced values as well. Then you can really mix and match libraries.

James Long · Answer 23 · Fri Mar 20 2015 04:37:10 GMT+0800 (China Standard Time)

Absolutely! Sorry for the confusion. So we're in agreement?

Data structures can implement 3 methods: @@transduce/step, @@transduce/init, and @@transduce/result. Only step is required.
A "reduced" object is indicated by a @@transduce/reduced property that equals "true". If this is a true, the value can be retrieved on the value property of the same object.

David Nolen · Answer 24 · Fri Mar 20 2015 04:43:58 GMT+0800 (China Standard Time)

@jlongster I think the value should be stored in @@transducer/value, the truth is that people duck type in the craziest ways you'd never expect. Putting the value in a key unlikely to clash just avoids all of this.

James Long · Answer 25 · Fri Mar 20 2015 04:46:44 GMT+0800 (China Standard Time)

@swannodette that's true. sounds good to me! those are all just internal details anyway.

Irakli Gozalishvili · Answer 26 · Fri Mar 20 2015 05:10:05 GMT+0800 (China Standard Time)

Is my understanding correct that agreed protocol is:

Symbol.for('transducer/init')
Symbol.for('transducer/step')
Symbol.for('transducer/result')
Symbol.for('transducer/reduced')
Symbol.for('transducer/value')

BTW is there a need for transducer/value ? Isn't transducer/reduced enough ? So if result has that field it's reduced and actual value is stored in under that field ?

Kevin Wallace · Answer 27 · Fri Mar 20 2015 05:33:16 GMT+0800 (China Standard Time)

If broken apart, can we say that x is a transformer if it has @@transducer/step? Or do we need to check for all 3?

David Nolen · Answer 28 · Fri Mar 20 2015 05:57:09 GMT+0800 (China Standard Time)

@Gozala we already went through that idea before, we don't want hasOwnProperty checks because the value may very well be something that is JavaScript false-y.

David Nolen · Answer 29 · Fri Mar 20 2015 05:58:28 GMT+0800 (China Standard Time)

@kedashoe yes just checking for @@transducer/step should be enough.

Kevin Beaty · Answer 30 · Fri Mar 20 2015 06:38:35 GMT+0800 (China Standard Time)

This all sounds great.

While we're at it, what do you think of using @@transducer/reduce to mark a dispatch method of type function(reducingFunction, init) for use similar to IReduceInit in Clojure? (Useful with eduction, etc.)

Irakli Gozalishvili · Answer 31 · Sat Mar 21 2015 07:31:47 GMT+0800 (China Standard Time)

This all sounds great.

While we're at it, what do you think of using @@transducer/reduce to mark a dispatch method of type function(reducingFunction, init) for use similar to IReduceInit in Clojure? (Useful with eduction, etc.)

I would like that, but would rather use different function signature (structure, reducer, init) => state, which is what I actually us already:
https://github.com/Gozala/transducers/blob/1f0efdfef18fe253d1c9b3759967401d431a23d1/src/transducers.js#L277-L311

Irakli Gozalishvili · Answer 32 · Sat Mar 21 2015 08:41:39 GMT+0800 (China Standard Time)

I would also like to propose that target[Symbol.for("transducer/reduce")] was invoked as follows:

target[Symbol.for("transducer/reduce")](step, initial, collection)

Note that step here has a (result, input) => result signature, as unless I'm mistaken reduce does not need to do init neither it needs to do result both are done by a transduce.

Kevin Beaty · Answer 33 · Sat Mar 21 2015 21:31:17 GMT+0800 (China Standard Time)

I was thinking about it a little differently. The idea is that the reducible object does not need to be concerned with transducers or transformers. It would be defined similar to Array.prototype.reduce. You could do something like this:

function reduce(xf, init, coll){
  if(isArray(coll)){
    return arrayReduce(xf, init, coll)
  }
  if(isFunction(coll[Symbol.for('transducer/reduce')])){
    return methodReduce(xf, init, coll)
  }
  // whatever else
  return iteratorReduce(xf, init, coll) 
}

function methodReduce(xf, init, coll){
  var result = coll[Symbol.for('transducer/reduce')](function(result, value){
      return xf.step(result, value)
    }, init)
  return xf.result(result)
}

Kevin Beaty · Answer 34 · Sat Mar 21 2015 21:34:40 GMT+0800 (China Standard Time)

... And the reducible object would just have to be concerned with implementing a contract similar to Array.prototype.reduce:

function eduction(t, coll) {
  return new Eduction(t, coll)
}
function Eduction(t, coll){
  this.t = t
  this.coll = coll
}
Eduction.prototype[Symbo.iterator] = function(){
  return sequence(this.t, this.coll)[Symbol.iterator]()
}
Eduction.prototype[Symbol.for('transducer/reduce')] = function(rf, init){
  return transduce(this.t, rf, init, this.coll)
}

David Nolen · Answer 35 · Fri Apr 03 2015 00:58:51 GMT+0800 (China Standard Time)

@kevinbeaty going to leave transducer/reduce alone for now as I think this requires some more thought. The above minimal protocol is enough to get everyone on the same page about the fundamentals.

David Nolen · Answer 36 · Fri Apr 03 2015 03:06:35 GMT+0800 (China Standard Time)

Implemented and released. Thanks everyone!

James Long · Answer 37 · Sat Apr 04 2015 03:17:00 GMT+0800 (China Standard Time)

Same here! My project conforms.

Irakli Gozalishvili · Answer 38 · Sat Apr 04 2015 03:19:16 GMT+0800 (China Standard Time)

https://github.com/gozala/transducers also uses this interface ;)