IntersectMBO / plutus

The Plutus language implementation and tools

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Request for changes to the ScriptContext in PlutusV3

KtorZ opened this issue ยท comments

Describe the feature you'd like

Hello!

Given that PlutusV3 and Conway are right at the corner, I've been thinking it would be a nice opportunity to bring some changes to the ScriptContext to make life of language builders relying on Plutus Data as encoding (i.e. Aiken, Helios, OpShin, Plu-Ts, Scalus, ...) easier.

There are a few oddities in the current context which makes it needlessly harder (and more expensive) to work around them. I am willing to make a pull request to introduce those changes if necessary though I am not familiar with the testing setup in Plutus and I'd love help on those. Pairing is an option if someone is willing to.

So here are the request:

1. Remove the null Ada quantity from txInfoMint

For some reasons, the assoc-map Value will always contain a null-quantity of Ada tokens in the txInfoMint. This has been quite error-prone and I believe, was due to how the ledger inject that value into the script context. The ledger has recently reworked the internal representation of mint values to MultiAsset which no longer contains Ada at all.

So while I think this particular issue is fixed on the ledger side, it'd be good to ensure from the Plutus side that this is the case (with perhaps an invariant on that particular field -> should never contain ada).

2. Remove the unnecessary constructor around TxId

As far as we know, TxId is the only newtype constructor that survives on-chain. Any other newtype is erased (as they're derived generically in the codebase), but TxId is manually derived as:

PlutusTx.makeLift ''TxId
PlutusTx.makeIsDataIndexed ''TxId [('TxId,0)]

This adds an extra 0-index constructor around the BuiltinByteString which brings no value whatsoever and; more importantly, make the handling of similar cases inconsistent. It creates an exceptional case which is somehow annoying to handle. There might be a reason for it which I do not grasp, or it may simply be an oversight.

3. Make txInfoRedeemer a list

It kind of already is due to how to Map is implemented, however, it's an awkward one as the ordering of ScriptPurpose is not well-defined and left as an implementation detail. Having a Map here brings no particular efficiency when it comes to access and Plutus Maps are notoriously expensive to traverse. So this is fundamentally akin to traversing a list except that having a Map here means that there's more upfront cost that goes to constructing it (especially in PlutusTx).

It also makes the definition and management of efficient Map a lot trickier at a compiler level. Removing this particular case (and the next one), would allow to only consider Map ByteString a in the ScriptContext bringing in some easy simplification (and thus reducing rooms for errors). Without that, we effectively have to support maps generic in their keys with advanced compiler specialization which are hard to get with a 0-cost overhead (since we have to pass extra context to all functions operating on Maps to handle the ordering of keys).

So the ask here is really to turn:

    , txInfoRedeemers       :: Map ScriptPurpose Redeemer

into

    , txInfoRedeemers       :: [Redeemer']

(where Redeemer' would be a type isomorphic to (ScriptPurpose, Redeemer) -- I don't mean to create a new type called Redeemer' but to change the current definition of Redeemer to that new definition).

4. Make txInfoWdrl a Map BuiltinByteString Integer

Or alternatively, turn it back to an assoc list as it was in PlutusV1. The rationale for making it indexed by just a ByteString is the following:

  • As mentioned in (3), it makes the overall script context more uniform with all maps being indexed by BuiltinByteString. This opens up optimizations down the line.
  • It already is a BuiltinByteString (wrapped inside two constructors) in almost 100% of the cases. In Conway, as pointer addresses are being deprecated from use, it truly becomes 100%. So for the handful of folks out there who have valid pointer addresses pointing to a script credential (if there is even any), they'd have to use PlutusV2 for that. Which is not only fine but necessary, since they won't be able to create new pointer addresses pointing to V3 validators in Conway. So It feels like the right time to get rid of this.
  • Once we get pointer out of the way, the StakingCredential is basically a just a Credential, which itself is a union type between a ScriptHash and a PubKeyHash. By making it a BuiltinByteString, we lose information regarding whether it is a script or a pub key hash, though this information is already available to the transaction builder (as he/she couldn't build the transaction otherwise). Moreso, if it's a script, then necessarily, it is available from the txInfoRedeemers; so the information still is there; only less directly.

I understand that some people may feel uneasy about removing a piece of information from the script context so a less controversial change might be to turn this into the following type:

Map BuiltinByteString (CredentialType, Integer)

Or said differently, to move the credential discrimination from the key level to the value level. This way, no information is lost and the type remains fundamentally a Map.

Describe alternatives you've considered

We have quite a few work-around in place for those in Aiken and have had to introduce language-specific features or dedicated library types to cope with these; increasing the complexity and entry-barrier for users. So they are only sub-optimal solutions.

PlutusV3 looks like a promising opportunity to get rid of some (if not all) of them.

Much needed changes!

1. Remove the null Ada quantity from txInfoMint

So while I think this particular issue is fixed on the ledger side,

Yes, this has already been implemented on the ledger side and there is no zero ADA being passed to PlutusV3 context in the mint field:

transMultiAsset :: MultiAsset c -> PV1.Value
transMultiAsset ma = transMultiAssetInternal ma mempty
transMintValue :: MultiAsset c -> PV1.Value
transMintValue m = transMultiAssetInternal m (transCoinToValue zero)

it'd be good to ensure from the Plutus side that this is the case (with perhaps an invariant on that particular field -> should never contain ada).

In Plutus ADA is not that different from any other multi asset, because ADA is just a special kind of CurrencySymbol. This is strange and not very type safe IMHO:

newtype Value = Value (Map CurrencySymbol (Map.Map TokenName Integer))

I would suggest doing something similar to what we do in ledger, eg:

type ADA = Integer
data Value = Value ADA MultiAsset
data MultiAssset = MultiAsset (Map PolicyID (Map AssetName Integer))

That would guarantee that there is no ADA in the mint field. However, I have no idea how big of a change this would be for Plutus.

2. Remove the unnecessary constructor around TxId

I have no opinion on this.

3. Make txInfoRedeemer an associative list

I don't have any argument against it. Ledger guarantees that there are no duplicate keys in that Map. The desire to switch to assoc list is a bit surprising to me, cause Map is just an assoc list underneath, so I'd expect it to be a no cost conversion from a Map to an assoc list. However, I am far from being Plutus guru, so I am probably missing some context here and my opinion isn't that important on this point.

Another good argument for it also comes from how the ledger currently has to do "extra work" to convert redeemers to list and back into a Plutus Map.

I do have a comment on this point though. Changing txInfoRedeemer to an assoc list will not make ledger do any less work. The only thing that will disappear is PV2.fromList, which has no overhead whatsoever, since:

fromList :: [(k, v)] -> Map k v
fromList = Map

4. Make txInfoWdrl a Map BuiltinByteString Integer

It already is a BuiltinByteString (wrapped inside two constructors) in almost 100% of the cases. In Conway, as pointer addresses are being deprecated from use, it truly becomes 100%.

This was actually always wrong. Withdrawals could never be pointers, I think it was an oversight. This has already been fixed in PlutusV3:

transTxBodyWithdrawals :: EraTxBody era => TxBody era -> PV3.Map PV3.Credential PV3.Lovelace
transTxBodyWithdrawals txBody =
  transMap transRewardAccount transCoinToLovelace (unWithdrawals $ txBody ^. withdrawalsTxBodyL)

Note also that withdrawal is now a Lovelace, rather than Value

By making it [Credential] a BuiltinByteString, we lose information regarding whether it is a script or a pub key hash

That is a big deal and I would advise against it. Losing information about whether credential is a script hash or a key hash is strictly less information than before and there is no other way to recover this information in the plutus script.

though this information is already available to the transaction builder (as he/she couldn't build the transaction otherwise).

It is available to the transaction builder, but the script writer.

Moreso, if it's a script, then necessarily, it is available from the txInfoRedeemers; so the information still is there; only less directly.

This is inaccurate, because we have native scripts that do not have redeemers.

1, 2 look good. A few questions on 3:

it's an awkward one as the ordering of ScriptPurpose is not well-defined

Do scripts care about the order of ScriptPurposes? Why would they care?

It also makes the definition and management of efficient Map a lot trickier at a compiler level

Can you help me understand how this makes the compiler simpler. Is it simpler because this allows Aiken to only support Maps with bytestring keys?

would allow to only consider Map ByteString a in the ScriptContext

There are currently a few other maps in V3's ScriptContext, with non-bytestring keys such as: ColdCommitteeCredential, Voter and GovernanceActionId. Are you proposing to turn all of them into lists?

For 4, it is not obvious to me that given a hash in txInfoWdrl, how to determine whether it is pubkey or script hash from txInfoRedeemers, can you elaborate?

how to determine whether it is pubkey or script hash from txInfoRedeemers, can you elaborate?

It is possible only to learn which ones are plutus script. It is not possible to learn if it is a pubkey

Redeemer will contain a Credential for every withdrawal that is hash of a plutus script. So it would be possible to learn from redeemers which withdrawals were locked by a plutus script, but it does not tell you which ones were locked by a native script vs a regular key witness.

I don't think we should lose information about the credential that is making a withdrawal, even if other cases are rare.

I don't really understand the point of 3 either. You can always just unwrap the map and get the underlying association list at zero cost.

@zliu41 Do scripts care about the order of ScriptPurposes? Why would they care?

Scripts don't generally; but compiler makers and library builders do ๐Ÿค“!

@michaelpj: I don't think we should lose information about the credential that is making a withdrawal, even if other cases are rare.
@lehins: This [the information about whether it's a script or a pubkey hash being available via the redeemers] is inaccurate, because we have native scripts that do not have redeemers.

That's a good point; I forgot about those ones. So likely then, the alternative of making it a true associative list sounds better and less controversial.

@michaelpj: I don't really understand the point of 3 either. You can always just unwrap the map and get the underlying association list at zero cost.
@zliu41 Can you help me understand how this makes the compiler simpler. Is it simpler because this allows Aiken to only support Maps with bytestring keys?

We can unwrap the maps provided that either:

  1. Users / contract writers do it explicitly.
  2. We have knowledge of where the map to unwrap is.

Regarding (2), we currently have a separation between the script context and the compiler. The script context structure is defined separately (via Aiken's standard library) which means that it isn't currently possible to instrument the compiler into knowing that elements of the script context should be patched. And we would very much like to keep it that way, as this separation makes the compiler relatively agnostic to the Plutus version. So while we could indeed look for that specific field in the script context and manually patch it to unwrap it; it introduces some complexity which feels unnecessary.

Given that maps do not provide any intrinsic values over a plain list; they aren't so useful and usually just "gets in the way". We've observed that most people writing on-chain validators end up doing (1) and prefer working with lists all the way. Maps provide no performance benefits, and simply create a larger-than-necessary learning surface. I assume that a direct consequence of the existence of Map is the existence of Pairs at the UPLC level. However, Pairs in there can already be emulated as Constr 0 [Data, Data].

@zliu41: There are currently a few other maps in V3's ScriptContext, with non-bytestring keys such as: ColdCommitteeCredential, Voter and GovernanceActionId. Are you proposing to turn all of them into lists?

Interesting! So from a similar motivation, yes, I am also proposing to turn them all into lists. Now, I am more and more questioning the usefulness of the Map constructor on Data altogether and perhaps, as a slightly bigger change (albeit less controversial?), we could imagine removing Map entirely? One of the beautiful thing about the PlutusVM is how it is in the end a relatively simple virtual machine; so making it even simpler without losing any of the expressiveness sounds like a good idea to me ๐Ÿค”? Unless I am missing something, Map only exists for convenience but as it turns out, actually brings more inconvenience than convenience. So perhaps removing it entirely effectively solve one problem, while we keep the discussion going about better builtins for map-like structure (see: cardano-foundation/CIPs#638)

We can unwrap the maps provided that either: 1. Users / contract writers do it explicitly, 2. We have knowledge of where the map to unwrap is.

I was suggesting the latter. If you have a Map and you want a list, it's trivial to get one or the other. Nonetheless, it is a map, so labelling it as such seems helpful.

I assume that a direct consequence of the existence of Map is the existence of Pairs at the UPLC level. However, Pairs in there can already be emulated as Constr 0 [Data, Data].

Now, I am more and more questioning the usefulness of the Map constructor on Data altogether

I was very confused before, but it seems like you are mostly talking about the Map constructor of Data, and not about how the ScriptContext type is represented in Plutus Tx, which is what I thought we were talking about.

The reason that Map exists is because Data was supposed to be a subset of CBOR, and CBOR has that. It seems semantically useful. We have in fact been talking about whether we could enforce that it contains no duplicates, which would make it more useful.

We do indeed need builtin pairs in order to have Map, but we would need it for lists of pairs just as much. Only builtin types can go in builtin types, so you can't have a builtin list of SOP-encoded pairs, you need a builtin pair type. Maybe we could use SOPs the whole way and get rid of both of them? I'd need to think about it.

So perhaps removing it entirely effectively solve one problem, while we keep the discussion going about better builtins for map-like structure

Please can we talk about one thing at a time. All of these things are quite different:

  • The type used for map-like things in the Plutus Tx script context
  • The way map-like things are encoded in Data
  • The existence of builtin map like data structures in UPLC

@michaelpj: I was very confused before, but it seems like you are mostly talking about the Map constructor of Data, and not about how the ScriptContext type is represented in Plutus Tx, which is what I thought we were talking about.

Correct; I linked to the PlutusTx/PlutusApi source code because there is quite a close relationship between the two at the moment. I wish that the script context had its own separate Haskell-agnostic specification; but it hasn't. And given how the cardano-ledger currently depends on the Plutus codebase for constructing the ScriptContext, the Haskell types (and the associated template haskell derivations) in PlutusApi is de-facto the specification.

@michaelpj: It seems semantically useful.

On that I agree; though in practice it isn't that useful because any of the semantic can't be enforced at the serialization. When it comes from the script context, there are some assumptions that can turns out to be true (such as that elements in a map are in ascending lexicographic order of the keys) and that are somewhat reliable since they are provided by the ledger. But in general, no.

@michaelpj: we would need it for lists of pairs just as much. Only builtin types can go in builtin types, so you can't have a builtin list of SOP-encoded pairs, you need a builtin pair type

Yes; I wasn't implying that pairs are only useful as part of Map. And I don't want to open the debate about whether we should get rid of pairs altogether (as it seems that lists are often just as efficient) ๐Ÿ˜…. At least not now!

@michaelpj Please can we talk about one thing at a time. All of these things are quite different:

As explained above, because of the lack of clear separation between the ledger and the Plutus source code, the first two points are effectively the same thing (although, conceptually, they aren't). While in PlutusTx the Map is a newtype around an assoc list, it is actually encoded as a Data's Map.

So I guess what I am truly asking is not necessarily to change the PlutusTx's Map, but to alter the on-chain encoding of them to be an assoc list instead of a Data's Map.

And I think the demand for the 3rd point (builtin-map like data structures) only reinforces the awkwardness of the current (Data's) Map in the language.

I disagree with 3 then removing the uplc data type without a good alternative would involve extra contract cost the compiler would push onto user contracts. Working with lists of pairs produced by unMapData is cheaper then working with a constr or list of 2. I would not remove the map uplc type without an equal cost replacement. I would rather the Aiken script context change to just be an AssocList type and leave Dict to be bytearray keys. Just my thoughts on the matter

Regarding 3, which one are you proposing: leaving maps with bytestring keys in the ScriptContext, or converting every map to list?

Either way, I still have the following question: doesn't Aiken (or any language targeting UPLC) need to provide a map type that enforces uniqueness, since users may want to use maps in the scripts? Then, if you encode both maps and lists of pairs as List in Data, when you convert Data back to the Aiken type, how do you know whether it is a map or a list?

An example is that it makes sense for Value to be a map. If it was a list that may have duplicate keys, then operations like "how much Ada is there in a Value", "are these two values equal", and "union two values" will be much more expensive to perform.

@zliu41: doesn't Aiken (or any language targeting UPLC) need to provide a map type that enforces uniqueness, since users may want to use maps in the scripts?

Yes, but there's no builtin maps in Plutus; only associative lists. Which means that validators have to convert and check for uniqueness by traversing the list anyway for anything that is user-provided (we can reasonably assume that data present in the script context data is ledger-provided is correct).

This is done implicitly in the case of PlutusTx (the construction of a scott-encoded Map actually re-construct a map from an assoc-list as stored in a Data to ensure uniqueness) or explicitly in the case of Aiken where we do not allow Map to figure in Datums or Redeemers; so users have to manually call a from_list method to enforce the invariant (since it can't be enforced at the binary level).

cardano-foundation/CIPs#749
cardano-foundation/CIPs#735

I would also like to introduce:

  1. txInfoObservations :: [ScriptHash] as a field to the TxInfo in the ScriptContext
  2. Add new Maybe BuiltinData field to TxOut for output tagging / mechanism by which we can assocaite arbitrary data with transaction outputs

As described in the above CIPs.

in the case of Aiken where we do not allow Map to figure in Datums or Redeemers

we can reasonably assume that data present in the script context data is ledger-provided is correct

Exactly - this is the right thing to do for datums and redeemers, but ScriptContexts are provided by the ledger. So, by using the Map constructor of Data, we can assume that the ledger is promising that it is actually a map with no duplicates.

Then, on the Aiken side, you can directly turn this into the Aiken map type via a cheap operation akin to fromAscList, without checking for duplicates. That should be much cheaper and thus more desirable. And if one actually wants a list, they can apply to_list, which should be cheap (if not a no-op) as it doesn't need to check any invariants.

@zliu41 that is already what we do; there's actually no extra cost in constructing a map from the script context. but to circle back to the root of the problem that triggered this ticket: to support this properly, we have to ensure a generic type parameter for the map key and this comes at some extra cost and complexity.

The thing with map is that because they are inherently as efficient as a list, we haven't seen anyone leveraging them; people just go for lists all-the-way. In fact, what we often observe is people just doing dict.to_list(...) from the start and completely disregarding the map (a.k.a dict in Aiken) from the start.

For maps already provided by the ledger (i.e in the script context), there is some slight efficiency benefits due to the invariant of the elements being unique and ordered. So some functions (e.g. inserting & union) can be made faster -- though they are often unused because of their large cost.

Yet, there's room for more efficiency (especially on get / lookup functions) that requires more finesse with the compiler as we need to provide ordering for the keys. Providing ordering basically means adding an extra argument to all functions, which also comes at a cost. This is necessary because maps are currently generic in the key type as well. For the few that do use maps, all instances of map usages we've seen in the wild have actually been using maps of ByteString -> a. The only occurences of a non-ByteString key are in the ScriptContext. So getting rid of those would allow to simply consider maps with ByteString as key, thus avoiding the extra argument passed around and producing much more efficient on-chain code (up to 40%-50% more efficient, amortized, on maps with more than 10 elements).

For now, the alternative we are considering is to define dedicated types for the script context, and fallback to a more efficient ByteString-map in other places. This is unsatisfactory as it increases the learning surface and seems needlessly complex.

So, TL;DR;

  • Maps (in both PlutusTx and Aiken) are effectively represented as assoc list, so they usually provide little benefits over lists.
  • It appears that in the wild, no one really makes use of Maps indexed by anything else than a ByteString.
  • The only occurences of non-ByteString keys we've seen are in the ScriptContext.
  • Because of the last point, we need to support key-generic maps which comes at a (significant) extra cost.
  • It appears that in the wild, no one really makes use of Maps indexed by anything else than a ByteString.

to supplement @KtorZ 's point, some idiotmatic aiken snippets from current projects I work on:

expect [_ada, (locking_policy, this_locked)] =
        value |> value.to_dict |> dict.to_list
expect [(locked_token_name, locked_qty)] = this_locked |> dict.to_list
expect [(mint_token_name, mint_qty)] =
        mint
          |> value.from_minted_value
          |> value.tokens(own_policy)
          |> dict.to_list
expect [(releasing_token_name, releasing_amount)] =
        releasing_tupl |> dict.to_list
expect [(mint_token_name, mint_qty)] =
        mint
          |> value.from_minted_value
          |> value.tokens(own_policy)
          |> dict.to_list
expect [_ada, (change_policy, change_tupl)] =
     value |> value.to_dict |> dict.to_list
expect [(change_token_name, change_amount)] =
     change_tupl |> dict.to_list

Across multiple projects / smart contracts, we apply this pattern of dumping it all into lists for the sake of efficiency. We also have unnecessary "value.from_minted_value" which removing null ada already fixes. We would find this much cleaner & concise with the requested changes to ScriptContext.

I agree with the @KtorZ @micahkendall's points above; however, I would add that:

So some functions (e.g. inserting & union) can be made faster -- though they are often unused because of their large cost.

Is true in all cases except for the inner Map within Values. Union & insertion into Value's from the ScriptContext is incredibly common since the idiomatic approach for validating expected output values is to construct the expected output value onchain and then compare it to the actual output value to prevent the presence of unexpected tokens (ie dust token attack) and to be more explicit about the expected conditions.

Excerpt from Encoins:

    let cond6 =
      lovelace_of(valToProtocol) >= 0 || any(
        info.outputs,
        fn(o) {
          o.address == changeAddr && isNonnegativeValue(
            merge(o.value, valToProtocol),
          )
        },
      )

Ok, let me try to summarize the pros and cons of the current ScriptContext (with the Maps)

Pros:

  • A Map field can be considered as a promise that there's no duplicate keys. If it was a list, then such invariant will have to be separately documented.
  • For certain things, it is usually more efficient if they are maps (e.g., Value). Currently, all such maps happen to use bytestring keys (or, perhaps it is only useful for Value), so converting the non-bytestring-keyed maps to lists won't cause any issue.
    • However I wonder if there's any fundamental reason for this. If not, then in the future, non-bytestring-keyed maps could appear again in ScriptContext.

Cons:

  • If a script only wants to operate on the underlying list of a map, it'd need to apply to_list, which is a small annoyance.
  • Having non-bytestring-keyed maps complicates the implementation of the Aiken language and compiler, since it will need to support maps with polymorphic keys.
    • As far as I understand this shouldn't lead to higher script evaluation cost, since you can use the specialize map whenever applicable, just like how you'd use IntMap for int keys and the regular Map otherwise.

If these are all correct, I don't mind one way or the other. On the one hand, if Map is indeed only useful for Value, and this is unlikely going to change in the future, then all those to_list can be annoying. On the other hand, if the main problem is that it complicates the implementation of the language and compiler, that's far less bad than complicating things for the users or making scripts more expensive.

@lehins @michaelpj What say you?

Are all invariants from the ledger typed? It would make sense to have this as a keeping point if that were true, or maybe we can go towards that, but otherwise I would like to see them documented external to the source base.

Are all invariants from the ledger typed?

Yes.

What say you?

I personally like types, that's why I work in Haskell. Without looking at how PlutusContext is being used I'd suggest having more types the better, which means keep using the Map type. Minor annoyances of conversions using to_list, if the don't cost anything in terms of runtime, is useful in itself (a parallel can be drawn with newtypes in Haskell with all the wrapping and unwrapping - it brings type safety). That being said, I do not write in Aiken or Plutus languages, so I do not know what makes most sense for the end user.

I am not sure if this is relevant. but if there is ever a plan to have efficient Maps with O(log n) lookups and inserts, I'd definitely recommend keeping the Map type, since otherwise we'll have to switch back from assoc lists to Maps, once we get that support.

@zliu41: As far as I understand this shouldn't lead to higher script evaluation cost, since you can use the specialize map whenever applicable, just like how you'd use IntMap for int keys and the regular Map otherwise.

This is assuming an advanced-enough compiler; Aiken doesn't get there. We do support generic maps, and we do typically monomorphize functions to a single type; yet every generic function still must carry around extra arguments and functions for ordering in the case of maps. We aren't able yet to fully erase the cost of generics, so even though the functions are specialized to ByteString keys, they still carry an overhead.

Bringing to compiler to a point where that overhead will be fully gone is challenging, which is why we're opening this ticket :). Since if we can get those few changes in, we don't need to invest extra efforts in making the compiler smarter. As much as we like solving problems, that problem isn't really driven by a use-case (no one is actively asking for maps generic in their keys) but simply exists because of arbitrary design decisions made when designing the script context.

@lehins: I personally like types, that's why I work in Haskell. Without looking at how PlutusContext is being used I'd suggest having more types the better, which means keep using the Map type

We like types too! And to be clear, we're not asking to 'get rid of the Map type' (even though this came on the table at some point as a possibility). We're asking for better types that reflects better the current use-cases. Having maps is conceptually good, and as @MicroProofs pointed out, we still very much likely need a Map variant on the Plutus Data for efficiency. What we're arguing is the need for Maps being generic in their keys. The only occurrences of that are in the script context and given that maps provide no performance benefits over lists when it comes to lookup, we might as well transform the few maps we have there with non-bytestring keys as plain lists.

@lehins @zliu41 @michaelpj I've updated points (3) and (4) on the original description of the ticket to match what I believe being the refined proposal resulting from our conversations above. Let me know if I can clarify anything.

So, gotta be honest, I got lost in the sauce for a lot of the conversation above and am not really sure what to make of it; perhaps I've been pushing myself too hard lately ๐Ÿ˜…

However, I was talking to @KtorZ about a subtle issue we're facing with the Sundae v3 contracts, and he said sharing some details here with real world numbers would be helpful. I leave it as an exercise to the reader to map this onto any argument for or against any particular change.

One of the redeemers for one of our scripts involves the type List<(Int, Option<X>)>; The first integer in the tuple is intended to encode the index into the inputs to process, and the second field in the index is some additional data for an unrelated feature.

The order is important, because if we say [(3, None), (2, None), (4, None), (1, None)], we really want to process the inputs in the order 3,2,4,1.

However, (and I'm not sure if this is an Aiken quirk, a Plutus quirk, etc.) this type looks an awful lot like how Map's are encoded as association lists, so Aiken says "aha, this must be an assoc-list, like the script context gives to us! so lets encode this (or try to parse this) as a cbor map".

However, the cbor specification says that it is incorrect to ever depend on the ordering of keys within a cbor map; because of this, many default cbor serializers won't try to preserve the order when you serialize the map. This would mean that anyone wanting to construct the correct redeemer here would need a custom serializer that said "no, no, i really know what i'm doing, please put this in this specific order".

This is really awkward and leads to potential bugs (offchain, but bugs nonetheless), so we wanted to evaluate our options to get this field interpreted as a real list, as there is no way (afaik) to tell Aiken that we want the type we said we want (List<(Int, Option<X>)>; we need to do something to interrupt the special logic for "list of 2-tuples".

The first way we tried was List<MyType>; however, this inserts another intermediate constructor in the cbor, which is enough overhead to eat into our budget enough to lower our max batch size.

Next, we tried List<(Int, Option<X>, ())>; this also lowers our max batch size, because () is encoded as a constructor plus an empty list.

Finally, we tried List<(Int, Option<X>, Int)>, and always pass 0 for the 3rd tuple field, and avoid destructuring it in aiken.

Ultimately this is actually faster than the encoded cbor map, and saves us around {steps: 53_936_334, mem: 6514} over the cbor map version, for whatever reason.

I'm sorry, I've read the discussion several times and I'm still failing to understand.

  • The Map constructor in Data wraps an association list. Unwrapping it has trivial cost. I don't understand why this is a problem.
  • I think there's a lot of implicit context about how Aiken works flying around that I don't understand at all. I don't see what the Map constructor of Data has to do with Aiken's ability to provide an efficient Map type.
    • This is also entirely orthogonal to the question of whether PLC could have an efficient builtin BytestringMap type. That would be a different type to Data, you would need to convert between the two, etc.
  • I think that you're saying that Aiken wants to assume that whenever it sees a Map constructor in a Data object, the keys are all bytestrings. That's impossible since you could get Data objects from other sources for which you have no such guarantees. Or am I just misunderstanding again?

Generally it sounds to me (especially reading Pi's comment) like a lot of this is a problem with how Aiken encodes things (which I don't understand and nobody has explained), rather than with anything the ledger is doing.

My position is this:

  • Data allows arbitrary keys in the Map constructor, since it corresponds to the CBOR Map constructor
    • We can't forbid this in general for Data objects without breaking other people
    • It's also perfectly reasonable to use something other than a bytestring as the key
  • The fields in the script context are, semantically, maps, and so we should represent them as such
  • The overhead of removing the Map constructor if you don't want it is tiny, so this shouldn't be a problem
  • We might want an efficient BytestringMap, but this is a completely orthogonal question

@michaelpj my apologies, I have a tendency to explore multiple ideas at once and weight them against each other and I understand this can be confusing especially in a written form. So let me try once again, to the point.

@michaelpj The Map constructor in Data wraps an association list. Unwrapping it has trivial cost. I don't understand why this is a problem.

Correct 1.

@michaelpj I don't see what the Map constructor of Data has to do with Aiken's ability to provide an efficient Map type.

The problem mainly stems from how Maps are inherently associative-list (in both Aiken and PlutusTx for that matter). While there's an option in PlutusTx to possibly (Scott-)encode map differently (e.g. as a binary tree), Aiken doesn't have much choice here since as you know, the internal representation of our types is directly mapped onto the underlying PLC structure. So we are bound to an associative-list.

This in itself isn't a problem and we aren't suggesting to change that.

@michaelpj This is also entirely orthogonal to the question of whether PLC could have an efficient builtin BytestringMap type

Correct. I just picked that example as a point in the argumentation that Maps could be more efficient. Making them more efficient is what motivates some aspects of this ticket.

@michaelpj I think that you're saying that Aiken wants to assume that whenever it sees a Map constructor in a Data object, the keys are all bytestrings.

Almost. What I am saying is that I would like to make this very assumption when we see a Map constructor in a Data from the Script Context. Which is possible since we are in complete control of how the script context looks like.

In fact, I do care very little about how fields of the Script Context are represented in the Plutus Api. I only care about the PLC structure of that ScriptContext. Yet, both are intertwined at the moment since there's no intermediate binary interface that the ledger implements against. Instead, the ledger uses the Plutus libraries to turn Ledger's Haskell types into Plutus Haskell types directly and rely on the serialization defined in those libraries.

So on the principle, fields like txInfoRedeemers could remain (Haskell) Map in the Plutus API, but I would like their PLC representation to be a Data' List. This slightly alters the serialization / deserialization logic for some of those types as it can no longer be auto-derived; but I that is kind of the ask here and I am happy to work on that if that helps.

I honestly doesn't see why it is such a big deal and the only argument I am really seeing standing against seems purely ideological / conceptual. The changes are fairly well-scoped and small, and they would save us months of work in the Aiken compiler if we wanted to improve that situation.

Footnotes

  1. What people like Micah have been trying to show with their examples of wrapping / unwrapping isn't really about the cost and the annoyance of doing it; but it's about usage. While some of those items are conceptually maps, they are in practice used as lists. So we might as well try to meet people where they are; not conceptually but practically. โ†ฉ

I honestly doesn't see why it is such a big deal and the only argument I am really seeing standing against seems purely ideological / conceptual.

I agree with this statement but in the other direction ๐Ÿ˜‚ I really feel like I'm being dump but I still don't see why the current situation is a problem.

Almost. What I am saying is that I would like to make this very assumption when we see a Map constructor in a Data from the Script Context.

Except... you're proposing to get this by changing the things that don't have bytestring keys into lists. So they're going to have exactly the same lookup properties. So what's the advantage, apart from adding an unnecessary and confusing restriction to our use of Data?

So what's the advantage

The advantage is a tiny change in the Plutus <-> Ledger API which doesn't affect the performances or semantic of PlutusTx, but that saves us in Aiken months of work and extra complexity in the compiler that we would need to add should we want to make maps faster. With this small change, we can save quite a significant portion of execution units for a large variety of use-cases.

Okay, can you walk me through why? I don't get it at all. You write the compiler, just do something different. Or have a BytestringMap type that you use as the type of the script context fields that are bytestring maps and not when they are not.

Can you walk me through it again?

We do write the compiler indeed. So we try to spend times on things that matters as much as possible (as you do I imagine).

So, we could support a BytestringMap indeed, that's straightforward and it is in fact what we're leaning towards at the moment. Although it is unsatisfactory because it means that we need to also introduce another abstraction for non-bytestring keys. This creates a higher barrier to entry for new joiners and more learning surface in general. We already had to introduce a separate map to cope with the 0-Ada value in minted assets.

So given the lack of general interest in non-bytestring maps, we thought we might as well propose getting rid of them entirely. This way, it still is a rather small change in Plutus, we can keep the smart contract API simple and clean, and we don't loose much in expressiveness. Plus, we get to spend time on something else instead of having to introduce specialisation in the compiler for the sake of improving performances on bytestring maps.

So in summary of the conversation above and points I've been trying to make:

  • We generally want faster on chain validators.
  • Specialised maps are faster.
  • We don't have a 0-cost abstraction in the Aiken compiler for this, so Maps generic in their keys come at an extra cost.
  • Bytestring maps are useful and sometimes used.
  • Non bytestring maps aren't as useful and generally unused (from observing contracts in the wild)
  • Supporting 0-cost specialisation for maps in the compiler may be doable, but it's quite some work.
  • Introducing a special case for Bytestring maps is an option, but not quite elegant and with the downside of creating more noise for contract developers.
  • If there were no non-Bytestring maps, we wouldn't need to do either (no need for advanced specialisation, no need for an extra type).

Let me try to articulate what I think some of the important points are from a different angle, to try to dislodge this conversation abit. It seems to me like the Aiken team is dealing with two properties that are in tension with each other:

First, Aiken currently has the property that there is no "special" script context type; it's just defined in the standard library. This is really beneficial for a huge number of reasons:

  • Less code / magic in the compiler, making its correctness easier to achieve
  • Less "magic" for a user to Cardano to learn; they can just go and read the code that defines the script context type
  • Clever optimization tricks; advanced users can define their own script context types, using Data to stub out fields they don't want to deserialize
  • A step towards ledger agnosticism: in theory, you could implement different script contexts (for example, for a side chain), and write aiken contracts for these side chains
  • Upgrading to each new era is much simpler: just update the type
  • probably others

Second, though, there are types that, in practice, only appear in the script context, and are not something that an Aiken developer would need. For example, Aiken becomes a lot simpler if, in the aiken language, the only type of maps that are supported have Bytestring keys. Obviously the underlying UPLC still continues to support arbitrary maps, and perhaps that's useful for other languages etc. So, I believe @KtorZ's point is that if we can 1) standardize on an ABI for the script context, and 2) make that ABI as simple as possible, without sacrificing on performance, then it allows language authors like Aiken to make these kinds of trade-offs very easily.

As it is, if Aiken wanted to limit "Map"s to bytestring keys in their language, they'd have to give up the first property, and have a special type that's only usable in the script context.

However, (and I'm not sure if this is an Aiken quirk, a Plutus quirk, etc.) this type looks an awful lot like how Map's are encoded as association lists, so Aiken says "aha, this must be an assoc-list, like the script context gives to us! so lets encode this (or try to parse this) as a cbor map".

This is quite bizarre - I cannot think of any reason why Aiken would use the Map constructor of Data to encode a list, just because it's a list of pairs. Also this is orthogonal to the discussion because it will be a problem regardless of whether Aiken only supports bytestring-keyed maps or not (if Aiken only supports bytestring-keyed maps, a List<(ByteString, Foo)> will still look like a map).

List<(a, b)> will be a 2 item array inside an array going forward. I had planned to make this change months ago just needed some more coordination and urgency lol.
The type that will use the Map Data encoding will be List<Pair<a,b>> going forward. Which Pair is a special type that cannot be used with generics except in Lists. This will solve the Aiken quirk people are talking about.

List<(a, b)> will be a 2 item array inside an array going forward. I had planned to make this change months ago just needed some more coordination and urgency lol. The type that will use the Map Data encoding will be List<Pair<a,b>> going forward. Which Pair is a special type that cannot be used with generics except in Lists. This will solve the Aiken quirk people are talking about.

Interestingly enough that will be more efficient than the current implementation as far as I can tell:

"headList-cpu-arguments": 43249,
  "headList-memory-arguments": 32,
  "tailList-cpu-arguments": 41182,
  "tailList-memory-arguments": 32,
  "fstPair-cpu-arguments": 80436,
  "fstPair-memory-arguments": 32,
  "sndPair-cpu-arguments": 85931,
  "sndPair-memory-arguments": 32,
  
  unListData                 -- 32247 
  headList x                 -- 43249 + 23000 (one force)
  headList (tailList x)      -- 43249 + 41182 + 46000 (two forces) 
                             -- total CPU: 228927
                                    
  fstPair x                    -- 80436 + 46000 (two forces)
  sndPair x                    -- 85931 + 46000 (two forces)
                               -- total CPU: 258,367
  
Difference of `29,440` CPU 

Okay, let me try and articulate it.

First, Aiken currently has the property that there is no "special" script context type; it's just defined in the standard library.

Great, so there is some definition of the script context type. Presumably today it looks something like this:

data ScriptContext = ScriptContext {
   ...
   txInfoRedeemers  :: GenericMap ScriptPurpose Redeemer,
   -- another random map field for example
   txInfoData :: GenericMap V2.DatumHash V2.Datum,
   ...
}

where you currently call "GenericMap" just "Map".

You want this to be

data ScriptContext = ScriptContext {
   ...
   txInfoRedeemers  :: AssocList ScriptPurpose Redeemer,
   -- another random map field for example
   txInfoData :: BytestringMap V2.DatumHash V2.Datum,
   ...
}

where you can then call "BytestringMap" just "Map". But I don't see how that is superior to

data ScriptContext = ScriptContext {
   ...
   txInfoRedeemers  :: GenericMap ScriptPurpose Redeemer,
   -- another random map field for example
   txInfoData :: BytestringMap V2.DatumHash V2.Datum,
   ...
}

where you can still call "BytestringMap" just "Map" if you want to and leave "GenericMap" with a distinct name. In both cases you have to have two different types for the different fields, just in one case you give it some name that evokes a map, and presumably provide some map-like functions on it, whereas in the latter you give it some name that evokes a list, and maybe don't provide map-like functions on it?

As it is, if Aiken wanted to limit "Map"s to bytestring keys in their language, they'd have to give up the first property, and have a special type that's only usable in the script context.

My proposal above includes a type (GenericMap) that might only (or mostly) be used in the script context, but I don't see why it would only be usable in the script context. Are you telling me that your users are never going to want a map that has arbitrary keys? Surely you want to provide both, even if the default is BytestringMap?


Perhaps this also illustrates my confusion with your request: it seems to me that you are in complete control over your specification of what the script context looks like in Aiken, so I am quite confused by the suggestion that we should change the ledger rather than you just changing how you describe it ๐Ÿค”

@michaelpj Great, so there is some definition of the script context type. Presumably today it looks something like this:

Correct. We call it Dict (just to circle back to some of the previous comments).

@michaelpj You want this to be...

Not quite. This is what we could do today, and we wouldn't have to bother you about it. But this introduces two distinct types for maps, and it's just weird to explain from a developer experience perspective. So what we would want is either: (a) ByteStringMap everywhere there's a map in the script context or (b) turn non-bytestring maps into plain lists (e.g. [RedeemerWithScriptPurpose]) where the map aspect aren't actually that interesting (even if conceptually, they are maps; they end up costing more as maps than they do as lists).

And again, we're only talking about the PLC representation here; if the PlutusAPI wants to keep them as Map, that's up to you and I don't think it changes much for you since you still have to go through the list once to Scott-encode it.

@michaelpj Are you telling me that your users are never going to want a map that has arbitrary keys?

Yes! That's one of the point I've desperately been trying to make ๐Ÿ˜… ... and this comes from the observation of several contracts seen in the wild from various dApps. No one uses anything else than ByteString maps (and even them are seldom used; usually people just prefer lists).

@michaelpj Surely you want to provide both, even if the default is BytestringMap?

No. If a language wants to, then why not. I don't think we've clearly identified a use-case for it in Aiken. We only have to support GenericMap because of the script context. We're totally fine with no generic maps.

Not quite. This is what we could do today, and we wouldn't have to bother you about it. But this introduces two distinct types for maps, and it's just weird to explain from a developer experience perspective.

turn non-bytestring maps into plain lists (e.g. [RedeemerWithScriptPurpose]) where the map aspect aren't actually that interesting (even if conceptually, they are maps; they end up costing more as maps than they do as lists).

I'm sorry, I do just find this very uncompelling.

  • These fields are generic maps
  • The representation as an association list is also a generic map, just with a different name.
  • Generic maps are a normal and useful concept, even if most people aren't using them

It seems like at the best this change would save you from having

-- | Aiken doesn't support generic maps directly, but uses this type to tag association lists that should be interpreted as maps.
newtype GenericMap a b = GenericMap { getAssocList :: [(a, b)] }

which is a tiny benefit to just Aiken. And the cost is a perpetual weird invariant on how we use the Map constructor of Data in certain circumstances.

๐Ÿ‘Ž from me.


I think there's nothing for us to do on 1, and we should certainly do 2.

Your call. Just to add one last point, the benefit isn't purely about just avoiding the newtype for us, but it makes using ByteString maps quite a lot faster in the end -- which is the main use-case we've identified.

So, okay, forget about (3) and (4), we'll do something on our end. I am glad to see (1) and (2) addressed at least โ˜บ๏ธ

How can it make using BytestringMap faster? You can still use the fast code if you know it is a bytestring map?

It's the cost of generics. We don't support an advanced enough specialization in the compiler to completely erase the effect of generics (and I don't think Haskell do either for that matter?).

Generic maps have to carry extra arguments in the generated PLC to deal with the ordering of the key. We do limit the impact on this already on recursive functions but we've observed that the cost is still significant enough for us to consider having truly specialised bytestring maps.

So we will resort to that, yet as I explained, we would have preferred not introducing another type for maps; and the only reason we have to is because of the script context.

It seems like we're going in circles so I guess we should stop.

It sounds like you're going to have a specialized bytestring map type in any case, it's just a question of whether you also have a generic map type. As such I can't see how the specific case of bytestring maps can be different since in both cases you have a specialized type.

I am glad to see (1) and (2) addressed at least โ˜บ๏ธ

Yeah we'll address 1 and 2 for sure. By the way the discussion on 3 reminds me that we should make a ByteStringMap in Plutus Tx and see how much cost it saves - could be a quick and easy win ๐Ÿ™‚

I would also like to introduce:

  1. txInfoObservations :: [ScriptHash] as a field to the TxInfo in the ScriptContext
  2. Add new Maybe BuiltinData field to TxOut for output tagging / mechanism by which we can assocaite arbitrary data with transaction outputs

As described in the above CIPs.

I don't think 2 is matured enough to be considered for Plutus V3, but I really would love to see 1 squeeze itself in. Should I create a separate issue to request it? Would it be feasible given how close Conway is?

cardano-foundation/CIPs#749

Would it be feasible given how close Conway is?

Extremely unlikely. This would require a lot of changes to the ledger specification and the rules. Change to transaction serialization format due to addition of new field, which in turn affects hardware wallets. I believe this is too late, but I certainly don't see a problem with adding it in the next era.