ipfs / roadmap

IPFS Project && Working Group Roadmaps Repo

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[2020 Theme Proposal] Sharing and Collaborating

aschmahmann opened this issue · comments

Note, this is part of the 2020 Theme Proposals Process - feel free to create additional/alternate proposals, or discuss this one in the comments!

Theme description

Distributed p2p systems like IPFS are at their best when users are able to share data and collaborate on its creation. While IPFS and the ecosystem around it have started us down the path of peers being able to share files and create documents with each other it's time for us to push this into a more usable and user-friendly realm.

Core needs & gaps

  • Access control: Users want to be able to share data with each other, and they should be able to do so without making the data publicly available. While users may choose to encrypt and then publicly expose their data, they could also just make sure to only send data to peers that they have authorized
  • Collaborative Applications: IPNS was a start for enabling decentralized mutable data, and while there are upcoming performance improvements there is still a lot of work to be done. Having performant supported solutions for applications to have multiple parties collaborate (e.g. collaborative document editing, community wide chat applications, github clones, etc.).
  • Identifiers: IPFS has peerIDs and IPNS can already be used as a decentralized identifier (e.g. as the DID spec compliant IPID). However, IPFS has no decentralized system for correlating a human name with these identifiers. While IPFS itself does not need to build any social fabric around identifiers, providing a simple API to label one's friends would be a huge step forward.
  • Dogfooding: IPFS development and community interactions still do not occur over decentralized systems. Enabling the development of applications that our community can use to create, communicate and organize would give us a great way of gauging how IPFS performs in the wild and make sure that it consistently performs well enough for users to want to use it.

Why focus this year

These areas for improvement have been in demand by the IPFS community for several years, and new comers to our community expect these features to be available. By putting emphasis on these goals in 2020 we can increase ecosystem growth and by using IPFS tools as part of our community toolset we can keep a tight leash on IPFS performance.

The community has already started significant groundwork towards these goals with identity management projects like Nomios, collaboration applications such as PeerPad and sharing and communication applications and frameworks like Textile and Peergos. Let's help them keep the great work up!

Milestones & rough roadmap

  1. Identity support in IPFS
    • Users can manually associate a name with a decentralized identifier
    • An API is exposed and integrated into at least one external client
  2. Access control
    • Users can limit data added to IPFS from being shared, even while connected to the public network
    • Access control may be tied to decentralized identifiers
  3. Mutable Data Improvements
    • Support for a decentralized mutable data primitive (like IPNS) that supports multiple concurrent writers
    • Support for third-party republishing/maintaining of mutable data records
  4. Dogfooding - One of:
    • Make PeerPad a stable and performant enough for IPFS team notes to be stored there
    • Create an IPFS based chat client that can bridge with our existing IRC, Matrix, Discord rooms

Desired / expected impact

  • The community is energized around IPFS's practical use
  • IPFS is stable and performant enough for some core community interactions to use it
  • Community members choose to share data from their local machines via IPFS instead of Dropbox, Email, etc.
  • Users feel comfortable adding non-public data to IPFS
  • Enable the creation of new collaborative DWeb applications

This is great, @aschmahmann!! Your points about the importance of ACLs (access control layers) and how to identify nodes you care about really resonate!

Curious if @ianopolous @michaelsena @andrewxhill or @haadcode have thoughts on the pros/cons/proposals for baking some of these primitives more deeply into IPFS?

I think most of these problems can already be solved on top of ipfs. And doing it that way allows you to minimise your trusted computing base. E.g. if the code that talks to the network is segregated from the code that does decryption then that is a huge win for security and auditability.

For identity you either have the 100% decentralized version where you just sign a data structure (publish it in ipns) which can already be done, or if, like we decided in Peergos, you want the same UX as every other social app in the world then you want unique usernames. Uniqueness obviously requires a logical centralization point. This could be a blockchain like namecoin, or we've chosen to use a single ipfs instance to authorise and republish new signups (usernames) guaranteeing uniqueness that way. so long as it's only a centralization point at signup that's a good trade off for much better UX in our opinion. The moment users need to see a hash or a key then 99% of the population will be lost.

Access control is also a mostly solved problem in my opinion. We have solved it in Peergos with a cryptographic capability based system called Cryptree which allows you to grant and revoke, read and/or write access to individual files or entire subtrees. The granting can happen in O(1), and the revocation is O(number of files/dirs in subtree) as you need to rotate keys and reencrypt things. You can grant as many users access to an item as you wish. There's even an option to make it public by publishing a capability - see for example: https://alpha.peergos.net/public/peergos/releases which give gives human readable paths to fully authenticated content. This is all 100% client side with no server trust required (a general principle in Peergos). The one thing that would be useful here from ipfs is making it easier to publish signed things where the keys are never exposed to IPFS (there's an issue for it).

For users to feel comfortable adding non private data to IPFS the privacy problem needs to be solved. We've solved it on the data structure, encryption and sharing level, but need support on the protocol level to solve the rest of it (which is mainly hiding metadata of access patterns).

commented

thoughts on the pros/cons/proposals for baking some of these primitives more deeply into IPFS?

Would love to explore all of these and discuss which primitives would make sense in IPFS!

The list of features/themes in OP are all something we've extensively worked on and solved (more or less) in OrbitDB, specifically the "mutable data / collaboration" primitives, and I could see multiple possibilities to use the components from OrbitDB in IPFS to enable similar use more widely (for example, I remember seeing discussion around "message queue version of pubsub", which eg. ipfs-log could do today). At the same time, OrbitDB would benefit from having some of these primitives baked in in IPFS.

I don't have a particular proposal/plan for any of this atm, but we'd (OrbitDB devs) be interested to participate in defining and collaborating/working on all these. I see it'd be beneficial for the community at large. Looping in @aphelionz and @shamb0t.

@ianopolous I think you're coming at these problems with a much more opinionated and end-to-end solution then I was hoping to tackle here. I think there are some common base layers that if added to IPFS could be useful across the ecosystem and boost cross-application compatibility.

Identity

For identity you either have the 100% decentralized version where you just sign a data structure (publish it in ipns) which can already be done, or if, like we decided in Peergos, you want the same UX as every other social app in the world then you want unique usernames.

These aren't the only options nor are they what's being proposed here. There are many potential decentralized identifiers (DIDs) that comply with the DID spec. Some are essentially public keys (like IPNS), others are blockchain based, others are federated. IPFS should not care what DID method you use to identify users, but it should have some notion of identity internally that is more than just a peerID.

The simplest, and I thought pretty uncontroversial option, is to do what people do with the address books on their phones and emails, associate common names (e.g. Ian) with some arbitrary set of handle (e.g. ianopolous on Github, me@email.com, etc.). Layers on top of IPFS that use their own DIDs (e.g. Peergos, OpenBazaar, Textile, etc.) could then utilize a shared framework.

The idea is that this can lead to a collaborative instead of competitive environment whereby I may have many handles for many IPFS based applications and no individual system must dominate because these identities follow a standard spec (e.g. the DID spec) and can be grouped together on my friends' local machines. This is opposed to currently where it's frequently difficult for applications, even those that utilize asymmetric cryptography for ACLs, to share data with each other unless they utilize the exact same notion and format for identities.

ACLs

Access control is also a mostly solved problem in my opinion. We have solved it in Peergos with a cryptographic capability based system called Cryptree

As I mentioned in the proposal users may choose to simply encrypt their data and publicly share it, relying on the cryptography to protect their data access. However, a solution that is even more secure, simpler to implement, and complements rather then replacing at-rest-encryption is to simply be able to not share data with every party that asks for it. For example, if Alice and Bob want to share data with each other they could encrypt the data with a shared key and make it publicly accessible over IPFS or Alice could just make sure that Bitswap will only be allowed to send the data Bob.

This doesn't deny the utility of protocols like Cryptree, Tresorium, etc. but instead operates at a level below them. For instance, both of those protocols are built around files and file systems but IPFS, despite the "FS" part, can be used for generic IPLD data which may not want their data encrypted in the same way. Additionally, people who want to build systems for mutable and collaborative data such as CRDT-based documents can't really use Cryptree. There are a number of schemes that might be fit for their use cases that could benefit from local ACLs even while open problems such as this are still being worked on.

One answer to every user that asks on the IPFS forums about setting up private networks for their friend groups is to tell them to just use the public network + encryption, which incidentally will help them if they want to pay cloud providers to persist their data. However, emulating Friend-to-Friend(F2F) networks without requiring at-rest-encryption (which parts of the data are encrypted and how is frequently a pretty opinionated issue) is something IPFS should be able to do natively. Incidentally, this could also be nice for creating greater interoperability between IPFS and F2F networks like SSB.

@aschmahmann We are indeed very opinionated. :-) But it's because we've spent years studying and implementing these solutions. (As open protocols that anyone can use)

Identity

I've read a large chunk of the DID spec and in my humble opinion it massively over complicates things unnecessarily. I want a simple scheme, that solves the "add me on Twitter I'm @ianopolous problem" at parties. By simple I mean I want to be able to write mathematical proofs about it's properties and security. Complication is the antithesis of that.

In your uncontroversial option, where are these mappings between names stored? We take a pretty extreme view on security and privacy threat models. In our threat model, if your device is compromised (and you are not logged in). We want the attacker to not be able to learn anything about your data, metadata (file names, sizes, directory structure) and, critically, your social graph. That is all hyper sensitive data upon which lives can depend. Functionally though, that kind of mapping is exactly what we have planned for our address book, except it will be stored in peergos.

ACLs

Please don't think Cryptree is "just encrypting the data". There is sharding, padding, meta data encryption, directory topology obfuscation, the list goes on. Typical (d)app developers should not be writing this kind of code because they will get it wrong. (Along the same lines of don't roll your own crypto)

However, a solution that is even more secure

It is not more secure. You are claiming that a system that removes encryption at rest is more secure. There are many relevant threat models where this false, and this is why most user devices these days encrypt their data at rest. As you say later, it complements at-rest encryption. We actually use this already for a small amount of temporary data (pending follow requests) in Peergos that is sensitive, because it is the only part vulnerable to a large enough quantum computer (because it is asymmetrically encrypted and CSIDH isn't standardized yet). We make anyone who wants to download it prove cryptographically that they are allowed to download it. This leaves it vulnerable to someone with a large quantum computer now, but safe against someone who merely gets a quantum computer in the future.

Additionally, people who want to build systems for mutable and collaborative data such as CRDT-based documents can't really use Cryptree.

This is incorrect. We've designed our structures from the ground up with CRDTs in mind, and indeed we plan on layering a CRDT on top of cryptree for collaborative document editing (inside the encryption). At one point we were considering making cryptree itself a CRDT, but we're not convinced it's possible with sensible semantics for a filesystem. Multiuser CRDTs generally also leak a lot of metadata, which is why we've decided to put them inside the encryption layer.

On the topic of boosting cross-application compatibility. We are very close to being able to let people write a dapp, which consists of nothing but html5 and JS, which is deployed by simply uploading to peergos, and can be made public or shared privately. These dapps get all the benefits of solved identity, privacy, security, access control and huge file support that we have already solved for free, and run in their own sandbox.

I want a simple scheme, that solves the "add me on Twitter I'm @ianopolous problem" at parties. By simple I mean I want to be able to write mathematical proofs about it's properties and security.

I'm pretty sure that solving this problem in a way that gives unique, choosable (e.g. you can choose @ianopolous), and non-consensus names is impossible (i.e. maps to the CAP theorem). Which of these properties is required for a given application is debatable and so creating a single system that solves this problem universally is not realistic. I can see that Peergos is hoping for a consensus-based solution to this problem, and IPFS should respect that option but it's certainly not the only option.

where are these mappings between names stored

On the users' device. This is just a local mapping (e.g. I store your name as "Ian-Peergos" because I already know another Ian, but one of your neighbors might store you name as simply "Ian"). This local mapping doesn't even need to be exposable over IPFS (although enabling users to share parts of this local address book seems reasonable as well). If we're following the rubric from above this gives us choosable names that are non-consensus but also not unique (just like the address book on your phone or email).

As you say later, it complements at-rest encryption.

Yes, given that encryption may be provided on-top of IPFS and all blocks added to IPFS are automatically publicly requestable via Bitswap it seems like adding some ACL layer can be a helpful layer of security.

Typical (d)app developers should not be writing this kind of code because they will get it wrong.

I agree, and I think this is where it becomes clear that IPFS shouldn't take the existence of Peergos for granted in saying we have a solution for sharing of non-public information. The majority of the IPFS stack seems to be designed with flexibility and futureproofing in mind. This means not choosing in advance what the "winning" protocols are going to be more than we absolutely have to. The two main protocols stacks IPFS relies on are IPLD and libp2p. IPLD is flexible to the point of having adversarial interoperability (thanks @agentofuser for the links) with existing hash linking schemes like Git, Dat, SSB, etc. libp2p is similarly being designed to be extremely flexible and adaptable to the future of networking protocols.

If the only way for users to share private data with each other via IPFS (a common request) is for them to utilize Peergos then we've basically added another dependency to IPFS which I think is a mistake. The reasons for this are:

  1. We end up discarding the ability to interoperate with existing F2F/private sharing networks, and those that emerge in the future
  2. I don't think Cryptree is the answer to all private data sharing needs. We may disagree here, but I think it's reasonable to assume that it's unlikely that Cryptree is the best solution we're ever going to have to private data sharing (e.g. I'm sure the authors of the Tresorium paper have their reasons for not just using Cryptree).

We are very close to being able to let people write a dapp, which consists of nothing but html5 and JS

Awesome, looking forward to it!

I'm pretty sure that solving this problem in a way that gives unique, choosable (e.g. you can choose @ianopolous), and non-consensus names is impossible (i.e. maps to the CAP theorem).

You don't need CAP to prove this. Consider a conflict with two concurrent claims of the same name. In such a situation you need to choose 1 of the two candidates for the name, which forms a total ordering on name claims. But any ordering that is not time based can be gamed (say it was sorting by public key, the "smallest" public key could claim all names retrospectively). But if it is time based then special relativity tells us that there is no universal ordering of space-like separated events. This implies the claims can't be space-like separated, which means you need consensus.

I think we agree there are many different possible identity solutions. What I'm saying is that I don't think any of these need more support from ipfs itself (with the possible exception of externally signing objects with keys not exposed to ipfs).

On the users' device.

This is not allowed under our threat model, because it leaks the social graph to anyone who gets access to the data on your device.

If the only way for users to share private data with each other via IPFS (a common request) is for them to utilize Peergos then we've basically added another dependency to IPFS which I think is a mistake.

IPFS is a platform for public data at the moment. If ipfs wants to handle cases like this (that can't be solved with a simple network level ACL) then it needs to focus on privacy first, as I discuss in the other proposal.

This is not allowed under our threat model

That's fine and other application developers don't have to agree that your threat model makes sense for their use case (some may think it's not stringent enough, others may think it imposes too much). If Peergos doesn't want to utilize some of the functionality supported by IPFS why should it have to? On the other hand stifling innovation because it does not fit the Peergos threat model is an unreasonable restriction on the IPFS community.

IPFS is a platform for public data at the moment

So we agree on one of the problems

then it needs to focus on privacy first

But we disagree on which solution is more important to get to first. Adding request anonymity in DHT requests is insufficient to help with general F2F use cases and therefore I'd like ACLs prioritized over anonymity. However, given that those cases don't fit your project's threat model you'd prefer that they be deprioritized.

I agree with you that network level ACLs are simple and can be done in parallel with privacy. That's why I added the caveat "(that can't be solved with a simple network level ACL)"

My strong recommendation for network level ACL is to copy what wireguard does (the model fits perfectly with IPFS).

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.