ipfs / roadmap

IPFS Project && Working Group Roadmaps Repo

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[2021 Theme Proposal] Security model

dchoi27 opened this issue · comments

Note, this is part of the 2021 IPFS project planning process - feel free to add other potential 2021 themes for the IPFS project by opening a new issue or discuss this proposed theme in the comments, especially other example workstreams that could fit under this theme for 2021. Please also review others’ proposed themes and leave feedback here!

Theme description

Solve key questions around IPFS’s security model for personal data, including read/write privacy of nodes and reliance on encryption.

Hypothesis

IPFS cannot be ready for a number of use cases (e.g., sensitive personal data storage) until it has a clear story around read/write privacy, encryption, and other key parts of the security model.

Vision statement

IPFS nodes in public IPFS networks are virtually anonymous if they want to be. Users can store private data on public IPFS networks without fear of anyone misusing the data.

Why focus this year

These are extremely hard, multifaceted problems that might take a long time to solve well (might require fundamental research, etc.) but can unlock a ton of value to the ecosystem - important to get the ball rolling sooner rather than later.

Example workstreams

Create “private browsing/publishing” for IPFS, clarify security model / encryption

Other content

commented

Having this be a "mode" will be technically much more difficult if not impossible, because it means only a small fraction of users (the ones 'in that mode') can possibly be in the anonymity set.

Perhaps it will be more productive to start with user expectations of their privacy in a system, and try to move the design to meet those. For example, it's reasonable for a user to expect that if they search for a file, and then later search for a different file, that some unrelated advertiser can't link those two searches together and create a profile of them. The current discovery design does not guarantee that.

👍 for working on a (or various) security models.

I agree with @willscott that for anonymity it's generally much more useful to have many people in diverse situations involved than just a small set of users who wish to remain anonymous. However, most of the anonymity solutions I'm aware of are frequently expensive or require a degree of trust. I'd like to see the project working with researchers and moving down the path of exploring what types of tradeoffs we can make (e.g. perhaps having users cache/forward content around the network helps in certain scenarios).

Some things that seem doable without much research include:

  • Allowing users to specify ACLs for who they will send data to
  • Making it so IPFS Public DHT nodes do not know the CIDs they're hosting provider records for (e.g. the keys are SHA256(multihash) instead of just multihash)
  • Starting down the path of making it easier to build F2F (friend to friend) networks using IPFS

The hash of multihash thing is a good privacy win for a probable low cost in term of code complexity and maintenance. There will probably be a need for the hash fonction (sha256 in the exemple) to be upgradable, though, requirering a multihash rather than a simple hash.

@aschmahmann Huge plus 1 to this:

Allowing users to specify ACLs for who they will send data to

It is a simple but powerful primitive that can be used to great effect. I'd love to be included in any design discussions if it does happen as I've thought a lot about it and have spec'ed out how it would ideally work for our use case in Peergos. The key thing is it being capability based, not identity based.

The key thing is it being capability based, not identity based.

Wanted to +1 this. Also possibly worth looking at OCAP rather than ACL tables — they're strictly more expressive, and can be made privacy preserving.

Great proposal! I'm part of the dag-cose working group that is specifying an Ipld codec for encrypted/signed dag nodes.
As it is based on the cose standard RFC 8152 it'd allow for builtin key-based access-control ( basically appending an ACL to the message itself )
Once ready, this would address most of the areas in this proposal on the data level instead of the application level which is neat.
If any of you are interested in this, we will open an issue on the ipld/specs repo soon!

Edit:
I also have to mention dag-jose wich is a suboptimal solution mostly because it uses json, but it's further along the standardization path so It's definitely worth checking out!
( the json encoding makes it also easier to debug !)

commented

This topic is something the @textileio team is also extremely interested in contributing it. We are a co-sponsor of the dag-jose (funded) project (which is useful beyond JWE for sure, and is particularly useful for JWS and JWTs expressed as IPLD), and I am now following up on the COSE front, which is as @JonasKruckenberg mentioned is a better final path forward for *OSE-based data level access control. But I also wanted to highlight that both approaches (data level and application level) are going to be needed. Data level access control is indeed neat, but from my perspective there are many use cases where application level access control is the ideal control mechanism. But the cool thing about IPFS/IPLD, is that codecs, if designed for it, are extremely useful here as well.

@oed and I have proposed starting a working/task group to focus on this topic specifically. To date, this has focused on the JOSE/COSE discussions, but I am increasingly interested in seeing a broader discussion of the benefits of app/data level access control. There's never going to be a one-size-fits-all solution, but there might be 3-5 solutions that fit most. And that's certainly more manageable than everyone rolling out their own crypto/access control solutions.

It also seems that, with the folks chiming in here, if we were able to come to some degree of consensus, that we could have multiple language implementations relatively quickly. This would be huge, and if we had enough 👍 from the IPFS team directly, it would be pretty exciting to see some experimental support for crypto-based plugins as a really great first go at some of these ideas.

commented

I also wanted to highlight @aschmahmann 's post above, because it hits on a few points I've been thinking about myself lately. Providing access control at the IPFS level is quite powerful, and pretty "simple".

Related, but perhaps not the same... because libp2p connections are e2e encrypted already, you get a lot of the benefits of "traditional" web security already there. Imagine a simple scenario where someone adds a file to IPFS with an ACL flag. Perhaps the publicly "provide" the record, perhaps they don't. But either way, when a peer comes looking for said CID, it might be reasonable to only respond with the data if the peer matches an ACL. Or even nicer, the requesting peer might have to "prove" (via some handshake protocol) that they are allowed to view the data. These are simpler fixes to access control that might go a long way to providing some level of security with reduced development overhead early on?

Or even nicer, the requesting peer might have to "prove" (via some handshake protocol) that they are allowed to view the data.

This is exactly what I meant by capability based, not identity based. I should be able to give a capability (e.g. a keypair) to any node and it should be able to access said data. It should be doable without any extra network round trips as well.

commented

Great! I'd also like to give a shoutout to @expede's work on the UCAN spec in this regard: https://blog.fission.codes/auth-without-backend/. It's quite simple/clever, and could be mapped on to some of the JOSE/COSE work quite nicely, as it is essentially JWT/JWS compliant (IIRC).

@carsonfarmer Thanks for the link to UCAN, which is interesting. You're probably also aware of our Peergos implementation of server-less access control in IPFS, which achieves similar things (read, write, public-read access) without requiring a server to enforce anything - Cryptree. What we hope to get out of this issue is a 4th level of access control below all the others (mirror) which controls access to ciphertext independently of all our other access control.

commented

Nice, totally agree on that point @ianopolous. By the way, @textileio are big fans of cryptree (I think @fission-suite also uses cryptrees).

The original cryptree was invented in 2008 by Wuala: https://github.com/Peergos/Peergos/blob/master/papers/wuala-cryptree.pdf

@Peergos has improved the metadata privacy of it, adapted it to work in the IPLD/IPFS setting and made it quantum computer attack resistant. Also adding a few cool features like privacy-preserving zero-I/O seeking in huge files.

Solve key questions around IPFS’s security model for personal data, including read/write privacy of nodes and reliance on encryption.

I love this, and as @carsonfarmer mentioned above, is something we (Textile) has been struggling with for quite some time. For example, we have an encryption scheme that currently works on UnixFS, but since the whole network doesn't know how to walk those nodes, they can't be easily replicated.

I realize this problem may not be addressed by this issue, which I take to be more about node-to-node access control, but given the complexity around defining a comprehensive security model for IPFS, I want to offer a slightly different method for converging on a solution.

go-ipld-prime provides a framework for writing custom IPLD codecs. If node owners had the ability to register different codecs via the config file or some other mechanism, you'd give them the freedom to try out different encryption and access control strategies. Perhaps these codecs aren't capable of addressing the whole problem space, but that would just mean allowing for different kinds of plugins.

In other words, allowing users to try out different approaches will enable a more iterative + community driven approach to a solution. Let the nodes speak and the best ideas will bubble to the top ;)

@oed and I have proposed starting a working/task group to focus on this topic specifically.

+1 for that idea! One of the biggest chances Ipfs has is that we can break with everything that didn't work in web2.0 and only keep the good bits.
What I'm worried about though is history repeating itself. We're truly part of the foundational period right now, so we should make sure to get right!
And having a space for discussion dedicated to security and privacy on IPFS would help a lot.

But either way, when a peer comes looking for said CID, it might be reasonable to only respond with the data if the peer matches an ACL. Or even nicer, the requesting peer might have to "prove" (via some handshake protocol) that they are allowed to view the data. These are simpler fixes to access control that might go a long way to providing some level of security with reduced development overhead early on?

@carsonfarmer you might be interested in something in what I talked about in this similar theme: #75.

There's a ton of different ways to do authentication / security / privacy, but if you focus on the simple need to either provide or deny content, you could probably do it using a simple API that gets called before serving a CID. The API simply returns true or false when provided a CID and a list of headers that were passed in with the request. From there it would be up to the node operators to decide how to return true / false. IMO, the least opinionated here the better.

I recognize that not all needs will be solved by something like that, but I have a feeling it would be a decent starting point.

@obo20 +100 That is a great idea, largely because it's very flexible, but also very simple. It would likely be better than what we had in mind as well, because it would let us support post-quantum signatures before ipfs itself understands them.

It might result in fragmentation, but also enables more experimentation which is arguably what we need right now. Well done.

commented

Indeed. This is increasingly my preferred form of "access control". I've actually implemented just this very form of access control in some experimental work using IPLD for data exchange. But there is actually no need for two API calls if your request for a CID can support headers or handshakes already. It works great, and the opaqueness of the responses means you don't leak any metadata about if you actually have the data or not, which is useful/important! So yes, good call for sure.

I think this is actually quite in line with some of the ideas in the original IPFS/BitSwap paper around BitSwap strategies. Certainly it applies to Gateways and direct peer request, but also in normal BitSwap requests. Making BitSwap and things more "pluggable" in this regard could go a long way here.

Nice, totally agree on that point @ianopolous. By the way, @textileio are big fans of cryptree (I think @fission-suite also uses cryptrees).

I started writing a reply clarifying where UCANs fit, but it veered into the weeds (you can still find that post here if interested). TL;DR — UCANs are very general, but for this topic they're aimed at the mutable portion of the stack (IPNS, DNSLink, &c), though we can piggyback and do more checks in one go. Our whitepaper is in a constant state of updating, but more about our approach can be found there.

Mainly wanted to pipe up and say:

  • Cryptrees are awesome
    • Would love to see them adopted more widely / or even directly part of IPFS
    • Various implementations have issues leaking metadata; we think we fixed that
  • Whatever the solution, ideally it maintains a diversity of approaches at higher layers

New Year Call

We'd love to help organize and/or host a call in the new year on this topic 2️⃣0️⃣2️⃣1️⃣🎉 i.e. Cover things like what has everyone built to scratch this itch? Use cases, requirements gathering, alignment, and so on. Thoughts?

@expede It's great to hear someone else saying this! We pioneered the use of cryptree in ipfs for @Peergos (We started working on our implementation in 2013 and had it mostly finished 3 years ago). We have gone to great lengths to protect metadata. In our design the following is hidden:

  • file/dir properties (like name, modification time, etc)
  • the size of the name
  • file sizes
  • whether something is a directory or file
  • who or how many people have been granted access to a file/dir
  • directory topology
  • number of files
  • number of directories
  • number of files + number of directories

As well as all this, it supports efficient and fast modification of arbitrarily large files, zero IO seeking within a large file, and is safe from exposure by a quantum computer.

We've presented this in many talks over the years, but happy to give a deeper dive into our latest design.

commented

A call in January sounds great to hear what everyone has encountered or built in terms of defining security properties they're making use of, and in terms of the ones they'd like to be able to have/enforce.

To throw out a potential structure:

  • 5-10min presentations of what people have built
  • 2-5min per participant laying out what properties would be valuable, and where they'd like to draw lines for security
  • group discussion reconciling and identifying engineering subsystems that would be most useful in addressing these properties.

Throwing out an arbitrary time slot when things ramp up after the holidays, any objections to Wednesday, Jan 13th 4pm-6pm UTC?

That works well for me! @bmann went ahead and made an RSVP-able event linked to our Zoom room https://talk.fission.codes/t/ipfs-ipld-security-encryption-workshop/1319

A call sounds great! Would love to come on and present dag-jose that me and @carsonfarmer have been working on.

I would love to attend as well!

commented

Excellent, count me in as well.

Would it be possible to record and publish this call? I would be very curious to watch, but would not have a lot to contribute by joining directly.

I'd be happy to attend this as well

commented

@nukemandan Yep, can't imagine it'll be a problem to get it recorded and available.

@expede - Thanks for the event link. I don't see an attached zoom room yet when i added it to my calendar though? I'm happy to offer one of the IPFS zoom ones, which is probably the path of least resistance to having the recording end up on youtube on the IPFS channel and such.

commented

Here's a quick doc to sign up in advance to organize getting through the existing mechanisms and systems that exist.

https://hackmd.io/@Qa2ngsClRbioWCFao_pO9A/ByZRwVS3P

@willscott yeah I have the zoom setup but if PL / IPFS can use their Zoom and publish — great. Just wanted to lock in the date.

I'd like to attend too, can give a quick overview of dag-cose and collect feedback for the next iteration!

Awesome proposal! I'd love to attend the call in Jan! 💯

commented

The URL us to use next week (also now referenced in the hackpad & on the fission.codes page) is https://app.veertly.com/v/ipfs-ipld-security-encryption-workshop

commented

I updated the previous URL, but our events team let me know they'd prefer for us to use https://app.veertly.com/v/ipfs-ipld-security-encryption-workshop tomorrow. It makes the lighting talks nicer to record on their side. We'll do those recorded (modulo anyone requesting to opt out :)), and then use a shared jitsi in that same platform for the discussion portion.

@willscott Would it be beneficial to create a github issue somewhere for the per-cid api we discussed during the call? I'd like to make sure that conversation progresses, as it seemed quite a few different parties were interested in this type of functionality within IPFS.

(Is there a recording of the call somewhere? :) )

commented

@willscott Would it be beneficial to create a github issue somewhere for the per-cid api we discussed during the call? I'd like to make sure that conversation progresses, as it seemed quite a few different parties were interested in this type of functionality within IPFS.

Yes, i think so. cc @aschmahmann
An issue in go-ipfs is probably the right place to design and build an initial implementation.

(Is there a recording of the call somewhere? :) )

We're getting it processed. I'll post here once the videos are online.

Yep, I think go-ipfs is a good place to start. As the proposal matures we will probably want to PR a spec into the specs repo.

commented

Recordings from last week are available at https://embed.voodfy.com/60022b90fff4f56e99cb3197

We've also mirrored the lightning talks in Peergos

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.