Next Steps
ntninja opened this issue · comments
Since #1 is clogged with all the many comments I open a new issue here. Feel free to continue the discussion below and I'll keep the following updated as things develop. Also feel free to create separate issues / repos to coordinate and I'll add the relevant links below.
Next steps (networking, stalled – please see the “storage” section below):
- We could use some documentation for the
py-libp2p
library- See issue libp2p/py-libp2p#35 for status
- Current work: libp2p/py-libp2p#330
- Documentation for the Bitswap protocol would be very useful as well – and not just for py-ipfs
- See issue ipfs/js-ipfs-bitswap#21 for status
- Currently available information:
- Finally, some documentation on the currently used DHT would be nice
- See https://github.com/ipfs/camp/blob/master/DEEP_DIVES/02-scaling-up-the-dht.md for some relevant links and current developments
- Implement the Bitswap protocol in Python 3
- Current
py-ipfs-bitswap
library: https://github.com/AliabbasMerchant/py-ipfs-bitswap - Current status/discussion: AliabbasMerchant/py-bitswap#1
- → Having this would allow us to implement the
ipfs block *
API for fetching blocks of nodes we are connected to – fetching blocks of non-connected nodes needs the DHT. - To interact with go-IPFS you can start it with
ipfs daemon --disable-transport-encryption
, but note that you will not be able to connect to any regular peers until one of the transport encryption methods is implemented
- Current
- (Low priority) Improve the
multistream-select
code ofpy-libp2p
to support actually dialing other nodes (MOSTLY FIXED UPSTREAM –ls
is still missing and anmss-nc
implementation could still be useful)- Write a
mss-nc
like utility on top of this code to demonstrate that you are able to connect togo-ipfs
nodes and negotiate - Here's some very simple sample code demonstrating the main mode of MSS:
import socket s = socket.socket(socket.AF_INET) s.connect(("127.0.0.1", 4001)) # The connect will already exist s.sendall(b'\x13/multistream/1.0.0\n') # Send your supported version of MSS s.recv(1024) # → b'\x13/multistream/1.0.0\n' – Receive supported version of MSS by other party & validate! s.sendall(b"\x0d/secio/1.0.0\n") # Request the protocol you'd like to upgrade too s.recv(1024) # → b'\0x0d/secio/1.0.0\n…' – Confirmation that protocol is available + Protocol data OR # → b'\0x03na\n' – Protocol was Not Available
- The binary values at the start are varints and you need to read them byte-by-byte until you're done decoding them, then read the remainder of each message lines based on the received length value
To do this you'll need to create an async stream based version of https://github.com/fmoo/python-varint/blob/master/varint.py - Additionally there is also a special
ls
mode in which MSS will return a list of supported protocols, see https://github.com/multiformats/multistream-select/blob/master/README.md for the complete spec - Subtask: Figure out how to actually dial a node using py-libp2p and document this using an example program.
- Write a
- Implement SecIO libp2p transport security (easier than TLS, but will be phased out eventually)
- You'll need to coordinate with the py-libp2p guys on transport security modules are added exactly
- Some background with crypto is highly recommended!
- Implement DHT peer lookups in
libp2p
- Currently available documentation: https://github.com/libp2p/specs/tree/8b89dc2521b48bf6edab7c93e8129156a7f5f02c/kad-dht
- Go Implementation: https://github.com/libp2p/go-libp2p-kad-dht
- JavaScript Implementation: https://github.com/libp2p/js-libp2p-kad-dht
- For the current status see: libp2p/py-libp2p#150 (also see the referenced PRs on that issue)
- Interesting PRs: #129, #153, #157
- Contacts: @alexh @zaibon @zixuanzh
- (Stretch goal) Factor out https://github.com/zixuanzh/py-libp2p/tree/master/protocol_muxer into a separate
py-multistream-select
library and updatepy-libp2p
to use it (Easy!, stalled – needs your help!)- Current
py-multistream-select
library: https://github.com/dheatovwil/py-multistream-select - Current status: libp2p/py-libp2p#101
- Current
- (Stretch goal) Implement TLSv1.3 libp2p transport security
- You'll need to coordinate with the py-libp2p guys on transport security modules are added exactly
- Some background with crypto/X.509/TLS is highly recommended!
Next steps (storage, simpler):
- Port https://github.com/ipfs/py-datastore to Python 3
(Suggestion: Use Python'slib2to3
and just drop Python 2 entirely.)Current port: https://github.com/dheatovwil/datastore
- Convert datastore to use async/await using some library
(maybe https://pypi.org/project/aiofiles/ ?) for file accessThetrio
framework is used for async I/O now - Implement a https://github.com/ipfs/go-ds-flatfs compatible backend for the above library
- Write a minimal
py-ipfs
“implementation” that can fetch blocks from the local$IPFS_PATH
directory and expose them with an API similar to what https://github.com/ipfs/py-ipfs-http-client currently offers (goal here is to eventually have a drop-in replacement)- In progress by @Alexander255 (no public code yet, most work happens in py-datastore)
- Implement a simple Python HTTP server that emulates the
block/{get,put,rm,stat}
API that serves blocks from the local$IPFS_PATH
directory- Recommendation: Use the
trio-quart
ASGI web microframework for this. (Whatever you choose it will have to be compatible with trio as that is the AIO framework used in the stack.)
- Recommendation: Use the
- (Stretch goal) Implement a badgerds compatible backend for py-datastore
- There is an issue requesting Python bindings for the Go library, but no work has been done yet:
dgraph-io/badger#984
- There is an issue requesting Python bindings for the Go library, but no work has been done yet:
- Beyond: Start integrating IPLD to expose the UnixFS files stored in those raw blocks…
I've ported datastore to py3 (passed the existing test cases) at dheatovwil/datastore
@dheatovwil: I saw your message, but I didn't respond in text (the OP was updated through). Sorry for this! I've updated the OP to outline what I believe should be happening next in order for this to become useful in terms of py-ipfs
. In particular I've put up the need to make datastore
async next: While we don't really need this right-now, fixing this later would be a pain in the *** – so let's do it now while we're breaking stuff anyways.
It's also much easier than it may sound: Start at the filesystem implementation and think about every place were we're currently doing a system call that may block (such as open
, read
, write
, recv
, send
, stat
, fsync
, …) and replace each these with their respective async equivalent (https://pypi.org/project/aiofiles/ will be needed for this). The replacement functions inserted will now return coroutines however so they will need to be prefixed with the await
keyword and their surrounding function will have to be marked async
. This in turn will make those functions return coroutines as well, so you'll need to do the same thing with each function that calls them. Once your done updating each function in this cascade as well as all unit tests (same story there), you're done.
@Alexander255 I would like to work on the bitswap implementation.
Although, I would require constant help, guidance and advice and the progress may be quite slow.
Resources I know of:
Specs: https://github.com/ipfs/specs/tree/master/bitswap
Go implementation: https://github.com/ipfs/go-bitswap
JS implementation: https://github.com/ipfs/js-ipfs-bitswap
I would work using a bottom-up approach.
@AliabbasMerchant: Any help is appreciated! Particularly on that front!
And I'll say it up front: It's not going to be easy, I'll help were I can, but you'll may have to do some reverse-engineering of the source code and definitely is going to involve some guesswork – there is no close-to-final spec and it shows (so ideally document any and all findings in whatever form while you're at it).
(One important thing I also realized however, while writing this reply is that we do not actually have any implementation for transport security yet and hence all data will have to sent as plain text; at the current stage of development I don't believe this is problem however as it allows for better debugging.)
The first thing required will be establishing a libp2p
connection to a known peer on localhost
: There is an example available for this. First negotiate for the /plaintext/1.0.0
MSS protocol (that is: no encryption), then for /ipfs/bitswap/1.0.0
(I think!). After this (with some luck) you should have established a bitswap connection. go-ipfs
will however reject any attempt of establishing an unencrypted connect unless you start it as ipfs daemon --disable-transport-encryption
, so be aware.
Using this connection then, try sending a block request Protobuf message (see also the protobuf
library), requesting a block you know exists in the remote server and dump any response packets you may receive.
It probably won't work exactly the way I described, but should be close. Please don't hesitate to ask any questions you may have and I'll try to answer them to the best of my abilities. 🙂
Sure.
Looks like a perfect task for me!
I will try my best to document everything that I find.
I will work here: https://github.com/AliabbasMerchant/py-ipfs-bitswap
@Alexander255
I noticed, IPFS has numerous repos for JS. Even some small codes (no doubt, important ones) have their own repo.
(For example: https://github.com/ipfs/js-ipfs-block-service, https://github.com/ipfs/js-ipfs-unixfs, https://github.com/ipfs/js-ipfs-block, https://github.com/ipfs/js-datastore-fs)
Do we want to do the same for Python, or should we put them all in this repo???
I need the python versions of some of the above for py-ipfs-bitswap. So I am making them.
Should I make new repos for them?
Also, 1 more thing.
What is the exact purpose of py-ipfs?
We all know python is slow. So we are definitely not trying to replace the go and js versions.
So, what is the exact goal??
If we know the goal, we can write code and documentation accordingly.
@AliabbasMerchant: Don't read to much into it, it's not uncommon in JS that every subroutine ends up in a different package. It's not uncommon to have packages such as is-buffer
, that then look like this (and I'm quoting a real module here):
module.exports = function isBuffer (obj) {
return obj != null && obj.constructor != null &&
typeof obj.constructor.isBuffer === 'function' && obj.constructor.isBuffer(obj)
}
(The end.) Apparently JS people like it that way and everybody does it there, so it's not surprisingly that JS-IPFS would do it to.
In Python it's more common that packages are written for groups of related functionality instead so packages are bigger but more versatile.
TL;DR: Never mind that JS has subpackages for everything, in Python everything Bitswap and up can be one package (py-ipfs).
BTW: datastore-fs
already exists as part of https://github.com/dheatovwil/datastore but benefit from being made async (no need to rewrite it from scratch).
Also, 1 more thing.
What is the exact purpose of py-ipfs?
We all know python is slow. So we are definitely not trying to replace the go and js versions.
So, what is the exact goal??
If we know the goal, we can write code and documentation accordingly.
Python support will be very important. It's one of the most popular scripting languages after JavaScript and used by numerous projects. To name just a few from my area of knowledge: The Blender 3D animation software and Godot game engine both use Python (named GDScript for the later) for addons and development. With a native Python implementation of IPFS, the daemon could be included in such software allowing it to directly work with files within the IPFS network... this is just one huge advantage I can immediately point out.
@AliabbasMerchant: Thank you for bringing up this important subject! I'll try to answer from my perspective, but do remember that different people have different visions of what this software should actually be. The discussions of #40 and #1 are good illustrations of this I think.
My vision for Py-IPFS:
A client-oriented Python library for accessing data from the IPFS network and caching it locally, that should be easy for any Python application to embed and ship. Basically a quick way to access (and by extension share to) the IPFS network from Pure-Python without running a full-blown daemon.
As for speed: Similar to Py-ETH the main goal of Py-IPFS should be readability and ease of auditing, not so much raw through-output. I would expect things to be pretty fast anyways when run on PyPy with some minor optimizing done (CPython will always be slow, but it's not the only Python implementation thankfully), but it's not the main goal.
I'm interested in your, and others!, vision as well however.
I have started with python bitswap.
Still under development, but please check it out and give your valuable feedback here: AliabbasMerchant/py-bitswap#1
/cc @Alexander255
Out of sheer morbid curiousity, has anyone thought about re-thinking this library as a light wrapper around the C++, Rust, or Go implementations instead? (Using C API, PyO3, or gopy)
Python is a great systems integration language, but a pure Python implementation seems like it'd be very slow and lagging behind the other compiled implementations with more funding.
Also, I am aware of the HTTP client library... just seemed like a direct integration with Python bindings might be safer with less overhead than communicating over HTTP. I've not had a great experience with the HTTP client either.
Is this still happening or is it completely abandoned?