RFC: Standard Python modules git repo

Question

RFC: Standard Python modules git repo

pfalcon opened this issue 10 years ago · comments

So, I voted many times to not burden uPy with extensive "standard lib" and leave that to the community to produce distributedly-maintained modules.

And yet Python has stdlib, which includes some core, foundational modules written in Python. It may make sense to have "common" repository just for such modules.

To stay with "do not burden" policy, I'd propose to create a separate git repository for them. Besides paradigmatic reasons, there's also pragmatic - I still expect those be treated as individual modules, to be installed by package manager. It will need to fetch git repo then, and main micropython is already too big.

So, I propose to create something like "micropython-lib". Criteria for inclusion should be availability of module in CPython stdlib.

Initial proposed content:

I have very minimal unittest.TestCase impl.
I have some Unix os, fcntl module subsets implemented using ffi module.

Damien George · Answer 1 · Tue Apr 01 2014 06:11:42 GMT+0800 (China Standard Time)

Would be nice to have the tests. I don't mind including more stuff, as long as uPy core itself can stay as lean as possible.

It might actually be a good idea to split the current repository up a bit, since people who want to use uPy on the PC don't really care about all the stm stuff (and parts of it come with some strange license...).

We could have: micropython (for the core implementation), micropython-unix, micropython-stm, micropython-lib.

Paul Sokolovsky · Answer 2 · Tue Apr 01 2014 09:21:08 GMT+0800 (China Standard Time)

IMHO, having core implementation without runnable reference platforms is not much useful, and both unix and stm are such platforms. The only reason for splitting might be indeed licensing issues - but IMHO before they become real problem or there's nothing else to do.

Anyway, I didn't intend to propose to split existing stuff - just create new repo for new stuff? Do you think it makes sense to have "micropython-lib" and could you please create it then? Well, I guess I actually can prototype it in my account, just want to be sure the idea of separate repo for Py lib code is ok.

Torwag · Answer 3 · Tue Apr 01 2014 16:58:19 GMT+0800 (China Standard Time)

A split always has the danger of fragmentation.
I think we would need something more integrative rather then a "bunch of
git repros" on github.
Python PIP does this, Rubys Gem, even LaTeX has its own repro, let alone
Emacs ;)

The problem with uPy is the fact that the core itself is interchangeable
stm, unix, pic, etc.
I believe we need a master plan how to but an umbrella around all this to
avoid people starting developing and implement the same stuff at different
locations.

The only project I know of who fights a similar fight (on a much larger
scale) is the linux kernel.
Could we adapt / learn from there? Having an arch folder containing unix,
stm, etc.? A lib folder for officially supported libs (which again might
need an arch structure e.g. /lib/generic /lib/stm, lib/unix, etc. To
differentiate between different architectures.

Thinking about it more, there is also a need to further differentiate
between different hardware platforms
/arch/stm/generic
/arch/stm/pyboard
/arch/stm/discovery

Those directories could just be a stub containing nothing more then some
makefile instructions or some pin-mapping or in the easiest case a link
back to another folder if it is for now identical to this. However, it
would need a build system which puts all this on the right place. 'make
pyboard' should result in a set of libs and uPy for the pyboard.

Just wondering,if a single repro is ok for something like the linux kernel
why should uPy be splitted accross several repros?

I have to admit, that I do not know exactly how the linux kernel splits
between generic code and platform dependent code. But it looks they found a
way ;)

Just my two cents

Torsten

On 1 April 2014 03:21, Paul Sokolovsky notifications@github.com wrote:

IMHO, having core implementation without runnable reference platforms is
not much useful, and both unix and stm are such platforms. The only reason
for splitting might be indeed licensing issues - but IMHO before they
become real problem or there's nothing else to do.

Anyway, I didn't intend to propose to split existing stuff - just create
new repo for new stuff? Do you think it makes sense to have
"micropython-lib" and could you please create it then? Well, I guess I
actually can prototype it in my account, just want to be sure the idea of
separate repo for Py lib code is ok.

Reply to this email directly or view it on GitHubhttps://github.com//issues/405#issuecomment-39161525
.

Damien George · Answer 4 · Tue Apr 01 2014 18:04:50 GMT+0800 (China Standard Time)

IMHO, having core implementation without runnable reference platforms is not much useful

In the core there would be a minimal unix version, without readline, file IO or anything. Then the full blown unix version would be in micropython-unix.

Thinking about it more, there is also a need to further differentiate between different hardware platforms

This already exists; see stmhal/boards.

If the micropython-lib stuff is only minimal at the moment, why not just put it in the current repo. Then let it grow, see how it evolves with everything else, then split things off later when we have a clearer idea of the structure.

I would vote for 1 big repo, or many small ones.

Paul Sokolovsky · Answer 5 · Tue Apr 01 2014 18:15:59 GMT+0800 (China Standard Time)

@torwag
I'd like to ask to keep discussion focused. This ticket specifically talks about creating a new repo for Python library source, nothing more. I understand that argumentation for dropping everything into one enormous repo may need to pull some outside example, but still, let's keep that focused. If you would like to bring up more generic questions, let's use forum, which is much more suited for discussions.

Answering some points:

more integrative rather then a "bunch of git repros" on github. Python PIP does this

Can't agree with epithets - PyPI (because pip is just a package manager) does it more distributive rather than integrative.

The only project I know of who fights a similar fight (on a much larger scale) is the linux kernel. Could we adapt / learn from there?

Some argue that few good things can be learned from Linux kernel project ;-).

Having an arch folder containing unix, stm, etc.?

If you looked at the source tree, there's already "stm", "unix", etc. directories. So, is your only proposal is to move them under another dir "arch", just because linux kernel has it that way? That's nitpicking on directory structure, sorry.

A lib folder for officially supported libs (which again might need an arch structure e.g. /lib/generic /lib/stm, lib/unix, etc.

"lib" is too broad a term, so it's hard to respond something. For example, my current proposals talks only about Python code which corresponds to CPython's standard lib. That was never supported by MicroPython so far, and I'd like to keep it along that way. (Well, it will be "supported", as it would stay under micropython org is it's accepted, but supported in a different way from main C source and modules).

Thinking about it more

Here we go to completely unrelated matter of how to manage/maintain ports. Besides being unrelated, it was already discussed - there should be tickets open even.

Just wondering,if a single repro is ok for something like the linux kernel why should uPy be split across several repros?

What makes you think that single repo is ok for linux kernel? Kernel has thousands and thousands of forks, so they definitely can't get along with single repo. Speaking about dir structure, first thing you will notice about arch/ folders is that they're regularly get removed. Some time ago, there was a threat from Torvalds that he'd remove arch/arm/ unless mess there will be cleaned up. That's because having arch/arm/ didn't help with "people starting developing and implement the same stuff at different locations." - in each individual subfolder of it people did just that. Now there's stuff like "device tree files" - initially they were dumped into main tree, then there's ideas that they don't belong there and should be split. Etc., etc. So, impression that Linux kernel is ok with single repo is superficial. The only way only way to get that impression is by downloading 50+Mb release tarballs again and again over modem line or fetching enormous git repo over flaky connection - so it fails in the middle, and you need to restart from scratch, ad infinitum. No, wait, then you're unlikely to think that they're "ok" with a single repo! ;-)

Paul Sokolovsky · Answer 6 · Tue Apr 01 2014 18:24:34 GMT+0800 (China Standard Time)

Just wondering,if a single repro is ok for something like the linux kernel why should uPy be split across several repros?

And another variant of the response: I personally don't think that uPy should be split across several repos. This ticket just propose to open new front of work - to implement/collect some subset of Python standard library. This was never (well, so far) the scope of uPy project, and I propose a separate repo to reinforce its "extension" status. Having it in the main repo puts burden on everyone - on the maintainer (he will be blamed if this "lib" sucks), on the contributors to this lib (they will be bashed by maintainer to provide high-quality code), on users (they will need to fetch rather big git repo, 90% content of which is of no use to them), etc., etc.

Paul Sokolovsky · Answer 7 · Tue Apr 01 2014 18:43:47 GMT+0800 (China Standard Time)

I would vote for 1 big repo, or many small ones.

I'd vote for sustainably minimal maintenance overhead. Let's see how it applies. There're ideas about splitting repo. But splitting is always a pain, so defer until unavoidable. However, use this usecase to minimize future work - if there're good reasons for some new part to be separate, let's put it separate. Previous comment summarizes why I think putting it in main repo is not ideal. One argument I forgot - having it in main repo "forces" to think that "lib" being discussed is integral unalienable part of uPy, and that's what I'd like to avoid either - it's just particular implementation (which also intended to be installed as separate fine-grained modules, not as a big "stdlib" which always goes along the interpreter, like CPython has it).

Damien George · Answer 8 · Tue Apr 01 2014 18:51:42 GMT+0800 (China Standard Time)

having it in main repo "forces" to think that "lib" being discussed is integral unalienable part of uPy

I think this is the strongest argument for a separate repo: to keep issues and pull requests contained to their respective place. I can safely ignore the issues in the lib repo since they have nothing (mostly) to do with the stmhal port (which at the moment is the focus).

Also, unix/ and stmhal/ progress together with py/, so it makes sense to have git hashes apply to a snapshot in time when these 3 components all compiled cleanly together. On the other hand, the libraries can evolve separately, at a separate pace.

New repo made.

Keith Rome · Answer 9 · Tue Apr 01 2014 21:47:50 GMT+0800 (China Standard Time)

Not to further beat a dead horse (since you've already made the repo), but another reason for keeping the stdlib in a different repo is that it would contain mostly (all?) pure python code and contributors to one would not necessarily be contributors to the other. And security access granularity is at the repo level.

Paul Sokolovsky · Answer 10 · Wed Apr 02 2014 01:06:20 GMT+0800 (China Standard Time)

@dpgeorge : Thanks! I'm glad you agree it makes sense. I actually don't know if you will want/need to host/ship Python files for KS delivery, but I assume you would do what's needed anyway. And I'm trying to think about wider scope - actually, have been pondering about it for couple of months, and now that you talk about being able to run existing code, and I have a bit more free time for this week, I guess it was worth a try. Success of that approach depends on the ability to provide users to install modules from the lib easily, and I did a nudge in that direction too.

Paul Sokolovsky · Answer 11 · Wed Apr 02 2014 01:15:39 GMT+0800 (China Standard Time)

Ok, one final question to consider is the dir structure of micropython-lib. Having all platform-independent module files in one dir, the platform-dependent in plat-* (CPython convention) is obvious choice, but if we want to package them for PyPI, that's not going to work - it seems that setuptools require package files to be in a separate dir (at the minimum, because there should be "setup.py" for each package).

So, I see no better choice than follow this requirement, so it would be:

unittest/
  unittest.py
  setup.py
os-unix/
  os.py
  setup.py

etc. The only other choice is to have "flat" structure, and separate subdir for setuptools packaging, with symlinks. But that's only more complicated and confusing IMHO.

Anyway, if someone has bright ideas, please speak up - renames are possible, but not really cheap even with git (git log stops are renames by default). In particular, I never had much experience with Python packaging stuff, so may be missing something.

Paul Sokolovsky · Answer 12 · Wed Apr 02 2014 02:16:56 GMT+0800 (China Standard Time)

Well, nope, os-unix is not right way to do it. The whole idea of installable packages is that they should support different platforms - with a single package. But I dunno how to do it now. We essentially need to support cross-install, for example running on unix, install os package variant for baremetal. I very doubt pip/setuptools support that.

Damien George · Answer 13 · Wed Apr 02 2014 04:30:17 GMT+0800 (China Standard Time)

I have no good suggestions. Just start with a flat structure so we can get to work?

Paul Sokolovsky · Answer 14 · Wed Apr 02 2014 04:53:59 GMT+0800 (China Standard Time)

Nope, too low an aim, we'll have modules installable from PyPI before break of the day ;-).

Torwag · Answer 15 · Wed Apr 02 2014 06:25:16 GMT+0800 (China Standard Time)

Hey,

sorry simply got a bit confused with all the other package stuff going on.
As for the Linux project. Guess they get along not too bad and hence, I was
proposing to see what they do right and what could be "ported" to
micropython.
Would be great to have a wiki page explaining/discuss a bit all those
different lib, packages, boards, architectures, repros, etc. and how to put
them all together. It is hard to follow among several tickets and forum
posts.

So no bad feelings, it was not my intention to high-jack this ticket.

On 1 April 2014 12:16, Paul Sokolovsky notifications@github.com wrote:

@torwag https://github.com/torwag
I'd like to ask to keep discussion focused. This ticket specifically talks
about creating a new repo for Python library source, nothing more. I
understand that argumentation for dropping everything into one enormous
repo may need to pull some outside example, but still, let's keep that
focused. If you would like to bring up more generic questions, let's use
forum, which is much more suited for discussions.

Answering some points:

more integrative rather then a "bunch of git repros" on github. Python PIP
does this

Can't agree with epithets - PyPI (because pip is just a package manager)
does it more distributive rather than integrative.

The only project I know of who fights a similar fight (on a much larger
scale) is the linux kernel. Could we adapt / learn from there?

Some argue that few good things can be learned from Linux kernel project
;-).

Having an arch folder containing unix, stm, etc.?

If you looked at the source tree, there's already "stm", "unix", etc.
directories. So, is your only proposal is to move them under another dir
"arch", just because linux kernel has it that way? That's nitpicking on
directory structure, sorry.

A lib folder for officially supported libs (which again might need an arch
structure e.g. /lib/generic /lib/stm, lib/unix, etc.

"lib" is too broad a term, so it's hard to respond something. For example,
my current proposals talks only about Python code which corresponds to
CPython's standard lib. That was never supported by MicroPython so far, and
I'd like to keep it along that way. (Well, it will be "supported", as it
would stay under micropython org is it's accepted, but supported in a
different way from main C source and modules).

Thinking about it more

Here we go to completely unrelated matter of how to manage/maintain ports.
Besides being unrelated, it was already discussed - there should be tickets
open even.

Just wondering,if a single repro is ok for something like the linux kernel
why should uPy be split across several repros?

What makes you think that single repo is ok for linux kernel? Kernel has
thousands and thousands of forks, so they definitely can't get along with
single repo. Speaking about dir structure, first thing you will notice
about arch/ folders is that they're regularly get removed. Some time ago,
there was a threat from Torvalds that he'd remove arch/arm/ unless mess
there will be cleaned up. That's because having arch/arm/ didn't help with
"people starting developing and implement the same stuff at different
locations." - in each individual subfolder of it people did just that. Now
there's stuff like "device tree files" - initially they were dumped into
main tree, then there's ideas that they don't belong there and should be
split. Etc., etc. So, impression that Linux kernel is ok with single repo
is superficial. The only way only way to get that impression is by
downloading 50+Mb release tarballs again and again over modem line or
fetching enormous git repo over flaky connection - so it fails in the
middle, and you need to restart from scratch, ad infinitum. No, wait, then
you're unlikely to think that they're "ok" with a single repo! ;-)

Reply to this email directly or view it on GitHubhttps://github.com//issues/405#issuecomment-39190005
.

Paul Sokolovsky · Answer 16 · Wed Apr 02 2014 06:57:34 GMT+0800 (China Standard Time)

@torwag: Sure, your comments are appreciated. But we indeed need to do it piecewise, otherwise "hard" tasks will be put off further and further. (And I wouldn't like this particular issue to be seen as forced - for me it's result of couple of months of pondering, based on discussions here and there.)

Regarding wiki, currently all this stuff is in flux, so wiki will require a lot of maintenance to be useful and not yet another source of confusion. You're welcome to create and maintain pages you think are useful. But I'd still think that forum is better place for discussions and bringing up "generic" topics - a typical thread's lifetime is longer than a ticket's, and it's "self-maintained" as long as participants post new info. IMHO, we underuse forum, so I'd welcome opening new topics (can't open them all myself though! ;-) )

Paul Sokolovsky · Answer 17 · Sat Apr 05 2014 02:52:40 GMT+0800 (China Standard Time)

Ok, I don't want to create another ticket, so reopening this.

If we talk about running existing code, it's unrealistic to think both: 1) that somebody will write all needed modules from scratch; 2) that it will be possible to run modules from CPython stdlib as is.

So we'll need to "steal" modules from CPy stdlib, patch them, etc. That's another reason I wanted to have a separate repo - to be more comfortable about license zoo.

Comments?

Paul Sokolovsky · Answer 18 · Mon Apr 07 2014 00:26:46 GMT+0800 (China Standard Time)

Ok, no comments, I assume noone has better ideas ;-). I'm going to push patches types and copy modules then.

In that regard, I tried to consider @dpgeorge's initiative to switch to Python 3.4, but right in types.py it has code additions useless (hopefully) to us. So, using 3.3.3 as a reference instead.

Damien George · Answer 19 · Mon Apr 07 2014 02:45:52 GMT+0800 (China Standard Time)

Yes, sorry, no good ideas at this point.

Paul Sokolovsky · Answer 20 · Sun Apr 13 2014 07:24:26 GMT+0800 (China Standard Time)

I thought about better organization of the repo, and had to rebase it. If you checked it out previously, please re-checkout.

Now there's a vendor branch, cpython-3.3.3 which tracks pristine files imported from CPy stdlib. All new files added to it, and then merged to master branch where actual changes happen.

This will allow to see all changes done for uPy, and help port them to a new upstream library. Actually, to make that truly organized, there should be upstream branch, branch for patching upstream stuff, and branch for developing stuff specifically for uPy, and then of course integration branch. But who would manage all that? ;-)

Paul Sokolovsky · Answer 21 · Tue Apr 15 2014 07:16:29 GMT+0800 (China Standard Time)

Please see my revelations regarding namespace packages here: #298 (comment)

The usecase with which I came to it was actually not "http" package (yet), but "collections". Let me start with saying that CPython's collections package is pure bloat - has most stuff in 40K __init__.py, so it has hard time even being compiled in default 128K heap (and I'd really like to think about MCU usage for such core classes as collections). Then, it uses metaclasses, etc. So, we want out own implementation (or maybe pick up some classes from older versions of CPy!).

Ok, but then we don't want to have everything in file, we want to fine-grainedly install it, for example, if I need just defaultdict, I should be able to pull just it. I first wanted to achieve that with putting each class into own package, so one would use "from defaultdict import defaultdict", with "collections" being just umbrella package which would re-export individual modules.

But then I figured that would be just the usecase for namespace packages. Unfortunately, I made a thinko - what I wanted to achieve is per-class separation, but unit of installation and import is module. So, I ended up with "from collections.defaultdict import defaultdict" instead of expected "from collections import defaultdict". So, that's not much different from "from defaultdict import defaultdict", only longer, so I'm not sure it was worth doing in the first place ;-I.

But how to get compliant while staying unbloated? Need to do something in collections/__init__.py Unfortunately, Python doesn't have something like "import all symbols from all submodules" (that would be something like "from .* import *"), so to support any set of specific preinstalled modules, would need to enumerate all possibilities, wrapped in "try/except ImportError".

Ideas/comments? ;-)

Paul Sokolovsky · Answer 22 · Tue Apr 15 2014 07:26:46 GMT+0800 (China Standard Time)

So, what's written in last para would be "micropython-collections" distpackage, with the meaning "provide stdlib-compliant interface to any other installed micropython-collections.* packages". All needed packages would need to be installed separately, e.g. "pip-micropython micropython-collections.defaultdict micropython-collections".

Them there would be "micropython-collections-all", which would pull complete micropython-collections* in one go.

Neon22 · Answer 23 · Tue Apr 15 2014 12:56:37 GMT+0800 (China Standard Time)

Its a tricky issue to solve - especially to stay compliant.
I have my non-compliant solution which requires an extra step: The python-flavin module which uses vulture and rope to scan all import dependencies, then tracks which functions and variables get used, then copies them into a fresh local dir to import.
Thereby ignoring bloat because unreferenced python functions are not copied and therefore take up no room.

The disdavantage is - the extra non-standard compliant copy ops and the two step workflow - especially if you have to go back and debug (ideally in the original not the stripped copy).

But I wonder if we can amalgamate this idea into a directive which did this at 'compile/assembly' time so it was more transparent and less hacky ?
Especially if building for the pyboard where space is at a premium ?

Example:
As an import file:

import collections
if __name__ == '__main__':
    a = collections.OrderedDict(('first', [1,2,3]),
                                ('sec', [4,5,6]))
    print a

processing:

Found import collections as None
Found import _abcoll as None
Found import sys as _sys
Found import heapq as _heapq
Found import sys as None
Found import bisect as None
Import paths:
 C:\Python27\lib\collections.py
 C:\Python27\lib\_abcoll.py
 C:\Python27\lib\heapq.py
 C:\Python27\lib\bisect.py
C:\Python27\lib\_abcoll.py:196: Unused function 'isdisjoint'
C:\Python27\lib\_abcoll.py:226: Unused function '_hash'
C:\Python27\lib\_abcoll.py:273: Unused function 'remove'
C:\Python27\lib\_abcoll.py:580: Unused function 'reverse'
C:\Python27\lib\bisect.py:47: Unused function 'insort_left'
C:\Python27\lib\bisect.py:67: Unused function 'bisect_left'
C:\Python27\lib\collections.py:115: Unused function 'iterkeys'
C:\Python27\lib\collections.py:149: Unused function 'setdefault'
C:\Python27\lib\collections.py:194: Unused function 'fromkeys'
C:\Python27\lib\collections.py:220: Unused function 'viewkeys'
C:\Python27\lib\collections.py:224: Unused function 'viewvalues'
C:\Python27\lib\collections.py:228: Unused function 'viewitems'
C:\Python27\lib\collections.py:237: Unused function 'namedtuple'
C:\Python27\lib\collections.py:292: Unused variable 'numfields'
C:\Python27\lib\collections.py:294: Unused variable 'reprtxt'
C:\Python27\lib\collections.py:345: Unused attribute '__module__'
C:\Python27\lib\collections.py:439: Unused function 'elements'
C:\Python27\lib\heapq.py:141: Unused function 'heappush'
C:\Python27\lib\heapq.py:323: Unused function 'merge'

Cleaning up import files
Unused:
['isdisjoint', '_hash', 'remove', 'reverse', 'insort_left', 'bisect_left', 'iterkeys', 'setdefault',
 'fromkeys', 'viewkeys', 'viewvalues', 'viewitems', 'namedtuple', 'numfields',
 'reprtxt', '__module__', 'elements', 'heappush', 'merge']
Resulting in:
  ('collections_stripped.py', '_abcoll_stripped.py',
     'heapq_stripped.py', 'bisect_stripped.py')
Minimal master is: test_collections_minimal.py

which produces a python file with suffix '_minimal' and (soon) stripped import files suffixed with '_stripped' which are imported by renamed function in the 'minimal' file.

Paul Sokolovsky · Answer 24 · Sun Apr 20 2014 02:22:00 GMT+0800 (China Standard Time)

@Neon22 :

I have my non-compliant solution which requires an extra step: The python-flavin
But I wonder if we can amalgamate this idea into a directive which did this at 'compile/assembly' time so it was more transparent and less hacky ?

It boils down to a question: a) do we want to reuse standard Python tools (pip), or b) do we want to write our own package manager. If we go path b), we can add any extra neat features etc. But who exactly will start such project and who and when will finish it? I personally think that path b) is not viable, and then we'd need to keep tools like python-flavin separate from basic package installation process.

But as I mentioned, I consider python-flavin as neat tool, and glad to hear you keep working on it! Can you please consider opening a thread about it in http://forum.micropython.org/viewforum.php?f=5 subforum? (Ditto for you other uPy-related projects.)

Paul Sokolovsky · Answer 25 · Sun Apr 20 2014 02:28:38 GMT+0800 (China Standard Time)

Now there's a vendor branch, cpython-3.3.3 which tracks pristine files imported from CPy stdlib. All new files added to it, and then merged to master branch where actual changes happen.

I'm afraid this experiment didn't work - I had to do few more rebases, and it was quite a chore with with branches and merges. Now I find that a just forget to add CPy files on corresponding branch. As nobody still stepped in to help with micropython-lib, I'm going to optimize my efforts.

Paul Sokolovsky · Answer 26 · Sat Apr 26 2014 21:58:17 GMT+0800 (China Standard Time)

I now opened a forum thread for this, let's have further communication there: http://forum.micropython.org/viewtopic.php?f=5&t=70

githublaohu · Answer 27 · Sun Oct 12 2014 14:19:45 GMT+0800 (China Standard Time)

A: hello!I am a beginner, there are several primary problems under the excuse me, does micropython support windos7 platform, micropython dome in the directory below?

James Mills · Answer 28 · Wed Apr 29 2015 08:21:10 GMT+0800 (China Standard Time)

Just noticed this hit PyPi. I'm curious (and will look into this) how much of this module is "dummy" and how much is useful "runtime information".