RFC: Introducing "import- vs run- time" semantics mode to Python

Question

RFC: Introducing "import- vs run- time" semantics mode to Python

pfalcon opened this issue 6 years ago · comments

One of the biggest (performance) issues with Python is (my term) overdynamicity - the fact that many symbols in a program a looked up at runtime by symbolic name. This includes: global variables and functions, module variables and functions, object attributes and methods. (Almost the only exception is that local function variables are optimized and accessed by "address" (more specifically, by offset in function stack frame)).

Such a semantics allows to override and customize many aspects of the language, but at the same time, leads to runtime inefficiency. But following are well-known facts:

Majority of applications just never override symbols in other modules.
Of those which do, majority do that once at the application startup (while "setting up application environment").
Remaining would be quite specialized applications, either belonging to toolset (test runners, profilers, etc.) or applications which work around something instead of implementing/fixing properly.

Formalizing to Python semantics, following optimization approach can be proposed:

During import time, a particular module can modify runtime environment (including overriding symbols in other modules).
However, at runtime, such modifications are not allowed.
These rules apply to all modules comprising a particular application recursively. I.e. there's a clear "import-time" phase vs runtime phases of application lifetime. Note that this rules out runtime imports (indeed, imports modify runtime environment, but it should be settled by the time when runtime phase starts).

Note also that "import time" is effectively corresponds to "compile time" in other languages. Indeed, cached bytecode files are produced during import phase, and they are produced by compiling source into the bytecode. But with conventional Python semantics, compiled bytecode has an implicit "module initialization function". That's required to allow both conventional semantics and modularity. For example, module init code can (and indeed, often does, per p.2 above) override symbols in other modules, so this has to be captured as imperative code. But the proposed new semantics effectively requires executing module init code during import time, and capturing effects of it. As effects can extend beyond the current module to the whole runtime environment, implementing the new semantics would require whole-program approach.

Paul Sokolovsky · Answer 1 · Sat Jan 05 2019 18:15:17 GMT+0800 (China Standard Time)

From the above, it's clear which constraints are put under the code:

Any function and globals definitions should be done in module init code.
Any class definitions should be done in module init code.
Any overridings of symbols in other modules should happen in module init code.

Note that "globals" is particular case of module name space, "globals" are just namespace of current module, with "builtins" module fallback.

As an example, suppose we want to override builtin print(). Code not compliant with the proposed approach:

import builtins

def my_print(*args, **kwargs):
    pass

def install_my_print():
    builtins.print = my_print

Compliant code:

import builtins

def my_print(*args, **kwargs):
    pass

builtins.print = my_print

Paul Sokolovsky · Answer 2 · Sat Jan 05 2019 18:21:54 GMT+0800 (China Standard Time)

It should be noted which symbolic accesses can be optimized by this approach:

global variables and functions, module variables and functions - yes, these will be fixed at the end of import time, and thus could be accessed by address instead of symbolically.
object attributes and methods - no, these requires dynamic dispatch and thus dynamic lookup of attribute/method in an object whose type is known only at runtime. Optimizing this would require static type inference, and would be a next stage of optimization beyond the scope of this proposal.

Paul Sokolovsky · Answer 3 · Sun Dec 29 2019 06:49:14 GMT+0800 (China Standard Time)

To clearly separate import-time from run-time, we'd need to add to implement a special kind of "main" function to call after import phase if over. Turns out, many good things like this were already considered, but some were rejected: https://www.python.org/dev/peps/pep-0299/ "Special __main__() function in modules".

Paul Sokolovsky · Answer 4 · Wed Jan 01 2020 08:49:59 GMT+0800 (China Standard Time)

https://en.wikipedia.org/wiki/Multi-stage_programming