Bytecode tools are combination of multiple necessary modules to play with Python bytecode. Most of the bytecode related modules are version specific and they won't support other python versions.
Bytecode won't be same across versions, new opcodes will be added or opcodes will be removed or modified, so python standard library modules won't work with other version generated bytecodes.
Let's say marshal
module, it's purpose is to serialize
or deserialize
code objects, they are version specific. Like wise dis
module, there is a heavy difference in disassembling bytecode
to wordcode
AND opcode differences make this version incompatible.
Our goal is to make bytecode_tools
work with any Cpython
version, Aim is to build below tools.
Tool | Purpose | Status |
---|---|---|
unmarshal | deserialize the code obejcts. | Completed |
pydis | Disassembler for any Cpython version. | completed |
pycdecode | Decoder for bytecode chache files (pyc files) |
Completed |
hackdis | Hack the disassembled bytecode | WIP |
decompiler | A decompiler for Cpython x.x | Planned |
-
Clone this repo and
python setup.py install
or usepip install .
-
Install directly from PyPi
pip install bytecode-tools
pydis
is a python disassembler, it can be a drop in replacement for cpython's
Lib/dis.py
.
Pydis supports all the cpython versions above 2.5, every verion above 2.5 supports other versions. This means, pydis decodes 2.6 bytes code in 3.6 and vice versa.
Python's dis
moduel is super helpful for looking inside code objects, but it
won't support other python versions. If the code object is created through
python 3.5
and try to disassemble with python3.6
, it won't work.
Each python version gets changes to opcodes, there will be new ones added and few are deleted. Unless you recreate the code object with new python version, the same code object can't be interpreted with old versions.
Disassemble a statement.
>>> from bytecode_tools import pydis
>>> pydis.dis("a=1")
1 0 LOAD_CONST 0 (1)
2 STORE_NAME 0 (a)
4 LOAD_CONST 1 (None)
6 RETURN_VALUE
Disassble a function object.
>>> def foo():
print(123)
a = 1
b = 2
c = a + b
return c
>>> pydis.dis(foo)
2 0 LOAD_GLOBAL 0 (print)
2 LOAD_CONST 1 (123)
4 CALL_FUNCTION 1
6 POP_TOP
3 8 LOAD_CONST 2 (1)
10 STORE_FAST 0 (a)
4 12 LOAD_CONST 3 (2)
14 STORE_FAST 1 (b)
5 16 LOAD_FAST 0 (a)
18 LOAD_FAST 1 (b)
20 BINARY_ADD
22 STORE_FAST 2 (c)
6 24 LOAD_FAST 2 (c)
26 RETURN_VALUE
To get the all the bytecode instructions.
>>> pydis.instructions(foo.__code__)
[LOAD_GLOBAL, LOAD_CONST, CALL_FUNCTION, POP_TOP, LOAD_CONST, STORE_FAST, LOAD_CONST, STORE_FAST, LOAD_FAST, LOAD_FAST, BINARY_ADD, STORE_FAST, LOAD_FAST, RETURN_VALUE]
Like wise other dis
module options are available with pydis
.
pyc
files are python bytecode cache files, they will be used in eval
loop at the interpretation
stage. Compilation
phase output gets serialized into pyc
file along with some meta data to identify the python version and the time it was created. pyc
files are heavily version specific, this means pyc
file generated from one version of python won't be intrepreted
with other version.
The meta data which has been serialized into to pyc
files are identifiers for invalidating the pyc
file incase if the source file is newer or the interpreter version has changed. This is mailny because of the incompatibility of the bytecode across versions. New opcodes
will be added or deleted from version to version.
Bytecode tools pycdecoder
helps with deserializing bytecode object and parse the metadata with any version of cpython
.
Let's say you've a python file test.py
with the below statements.
a = 1
print(a)
If you do python -m test.py
, based on the versioon that you use. You'll get a test.pyc
or __pycahce__/test_cpython-37.pyc
(From pyhton3 pyc files are cached in __pycahce__
dir).
I'm using python37 here, I've got the cache file under __pycahce__
>>> from bytecode_tools import pycdecode
>>> pycdecode.showpyc('<path_till_here>/__pycache__/test_pycdecoder.cpython-37.pyc')
Magic : 3394
timestamp : 2019-05-23 02:28:58
Size : 15
Bytecode :
1 0 LOAD_CONST 0 (1)
2 STORE_NAME 0 (a)
2 4 LOAD_NAME 1 (print)
6 LOAD_NAME 0 (a)
8 CALL_FUNCTION 1
10 POP_TOP
12 LOAD_CONST 1 (None)
14 RETURN_VALUE