LyricZhao / DLMO

A memory optimizer in deep learning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DLMO: Deep Learning Memory Optimizer

How to use

Step 1: Get traced function and profiling information

a. Trace:

# `func` is the function to be traced 
traced_func = parrots.aot.trace(func, input, target)

b. Profiling:

parrots.runtime.profile()
parrots.runtime.profile_memory()

c. Run it:

loss, acc = traced_func(input, target)

d. Save IR and profiling information:

parrots.aot.save(traced_func, 'function.json')
# Quit parrots and get `profile_memory.json` and `profile.txt`

Step 2: Generate merged IR JSON

a. Put original IR JSON function.json, timeline profiling result profile.txt (please rename to timeline.txt) and memory profiling result profile_memory.json (please rename to memory.json) into one directory under data, like:

├── data
│   ├── resnet152-32
│   │   ├── function.json
│   │   ├── memory.json
│   │   └── timeline.txt
│   ├── resnet50-32
│   │   ├── function.json
│   │   ├── memory.json
│   │   └── timeline.txt
│   ├── processor.py

b. Run processor.py (maybe failed, please debug by yourself)

python3 processor.py

c. You should get like this:

├── data
│   ├── resnet152-32
│   │   ├── function.json
│   │   ├── memory.json
│   │   ├── pattern.json
│   │   └── timeline.txt
│   ├── resnet50-32
│   │   ├── function.json
│   │   ├── memory.json
│   │   ├── pattern.json
│   │   └── timeline.txt
│   ├── processor.py

Step 3: Compile and Run!

a. Compile with CMake:

mkdir build
cd build

# Debug
cmake .. -DCMAKE_BUILD_TYPE=Debug
# Release (about 10x faster than debug binary)
cmake .. -DCMAKE_BUILD_TYPE=Release

make

b. Run:

# Usage: dlmo <input> <output> <limit>
./dlmo ../data/resnet152-32/pattern.json optimized.json 2.1GiB

c. The output should be like (you may have to read the source code and adjust the parameters):

Running case ../data/resnet152-32/pattern.json (2356 operators) with optimizer (limit 2.100000 GiB) ...
 > Start back-tracing search from source (peak memory: 8.019928 GiB, total time: 229.738614 ms)
 > Progress (300): 2.660553 GiB, 256.848531 ms
 > Progress (600): 2.094578 GiB, 276.179152 ms
 > Progress (900): 2.094578 GiB, 276.605152 ms
 > Progress (1200): 2.082615 GiB, 276.739483 ms
 > Reach search limit, stop searching
 > Result:
   > Schedules searched: 1500
   > Time used: 26677.157000 ms
   > Best: {peak memory: 2.094578 GiB, total time: 274.821156 ms}
   > Satisfy memory: true
 > Writing result into path optimized.json ... OK!

d. The newly generated IR JSON is named as <output>.

About

A memory optimizer in deep learning


Languages

Language:C++ 99.4%Language:Python 0.6%Language:CMake 0.0%Language:Shell 0.0%