Welcome to nucleus7 - library for exchangeable and reproducible development of Deep Learning models built on top of tensorflow!
- Why and when to use it?
- Installation
- Glossary
- Project structure
- Nucleotide and co.
- Data flow
- Model components
- Metrics and KPIs
- Training and inference Coordination:
- nucleus7 project
- Nucleotide development
- Tutorials
- Mlflow integration
- Known bugs
- Documentation
- Contribution
-
If you want to spend your time on important stuff (architecture design, paper implementation) and not on how to launch tf.Session on multi gpus :)
-
If you want your code to be able to be used by other developers without any problems and without spending hours and hours of their time to understand where the training begins :)
-
If you want to use modules (like architectures, losses, metrics etc.) developed by yourself and by others in plug-and-play mode
-
If you want to load classification models from your neighbour and use those weights for your dog-cat object detection (of course, it is better that your neighbour also uses nucleus7)
Names of nucleus7 components are based on nucleus structure: nucleotide, gene, dna helix etc.
Nucleotide is a building block of nucleus7. It has a modular structure and also has a data flow interfaces, e.g. which data does it take and which data does it output. There are different kinds of nucleotides, like ModelPlugin and CoordinatorCallback, which serve for different tasks, like neural network architecture and callbacks to execute after each iteration.
Gene is a combination of same type nucleotides, like plugins gene has all the ModelPlugin nucleotides inside etc. There may be many different genes in one model. This abstraction allows also to restrict the gene-to-gene connections, e.g. data can flow only from one gene to other and other connection is not allowed, e.g. ModelPlugin cat take inputs from the Dataset, but not from ModelLoss and ModelLoss can take inputs from ModelPlugin but not from Callbacks.
DNA helix called the graph constructed of all model nucleotides. DNA will sort the nucleotides in each gene
You can find some totorials how to use nucleus7 for your projects inside of tutorials folder in the root directory.
DO NOT FORGET- you need to have nucleus7 inside of your PYTHONPATH
,
e.g. (see Install section)
To run notobooks:
jupyter notebook
By default, when you start the nucleus7 project, e.g. training or inference,
it will create a folder mlruns inside of the root of project_dir, e.g.
for /path/to/root/project_dir
it will create path/to/root/project_dir
and
add the experiment there with name of the project_dir
directory or
it will search for "project_name" inside of "nucleus7_project.json" file
under PROJECT_NAME
key. This
will make sure, that you track everything to mlflow and so you can start mlflow
from path/to/root
folder:
cd /path/to/root
mlflow ui
But you also can set MLFLOW_TRACKING_URI
environment variable to point to the
URI with main mlflow tracker (see mlflow help for more details) and it will
create the experiment there (if experiment with that name exists, than it will
add the run to it):
export MLFLOW_TRACKING_URI='path/to/uri'
nc7-train /path/to/root/project_dir
-
If you have tensorflow(-gpu) > 1.11, then you can issue the pylint no-member warnings issues for tensorflow estimator due to the fact, that estimator API is still accessible as in tensorflow 1.11, but is officially legacy there and is moved to tensorflow_estimator. Since we maintain code for nucleus7 for tf >= 1.11, this bug cannot be solved easily and is more cosmetic issue. When tensorflow 2.0 will come out, tensorflow 1.11 support will be dropped and this issue will be removed. Inside of the testing, pylint is used only for tensorflow 1.11, so it does not raise an issue.
-
Since by starting the inference project (nc7-infer), the symlinks are generated, it causes the errors with the directory cleaning inside of tf.TestCase. But since the temporary folders are used (/tmp/...), this is also only a cosmetic issue, since this folders are cleaned automatically by the OS