microsoft / CNTK

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

Home Page:https://docs.microsoft.com/cognitive-toolkit/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Iteration Plan (September - October 2017)

cha-zhang opened this issue · comments

This plan captures our work from mid September to end of October. We will ship around November 22nd. Major work items of this iteration include ONNX support in CNTK, MKL integration, and many others.

Endgame

  • November 8: Code freeze for the end game
  • November 22: Release date

Planned items

We plan to ship these items at the end of this iteration.

Legend of annotations:

Icon Description
  • Item not started
  • Item finished
    🏃 Work in progress
    Blocked
    💪 Stretch

    Documentation

    • Finalize learner design and fix related documentation

    System

    • Support import/export ONNX format models
    • A network optimization API that helps model compression via SVD, quantization, etc.
    • 16bit support for training on Volta GPU (limited functionality)
    • C# high-level API design (no implementation)
    • Reader improvement for large data sets (sequential reader)

    Examples

    • Faster R-CNN object detection
      • Clean up the code to use arbitrary input image size
      • C++ implementation of some Python layers
      • Usability improvement
    • New example for natural language processing (NLP)
    • New tutorial on WGAN and LS-GAN
    • Semantic segmentation (stretch goal)

    Operations

    • Specify frequency in the number of epochs and minibatches for progress report, validation, checkpoints
    • Improve statistics for distributed evaluation

    Performance

    • Intel MKL update to improve inference speed on CPU by around 2x on AlexNet

    Others

    @cha-zhang Can we assume that parallel learning for Faster R-CNN will be implemented in this sprint?
    I put my comments for Fast R-CNN in the issue. Indeed, I don't stick to this issue, and I'd like to know if "Faster R-CNN" can include MORE FASTER implementation in this sprint:)

    Continue work on Deep Learning Explained course on edX.

    Does it mean an advanced course is coming up?

    commented

    Will new release be available for .netcore2.0?

    @arijit17 No, we are not working on an advanced course at this moment. It's there just to indicate some routine maintenance needed for the course.

    @kyoro1 Yes, faster implementation is on the roadmap, but we first want to achieve full parity.

    @grzsz We are making some fixes for the C# low-level API as well during this iteration (didn't mention above). .netcore2.0 compatibility is not a very high priority at this moment. How important is this?

    We are making some fixes for the C# low-level API as well during this iteration (didn't mention above).

    @cha-zhang This C# support is language binding? Or the APIs will be implemented in C#?

    @helloguo The C# API is SWIG generated binding.

    @cha-zhang Thank you for your clarification.

    The example Evaluation code shows the target framework is .NET Framework, which is Windows only. So can I assume these C# APIs are Windows only at this moment? If yes, are you planning to support Linux as well (e.g. using .NET Core since it supports Windows, Linux and macOS)?

    @helloguo people had raised this .NET Code issue #2346, #2352. We are investigating. Not sure if we can push into this release or not. However if we can, we will update this iteration plan.

    Regarding the Usability improvements to the Faster r-cnn, would this include a GPU enabled version of the proposal layer UDF? Otherwise I find the faster r-cnn example is already quite usable as it is. Since adding the 'STORE_EVAL_MODEL_WITH_NATIVE_UDF' option it now has everything you need to include it in a native c++ windows based product for example (i.e. without the need for python dependencies) . The only problem is that the evaluation is a very slow because we are stuck using the CPU.

    A network optimization API that helps model compression via SVD, quantization, etc.

    Awesome! And, does there exist a way to get early access?

    @master76 We have some prototype code but they are not written as CNTK API. So the answer to your question is no, you will have to wait till the end of the iteration. Thanks!

    commented

    @grzsz We are making some fixes for the C# low-level API as well during this iteration (didn't mention above). .netcore2.0 compatibility is not a very high priority at this moment. How important is this?

    @cha-zhang
    As everything - it depends :) I can use C++/Python, but I suppose many people want/have to stick to .netcore2 and will choose a competition or home-made solution when CNTK was their first choice due to assumed platform support

    @cha-zhang
    Can you please elaborate "Continue work on Deep Learning Explained course on edX."

    If there a plan or milestone?
    edX's CNTK course is an important way to promote and explain the "comprehensive extensive coverage of Deep Learning Topics" by CNTK.

    It could be useful to use this thread to get feedback "WHAT GO INTO THE edX course"

    Use this thread or a dedicated one to discuss

    • what have gone in, for that,
    • what users think about that,
    • What are the new topics YET to be included.

    #2422

    what is the medium term planning in terms of NNs debugging facilities?

    Can we output few more metrics using existing TensorBoard facilities within the next release under "improve statistics for distributed evaluation"? A good start would be weights histogram.

    @JimSEOW Sure let's create a dedicated thread for edX course.

    As I mentioned earlier, for this iteration, we are just doing maintenance. Maybe I'll remove it from the list.

    Does onnx mean that the model format will stabilize in the near future so models i have already trained will continue to work with future versions of cntk? At least for after onnx is implemented?

    @clintjcampbell Yes when ONNX is implemented it will be stable. ONNX itself is still evolving, but in a few weeks it should stabilize and be backward compatible.

    @rhy-ama weight histogram is not part of "improve statistics for distributed evaluation". This item specifically refers to improving printed information about training statistics when in distributed eval.

    NN debugging facility is not in the current plan. The team is busy delivering a major milestone that sets a few things to relatively lower priority. If someone could contribute this, it would be great!

    On the note of Improve statistics: It was possible in BrainScript to specify multiple metrics that were all evaluated and reported during training, but it seems that you can only monitor the loss and one metric using the Python API. It would be great to add the old BrainScript feature of multiple metrics back to the Python API.

    commented

    We are making some fixes for the C# low-level API as well during this iteration (didn't mention above). .netcore2.0 compatibility is not a very high priority at this moment. How important is this?

    This is super important to us. We would like to be able to reuse and maintain C# across the dev spectrum especially for business continuity. Plus there are performance improvements on .NET Core 2.0 which we would like to take advantage of without further optimization of our codebase. Please consider making it high priority.

    Thank you for your time and efforts!

    @skynode Please refer to #2352.

    Hi @cha-zhang,

    I am willing to implement high level API for C#, actually I have started that and I have implemented the following layers:

    • Linear
    • Convolution: Conv1D, Conv2D and Conv3D
    • Pooling: Max(Pool1D, Pool2D and Pool3D) and Avg(Pool1D, Pool2D and Pool3D)

    You can find it in this link:
    https://github.com/mhjabreel/DeepSharp

    Regards,

    Mohammed

    Hi, we have to postpone the release date for this iteration to Nov. 14. We added one week to wrap up a few features under implementation, and another week to fix some bugs reported in GitHub issues. Sorry for the delay!

    I highly recommend the Deep Learning Explained course on edX.
    Waiting patiently for the advanced course.

    .NET Core 2.0 support is very important.
    I hope CUDA 9 support and VS 2017 build is part of this iteration.

    Does C++ implementation of some Python layers for Faster R-CNN object detection include gpu enabled evaluation from c#?

    These features sound awesome. Are we still looking at getting them sometime this week? Is there a list of open issues for the release that someone who knows C# well could contribute to?

    The new ship date for v2.3 is Nov. 14, as updated in the message above.

    The C# high-level API design task is now blocked due to internal deadlines. We encourage the community to build high level API on top of the current low level one and share. You may use a similar design as CNTK's high level API, or feel free to mimic other high level APIs such as Keras/Gluon.

    Starting next iteration, we will be making some changes to the release procedure. We are working hard to enable nightly releases (ETA before end of this year). Official release will then be done as-needed. Please comment if you have comments/suggestions. Thanks!

    Is 2.3 release still planned for today?

    No, it got delayed 1 week. We are releasing it in Nov 22 due to some changes that we need to take.

    Well, that is a bummer. Might as well delay it all the way till you are ready to release cuda9,cudnn7,and stable fp16 training. It is pretty amazing that mxnet 0.12 beat both cntk and tensorflow on cuda9 fp16 support,but lacks keras 2.0 support.

    Cuda9 and cuDNN7 will follow next.

    @ebarsoumMS , thank you for keeping us informed. The iteration plan included three improvements to the Faster RCNN example:

    1. Clean up the code to use arbitrary input image size
    2. C++ implementation of some Python layers
    3. Usability improvement

    Have these made it into the upcoming release?

    Adding @spandantiwari to comment, arbitrary input image size is in and we fix most OPs to work with arbitrary size.

    @Dozer3D - we have worked quite a bit to support free static axes (arbitrary input image size) in convolutional pipelines in this iteration. So convolution, pooling and other nodes that may be used in a typical pipeline support free static axes. We have also improved the performance for convolution with free static axes. But the FasterRCNN training using free static axes is not completely ready yet. We are still testing it out to match the numbers stated in the paper. Also, the C++ implementation of ProposalLayer.py is also under works. But these will most probably not make it into 2.3 release. Having said that, this model and making it work fast (especially inference) is still on our priorities.

    @ebarsoumMS My understanding is that Cuda9 is required to eliminate .Net Framework dependencies and provide a Net Standard version of CNTK. Is that correct? If so, Is that likely to happen for 2.3 next week, or at some future point? If a future point, is there any estimate of when?

    Being able to use CNTK effectively in a container would be super useful, and my impression was this wasn't TOO far away...

    @spandantiwari thank you for that informative reply. We have created two datasets and trained faster RCNN networks with CNTK 2.2 to solve three problems for a client, but currently only one of these is usable without the GPU, and then only just. Having faster GPU and faster CPU inference would be much appreciated (I assume decreasing the input image size would also speed up the CPU processing)

    So nothing for us in 2.3? but a good chance something before say, end of January?

    Having said that, this model and making it work fast (especially inference) is still on our priorities.

    Thank you. As a traditional Windows programmer/solutions provider, who knows very little about machine learning , I find Faster RCNN to be a very practical tool for solving many real problems for our customers.

    @cha-zhang looking forward to the next release :)

    Given that you "encourage the community to build high level API on top of the current low level one and share", I figure I would mention that I started working with some F# community members on exploring what a high-level, script-friendly F# DSL on top of CNTK could look like.

    Got some of the C# samples converted to F# scripts already, very close to the original C# version here:

    https://github.com/mathias-brandewinder/CNTK.FSharp/tree/master/examples

    ... and currently trying out something loosely Keras inspired. Plenty of rough edges, not sure yet if the direction is right, but here is how the MNIST CNN sample looks like as of today, interesting part highlighted:

    https://github.com/mathias-brandewinder/CNTK.FSharp/blob/a0e9794697afacce65c95c66f5d899a9dd71cbf7/examples/MNIST-CNN.fsx#L89-L123

    @spandantiwari - we're also exploring FasterRCNN. If the improvements aren't going to be released in the next week or so, could you please create a document somewhere with a recommended approach? I'm new to CNTK, but with some direction I may be able to help (especially if there are some examples e.g. 'convert the python layers [files <...>] to C++ in the same way as was done for PR <...>' ... or 'see C++ layer <...> for an example').

    For those of you who are exploring FasterRCNN, we have a branch chazhang/faster_rcnn that updates the Faster RCNN with free static axis. The code is tangled with Fast RCNN, and Fast RCNN hasn't been verified, so we won't release it in this iteration. On the other hand, FasterRCNN is now functional with arbitrary input image size, tested on Pascal data set. We don't see much accuracy improvement with this, though.

    Most code was actually contributed by @spandantiwari. Thanks!

    Thanks @cha-zhang . Could you please provide feedback on the best way to implement some of the C++ layers as per here?

    As an aside, pip installs in code might want reconsidering before merging.

    @kodonnell Are you asking about using C++ to implement the proposal layer instead of Python?

    @cha-zhang I'm referring to the original iteration plan:

    C++ implementation of some Python layers

    I don't even know what those layers are, hence why I'm asking for a starter = ) From other issues I've read, it sounds like implementing this will make evaluation of Faster RCNN a lot faster.

    Yes, that's the proposal layer. The current custom proposal layer is in Python and can be written in C++ instead.

    You can refer to the binary convolution example for how to write a C++ custom layer:
    https://github.com/Microsoft/CNTK/tree/master/Examples/Extensibility/BinaryConvolution

    I am confused now :-(

    The current custom proposal layer is in Python and can be written in C++ instead.

    It is my understanding that evaluation using c++ only (no python) already works and was implemented in 2.2 using a UDF by @pkranen (see 2234 ).

    i.e. set __C.STORE_EVAL_MODEL_WITH_NATIVE_UDF = True

    This does seem to work, except it runs on the CPU only (very slow), and not GPU. If you set the device to a GPU it throws an exception because the GPU version of that layer hasn't been written.

    i..e in the file "cntk\Examples\Extensibility\ProposalLayer\ProposalLayerLib\ProposalLayerLib.h" we have the following code.

        if (computeDevice.Type() != DeviceKind::CPU)
               throw std::runtime_error("ProposalLayer: only CPU evaluation is supported at the moment.");
    

    @Dozer3D I think I was referring to training. If eval only, then yes, we have a C++ version already.

    We are not satisfied with the training speed of Faster RCNN. More work is needed.

    @cha-zhang - might it pay to start a new issue (or update the docs somewhere) to have a single place referring to all the improvements intended for Faster RCNN (with some useful detail to encourage PRs), so it's a little clearer? There are quite a few threads (including the 'pollution' of this one) which I, for one, find hard to follow.

    I am confused now :-(

    The current custom proposal layer is in Python and can be written in C++ instead.

    It is my understanding that evaluation using c++ only (no python) already works and was implemented in 2.2 using a UDF by @pkranen (see 2234 ).

    i.e. set __C.STORE_EVAL_MODEL_WITH_NATIVE_UDF = True

    This does seem to work, except it runs on the CPU only (very slow), and not GPU. If you set the device to a GPU it throws an exception because the GPU version of that layer hasn't been written.

    i..e in the file "cntk\Examples\Extensibility\ProposalLayer\ProposalLayerLib\ProposalLayerLib.h" we have the following code.

        if (computeDevice.Type() != DeviceKind::CPU)
               throw std::runtime_error("ProposalLayer: only CPU evaluation is supported at the moment.");
    

    Is there any support for GPU on Porposal Layer Lib c++ implementation ?
    I'm running CNTK 2.7 and it seems there still isn't any support for GPU.
    When Is it planned to release this kind of support ?