Iteration Plan (September - October 2017)

Question

Iteration Plan (September - October 2017)

cha-zhang opened this issue 7 years ago · comments

This plan captures our work from mid September to end of October. We will ship around November 22nd. Major work items of this iteration include ONNX support in CNTK, MKL integration, and many others.

Endgame

November 8: Code freeze for the end game
November 22: Release date

Planned items

We plan to ship these items at the end of this iteration.

Legend of annotations:

Icon	Description
	Item not started
	Item finished
🏃	Work in progress
✋	Blocked
💪	Stretch

Documentation

Finalize learner design and fix related documentation

System

Support import/export ONNX format models
A network optimization API that helps model compression via SVD, quantization, etc.
16bit support for training on Volta GPU (limited functionality)
C# high-level API design (no implementation)
Reader improvement for large data sets (sequential reader)

Examples

Faster R-CNN object detection
- Clean up the code to use arbitrary input image size
- C++ implementation of some Python layers
- Usability improvement
New example for natural language processing (NLP)
New tutorial on WGAN and LS-GAN
Semantic segmentation (stretch goal)

Operations

Specify frequency in the number of epochs and minibatches for progress report, validation, checkpoints
Improve statistics for distributed evaluation

Performance

Intel MKL update to improve inference speed on CPU by around 2x on AlexNet

Others

Continue work on Deep Learning Explained course on edX.

Kyoichi Iwasaki · Answer 1 · Tue Sep 26 2017 08:24:50 GMT+0800 (China Standard Time)

@cha-zhang Can we assume that parallel learning for Faster R-CNN will be implemented in this sprint?
I put my comments for Fast R-CNN in the issue. Indeed, I don't stick to this issue, and I'd like to know if "Faster R-CNN" can include MORE FASTER implementation in this sprint:)

Arijit Biswas · Answer 2 · Tue Sep 26 2017 14:36:07 GMT+0800 (China Standard Time)

Continue work on Deep Learning Explained course on edX.

Does it mean an advanced course is coming up?

grzsz · Answer 3 · Tue Sep 26 2017 17:16:05 GMT+0800 (China Standard Time)

Will new release be available for .netcore2.0?

Cha Zhang · Answer 4 · Tue Sep 26 2017 22:14:13 GMT+0800 (China Standard Time)

@arijit17 No, we are not working on an advanced course at this moment. It's there just to indicate some routine maintenance needed for the course.

Cha Zhang · Answer 5 · Tue Sep 26 2017 22:15:39 GMT+0800 (China Standard Time)

@kyoro1 Yes, faster implementation is on the roadmap, but we first want to achieve full parity.

Cha Zhang · Answer 6 · Tue Sep 26 2017 22:17:24 GMT+0800 (China Standard Time)

@grzsz We are making some fixes for the C# low-level API as well during this iteration (didn't mention above). .netcore2.0 compatibility is not a very high priority at this moment. How important is this?

Xiangyang (Mark) Guo · Answer 7 · Wed Sep 27 2017 00:58:05 GMT+0800 (China Standard Time)

We are making some fixes for the C# low-level API as well during this iteration (didn't mention above).

@cha-zhang This C# support is language binding? Or the APIs will be implemented in C#?

Cha Zhang · Answer 8 · Wed Sep 27 2017 01:01:05 GMT+0800 (China Standard Time)

@helloguo The C# API is SWIG generated binding.

Xiangyang (Mark) Guo · Answer 9 · Wed Sep 27 2017 01:59:11 GMT+0800 (China Standard Time)

@cha-zhang Thank you for your clarification.

The example Evaluation code shows the target framework is .NET Framework, which is Windows only. So can I assume these C# APIs are Windows only at this moment? If yes, are you planning to support Linux as well (e.g. using .NET Core since it supports Windows, Linux and macOS)?

liqun Fu · Answer 10 · Wed Sep 27 2017 04:34:33 GMT+0800 (China Standard Time)

@helloguo people had raised this .NET Code issue #2346, #2352. We are investigating. Not sure if we can push into this release or not. However if we can, we will update this iteration plan.

Dozer3D · Answer 11 · Wed Sep 27 2017 06:04:39 GMT+0800 (China Standard Time)

Regarding the Usability improvements to the Faster r-cnn, would this include a GPU enabled version of the proposal layer UDF? Otherwise I find the faster r-cnn example is already quite usable as it is. Since adding the 'STORE_EVAL_MODEL_WITH_NATIVE_UDF' option it now has everything you need to include it in a native c++ windows based product for example (i.e. without the need for python dependencies) . The only problem is that the evaluation is a very slow because we are stuck using the CPU.

main76 · Answer 12 · Wed Sep 27 2017 11:24:42 GMT+0800 (China Standard Time)

A network optimization API that helps model compression via SVD, quantization, etc.

Awesome! And, does there exist a way to get early access?

Cha Zhang · Answer 13 · Wed Sep 27 2017 11:54:46 GMT+0800 (China Standard Time)

@master76 We have some prototype code but they are not written as CNTK API. So the answer to your question is no, you will have to wait till the end of the iteration. Thanks!

grzsz · Answer 14 · Wed Sep 27 2017 13:36:30 GMT+0800 (China Standard Time)

@grzsz We are making some fixes for the C# low-level API as well during this iteration (didn't mention above). .netcore2.0 compatibility is not a very high priority at this moment. How important is this?

@cha-zhang
As everything - it depends :) I can use C++/Python, but I suppose many people want/have to stick to .netcore2 and will choose a competition or home-made solution when CNTK was their first choice due to assumed platform support

JimSEOW · Answer 15 · Wed Sep 27 2017 17:09:09 GMT+0800 (China Standard Time)

@cha-zhang
Can you please elaborate "Continue work on Deep Learning Explained course on edX."

If there a plan or milestone?
edX's CNTK course is an important way to promote and explain the "comprehensive extensive coverage of Deep Learning Topics" by CNTK.

It could be useful to use this thread to get feedback "WHAT GO INTO THE edX course"

Use this thread or a dedicated one to discuss

what have gone in, for that,
what users think about that,
What are the new topics YET to be included.

rhy-ama · Answer 16 · Wed Sep 27 2017 19:37:14 GMT+0800 (China Standard Time)

#2422

what is the medium term planning in terms of NNs debugging facilities?

Can we output few more metrics using existing TensorBoard facilities within the next release under "improve statistics for distributed evaluation"? A good start would be weights histogram.

Cha Zhang · Answer 17 · Thu Sep 28 2017 00:49:22 GMT+0800 (China Standard Time)

@JimSEOW Sure let's create a dedicated thread for edX course.

As I mentioned earlier, for this iteration, we are just doing maintenance. Maybe I'll remove it from the list.

clintjcampbell · Answer 18 · Fri Sep 29 2017 06:06:49 GMT+0800 (China Standard Time)

Does onnx mean that the model format will stabilize in the near future so models i have already trained will continue to work with future versions of cntk? At least for after onnx is implemented?

Cha Zhang · Answer 19 · Fri Sep 29 2017 11:25:22 GMT+0800 (China Standard Time)

@clintjcampbell Yes when ONNX is implemented it will be stable. ONNX itself is still evolving, but in a few weeks it should stabilize and be backward compatible.

Cha Zhang · Answer 20 · Fri Sep 29 2017 11:30:06 GMT+0800 (China Standard Time)

@rhy-ama weight histogram is not part of "improve statistics for distributed evaluation". This item specifically refers to improving printed information about training statistics when in distributed eval.

NN debugging facility is not in the current plan. The team is busy delivering a major milestone that sets a few things to relatively lower priority. If someone could contribute this, it would be great!

e-thereal · Answer 21 · Tue Oct 10 2017 19:07:05 GMT+0800 (China Standard Time)

On the note of Improve statistics: It was possible in BrainScript to specify multiple metrics that were all evaluated and reported during training, but it seems that you can only monitor the loss and one metric using the Python API. It would be great to add the old BrainScript feature of multiple metrics back to the Python API.

Obi · Answer 22 · Mon Oct 16 2017 08:35:25 GMT+0800 (China Standard Time)

We are making some fixes for the C# low-level API as well during this iteration (didn't mention above). .netcore2.0 compatibility is not a very high priority at this moment. How important is this?

This is super important to us. We would like to be able to reuse and maintain C# across the dev spectrum especially for business continuity. Plus there are performance improvements on .NET Core 2.0 which we would like to take advantage of without further optimization of our codebase. Please consider making it high priority.

Thank you for your time and efforts!

Cha Zhang · Answer 23 · Mon Oct 16 2017 11:48:21 GMT+0800 (China Standard Time)

@skynode Please refer to #2352.

Mohammed H. Jabreel · Answer 24 · Fri Oct 20 2017 01:15:08 GMT+0800 (China Standard Time)

Hi @cha-zhang,

I am willing to implement high level API for C#, actually I have started that and I have implemented the following layers:

Linear
Convolution: Conv1D, Conv2D and Conv3D
Pooling: Max(Pool1D, Pool2D and Pool3D) and Avg(Pool1D, Pool2D and Pool3D)

You can find it in this link:
https://github.com/mhjabreel/DeepSharp

Regards,

Mohammed

Cha Zhang · Answer 25 · Thu Oct 26 2017 00:32:54 GMT+0800 (China Standard Time)

Hi, we have to postpone the release date for this iteration to Nov. 14. We added one week to wrap up a few features under implementation, and another week to fix some bugs reported in GitHub issues. Sorry for the delay!

Ivan Farkas · Answer 26 · Sun Oct 29 2017 02:36:33 GMT+0800 (China Standard Time)

I highly recommend the Deep Learning Explained course on edX.
Waiting patiently for the advanced course.

.NET Core 2.0 support is very important.
I hope CUDA 9 support and VS 2017 build is part of this iteration.

mstockfo · Answer 27 · Thu Nov 09 2017 02:19:42 GMT+0800 (China Standard Time)

Does C++ implementation of some Python layers for Faster R-CNN object detection include gpu enabled evaluation from c#?

ddurschlag · Answer 28 · Mon Nov 13 2017 01:53:49 GMT+0800 (China Standard Time)

These features sound awesome. Are we still looking at getting them sometime this week? Is there a list of open issues for the release that someone who knows C# well could contribute to?

Cha Zhang · Answer 29 · Mon Nov 13 2017 02:31:21 GMT+0800 (China Standard Time)

The new ship date for v2.3 is Nov. 14, as updated in the message above.

The C# high-level API design task is now blocked due to internal deadlines. We encourage the community to build high level API on top of the current low level one and share. You may use a similar design as CNTK's high level API, or feel free to mimic other high level APIs such as Keras/Gluon.

Starting next iteration, we will be making some changes to the release procedure. We are working hard to enable nightly releases (ETA before end of this year). Official release will then be done as-needed. Please comment if you have comments/suggestions. Thanks!

Benjamin Cherian · Answer 30 · Wed Nov 15 2017 02:36:47 GMT+0800 (China Standard Time)

Is 2.3 release still planned for today?

Emad Barsoum · Answer 31 · Wed Nov 15 2017 02:39:53 GMT+0800 (China Standard Time)

No, it got delayed 1 week. We are releasing it in Nov 22 due to some changes that we need to take.

whatever1983 · Answer 32 · Wed Nov 15 2017 03:06:19 GMT+0800 (China Standard Time)

Well, that is a bummer. Might as well delay it all the way till you are ready to release cuda9,cudnn7,and stable fp16 training. It is pretty amazing that mxnet 0.12 beat both cntk and tensorflow on cuda9 fp16 support,but lacks keras 2.0 support.

Emad Barsoum · Answer 33 · Wed Nov 15 2017 03:09:20 GMT+0800 (China Standard Time)

Cuda9 and cuDNN7 will follow next.

Dozer3D · Answer 34 · Wed Nov 15 2017 03:41:24 GMT+0800 (China Standard Time)

@ebarsoumMS , thank you for keeping us informed. The iteration plan included three improvements to the Faster RCNN example:

Clean up the code to use arbitrary input image size
C++ implementation of some Python layers
Usability improvement

Have these made it into the upcoming release?

Emad Barsoum · Answer 35 · Wed Nov 15 2017 03:45:51 GMT+0800 (China Standard Time)

Adding @spandantiwari to comment, arbitrary input image size is in and we fix most OPs to work with arbitrary size.

Spandan Tiwari · Answer 36 · Wed Nov 15 2017 03:58:48 GMT+0800 (China Standard Time)

@Dozer3D - we have worked quite a bit to support free static axes (arbitrary input image size) in convolutional pipelines in this iteration. So convolution, pooling and other nodes that may be used in a typical pipeline support free static axes. We have also improved the performance for convolution with free static axes. But the FasterRCNN training using free static axes is not completely ready yet. We are still testing it out to match the numbers stated in the paper. Also, the C++ implementation of ProposalLayer.py is also under works. But these will most probably not make it into 2.3 release. Having said that, this model and making it work fast (especially inference) is still on our priorities.

ddurschlag · Answer 37 · Wed Nov 15 2017 04:12:20 GMT+0800 (China Standard Time)

@ebarsoumMS My understanding is that Cuda9 is required to eliminate .Net Framework dependencies and provide a Net Standard version of CNTK. Is that correct? If so, Is that likely to happen for 2.3 next week, or at some future point? If a future point, is there any estimate of when?

Being able to use CNTK effectively in a container would be super useful, and my impression was this wasn't TOO far away...

Dozer3D · Answer 38 · Wed Nov 15 2017 08:33:52 GMT+0800 (China Standard Time)

@spandantiwari thank you for that informative reply. We have created two datasets and trained faster RCNN networks with CNTK 2.2 to solve three problems for a client, but currently only one of these is usable without the GPU, and then only just. Having faster GPU and faster CPU inference would be much appreciated (I assume decreasing the input image size would also speed up the CPU processing)

So nothing for us in 2.3? but a good chance something before say, end of January?

Having said that, this model and making it work fast (especially inference) is still on our priorities.

Thank you. As a traditional Windows programmer/solutions provider, who knows very little about machine learning , I find Faster RCNN to be a very practical tool for solving many real problems for our customers.

Mathias Brandewinder · Answer 39 · Wed Nov 15 2017 13:20:07 GMT+0800 (China Standard Time)

@cha-zhang looking forward to the next release :)

Given that you "encourage the community to build high level API on top of the current low level one and share", I figure I would mention that I started working with some F# community members on exploring what a high-level, script-friendly F# DSL on top of CNTK could look like.

Got some of the C# samples converted to F# scripts already, very close to the original C# version here:

https://github.com/mathias-brandewinder/CNTK.FSharp/tree/master/examples

... and currently trying out something loosely Keras inspired. Plenty of rough edges, not sure yet if the direction is right, but here is how the MNIST CNN sample looks like as of today, interesting part highlighted:

https://github.com/mathias-brandewinder/CNTK.FSharp/blob/a0e9794697afacce65c95c66f5d899a9dd71cbf7/examples/MNIST-CNN.fsx#L89-L123

kodonnell · Answer 40 · Wed Nov 15 2017 15:58:20 GMT+0800 (China Standard Time)

@spandantiwari - we're also exploring FasterRCNN. If the improvements aren't going to be released in the next week or so, could you please create a document somewhere with a recommended approach? I'm new to CNTK, but with some direction I may be able to help (especially if there are some examples e.g. 'convert the python layers [files <...>] to C++ in the same way as was done for PR <...>' ... or 'see C++ layer <...> for an example').

Cha Zhang · Answer 41 · Thu Nov 23 2017 01:38:10 GMT+0800 (China Standard Time)

For those of you who are exploring FasterRCNN, we have a branch chazhang/faster_rcnn that updates the Faster RCNN with free static axis. The code is tangled with Fast RCNN, and Fast RCNN hasn't been verified, so we won't release it in this iteration. On the other hand, FasterRCNN is now functional with arbitrary input image size, tested on Pascal data set. We don't see much accuracy improvement with this, though.

Most code was actually contributed by @spandantiwari. Thanks!

kodonnell · Answer 42 · Thu Nov 23 2017 02:33:05 GMT+0800 (China Standard Time)

Thanks @cha-zhang . Could you please provide feedback on the best way to implement some of the C++ layers as per here?

As an aside, pip installs in code might want reconsidering before merging.

Cha Zhang · Answer 43 · Thu Nov 23 2017 02:39:07 GMT+0800 (China Standard Time)

@kodonnell Are you asking about using C++ to implement the proposal layer instead of Python?

kodonnell · Answer 44 · Thu Nov 23 2017 03:00:14 GMT+0800 (China Standard Time)

@cha-zhang I'm referring to the original iteration plan:

C++ implementation of some Python layers

I don't even know what those layers are, hence why I'm asking for a starter = ) From other issues I've read, it sounds like implementing this will make evaluation of Faster RCNN a lot faster.

Cha Zhang · Answer 45 · Thu Nov 23 2017 03:34:38 GMT+0800 (China Standard Time)

Yes, that's the proposal layer. The current custom proposal layer is in Python and can be written in C++ instead.

You can refer to the binary convolution example for how to write a C++ custom layer:
https://github.com/Microsoft/CNTK/tree/master/Examples/Extensibility/BinaryConvolution

Dozer3D · Answer 46 · Thu Nov 23 2017 03:38:34 GMT+0800 (China Standard Time)

I am confused now :-(

The current custom proposal layer is in Python and can be written in C++ instead.

It is my understanding that evaluation using c++ only (no python) already works and was implemented in 2.2 using a UDF by @pkranen (see 2234 ).

i.e. set __C.STORE_EVAL_MODEL_WITH_NATIVE_UDF = True

This does seem to work, except it runs on the CPU only (very slow), and not GPU. If you set the device to a GPU it throws an exception because the GPU version of that layer hasn't been written.

i..e in the file "cntk\Examples\Extensibility\ProposalLayer\ProposalLayerLib\ProposalLayerLib.h" we have the following code.

    if (computeDevice.Type() != DeviceKind::CPU)
           throw std::runtime_error("ProposalLayer: only CPU evaluation is supported at the moment.");

Cha Zhang · Answer 47 · Thu Nov 23 2017 03:44:20 GMT+0800 (China Standard Time)

@Dozer3D I think I was referring to training. If eval only, then yes, we have a C++ version already.

We are not satisfied with the training speed of Faster RCNN. More work is needed.

kodonnell · Answer 48 · Thu Nov 23 2017 04:19:51 GMT+0800 (China Standard Time)

@cha-zhang - might it pay to start a new issue (or update the docs somewhere) to have a single place referring to all the improvements intended for Faster RCNN (with some useful detail to encourage PRs), so it's a little clearer? There are quite a few threads (including the 'pollution' of this one) which I, for one, find hard to follow.

sigfrid696 · Answer 49 · Fri Jun 07 2019 23:39:47 GMT+0800 (China Standard Time)

I am confused now :-(

The current custom proposal layer is in Python and can be written in C++ instead.

It is my understanding that evaluation using c++ only (no python) already works and was implemented in 2.2 using a UDF by @pkranen (see 2234 ).

i.e. set __C.STORE_EVAL_MODEL_WITH_NATIVE_UDF = True

This does seem to work, except it runs on the CPU only (very slow), and not GPU. If you set the device to a GPU it throws an exception because the GPU version of that layer hasn't been written.

i..e in the file "cntk\Examples\Extensibility\ProposalLayer\ProposalLayerLib\ProposalLayerLib.h" we have the following code.
    if (computeDevice.Type() != DeviceKind::CPU)
           throw std::runtime_error("ProposalLayer: only CPU evaluation is supported at the moment.");

Is there any support for GPU on Porposal Layer Lib c++ implementation ?
I'm running CNTK 2.7 and it seems there still isn't any support for GPU.
When Is it planned to release this kind of support ?