Publish roadmap and contributing doc?

Question

Publish roadmap and contributing doc?

tribbloid opened this issue 5 years ago · comments

I'm curious if your visions include making it a feature-complete NN training framework?

What will be the master plan? Integrating with Torch/TF/MXNet or build hardware-level compilation framework from scratch?

Also, what is the standard & code of conduct for contributions from the community?

I'm totally convinced of its capability and believe it can fit into the missing link between horizontal AI (Apache Spark/Apache Flink) and vertical AI (domain-specific hardware, Chisel/FIRRTL)

breandan · Answer 1 · Sun Oct 27 2019 12:04:14 GMT+0800 (China Standard Time)

Right now, the project is still a research prototype, but we should be able to support CUDA with something like ND4J or JavaCPP/CUDA for GPU acceleration on the JVM. Ultimately, our goal is to move off the JVM to Kotlin/Native once it becomes more stable.

Currently focused on stabilizing our API, and improving support for differentiable programming. ~~It should be possible to implement a feedforward MLP right now, but we have not gotten around to that yet.~~ There are some toy examples here if you feel like looking around!

Longer term, it would be great to integrate with something like Relay IR or MLIR. Ideally, this would take the form of a Kotlin compiler plugin and would also require additional work to translate our graph. It's something we need to explore a bit further before committing.

Re: CoC and contributing. Haven't thought about this yet. Do you have any recommendations?

Peng Cheng · Answer 2 · Mon Oct 28 2019 01:30:44 GMT+0800 (China Standard Time)

thanks a lot, agreed RelayIR in TVM is awesome. A sister project in DMLC is looking for a JVM based autodiff implementation:

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103089990 (Questions section)

Would you like to pitch them to merge with you? I'll also try to get their committers' (particularly Tianchi Chen) attention.

I'm not sure about Kotlin/Native at this time though. It appears to be self-contained and not prioritising interoperability with other LLVM libraries. In addition, all JIT compilers (JVM, LLVM, V8.js) may evolve to have similar performances to each other.

breandan · Answer 3 · Mon Oct 28 2019 03:05:13 GMT+0800 (China Standard Time)

Sounds interesting, we are certainly fans of Tianqi's work. Would be happy to have a conversion to understand Gluon's needs a little more clearly. The landscape of autodiff/autograd for JVM appears somewhat bleak at the moment, ND4J has a prototype AD, although just for internal consumption. There is Nexus and Lantern, but unsure how well these interop with other JVM languages.

Agree that one of Kotlin's main benefits is strong JVM interoperability. From our experience developing Kotlin𝛁, we are convinced it is possible to implement AD as a JVM library, and could probably support Gluon's intended use case. I presume K/N also aims to support the same compatibility with other LLVM libraries, but agree the story is currently less complete.

Peng Cheng · Answer 4 · Mon Oct 28 2019 05:50:28 GMT+0800 (China Standard Time)

Kotlin itself as an API is good enough, scala may be a "sweet-to-have" through its quirky dependent type implementation:

https://stackoverflow.com/questions/12935731/any-reason-why-scala-does-not-explicitly-support-dependent-types

I don't think an API in Java or Clojure serves any purpose. Java has 0 dependent type support which becomes exactly the downfall of ND4J SameDiff API, in fact, it is so verbose that people in Skymind.io end up not using it for most deeplearning4j components. Clojure/LISP is type unsafe out-of-the-box.

I'm not sure about Nexus and Lantern though, I like their functional approach and succinct API, however their decision to build everything from ground up and refusal in leveraging existing communities seems to be an oversight (which is why I believed merging with MXNet can be a win-win)

I'll use a few days to try to figure out the intention of Tianqi's team. BTW, have you met Dr Thorsten Scholak (https://github.com/tscholak) at ElementAI in your city? He is an active committer of hasktorch and might be interested in promoting your work.

breandan · Answer 5 · Mon Oct 28 2019 06:58:29 GMT+0800 (China Standard Time)

If you’re interested in fully dependent types, Scala/Shapeless is probably your best bet. We’ve been able to get some of the benefits of dependent types with a few caveats. Equality checking is possible (e.g. shape checking and inference for ordinary vector/matrix arithmetic) at compile time. Type level arithmetic is possible with some difficulty, by enumerating all cases below a fixed constant. Practically speaking, we can only type check a fixed set of convolution sizes at compile time, and the general case can either be checked at runtime or dynamically. You can find more details in our recent workshop paper.

I know of Torsten, but have never met him in person. I think we‘re even in the same building. Will definitely reach out, thank you for the reminder!

Peng Cheng · Answer 6 · Mon Oct 28 2019 07:40:25 GMT+0800 (China Standard Time)

At your service :) I'm 100% committed to Spark & Chisel so singleton-typed literal + recursive generic (similar to HList and dotty tuple) for shape safety seems no-brainer for me. I assume you have chosen Kotlin for other reasons, as I saw you complained about it on OpenReview (regarding THE workshop paper), and your code has a codegen that seems to use a fair amount of copy-pasting.

BUT I have no intention to refactor the core part (or argue which language is fitter), all languages are either a bit obsolete or a bit unstable or both. Kotlingrad is big enough, and I guess we can agree that its strength lies in mega-engineering instead of prototyping (for which torch is dominant). So I definitely favour the minimalistic approach you are already using

breandan · Answer 7 · Mon Oct 28 2019 09:39:37 GMT+0800 (China Standard Time)

More than happy to accept PRs! If you can refactor our implementation and preserve the same guarantees, I'm all ears. Tried a bunch of different approaches, but perhaps there is a simpler implementation that offers compile time shape checking. The most recent prototype can be found in the Toy*Example.kt files, which use less metaprogramming and are somewhat easier to grok. Have experimented with other features like type safe currying, but unless we can leverage genericity somehow (bona fide union/intersection types would be great), the complexity is probably too high.

There are just four different operators you must implement to demonstrate a working implementation of AD: +, *, invoke and diff. Everything else is just syntactic sugar. The way I figure, the simpler the language, the fewer room for errors during graph lowering. Definitely still areas for improvement. I have not been able to fully generify multidimensional arrays, i.e. it would be nice to have a recursive type like Vec<Vec<Vec<...>, D2>, D1>. As it is, we need to implement a new tower (e.g. Mat, Cub) for each rank of tensor with corresponding conversions (it's a real headache). Given the semantics of type erasure on the JVM, I'm not sure if this is possible but would be happy to be proven wrong.

Peng Cheng · Answer 8 · Sat Nov 02 2019 07:16:10 GMT+0800 (China Standard Time)

I'm still observing: https://discuss.mxnet.io/t/what-is-the-status-of-gluon-api-for-scala-clojure-jvm-autograd-api/5088

Did you get response from Thorsten?

breandan · Answer 9 · Sat Nov 02 2019 08:03:51 GMT+0800 (China Standard Time)

Yeah, I emailed Torsten at your suggestion and he ended up giving a nice talk about his work on Hasktorch at our reading group yesterday. Only wish I had reached out sooner, thanks @tscholak!

Also had a chat with my labmate and TVM contributor @silentspring2 the other day. Sounds like it would be a great direction to explore, happy to continue discussing either here or on your thread.

Peng Cheng · Answer 10 · Sat Nov 02 2019 08:29:56 GMT+0800 (China Standard Time)

That's a lot of progress, I'm in Toronto so kind of missing all the action there ...

I'll seek to post a MXNet JIRA ticket and keep our conversation there. Hope your committer friend won't mind

Torsten Scholak · Answer 11 · Sat Nov 02 2019 09:27:31 GMT+0800 (China Standard Time)

lol, you guys know each other? small world!
hope you’re doing alright @tribbloid

Peng Cheng · Answer 12 · Sun Nov 03 2019 02:19:00 GMT+0800 (China Standard Time)

@tscholak no we don't which is totally my miscalculation.
I assume you are both contrarians there (amongst colleagues who are totally comfortable with Python 2), hope this can change soon

breandan · Answer 13 · Sun Dec 22 2019 11:30:58 GMT+0800 (China Standard Time)

From what I can tell, there are a few paths we could take to support GPUs. In no particular order:

There were a bunch of OpenJDK projects circa 2017, including Project Trinity, Project Sumatra, Project Panama, and Project Valhalla the last two of which appear to be somewhat active.
There is AMD's AparAPI, which appears to have been abandoned in 2016/2017.
There is Intel's oneAPI, which has a Java API, documentation including some form of GPU support and is active, but appears to target a higher level set of use cases.
There is Rootbeer, a Java GPU compiler, which looks interesting but appears to have died.
A lot of JVM activity appears to have shifted to Truffle/GraalVM and for our use case GrCUDA looks very promising, has the best documentation and supports polyglot programming.
Alternatively, the ONNX Runtime now has a Java API, which provides GPU support for running ONNX models, which would give us access to the GPU by lowering to the ONNX IR.
Finally, there is @jcuda, JOCL and JavaCPP which have discussed merging efforts.

There is also some word of upcoming JDK features (@Craigacp?) but it looks like these are our best options for hardware acceleration at the moment. Any other suggestions welcome!

Adam Pocock · Answer 14 · Mon Dec 23 2019 05:33:04 GMT+0800 (China Standard Time)

Valhalla is adding value types & specialised generics to Java and is definitely still an active project. Panama is about making native interop easier, currently it's split into three parts, the first part of which is a native memory allocation and access API, the second part allows to wrap a C function entry point into a MethodHandle to allow easy calling from Java, and the third part is automatic extraction of method entry points, structs and other types from C header files. The first part will land as a preview feature in JDK 14 (https://openjdk.java.net/jeps/370) in March. There is a separate strand to Panama which is the introduction of a Vector (SIMD) API for Java. The aim there is to allow users to write something higher level than C intrinsics and it will use the most appropriate SIMD instruction available before falling back to scalar code. I believe Sumatra is inactive at the moment, and I don't think Trinity passed the vote threshold to become an official OpenJDK project.

In addition here is some interesting work on GPU acceleration for the JVM from the TornadoVM (https://github.com/beehive-lab/TornadoVM) group at the University of Manchester, which is partially based on top of the Graal compiler. It's an academic project but I know a number of the people working on it (as I went to university there).

For GPU acceleration specifically then the Panama work to easily wrap C functions will help. Part of Panama is trying to teach the JVM about native memory in it's different kinds (e.g. native, non-volatile, etc), but I don't think there is an explicit GPU focus at the moment. Personally I'm looking forward to the combination of large off heap allocations along with simple calls out to BLAS from the main strand of Panama, coupled with the SIMD API (https://openjdk.java.net/jeps/338) to allow efficient usage of all the fancy AVX and SVE instructions available in modern CPUs. If you're willing to build OpenJDK from source it's possible to try that out today (or at least it will be soon once some complex merges have happened). The memory access stuff landed in the latest JDK 14 early access build - http://jdk.java.net/14/ - so you can try that out without having to compile anything.

I agree the Graal CUDA work is very interesting for accessing GPU workloads. I've been meaning to investigate it but haven't got around to it.

BTW the Tensorflow Java API has been spun out into a SIG, and is now under extremely active development (https://github.com/tensorflow/java), so you could consider that a target similar to the ONNX IR if you wanted. Full disclosure, I'm a member of the SIG. It might be a bit high level though. We do have model training working in Java without using Python to specify a computation graph, and eager mode is available in Java too. We're trying to specify a JVM ndarray library interface at the moment, with the hope that we could get buy in from other projects, as every Java ML project has it's own Tensor type, and none of them are compatible.

Peng Cheng · Answer 15 · Sun Dec 29 2019 02:27:55 GMT+0800 (China Standard Time)

@breandan welcome back!

Had a quick chat with Lan King from MXNet amazon team a while ago, it appears that they have prioritised this project:

https://djl.ai/

for optimisation on JVM, rather than salvaging existing MXNet inference API & architecture. Fortunately this new project still use MXNet as a backend. But is MXNet inference API a lost cause? I have no idea.

This new library (unfortunately) appears to delegate autograd to JNA interface, and further down to C libraries. Through an intricate code generator that creates an mirror image of C functions in Java:

https://github.com/awslabs/djl/tree/master/mxnet/jnarator

This doesn't stop us from imposing type/shape safety as an NDArray wrapper, in which case obviously most of your code that implements chain rule will be useless. But I doubt it fits into your vision. Do you see a strong reason to write chain rule in Kotlin?

Peng Cheng · Answer 16 · Sun Dec 29 2019 02:34:08 GMT+0800 (China Standard Time)

@Craigacp does Apache Arrow (a spin-off of Berkeley Ray) has GPU memory mapping?

BTW, I doubt if industry will move away from HotSpot JVM that fast.

breandan · Answer 17 · Mon Dec 30 2019 23:32:56 GMT+0800 (China Standard Time)

The question comes down to whether you want a type-safe AD wrapper much like Hasktorch, F# AI tools, DJL et al. or first-class AD support a la Swift for TensorFlow, TensorFlow.js and others. There are good arguments for wrapping a mature AD framework like TF or MXNet. There are also good arguments for implementing AD in the host language. Ideally, you want the utility of a mature ML framework, with the convenience and extensibility of a JVM-native implementation, which seems to be the idea behind TF4J. So where does that leave Kotlin∇?

The S4TF team wrote up a nice manifesto about upstreaming first-class language support for AD and why it is important. They introduce a dedicated datatype and various APIs into the stdlib, which seems radical but is not unprecedented. An alternative approach (dismissed for valid but not insurmountable reasons), is to write an embedded DSL. Kotlin∇ starts from the same place but takes the latter approach, with some desiderata. It would be nice to avoid the boxing ceremony for numerical types. Easy GPU access would be nice. Dependent types would be nice to have but you can kind of fake them with type-level integers.

A few of these desiderata appear forthcoming with the ongoing work in Valhalla and Panama, which is encouraging to see. Soon Java will support pattern matching and sealed types, which admits a pleasant way to implement algebraic reductions and transformations like Kotlin∇ does. It is already possible to write fluent interfaces, but operator overloading is really needed to port or use Kotlin∇ effectively. So it is feasible to implement an AD on the JVM. But ergonomics aside, why reinvent the wheel when we could just wrap or invoke an AD package directly?

Most mainstream AD packages are optimized for a narrow set of use cases, but there are many applications in scientific computing which deserve special attention. Our goal is not just to reimplement AD for the JVM, but to explore new ideas in shape-inference, automatic term rewriting and numerical stabilization. If other libraries adopt some of those strategies, great. If they want to use Kotlin∇, happy to discuss a roadmap and what the API might provide. If they prefer a foreign AD and Kotlin∇ becomes a research tool, that's okay too. As long as we can show some practical applications for ML/PL research, I'll continue to work on it!

Marco Hutter · Answer 18 · Tue Dec 31 2019 22:42:57 GMT+0800 (China Standard Time)

After having been mentioned in the #8 (comment) above, I had a short look at this issue. There are many resources and links in the previous comments, and I'm not (yet?) familiar with the goals and scope of this repo. A casual interest in data types, vector operations, visualization, flow-based programming and GPU computing* is definitely not enough - not to mention my lack of Kotlin knowledge.

But the 'mention' above referred to some sort of GPU support, so here are a few more pointers:

^{(And you'd probably be surprised how often my GitHub account for JOCL is notified just because someone is doing something @gpu ...)}

A while ago, I wrote an answer to 'Using Java with Nvidia GPU's (cuda)' on Stack Overflow. While the first part is more on a conceptual level, the second part summarizes some general resources for Java+GPU computing (certainly not up to date, though...)
Project Panama was already mentioned above, and there are early access builds at http://jdk.java.net/panama/ . I'm not entirely up to date here, but had some discussion with the maintainer of Panama, because they considered CUDA bindings as one interesting and challenging test case. I tried out the early access build to generate CUDA bindings, and the results are summarized at https://mail.openjdk.java.net/pipermail/panama-dev/2019-February/004443.html .

Panama or the issue of generating JNI bindings are probably too low-level to be interesting for kotlingrad. You'd probably rather like to use a nice, simple library to shove your data into the GPU. But the issue also talked about ~"code generation", so I'd like to emphasize the option of generating OpenCL code at runtime, and compiling it for the GPU at runtime:

I did some very basic experiment for generating OpenCL code from Groovy Closure code, and compiling it at runtime for OpenCL. The results are at http://jocl.org/GroovyGPU/ (This is now 8 years old, and again: It was only a VERY basic experiment...)
Aparapi from AMD is a hotter candidate here. It was also mentioned above. The basic approach here was to basically extract the bytecode of Java classes and translating it to OpenCL. As breandan already said, AMD basically abandoned Aparapi (so http://aparapi.github.io/ is not really up to date). But it already was a comparatively mature approach back then, and more importantly: It has been "forked", and others have maintained and extended it for a while, at https://github.com/Syncleus/aparapi - this may still be worth a look.

_{* The topics of data types, vector operations, visualization, flow-based programming and GPU computing are roughly represented my my repos https://github.com/javagl/Types , https://github.com/javagl/ND , https://github.com/javagl/Viewer/tree/master/viewer-functions , https://github.com/javagl/Flow , https://github.com/gpu/JOCL and https://github.com/jcuda , but of course, these are only pet projects and not even nearly in the same league as the resources that breandan linked to ...}