mccorby / kotlin-data-science-resources

Curation of libraries, media, links, and other resources to use Kotlin for data science

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Kotlin Data Science Resources

Libraries, media, links, and other resources to use Kotlin for data science applications

This document serves as an awesome-like curation of helpful links in using Kotlin for data science/data engineering/machine learning/optimization purposes. Please feel free to put in PR's with other links you find helpful.

Data Science is a broad, buzzwordy domain that seeks to gain insight from data. Arguably, optimization and operations research algorithms play a role in this space as well. While the incumbent programming tools in data science are R, Python, and even Scala, there is a large opportunity for Kotlin to enter this space. Kotlin can add value by closing the gap between data science and software engineering, and essentially finish what Scala started.

With Kotlin/Native on the horizon, the scope of this document will hopefully expand beyond the JVM.

FAQ

Q) I am a Java/Kotlin developer and I want to dive into "data science". Where do I start?

A) The funny thing about buzzwords is they do prompt people to be open to new ideas, but when you reach the actual problem-solving stage it can be difficult knowing where to start. This is because "data science" means something different to every individual, company, and industry. Therefore it is easy for newcomers to chase machine learning as a solution to their problems when what they really need is discrete optimization. Some data scientists like tinkering with data sets in Jupyter notebooks and Pandas DataFrames while software engineers find such approaches undisciplined and messy. Many data science book authors will tell you to become proficient in linear algebra, even though it seems pointless to learn without seeing real-world applications.

In my opinion, the best way to break into "data science" is to find problems with a nondeterministic nature that you would like to solve. Self-started projects like generating a classroom schedule or categorizing text are great ways to get insight and intuition pretty quickly. It also channels you into the areas of "data science" you will more likely be interested in.

I do think it is better to start learning optimization before diving into machine learning, as optimization is a key part of ML. Be sure to also develop a healthy curiosity for math and how it applies to the real-world, and YouTube channels like 3Blue1Brown do a wonderful job fostering this kind of curiosity.



Q) Why are you proposing using Kotlin for data science/machine learning purposes when most people are using Python?

A) Kotlin is a production-grade and statically typed JVM language that fills the much-needed void of a more "pragmatic Scala". While many people are being productive with Python, many organizations are finding Kotlin can help them be more productive. This is especially true when it comes to building entire production systems and not just models.

To give a more thorough answer, Roman Elizarov at JetBrains summarized it best in this post:

Most other “Python” deep learning frameworks are actually written in C++. Python is used just as a scripting language to “glue” various moving pieces together. Any other scripting language could be used instead, with the corresponding (relatively thin) bridge. Of course, the current momentum is on the Python side. If the past history on innovation and rise/fall of languages teaches us anything, the lesson we can learn is that it does not really matter who came first to the field and it does not even matter who is the leader now.

Kotlin is, without doubt, considerably more productive for any project of non-trivial size due to its static types and emphasis on toolability. Even at 10K+ Python LOCs you start to feel pains of a dynamic language. Python works nicely in slide-ware and in small code snippets of the kind you can put into iPython notebooks, where you can actually execute the code on your data and then enjoy code completion and integrated help on the actual, dynamically resolved object instances. As soon as you start writing the actual non-tirival code, abstracting it into modules, etc, it all starts to fail utterly – code completion and help becomes useless for any non-trivial framework even in state-of-the-art Python IDEs like PyCharm.

The first player in ML field that will realise that Python is roadblock to further scale will reap the benefits. All we can do in Kotlin team is to make sure that when this realisation comes, Kotlin is in good shape to serve as a viable alternative to be considered. - Roman Elizarov



Q) How can Kotlin do any machine learning or data science tasks when it does not match the library catalogue of Python or R?

A) Read the rest of this document and you will be surprised what the JVM ecosystem already offers :)



Showcases

Open-source applications and proof-of-concepts demonstrating data science modeling with Kotlin.

Project Description
Federated Learning - Building an Android ML App Showcase of an Android app recognizing images using DL4J
Kotlin Math Cheatsheet How to turn mathematical symbol expressions into Kotlin code
Bayes Email Spam Filter A Kotlin proof-of-concept implementation of a spam filter
Bayes User Input Prediction A simple TornadoFX app that predicts user inputs using Naive Bayes text categorization
Linear Regression Different algorithms for linear regression written and visualized with Kotlin/TornadoFX
Classroom Scheduler A discrete programming model that schedules classes against one classroom
Sudoku Solver A showcase of constraint programming and discrete optimization
Traveling Salesman Problem A visual Kotlin demo of the Traveling Salesman Problem
Driver Shift Optimizer A linear programming model using ojAlgo to minimze the cost of driver shifts in a day
Kotlin Simple Neural Network A simple application built with a Kotlin-implemented neural network
Kubed Map Visualization A U.S. heat map of unemployment rates

Kotlin Libraries

Library Name Category Description
Kotlin-Statistics Analytics Idiomatic statistical/analytical extension functions for Kotlin
okAlgo Optimization Kotlin extensions to ojAlgo
Data2Viz Charts Cross-platform charts and visuals for Kotlin
Sparklin Scaled Data Processing Kotlin framework for Apache Spark
Krangl Analytics dplyr-like data frame wrangling for Kotlin
Koma Computation Scientific library for Kotlin with interop/multiplatform capabilities
Komputation Deep Learning Neural network platform for Kotlin, primarily for text processing
KotlinNLP Natural Language Processing Natural Language Processing framework for Kotlin
TornadoFX UI, Charts Kotlin UI desktop app framework, built on top JavaFX
TornadoFX-ControlsFX UI ControlsFX extensions with more data views and controls for TornadoFX
Kotlin Jupyter Notebook Kotlin support for Jupyter
JINX Plugin Create Excel functions with Java/Scala/Kotlin instead of VBA
Kotlin Algorithm Algorithm Kotlin algorithm implementations

Java Libraries

Library Name Category Description
DeepLearning4J Deep Learning Deep learning library for Java
ND4J Computation Efficient matrix math library for JVM
TableSaw DataFrame Tabular data processing and manipulation
Joinery DataFrame Tabular data processing and manipulation
Kubed Visualization JavaFX-based, D3.js-like visualizations
Dex Charting Java-based data visualization tool
JSoup Data Wrangling HTML parsing library for Java
Smile ML and analytics Comprehensive machine learning, NLP, linear algebra, graph, interpolation, and visualization system
ojAlgo! LP and Optimization Helpful library for linear/mixed optimization and linear algebra
Apache Commons Math Math/Statistics/ML General math, statistics, and ML library for Java
Apache Commons IO IO IO Utilities
JBlas Linear Algebra Linear Algebra for Java
OptaPlanner Optimization Solver utility for optimization planning problems
Charts Charting Scientific JavaFX charting library in development
CoreNLP Natural Language Processing Natural language processing toolkit
Renjin Interop R JVM implementation
Apache Mahout Linear Algebra Distributed framework for regression, clustering and recommendation
Weka Data Mining Software Collection of machine learning algorithms for data mining tasks

Resources for Python Developers

If you already are proficient in Python but want to learn Kotlin and its potential on the data science domain.

Name Media Topic Description
From Data Science to Production with Kotlin (O'Reilly) Video Kotlin Trains Python data science professionals transitioning to Kotlin
Kotlin for Data Science (KotlinConf) Video Kotlin KotlinConf session explaining the merits of Kotlin for data science
Kotlin for Python Programmers Document Kotlin A thorough documentation of Kotlin for Python devs

Resources for Kotlin Developers

If you are a veteran JVM/Kotlin developer trying to break into the broad, buzzwordy domain of "data science".

Name Media Topic Description
Brandon Rohrer Blog ML Excellent videos and articles on machine learning topics
3Blue1Brown Video Math, ML, etc Excellent YouTube channel visually covering mathematical concepts, including linear algebra and neural networks
Thomas Nield Video ML, Optimization Thomas Nield's YouTube channel covering ML and optimization topics, all in Kotlin!
Discrete Optimization (Coursera) Online Class Optimization Deep dive class into search algorithms, optimizatoin, as well as linear/integer programming
Make Your Own Neural Network eBook ML The best practical guide on neural networks I've found
Python for the Busy Java Developer eBook Python Helpful resource for Java devs to learn Python quickly
Data Science with Java (O'Reilly) Book Data Science Teaches data science for Java developers
Mastering Java for Data Science (Packt) Book Data Science Data science for Java developers
Mastering Java Machine Learning (Packt) Book ML Machine learning for Java developers
Machine Learning for Absolute Beginners eBook ML Excellent eBook to get high level understanding of ML

Communities

Name Platform Description
PySlackers Slack A Slack community of Python developers and data science professionals.
Kotlin Slack Slack A Slack community of Kotlin developers. Join the #datascience channel

Blogs, Press, Media

Name Media Description
KotlinConf - Mathematical Modeling with Kotlin Conference Thomas Nield talks about optimization and ML with Kotlin
Kotlin Machine Learning and Optimization Video Thomas' demos and walkthroughs of different optimization and machine learning algorithms in Kotlin
KotlinConf - Data Science Workflows with Kotlin Conference Holger Brandl demonstrates Krangl workflows in Kotlin
KotlinConf- Kotlin for Data Science Conference Thomas Nield explains the merits of Kotlin on the data science domain
KotlinConf - Kscript Conference Holger Brandl covers kscript for data science workflows
Talking Kotlin - Data Science with Thomas Nield Podcast Thomas Nield explains the merits of Kotlin on the data science domain
Kotlin's Emerging Data Science Ecosystem Talk/Slides Holger Brandl gives an update on Kotlin's state as a data science platform

About

Curation of libraries, media, links, and other resources to use Kotlin for data science