This document serves as an awesome-like curation of helpful links in using Kotlin for data science/data engineering/machine learning/optimization purposes. Please feel free to put in PR's with other links you find helpful.
Data Science is a broad, buzzwordy domain that seeks to gain insight from data. Arguably, optimization and operations research algorithms play a role in this space as well. While the incumbent programming tools in data science are R, Python, and even Scala, there is a large opportunity for Kotlin to enter this space. Kotlin can add value by closing the gap between data science and software engineering, and essentially finish what Scala started.
With Kotlin/Native on the horizon, the scope of this document will hopefully expand beyond the JVM.
Open-source applications and proof-of-concepts demonstrating data science modeling with Kotlin.
Project | Description |
---|---|
Kotlin Machine Learning Demos | A simple demo with from-scratch Kotlin machine learning algorithms, as well as library implementations |
Kotlin Math Cheatsheet | How to turn mathematical symbol expressions into Kotlin code |
Traveling Salesman Problem | A visual Kotlin demo of the Traveling Salesman Problem |
Customer Wait Time Simulator | A simulation of customer wait time for a specified number of cashiers and rates of processing/arrival |
Bayes Email Spam Filter | A Kotlin proof-of-concept implementation of a spam filter |
Bayes User Input Prediction | A simple TornadoFX app that predicts user inputs using Naive Bayes text categorization |
Sudokus and Schedules | An article on using Kotlin to solve Sudokus and scheduling problems from scratch |
Linear Regression | Different algorithms for linear regression written and visualized with Kotlin/TornadoFX |
Kotlin Simple Logistic Regression | A logistic regression from scratch in Kotlin |
Kotlin K-Means Clustering | A simple K-means clustering of points |
Classroom Scheduler | A discrete programming model that schedules classes against one classroom |
Sudoku Solver | A showcase of constraint programming and discrete optimization |
Driver Shift Optimizer | A linear programming model using ojAlgo to minimze the cost of driver shifts in a day |
Federated Learning - Building an Android ML App | Showcase of an Android app recognizing images using DL4J |
Kubed Map Visualization | A U.S. heat map of unemployment rates |
Library Name | Category | Description |
---|---|---|
Kotlin-Statistics | Analytics | Idiomatic statistical/analytical extension functions for Kotlin |
KMath | Math/Linear Algebra | Kotlin mathematical library analogous to NumPy |
okAlgo | Optimization | Kotlin extensions to ojAlgo |
Data2Viz | Charts | Cross-platform charts and visuals for Kotlin |
Sparklin | Scaled Data Processing | Kotlin framework for Apache Spark |
Krangl | Analytics | dplyr-like data frame wrangling for Kotlin |
Koma | Computation | Scientific library for Kotlin with interop/multiplatform capabilities |
Komputation | Deep Learning | Neural network platform for Kotlin, primarily for text processing |
KotlinNLP | Natural Language Processing | Natural Language Processing framework for Kotlin |
TornadoFX | UI, Charts | Kotlin UI desktop app framework, built on top JavaFX |
TornadoFX-ControlsFX | UI | ControlsFX extensions with more data views and controls for TornadoFX |
Kotlin Jupyter | Notebook | Kotlin support for Jupyter |
JINX | Plugin | Create Excel functions with Java/Scala/Kotlin instead of VBA |
Kotlin Algorithm | Algorithm | Kotlin algorithm implementations |
Library Name | Category | Description |
---|---|---|
DeepLearning4J | Deep Learning | Deep learning library for Java |
ND4J | Computation | Efficient matrix math library for JVM |
TableSaw | DataFrame | Tabular data processing and manipulation |
Joinery | DataFrame | Tabular data processing and manipulation |
Kubed | Visualization | JavaFX-based, D3.js-like visualizations |
Dex | Charting | Java-based data visualization tool |
JSoup | Data Wrangling | HTML parsing library for Java |
Smile | ML and analytics | Comprehensive machine learning, NLP, linear algebra, graph, interpolation, and visualization system |
ojAlgo! | LP and Optimization | Helpful library for linear/mixed optimization and linear algebra |
Apache Commons Math | Math/Statistics/ML | General math, statistics, and ML library for Java |
Apache Commons IO | IO | IO Utilities |
JBlas | Linear Algebra | Linear Algebra for Java |
OptaPlanner | Optimization | Solver utility for optimization planning problems |
Charts | Charting | Scientific JavaFX charting library in development |
CoreNLP | Natural Language Processing | Natural language processing toolkit |
Renjin | Interop | R JVM implementation |
Apache Mahout | Linear Algebra | Distributed framework for regression, clustering and recommendation |
Weka | Data Mining Software | Collection of machine learning algorithms for data mining tasks |
Apache MXNet | Deep Learning Framework | Deep learning framework with a Java API (inference only) |
If you already are proficient in Python but want to learn Kotlin and its potential on the data science domain.
Name | Media | Topic | Description |
---|---|---|---|
From Data Science to Production with Kotlin (O'Reilly) | Video | Kotlin | Trains Python data science professionals transitioning to Kotlin |
Kotlin for Data Science (KotlinConf) | Video | Kotlin | KotlinConf session explaining the merits of Kotlin for data science |
Kotlin for Python Programmers | Document | Kotlin | A thorough documentation of Kotlin for Python devs |
If you are a veteran JVM/Kotlin developer trying to break into the broad, buzzwordy domain of "data science".
Name | Media | Topic | Description |
---|---|---|---|
Brandon Rohrer | Blog | ML | Excellent videos and articles on machine learning topics |
3Blue1Brown | Video | Math, ML, etc | Excellent YouTube channel visually covering mathematical concepts, including linear algebra and neural networks |
Thomas Nield | Video | ML, Optimization | Thomas Nield's YouTube channel covering ML and optimization topics, all in Kotlin! |
PatrickJMT | Video | Math, Linear Algebra | Patrick is the most comprehensive and effective math tutor on YouTube |
An Intuitive Introduction to Probability | Online class | Probability | A short but excellent beginner class introducing probability and its applications |
Discrete Optimization (Coursera) | Online Class | Optimization | Deep dive class into search algorithms, optimizatoin, as well as linear/integer programming |
No BS Guide to Math and Physics | Book | Math | A helpful and unassuming book to learn high school/college math as well as calculus. |
No BS Guide to Linear Algebra | Book | Math | A helpful and unassuming book to learn linear algebra |
Make Your Own Neural Network | eBook | ML | A gentle introduction to neural networks |
Grokking Deep Learning | Book | ML | A thorough but practical "from scratch" approach to learning neural networks |
Python for the Busy Java Developer | Book | Python | Helpful resource for Java devs to learn Python quickly |
Data Science with Java (O'Reilly) | Book | Data Science | Teaches data science for Java developers |
Mastering Java for Data Science (Packt) | Book | Data Science | Data science for Java developers |
Mastering Java Machine Learning (Packt) | Book | ML | Machine learning for Java developers |
Machine Learning for Absolute Beginners | eBook | ML | Excellent eBook to get high level understanding of ML |
Q) I am a Java/Kotlin developer and I want to dive into "data science". Where do I start?
A) The funny thing about buzzwords is they do prompt people to be open to new ideas, but when you reach the actual problem-solving stage it can be difficult knowing where to start. This is because "data science" means something different to every individual, company, and industry. Therefore it is easy for newcomers to chase machine learning as a solution to their problems when what they really need is discrete optimization. Some data scientists like tinkering with data sets in Jupyter notebooks and Pandas DataFrames while software engineers find such approaches undisciplined and messy. Many data science book authors will tell you to become proficient in linear algebra, even though it seems pointless to learn without seeing real-world applications.
In my opinion, the best way to break into "data science" is to find problems with a nondeterministic nature that you would like to solve. Self-started projects like generating a classroom schedule or categorizing text are great ways to get insight and intuition pretty quickly. It also channels you into the areas of "data science" you will more likely be interested in.
I do think it is better to start learning optimization before diving into machine learning, not just because optimization is a key part of ML, but rather it solves a larger number of real-world problems on its own. Be sure to also develop a healthy curiosity for math and how it applies to the real-world, and YouTube channels like 3Blue1Brown do a wonderful job fostering this kind of curiosity.
Q) Why are you proposing using Kotlin for data science/machine learning purposes when most people are using Python?
A) Kotlin is a production-grade and statically typed JVM language that fills the much-needed void of a more "pragmatic Scala". While many people are being productive with Python, many organizations are finding Kotlin can help them be more productive. This is especially true when it comes to building entire production systems and not just models.
To give a more thorough answer, Roman Elizarov at JetBrains summarized it best in this post:
Most other “Python” deep learning frameworks are actually written in C++. Python is used just as a scripting language to “glue” various moving pieces together. Any other scripting language could be used instead, with the corresponding (relatively thin) bridge. Of course, the current momentum is on the Python side. If the past history on innovation and rise/fall of languages teaches us anything, the lesson we can learn is that it does not really matter who came first to the field and it does not even matter who is the leader now.
Kotlin is, without doubt, considerably more productive for any project of non-trivial size due to its static types and emphasis on toolability. Even at 10K+ Python LOCs you start to feel pains of a dynamic language. Python works nicely in slide-ware and in small code snippets of the kind you can put into iPython notebooks, where you can actually execute the code on your data and then enjoy code completion and integrated help on the actual, dynamically resolved object instances. As soon as you start writing the actual non-tirival code, abstracting it into modules, etc, it all starts to fail utterly – code completion and help becomes useless for any non-trivial framework even in state-of-the-art Python IDEs like PyCharm.
The first player in ML field that will realise that Python is roadblock to further scale will reap the benefits. All we can do in Kotlin team is to make sure that when this realisation comes, Kotlin is in good shape to serve as a viable alternative to be considered. - Roman Elizarov
Q) I'm a programmer but I'm bad at math. Is there any hope for me?
A) Absolutely. If you have successfully hacked your way with programming, you can hack your way through math. Granted it is helpful to have gone through the exposure of high school and college math, but chances are you forgot most of it anyways. The key is to learn math that is useful for the problems you are trying to solve, and to pair it effectively with your programming skills. It is also easier to learn as an adult when you have a purpose for it.
However, this can still be hard and it is important to not give into discouragement. Keep studying your roadblocks and use multiple resources to demystify what you are struggling with. When you cannot find an alternative explanation of a concept, it can help to study what is confusing you repeatedly until you start absorbing it.
On top of learning enough math to solve nondeterministic problems you are interested in, try to foster your general curiosity. Do not memorize like in high school/college but take time to understand why something is so. 3Blue1Brown on YouTube and the book No BS Guide to Math and Physics are great resources to do this. PatrickJMT does an excellent job showing how to apply mathematical concepts, and tThis article effectively summarizes all the areas of applied math in data science.
Q) How can Kotlin do any machine learning or data science tasks when it does not match the library catalogue of Python or R?
A) Read the rest of this document and you will be surprised what the JVM ecosystem already offers :)
Name | Platform | Description |
---|---|---|
PySlackers | Slack | A Slack community of Python developers and data science professionals. |
Kotlin Slack | Slack | A Slack community of Kotlin developers. Join the #datascience channel |
Name | Media | Description |
---|---|---|
KotlinConf - Mathematical Modeling with Kotlin | Conference | Thomas Nield talks about optimization and ML with Kotlin |
Kotlin Machine Learning and Optimization | Video | Thomas' demos and walkthroughs of different optimization and machine learning algorithms in Kotlin |
KotlinConf - Data Science Workflows with Kotlin | Conference | Holger Brandl demonstrates Krangl workflows in Kotlin |
KotlinConf- Kotlin for Data Science | Conference | Thomas Nield explains the merits of Kotlin on the data science domain |
KotlinConf - Kscript | Conference | Holger Brandl covers kscript for data science workflows |
Talking Kotlin - Data Science with Thomas Nield | Podcast | Thomas Nield explains the merits of Kotlin on the data science domain |
Kotlin's Emerging Data Science Ecosystem | Talk/Slides | Holger Brandl gives an update on Kotlin's state as a data science platform |