Kotlin Data Science Resources
Libraries, media, links, and other resources to use Kotlin for data science applications
This document serves as an awesome-like curation of helpful links in using Kotlin for data science/data engineering/machine learning/optimization purposes. Please feel free to put in PR's with other links you find helpful.
Data Science is a broad, buzzwordy domain that seeks to gain insight from data. Arguably, optimization and operations research algorithms play a role in this space as well. While the incumbent programming tools in data science are R, Python, and even Scala, there is a large opportunity for Kotlin to enter this space. Kotlin can add value by closing the gap between data science and software engineering, and essentially finish what Scala started.
With Kotlin/Native on the horizon, the scope of this document will hopefully expand beyond the JVM.
FAQ
Q) I am a Java/Kotlin developer and I want to dive into "data science". Where do I start?
A) The funny thing about buzzwords is they do prompt people to be open to new ideas, but when you reach the actual problem-solving stage it can be difficult knowing where to start. This is because "data science" means something different to every individual, company, and industry. Therefore it is easy for newcomers to chase machine learning as a solution to their problems when what they really need is discrete optimization. Some data scientists like tinkering with data sets in Jupyter notebooks and Pandas DataFrames while software engineers find such approaches undisciplined and messy. Many data science book authors will tell you to become proficient in linear algebra, even though it seems pointless to learn without seeing real-world applications.
In my opinion, the best way to break into "data science" is to find problems with a nondeterministic nature that you would like to solve. Self-started projects like generating a classroom schedule or categorizing text are great ways to get insight and intuition pretty quickly. It also channels you into the areas of "data science" you will more likely be interested in.
I do think it is better to start learning optimization before diving into machine learning, as optimization is a key part of ML. Be sure to also develop a healthy curiosity for math and how it applies to the real-world, and YouTube channels like 3Blue1Brown do a wonderful job fostering this kind of curiosity.
Q) Why are you proposing using Kotlin for data science/machine learning purposes when most people are using Python?
A) Kotlin is a production-grade and statically typed JVM language that fills the much-needed void of a more "pragmatic Scala". While many people are being productive with Python, many organizations are finding Kotlin can help them be more productive. This is especially true when it comes to building entire production systems and not just models.
To give a more thorough answer, Roman Elizarov at JetBrains summarized it best in this post:
Most other “Python” deep learning frameworks are actually written in C++. Python is used just as a scripting language to “glue” various moving pieces together. Any other scripting language could be used instead, with the corresponding (relatively thin) bridge. Of course, the current momentum is on the Python side. If the past history on innovation and rise/fall of languages teaches us anything, the lesson we can learn is that it does not really matter who came first to the field and it does not even matter who is the leader now.
Kotlin is, without doubt, considerably more productive for any project of non-trivial size due to its static types and emphasis on toolability. Even at 10K+ Python LOCs you start to feel pains of a dynamic language. Python works nicely in slide-ware and in small code snippets of the kind you can put into iPython notebooks, where you can actually execute the code on your data and then enjoy code completion and integrated help on the actual, dynamically resolved object instances. As soon as you start writing the actual non-tirival code, abstracting it into modules, etc, it all starts to fail utterly – code completion and help becomes useless for any non-trivial framework even in state-of-the-art Python IDEs like PyCharm.
The first player in ML field that will realise that Python is roadblock to further scale will reap the benefits. All we can do in Kotlin team is to make sure that when this realisation comes, Kotlin is in good shape to serve as a viable alternative to be considered. - Roman Elizarov
Q) How can Kotlin do any machine learning or data science tasks when it does not match the library catalogue of Python or R?
A) Read the rest of this document and you will be surprised what the JVM ecosystem already offers :)
Showcases
Open-source applications and proof-of-concepts demonstrating data science modeling with Kotlin.
Project | Description |
---|---|
Federated Learning - Building an Android ML App | Showcase of an Android app recognizing images using DL4J |
Kotlin Math Cheatsheet | How to turn mathematical symbol expressions into Kotlin code |
Bayes Email Spam Filter | A Kotlin proof-of-concept implementation of a spam filter |
Bayes User Input Prediction | A simple TornadoFX app that predicts user inputs using Naive Bayes text categorization |
Linear Regression | Different algorithms for linear regression written and visualized with Kotlin/TornadoFX |
Classroom Scheduler | A discrete programming model that schedules classes against one classroom |
Sudoku Solver | A showcase of constraint programming and discrete optimization |
Traveling Salesman Problem | A visual Kotlin demo of the Traveling Salesman Problem |
Driver Shift Optimizer | A linear programming model using ojAlgo to minimze the cost of driver shifts in a day |
Kotlin Simple Neural Network | A simple application built with a Kotlin-implemented neural network |
Kubed Map Visualization | A U.S. heat map of unemployment rates |
Kotlin Libraries
Library Name | Category | Description |
---|---|---|
Kotlin-Statistics | Analytics | Idiomatic statistical/analytical extension functions for Kotlin |
okAlgo | Optimization | Kotlin extensions to ojAlgo |
Data2Viz | Charts | Cross-platform charts and visuals for Kotlin |
Sparklin | Scaled Data Processing | Kotlin framework for Apache Spark |
Krangl | Analytics | dplyr-like data frame wrangling for Kotlin |
Koma | Computation | Scientific library for Kotlin with interop/multiplatform capabilities |
Komputation | Deep Learning | Neural network platform for Kotlin, primarily for text processing |
KotlinNLP | Natural Language Processing | Natural Language Processing framework for Kotlin |
TornadoFX | UI, Charts | Kotlin UI desktop app framework, built on top JavaFX |
TornadoFX-ControlsFX | UI | ControlsFX extensions with more data views and controls for TornadoFX |
Kotlin Jupyter | Notebook | Kotlin support for Jupyter |
JINX | Plugin | Create Excel functions with Java/Scala/Kotlin instead of VBA |
Kotlin Algorithm | Algorithm | Kotlin algorithm implementations |
Java Libraries
Library Name | Category | Description |
---|---|---|
DeepLearning4J | Deep Learning | Deep learning library for Java |
ND4J | Computation | Efficient matrix math library for JVM |
TableSaw | DataFrame | Tabular data processing and manipulation |
Joinery | DataFrame | Tabular data processing and manipulation |
Kubed | Visualization | JavaFX-based, D3.js-like visualizations |
Dex | Charting | Java-based data visualization tool |
JSoup | Data Wrangling | HTML parsing library for Java |
Smile | ML and analytics | Comprehensive machine learning, NLP, linear algebra, graph, interpolation, and visualization system |
ojAlgo! | LP and Optimization | Helpful library for linear/mixed optimization and linear algebra |
Apache Commons Math | Math/Statistics/ML | General math, statistics, and ML library for Java |
Apache Commons IO | IO | IO Utilities |
JBlas | Linear Algebra | Linear Algebra for Java |
OptaPlanner | Optimization | Solver utility for optimization planning problems |
Charts | Charting | Scientific JavaFX charting library in development |
CoreNLP | Natural Language Processing | Natural language processing toolkit |
Renjin | Interop | R JVM implementation |
Apache Mahout | Linear Algebra | Distributed framework for regression, clustering and recommendation |
Weka | Data Mining Software | Collection of machine learning algorithms for data mining tasks |
Resources for Python Developers
If you already are proficient in Python but want to learn Kotlin and its potential on the data science domain.
Name | Media | Topic | Description |
---|---|---|---|
From Data Science to Production with Kotlin (O'Reilly) | Video | Kotlin | Trains Python data science professionals transitioning to Kotlin |
Kotlin for Data Science (KotlinConf) | Video | Kotlin | KotlinConf session explaining the merits of Kotlin for data science |
Kotlin for Python Programmers | Document | Kotlin | A thorough documentation of Kotlin for Python devs |
Resources for Kotlin Developers
If you are a veteran JVM/Kotlin developer trying to break into the broad, buzzwordy domain of "data science".
Name | Media | Topic | Description |
---|---|---|---|
Brandon Rohrer | Blog | ML | Excellent videos and articles on machine learning topics |
3Blue1Brown | Video | Math, ML, etc | Excellent YouTube channel visually covering mathematical concepts, including linear algebra and neural networks |
Thomas Nield | Video | ML, Optimization | Thomas Nield's YouTube channel covering ML and optimization topics, all in Kotlin! |
Discrete Optimization (Coursera) | Online Class | Optimization | Deep dive class into search algorithms, optimizatoin, as well as linear/integer programming |
Make Your Own Neural Network | eBook | ML | The best practical guide on neural networks I've found |
Python for the Busy Java Developer | eBook | Python | Helpful resource for Java devs to learn Python quickly |
Data Science with Java (O'Reilly) | Book | Data Science | Teaches data science for Java developers |
Mastering Java for Data Science (Packt) | Book | Data Science | Data science for Java developers |
Mastering Java Machine Learning (Packt) | Book | ML | Machine learning for Java developers |
Machine Learning for Absolute Beginners | eBook | ML | Excellent eBook to get high level understanding of ML |
Communities
Name | Platform | Description |
---|---|---|
PySlackers | Slack | A Slack community of Python developers and data science professionals. |
Kotlin Slack | Slack | A Slack community of Kotlin developers. Join the #datascience channel |
Blogs, Press, Media
Name | Media | Description |
---|---|---|
KotlinConf - Mathematical Modeling with Kotlin | Conference | Thomas Nield talks about optimization and ML with Kotlin |
Kotlin Machine Learning and Optimization | Video | Thomas' demos and walkthroughs of different optimization and machine learning algorithms in Kotlin |
KotlinConf - Data Science Workflows with Kotlin | Conference | Holger Brandl demonstrates Krangl workflows in Kotlin |
KotlinConf- Kotlin for Data Science | Conference | Thomas Nield explains the merits of Kotlin on the data science domain |
KotlinConf - Kscript | Conference | Holger Brandl covers kscript for data science workflows |
Talking Kotlin - Data Science with Thomas Nield | Podcast | Thomas Nield explains the merits of Kotlin on the data science domain |
Kotlin's Emerging Data Science Ecosystem | Talk/Slides | Holger Brandl gives an update on Kotlin's state as a data science platform |