chrisvnicholson / kotlin-data-science-resources

Curation of libraries, media, links, and other resources to use Kotlin for data science

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Kotlin Data Science Resources

Libraries, media, links, and other resources to use Kotlin for data science applications

This document serves as an awesome-like curation of helpful links in using Kotlin for data science/data engineering/machine learning/optimization purposes. Please feel free to put in PR's with other links you find helpful.

Data Science is a broad, buzzwordy domain that seeks to gain insight from data. Arguably, optimization and operations research algorithms play a role in this space as well. While the incumbent programming tools in data science are R, Python, and even Scala, there is a large opportunity for Kotlin to enter this space. Kotlin can add value by closing the gap between data science and software engineering, and essentially finish what Scala started.

With Kotlin/Native on the horizon, the scope of this document will hopefully expand beyond the JVM.

Showcases

Open-source applications and proof-of-concepts demonstrating data science modeling with Kotlin.

Project Description
Kotlin Machine Learning Demos A simple demo with from-scratch Kotlin machine learning algorithms, as well as library implementations
Kotlin Math Cheatsheet How to turn mathematical symbol expressions into Kotlin code
Traveling Salesman Problem A visual Kotlin demo of the Traveling Salesman Problem
Customer Wait Time Simulator A simulation of customer wait time for a specified number of cashiers and rates of processing/arrival
Bayes Email Spam Filter A Kotlin proof-of-concept implementation of a spam filter
Bayes User Input Prediction A simple TornadoFX app that predicts user inputs using Naive Bayes text categorization
Sudokus and Schedules An article on using Kotlin to solve Sudokus and scheduling problems from scratch
Linear Regression Different algorithms for linear regression written and visualized with Kotlin/TornadoFX
Kotlin Simple Logistic Regression A logistic regression from scratch in Kotlin
Kotlin K-Means Clustering A simple K-means clustering of points
Classroom Scheduler A discrete programming model that schedules classes against one classroom
Sudoku Solver A showcase of constraint programming and discrete optimization
Driver Shift Optimizer A linear programming model using ojAlgo to minimze the cost of driver shifts in a day
Federated Learning - Building an Android ML App Showcase of an Android app recognizing images using DL4J
Kubed Map Visualization A U.S. heat map of unemployment rates

Kotlin Libraries

Library Name Category Description
Kotlin-Statistics Analytics Idiomatic statistical/analytical extension functions for Kotlin
KMath Math/Linear Algebra Kotlin mathematical library analogous to NumPy
okAlgo Optimization Kotlin extensions to ojAlgo
Data2Viz Charts Cross-platform charts and visuals for Kotlin
Sparklin Scaled Data Processing Kotlin framework for Apache Spark
Krangl Analytics dplyr-like data frame wrangling for Kotlin
Koma Computation Scientific library for Kotlin with interop/multiplatform capabilities
Komputation Deep Learning Neural network platform for Kotlin, primarily for text processing
KotlinNLP Natural Language Processing Natural Language Processing framework for Kotlin
TornadoFX UI, Charts Kotlin UI desktop app framework, built on top JavaFX
TornadoFX-ControlsFX UI ControlsFX extensions with more data views and controls for TornadoFX
Kotlin Jupyter Notebook Kotlin support for Jupyter
JINX Plugin Create Excel functions with Java/Scala/Kotlin instead of VBA
Kotlin Algorithm Algorithm Kotlin algorithm implementations

Java Libraries

Library Name Category Description
DeepLearning4J Deep Learning Deep learning library for Java
ND4J Computation Efficient matrix math library for JVM
TableSaw DataFrame Tabular data processing and manipulation
Joinery DataFrame Tabular data processing and manipulation
Kubed Visualization JavaFX-based, D3.js-like visualizations
Dex Charting Java-based data visualization tool
JSoup Data Wrangling HTML parsing library for Java
Smile ML and analytics Comprehensive machine learning, NLP, linear algebra, graph, interpolation, and visualization system
ojAlgo! LP and Optimization Helpful library for linear/mixed optimization and linear algebra
Apache Commons Math Math/Statistics/ML General math, statistics, and ML library for Java
Apache Commons IO IO IO Utilities
JBlas Linear Algebra Linear Algebra for Java
OptaPlanner Optimization Solver utility for optimization planning problems
Charts Charting Scientific JavaFX charting library in development
CoreNLP Natural Language Processing Natural language processing toolkit
Renjin Interop R JVM implementation
Apache Mahout Linear Algebra Distributed framework for regression, clustering and recommendation
Weka Data Mining Software Collection of machine learning algorithms for data mining tasks
Apache MXNet Deep Learning Framework Deep learning framework with a Java API (inference only)

Resources for Python Developers

If you already are proficient in Python but want to learn Kotlin and its potential on the data science domain.

Name Media Topic Description
From Data Science to Production with Kotlin (O'Reilly) Video Kotlin Trains Python data science professionals transitioning to Kotlin
Kotlin for Data Science (KotlinConf) Video Kotlin KotlinConf session explaining the merits of Kotlin for data science
Kotlin for Python Programmers Document Kotlin A thorough documentation of Kotlin for Python devs

Resources for Kotlin Developers

If you are a veteran JVM/Kotlin developer trying to break into the broad, buzzwordy domain of "data science".

Name Media Topic Description
Brandon Rohrer Blog ML Excellent videos and articles on machine learning topics
3Blue1Brown Video Math, ML, etc Excellent YouTube channel visually covering mathematical concepts, including linear algebra and neural networks
Thomas Nield Video ML, Optimization Thomas Nield's YouTube channel covering ML and optimization topics, all in Kotlin!
PatrickJMT Video Math, Linear Algebra Patrick is the most comprehensive and effective math tutor on YouTube
An Intuitive Introduction to Probability Online class Probability A short but excellent beginner class introducing probability and its applications
Discrete Optimization (Coursera) Online Class Optimization Deep dive class into search algorithms, optimizatoin, as well as linear/integer programming
No BS Guide to Math and Physics Book Math A helpful and unassuming book to learn high school/college math as well as calculus.
No BS Guide to Linear Algebra Book Math A helpful and unassuming book to learn linear algebra
Make Your Own Neural Network eBook ML A gentle introduction to neural networks
Grokking Deep Learning Book ML A thorough but practical "from scratch" approach to learning neural networks
Python for the Busy Java Developer Book Python Helpful resource for Java devs to learn Python quickly
Data Science with Java (O'Reilly) Book Data Science Teaches data science for Java developers
Mastering Java for Data Science (Packt) Book Data Science Data science for Java developers
Mastering Java Machine Learning (Packt) Book ML Machine learning for Java developers
Machine Learning for Absolute Beginners eBook ML Excellent eBook to get high level understanding of ML

FAQ

Q) I am a Java/Kotlin developer and I want to dive into "data science". Where do I start?

A) The funny thing about buzzwords is they do prompt people to be open to new ideas, but when you reach the actual problem-solving stage it can be difficult knowing where to start. This is because "data science" means something different to every individual, company, and industry. Therefore it is easy for newcomers to chase machine learning as a solution to their problems when what they really need is discrete optimization. Some data scientists like tinkering with data sets in Jupyter notebooks and Pandas DataFrames while software engineers find such approaches undisciplined and messy. Many data science book authors will tell you to become proficient in linear algebra, even though it seems pointless to learn without seeing real-world applications.

In my opinion, the best way to break into "data science" is to find problems with a nondeterministic nature that you would like to solve. Self-started projects like generating a classroom schedule or categorizing text are great ways to get insight and intuition pretty quickly. It also channels you into the areas of "data science" you will more likely be interested in.

I do think it is better to start learning optimization before diving into machine learning, not just because optimization is a key part of ML, but rather it solves a larger number of real-world problems on its own. Be sure to also develop a healthy curiosity for math and how it applies to the real-world, and YouTube channels like 3Blue1Brown do a wonderful job fostering this kind of curiosity.


Q) Why are you proposing using Kotlin for data science/machine learning purposes when most people are using Python?

A) Kotlin is a production-grade and statically typed JVM language that fills the much-needed void of a more "pragmatic Scala". While many people are being productive with Python, many organizations are finding Kotlin can help them be more productive. This is especially true when it comes to building entire production systems and not just models.

To give a more thorough answer, Roman Elizarov at JetBrains summarized it best in this post:

Most other “Python” deep learning frameworks are actually written in C++. Python is used just as a scripting language to “glue” various moving pieces together. Any other scripting language could be used instead, with the corresponding (relatively thin) bridge. Of course, the current momentum is on the Python side. If the past history on innovation and rise/fall of languages teaches us anything, the lesson we can learn is that it does not really matter who came first to the field and it does not even matter who is the leader now.

Kotlin is, without doubt, considerably more productive for any project of non-trivial size due to its static types and emphasis on toolability. Even at 10K+ Python LOCs you start to feel pains of a dynamic language. Python works nicely in slide-ware and in small code snippets of the kind you can put into iPython notebooks, where you can actually execute the code on your data and then enjoy code completion and integrated help on the actual, dynamically resolved object instances. As soon as you start writing the actual non-tirival code, abstracting it into modules, etc, it all starts to fail utterly – code completion and help becomes useless for any non-trivial framework even in state-of-the-art Python IDEs like PyCharm.

The first player in ML field that will realise that Python is roadblock to further scale will reap the benefits. All we can do in Kotlin team is to make sure that when this realisation comes, Kotlin is in good shape to serve as a viable alternative to be considered. - Roman Elizarov


Q) I'm a programmer but I'm bad at math. Is there any hope for me?

A) Absolutely. If you have successfully hacked your way with programming, you can hack your way through math. Granted it is helpful to have gone through the exposure of high school and college math, but chances are you forgot most of it anyways. The key is to learn math that is useful for the problems you are trying to solve, and to pair it effectively with your programming skills. It is also easier to learn as an adult when you have a purpose for it.

However, this can still be hard and it is important to not give into discouragement. Keep studying your roadblocks and use multiple resources to demystify what you are struggling with. When you cannot find an alternative explanation of a concept, it can help to study what is confusing you repeatedly until you start absorbing it.

On top of learning enough math to solve nondeterministic problems you are interested in, try to foster your general curiosity. Do not memorize like in high school/college but take time to understand why something is so. 3Blue1Brown on YouTube and the book No BS Guide to Math and Physics are great resources to do this. PatrickJMT does an excellent job showing how to apply mathematical concepts, and tThis article effectively summarizes all the areas of applied math in data science.


Q) How can Kotlin do any machine learning or data science tasks when it does not match the library catalogue of Python or R?

A) Read the rest of this document and you will be surprised what the JVM ecosystem already offers :)


Communities

Name Platform Description
PySlackers Slack A Slack community of Python developers and data science professionals.
Kotlin Slack Slack A Slack community of Kotlin developers. Join the #datascience channel

Blogs, Press, Media

Name Media Description
KotlinConf - Mathematical Modeling with Kotlin Conference Thomas Nield talks about optimization and ML with Kotlin
Kotlin Machine Learning and Optimization Video Thomas' demos and walkthroughs of different optimization and machine learning algorithms in Kotlin
KotlinConf - Data Science Workflows with Kotlin Conference Holger Brandl demonstrates Krangl workflows in Kotlin
KotlinConf- Kotlin for Data Science Conference Thomas Nield explains the merits of Kotlin on the data science domain
KotlinConf - Kscript Conference Holger Brandl covers kscript for data science workflows
Talking Kotlin - Data Science with Thomas Nield Podcast Thomas Nield explains the merits of Kotlin on the data science domain
Kotlin's Emerging Data Science Ecosystem Talk/Slides Holger Brandl gives an update on Kotlin's state as a data science platform

About

Curation of libraries, media, links, and other resources to use Kotlin for data science