nekomatic / kotlin-data-science-resources

Curation of libraries, media, links, and other resources to use Kotlin for data science

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Kotlin Data Science Resources

Libraries, media, links, and other resources to use Kotlin for data science applications

This document serves as an awesome-like curation of helpful links in using Kotlin for data science/data engineering/machine learning/optimization purposes. Please feel free to put in PR's with other links you find helpful.

Data Science is a broad, buzzwordy domain that seeks to gain insight from data. Arguably, optimization and operations research algorithms play a role in this space as well. While the incumbent programming tools in data science are R, Python, and even Scala, there is a large opportunity for Kotlin to enter this space. Kotlin can add value by closing the gap between data science and software engineering, and essentially finish what Scala started.

With Kotlin/Native on the horizon, the scope of this document will hopefully expand beyond the JVM.

Showcases

Open-source applications and proof-of-concepts demonstrating data science modeling with Kotlin.

Project Description
Kotlin Math Cheatsheet How to turn mathematical symbol expressions into Kotlin code
Bayes Email Spam Filter A Kotlin proof-of-concept implementation of a spam filter
Bayes User Input Prediction A simple TornadoFX app that predicts user inputs using Naive Bayes text categorization
Classroom Scheduler A discrete programming model that schedules classes against one classroom
Sudoku Solver A showcase of constraint programming and discrete optimization
Traveling Salesman Problem A visual Kotlin demo of the Traveling Salesman Problem
Driver Shift Optimizer A linear programming model using ojAlgo to minimze the cost of driver shifts in a day
Kotlin Simple Neural Network A simple application built with a Kotlin-implemented neural network
Kubed Map Visualization A U.S. heat map of unemployment rates

Kotlin Libraries

Library Name Category Description
Kotlin-Statistics Analytics Idiomatic statistical/analytical extension functions for Kotlin
okAlgo Optimization Kotlin extensions to ojAlgo
Data2Viz Charts Cross-platform charts and visuals for Kotlin
Sparklin Scaled Data Processing Kotlin framework for Apache Spark
Krangl Analytics dplyr-like data frame wrangling for Kotlin
Koma Computation Scientific library for Kotlin with interop/multiplatform capabilities
Komputation Deep Learning Neural network platform for Kotlin, primarily for text processing
KotlinNLP Natural Language Processing Natural Language Processing framework for Kotlin
TornadoFX UI, Charts Kotlin UI desktop app framework, built on top JavaFX
TornadoFX-ControlsFX UI ControlsFX extensions with more data views and controls for TornadoFX
Kotlin Jupyter Notebook Kotlin support for Jupyter
JINX Plugin Create Excel functions with Java/Scala/Kotlin instead of VBA
Kotlin Algorithm Algorithm Kotlin algorithm implementations

Java Libraries

Library Name Category Description
TensorFlow ML Machine Learning library from Google
ND4J Computation Efficient matrix math library for JVM
TableSaw DataFrame Tabular data processing and manipulation
Joinery DataFrame Tabular data processing and manipulation
Kubed Visualization JavaFX-based, D3.js-like visualizations
Dex Charting Java-based data visualization tool
JSoup Data Wrangling HTML parsing library for Java
Smile ML and analytics Comprehensive machine learning, NLP, linear algebra, graph, interpolation, and visualization system
ojAlgo! LP and Optimization Helpful library for linear/mixed optimization and linear algebra
DeepLearning4J Deep Learning Deep learning library for Java
Apache Commons Math Math/Statistics/ML General math, statistics, and ML library for Java
Java Naive Bayes Classifier Math/Statistics Naive Bayes classifier for Java
Apache Commons IO IO IO Utilities
JBlas Linear Algebra Linear Algebra for Java
OptaPlanner Optimization Solver utility for optimization planning problems
Charts Charting Scientific JavaFX charting library in development
CoreNLP Natural Language Processing Natural language processing toolkit
Renjin Interop R JVM implementation
Apache Mahout Linear Algebra Distributed framework for regression, clustering and recommendation
Weka Data Mining Software Collection of machine learning algorithms for data mining tasks

Resources for Python Developers

If you already are proficient in Python but want to learn Kotlin and its potential on the data science domain.

Name Media Topic Description
From Data Science to Production with Kotlin (O'Reilly) Video Kotlin Trains Python data science professionals transitioning to Kotlin
Kotlin for Data Science (KotlinConf) Video Kotlin KotlinConf session explaining the merits of Kotlin for data science
Kotlin for Python Programmers Blog Kotlin Blog relating Kotlin concepts to a Pythonista audience

Resources for Kotlin Developers

If you are a veteran JVM/Kotlin developer trying to break into the broad, buzzwordy domain of "data science".

Name Media Topic Description
Brandon Rohrer Blog ML Excellent videos and articles on machine learning topics
3Blue1Brown Video Math, ML, etc Excellent YouTube channel visually covering mathematical concepts, including neural networks
Make Your Own Neural Network eBook ML The best practical guide on neural networks I've found
Python for the Busy Java Developer eBook Python Helpful resource for Java devs to learn Python quickly
Data Science with Java (O'Reilly) Book Data Science Teaches data science for Java developers
Mastering Java for Data Science (Packt) Book Data Science Data science for Java developers
Mastering Java Machine Learning (Packt) Book ML Machine learning for Java developers
Discrete Optimization (Coursera) Online Class Optimization Deep dive class into linear/integer programming and optimization
Machine Learning for Absolute Beginners eBook ML Excellent eBook to get high level understanding of ML
Model Building in Mathematical Programming Book Optimization Covers linear/integer programming particularly for optimization problems
AMPL eBook Optimization Free eBook covering linear/integer programming

Roadmap

For Kotlin to become a mainstream data science platform on par with Python and R, there is still some work to do. This will depend heavily on you, the community, to help fill these gaps.

  • Kotlin extensions to existing Java libs are an easy contribution opportunity (e.g. Kotlin-Statisitcs and Sparklin)
  • Machine Learning- More robust machine learning libraries/API's need to be integrated with Kotlin (e.g. SMILE)
  • Implement ML algorithms in Kotlin Algorithm project
  • Kotlin/Native - Explore bindings against Python C libraries
  • Kotlin/Native - Need a NumPy-like library, ojAlgo-like solvers a plus
  • Jupyter support- There is a Jupyter plugin for Kotlin that needs development

Communities

Name Platform Description
PySlackers Slack A Slack community of Python developers and data science professionals.
Kotlin Slack Slack A Slack community of Kotlin developers. Join the #datascience channel

Blogs, Press, Media

Name Media Description
Traveling Salesman Problem in Kotlin (Walkthrough) Video A walkthrough of the Traveling Salesman Problem, a foundational problem in optimization and mathematical modeling
KotlinConf- Kotlin for Data Science Conference Thomas Nield explains the merits of Kotlin on the data science domain
KotlinConf - Kscript Conference Holger Brandl covers kscript for data science workflows
Talking Kotlin - Data Science with Thomas Nield Podcast Thomas Nield explains the merits of Kotlin on the data science domain
Kotlin's Emerging Data Science Ecosystem Talk/Slides Holger Brandl gives an update on Kotlin's state as a data science platform

About

Curation of libraries, media, links, and other resources to use Kotlin for data science