Keiji Yoshida's starred repositories
tensorflow
An Open Source Machine Learning Framework for Everyone
scikit-learn
scikit-learn: machine learning in Python
learning-spark
Example code from Learning Spark book
LearningSparkV2
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
tech-talks
This repository contains the notebooks and presentations we use for our Databricks Tech Talks
databricks-cli
(Legacy) Command Line Interface for Databricks
spark-knowledgebase
Spark Knowledge Base
dotaconstants
Constant data for Dota applications
mlflow-example
An example MLflow project
healthcare-data-harmonization
This is an engine that converts data of one structure to another, based on a configuration file which describes how. There is an accompanying syntax to make writing mappings easier and more robust.
bigquery-data-lineage
Reference implementation for real-time Data Lineage tracking for BigQuery using Audit Logs, ZetaSQL and Dataflow.
datashare-toolkit
DIY commercial datasets on Google Cloud Platform
oozie-to-airflow
Oozie Workflow to Airflow DAGs migration tool
datacatalog-connectors-rdbms
Sample code with integration between Data Catalog and RDBMS data sources.
hydrator-plugins
Cask Hydrator Plugins Repository
redis-dataflow-realtime-analytics
Build a real-time website analytics dashboard on GCP using Dataflow, Cloud Memorystore (Redis) and Spring Boot
kafka-plugins
Kafka Source/Sink for reading/writing to kafka topic
WoWAH-parser
Reads the WoWAH files into one big csv.