youngbink

followers

following

stars

@Databricks

Organizations

castorini

databricks

dsg-uwaterloo

Youngbin Kim's repositories

aws-glue-data-catalog-client-for-apache-hive-metastore

The AWS Glue Data Catalog is a fully managed, Apache Hive Metastore compatible, metadata repository. Customers can use the Data Catalog as a central repository to store structural and operational metadata for their data. AWS Glue provides out-of-box integration with Amazon EMR that enables customers to use the AWS Glue Data Catalog as an external Hive Metastore. This is an open-source implementation of the Apache Hive Metastore client on Amazon EMR clusters that uses the AWS Glue Data Catalog as an external Hive Metastore. It serves as a reference implementation for building a Hive Metastore-compatible client that connects to the AWS Glue Data Catalog. It may be ported to other Hive Metastore-compatible platforms such as other Hadoop and Apache Spark distributions

Language:JavaApache-2.0000

bespin

Reference implementations of "big data" algorithms in MapReduce and Spark

Language:Java000

bigdata-2018w

CS 451/651 431/631 Data-Intensive Distribute Computing (Winter 2018) at the University of Waterloo

Language:HTML000

cassovary

Cassovary is a simple big graph processing library for the JVM

Language:Scala000

Cassovary-vs-GraphJet

Performance comparison between Cassovary and GraphJet

000

Castor

PyTorch deep learning models by the Data Systems Group at the University of Waterloo

Language:Python020

cnn-text-classification-tf

Convolutional Neural Network for Text Classification in Tensorflow

Language:PythonApache-2.0000

CS224D-Assignments

My answers to the assignments to Stanford's NLP Course CS 224D

Language:Python000

curio

Curio - The coroutine concurrency library.

Language:PythonNOASSERTION020

datasets

A collection of all my datasets

Language:Jupyter NotebookGPL-3.0020

deeplearning_prac

Language:Jupyter Notebook000

DeepLearningZeroToAll

TensorFlow Basic Tutorial Labs

Language:Python000

delta

An open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.

Language:ScalaApache-2.0020

gevent

Coroutine-based concurrency library for Python

Language:PythonNOASSERTION020

googletest

Google Test

Language:C++000

GraphJet

GraphJet is a real-time graph processing library.

Language:JavaApache-2.0020

jsonresume-theme-short

Boilerplate theme for JSON Resume

Language:CSS000

koalas

Koalas: Pandas API on Apache Spark

Language:PythonApache-2.0000

libgo

Go-style concurrency in C++11

Language:C++MIT000

lucene-solr

Mirror of Apache Lucene + Solr

Language:Java000

models

Models built with TensorFlow

Language:PythonApache-2.0000

oltpbench

OLTP Benchmark Framework

Language:JavaNOASSERTION000

PyHive

Python interface to Hive and Presto. 🐝

Language:PythonNOASSERTION000

reinforcement-learning

Implementation of Reinforcement Learning Algorithms. Python, OpenAI Gym, Tensorflow. Exercises and Solutions to accompany Sutton's Book and David Silver's course.

Language:Jupyter NotebookMIT000

resume-1

Software developer resume in Latex

Language:TeXMIT000

sentiment_analysis

sentiment analysis using CNN (Tensorflow)

000

spark

Mirror of Apache Spark

Language:ScalaApache-2.0000

tensor

tensorflow practice

Language:Python020

tensorflow

Computation using data flow graphs for scalable machine learning

Language:C++Apache-2.0000

vel

Velocity in deep-learning research

Language:PythonMIT000