Wenbing's repositories
arrow
Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication. Languages currently supported include C, C++, Java, JavaScript, Python, and Ruby.
beam
Apache Beam
cs140
Operating Systems (CS140)
gcsfs
Pythonic file-system interface for Google Cloud Storage
koalas
Koalas: pandas API on Apache Spark
petastorm
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
playground
Python Open Source Playground
practice
Feb15
rikai
Parquet-based ML data format optimized for working with unstructured data
sponge
CS144 Lab Assignments