venbigdata's repositories
BAIML
Repository for all my projects as a part of Berkeley's AIML (BAIML) program
flink-connector-jdbc
Apache flink
spark-workshop
Apache Spark™ and Scala Workshops
kafka
Mirror of Apache Kafka
dataframe-rules-engine
Extensible Rules Engine for custom Dataframe / Dataset validation
fun-stuff
Still Learning :)
jetbrains-plugin
Deepcode plugin for JetBrains
arrow
Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication. Languages currently supported include C, C++, Java, JavaScript, Python, and Ruby.
deep-learning-v2-pytorch
Projects and exercises for the latest Deep Learning ND program https://www.udacity.com/course/deep-learning-nanodegree--nd101
deep-learning
Repo for the Deep Learning Nanodegree Foundations program.
AIPND
Code and associated files for the AI Programming with Python Nanodegree Program
spark-solr
Tools for reading data from Solr as a Spark RDD and indexing objects from Spark into Solr using SolrJ.
gitective
Find the Git commits you're looking for
AIPND-revision
Revision to the code and associated files for the AI Programming with Python Nanodegree Program
parquet-mr
Apache Parquet
solr-autocomplete
Solr AutoComplete implementation
CMS1500form
test data preparation - python project for CMS1500Form
HBaseTutorials
HBase Examples
twitbase
TwitBase is a running example used throughout HBase In Action
Solbase
open source search platform based on Lucene, Solr, HBase
kite
Kite SDK
pydata-book
Materials and IPython notebooks for "Python for Data Analysis" by Wes McKinney, published by O'Reilly Media
hello-world
hello-world test
SparkOnHBase
SparkOnHBase
datasharing
The Leek group guide to data sharing
test-datascience-repo
for data science training