jason4zhu's repositories
awesome-public-datasets
An awesome list of high-quality open datasets in public domains (on-going).
com.judking.hive.udf
judking的udf
Language:Java000
com.judking.mr
MapReduce相关代码
Language:Java000
com.kth.wps.project
com.kth.wps.project
Language:Web Ontology Language000
CompaignCombineHiveInputFormat
自定义CombineInputFormat,用于将属于同一个part的hdfs文件打包放到同一个mapper下,减少整体的mapper数量并提高性能。
Language:Java000
DHTCrawler
python 编写的DHT Crawler 网络爬虫,抓取磁力链接
Language:Python000
elasticsearch
Open Source, Distributed, RESTful Search Engine
Language:JavaApache-2.0000
flink-training
Apache Flink Training Excercises
Language:JavaApache-2.0000
Language:JavaNOASSERTION000
Language:HTML000
KolodaProblem
Problem related with https://github.com/Yalantis/Koloda/issues/199
Language:Swift000
lsh-spark
Locality Sensitive Hashing for Apache Spark
Language:ScalaApache-2.0000
myvagrant
edX: Introduction to Big Data with Apache Spark
Language:Jupyter Notebook000
scala.test
scala.test
Language:Scala000
spark
Mirror of Apache Spark
Language:ScalaApache-2.0000
spark-hash
Locality Sensitive Hashing for Apache Spark
Language:ScalaApache-2.0000
tencent7227
tencent_task_7227