shredder12 / Data-Infra-Projects

List of some interesting projects

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data-Infra-Projects

This is an attempt to list out all the interesting projects.

TODO: Add links and licenses.

##Abstractions

##Distributed Coordination

This are implementations/libraries to help write distributed applications which require some form of coordination.

##Infrastructure Management

##File Systems

##Distribtued Databases

##Infrastrcuture Logging/Monitoring

##Infrastructure Helpers

MultiCloud/CrossCloud utilities

##Virtualization

##Virtualization++

##Generalized Data Processing

#####comparisons

  • Tez vs Dryad
  • Hadoop vs Spark - Too many differences, no good link.

##Largescale Distributed ML

##pub-sub / messaging

##Data Ingest

##Graph Storing and/or Processing

SQL Engines

##RealTime Processing (Time-constrained Processing)

##Stream Processing

##Security

##Performance Analysis

##Workflow engines/DAG-executors/Pipelines

#####Comparisons

##Configuration Management

##Service Discovery

#####Comparison

##Search

others

  • Nutch - web crawler
  • Ambari - Hadoop Deployment + Management
  • Hue - Hadoop Web UI
  • Bigtop - Hadoop Packaging
  • DataFu - Collection of MapReduce lib
  • Twill (Previsously known as Weave) - YARN application writing lib

About

List of some interesting projects