This is an attempt to list out all the interesting projects.
It is intended for anyone designing modern large scale architectures and need to choose tools/technoglogies/frameworks. The purpose is to help in making that choices with resources like comparisions/use-cases/features/maturity or really anything that helps in making an informed decision.
TODO:
Add links and licenses.
##Abstractions
##Distributed Coordination
This are implementations/libraries to help write distributed applications which require some form of coordination.
##Infrastructure Management
#####comparisons
##File Systems
##Distribtued Databases
##Infrastrcuture Logging/Monitoring
##Infrastructure Helpers
##Virtualization
##Virtualization++
##Generalized Data Processing
#####comparisons
- Tez vs Dryad
- Hadoop vs Spark - Too many differences, no good link.
##Largescale Distributed ML
##pub-sub / messaging
##Data Ingest
##Graph Storing and/or Processing
##SQL Engines
##Stream Processing
##Security
##Performance Analysis
##Workflow engines/DAG-executors/Pipelines
#####Comparisons
##Configuration Management
##Service Discovery
#####Comparison
##Testing
##Visualization
- White Elephent
- Ambrose
- Lipstick
- Hue - Hadoop Web UI
- Inviso
##Libraries
- Zoie
- Norbert - cluster manager and networking layer built on top of Zookeeper.
- Okapi - Large-scale ML & graph analytics on Giraph
- Scalding - A Scala API for Cascading
- SummingBird - Streaming MapReduce with Scalding and Storm
- Curator - set of Java libraries that make using Apache ZooKeeper much easier
- Turbine - Low latency high throughput aggregator for real time streams
- DataFu - Collection of MapReduce lib
- Twill (Previsously known as Weave) - YARN application writing lib
##Search