datafibers / df

Big Data Swiss Knifes

Home Page:http://www.datafibers.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DataFibers Smart GW

Gitter Build Status ##1.Overview DataFibers - DF is a open source big data smart gateway and data bus for enterprise big data project. It has implemented a generic architecture for both batch and real time processing.

This project is using or will use following technologies.

  • Vertx (Java 8)
  • Kafka (API, Connect, Stream)
  • HDFS API
  • Flink|Spark

It is a maven multi-module project. It contains following modules

  • df-reactive-client: Reads a very large file and streams it to server
  • df-reactive-server: Non-blocking server, that reads stream of data from client, parses data and sends it to Kafka queue.

##2.TODO

  • Streaming files to Kafka - DONE
  • Streaming metadata to Kafka - DONE
  • Streaming files to HDFS - DONE
  • Batching files to HDFS
  • Batching files to HIVE
  • Metadata Store
  • File watcher
  • Dashboard for metadata
  • Transformation framework
  • Persist framework
  • Query framework
  • Integrate Kanaba and Elastic
  • File ingestion and conversion, flat, xml, csv, mainframe
  • File header and trailer validation
  • Data replication across clusters, databases, tables, etc
  • Data policy supports, such as purging/retaining some rows for compliance reasons
  • Automatically register data with Hive
  • Data format interchange
  • Data deduplication and merge
  • Data job management and monitoring
  • Web UI

About

Big Data Swiss Knifes

http://www.datafibers.com

License:Apache License 2.0


Languages

Language:Java 100.0%