Smart Crawler for Big Data integration

This project aims to produce a set of tools, that will help big data integration engineers, model the data automatically with a certain confidence interval.

General Architecture

Getting Started

These instructions help you start developing and running the project for testing purposes.

Prerequisites

Netbeans
Hadoop Cluster

Installing

To start developing

Install Netbeans;
Clone the git repository;
Configure as Maven project;

Deployment

Maven package and deploy them in the Big Data infrastructure.
Change endpoints in AtlasClient.AtlasCosumer

Running the tests

Quality Tests: select tables in basicProfiler.Profiler
Similarity Tests: select tables in Similarity.SimilarityAnalysis

Built With

Spark - The scalabe event processing engine
Atlas - Data Governance and Metadata framework for Hadoop
Ranger - Enable, monitor and manage comprehensive data security across the Hadoop platform.

Authors

José Magalhães
João Galvão
Maria Inês Costa

Versioning

We use SemVer for versioning. For the versions available, see the tags on this repository.

License

This project is currently internal.

Acknowledgments

Cheers for the LID4 community

About

This project aims to produce a set of tools, that will help big data integration engineers, model the data automatically with a certain confidence interval.

Languages

Language:Java 100.0%