Marko Varghese's repositories
dedupefolders
Dedupe folders on my drives
datapull
Cloud based Data Platform based on Apache Spark
deequ
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
mongotestsparkreplacedocument
This repo tests the mongo spark connector's replaceDocument feature against a sharded collection
docker-spark
Docker build for Apache Spark
ace
Ace (Ajax.org Cloud9 Editor)
materialuiwithapiproxy
https://www.creative-tim.com/product/material-dashboard-react with reverse proxy for REST APIs
ambaridocker
Run Ambari cluster on docker with an emphasis on Apache Ranger
cppDayofBirthday
C++ Day of Birthday
GettingAndCleaningCourseProject
Files for Course Project of Coursera course "Getting and Cleaning Data"
ProgrammingAssignment2
Repository for Programming Assignment 2 for R Programming on Coursera
Spoon-Knife
This repo is for demonstration purposes only.
datasharing
The Leek group guide to data sharing