MScDataScienceProject Diversity analysis using Jaccard Distance from Large Collections of Data using MapReduce,Spark, SParkQL