divith raju's repositories
divith-raju-Building-Big-Data-Infrastucture-NoSQL-And-SQL
Big Data Platform on MongoDB Atlas and Heroku PostgreSQL
divith-raju-Immigration-Data-Engineering
A Capstone Project that covers several aspects of Data Engineering (Data Exploration, Cleaning, Modeling, Pipelining, Processing)
divith-raju-OpenMetadata
Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.
divith-raju-SearchEngine-Wikipedia
search engine optimizationA complete search engine experience built on top of 75 GB Wikipedia corpus with subsecond latency for searches. Results contain wiki pages ordered by TF/IDF relevance based on given search word/s. From an optimized code to the K-Way mergesort algorithm, this project addresses latency, indexing, and big data challenges.
Divithraju
Config files for my GitHub profile.
divith-raju-Web-Server-Log-Analysis-Pyspark
Playground for pyspark (RDDs, DStreams) and Apache Airflow. Based on the example of parsing (including incorrectly formated strings) web server log data