Mongodb NLP

The purpose of this project is to demonstrate how a company can use Natural Language Processing on top of a modern data stack to improve its decision-making quality.

Skills and Expertise:

Building scalable ETL pipelines for high-performance data processing.
Proficiency in programming with Python using Spark, Polars and Pandas.
Data Orchestration with Apache Airflow
Familiarity with AWS services, including S3, Kinesis, EMR, Lambda, Athena, Glue, IAM, and RDS.
Understanding storage formats such as Parquet, JSON, Avro, and Arrow.
Working knowledge of databases, including MongoDB and Redshift.
Understanding of storage format differences and schema designs.
Knowledge of building machine learning pipelines using tools like SparkML, Tensorflow, Scikit-Learn

LNshuti / mongodb-nlp

Mongodb NLP

Skills and Expertise:

Architecture

About

Languages