LNshuti / mongodb-nlp

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Mongodb NLP

The purpose of this project is to demonstrate how a company can use Natural Language Processing on top of a modern data stack to improve its decision-making quality.

Skills and Expertise:

  • Building scalable ETL pipelines for high-performance data processing.
  • Proficiency in programming with Python using Spark, Polars and Pandas.
  • Data Orchestration with Apache Airflow
  • Familiarity with AWS services, including S3, Kinesis, EMR, Lambda, Athena, Glue, IAM, and RDS.
  • Understanding storage formats such as Parquet, JSON, Avro, and Arrow.
  • Working knowledge of databases, including MongoDB and Redshift.
  • Understanding of storage format differences and schema designs.
  • Knowledge of building machine learning pipelines using tools like SparkML, Tensorflow, Scikit-Learn

Architecture


image

About


Languages

Language:Jupyter Notebook 100.0%