agarwal-nitesh / file-content-search

Ingest and Search s3 file contents with elasticsearch client 8.9.1

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Remote Storage Files - Meta/Content Search

External Dependencies

  1. Postgres
  2. ElasticSearch
  3. Kafka

External dependency documentation

  1. Postgres - Is to store the file url and last processed time (CDC)
  2. ElasticSearch:
    • For maintaining inverted indices of file content tokens (single term)
    • How does it work?
  3. Kafka:
    • When file processing of a s3 or any source is triggered, all files from the source is read and is produced.
    • The file url is the key (so that it is consumed by a specific partition) Maintain Strong Ordering Guarantees

Swagger doc

http://localhost:8085/fs/swagger-ui/index.html

Architecture

alt text

PRD

PRD Notion
PRD MD

About

Ingest and Search s3 file contents with elasticsearch client 8.9.1


Languages

Language:Java 100.0%