- Docker and docker-compose
Note
: Other requirements are fetched by docker via docker-compose.
-
Clone the repository.
-
Run docker-compose file using:
docker-compose -f docker-compose.yml up -d --build
-
Testing local access as mentioned in the document:
-
Read message from the queue using the command:
awslocal sqs receive-message --queue-url http://localhost:4566/000000000000/login-queue
-
Check if your table is created in PostgreSQL using the two commands shown below:
psql -d postgres -U postgres -p 5432 -h localhost -W SELECT * FROM user_logins;
-
-
To stop the container use:
docker-compose -f docker-compose.yml down
Run unit tests with:
pytest tests/.
Note: This is run by default when docker container starts
- etl-sqs/
- src/
- helpers
- logins.py
- queue_service.py
- run.py
- helpers
- tests/
- test_all.py
- .gitignore
- docker-compose.yml
- Dockerfile
- env_local.ini
- requirements.txt
- README.md
- src/
- The JSON structure of messages retrieved from the SQS queue remains consistent.
- Messages lacking essential field values in their data are not included in the database insertion process.
- When app_version is a decimal, it is split on . with only the integer part before the . considered as its type, as it is treated as an integer.
- Masking of device_id and ip utilizes SHA512 and does not require a reversal process.
- While we currently handle basic errors, there's room to implement additional checks to address a broader range of exceptions.
- At the moment, we process one message at a time, but there's an opportunity to boost performance by implementing batch processing.
- Consider incorporating a logging mechanism that allows for various log types, such as debug, error, and info, with the flexibility to switch between them.
- Enhance the test suite by adding more unit tests; presently, the code focuses solely on validating the accuracy of the device_id and ip masking operations.
- Leveraging container orchestration solutions like Kubernetes can further enhance our existing Docker setup.
- Enhance database performance through optimization techniques like indexing and sharding.
- Implement GitHub Actions for an efficient CI/CD pipeline.
- Utilize monitoring tools such as Prometheus or Datadog to diagnose issues and detect patterns.
- Employ logging tools like Logstash to enhance logging capabilities.
- Scale horizontally by adding multiple nodes with a load balancer and implement auto-scaling based on resource requirements.
- Utilize a reliable open-source stream processing platform like Apache Kafka.
I am Kedar Takwane, a graduate student at the University of Illinois at Urbana-Champaign, pursuing a Master's degree in Computer Science.
Currently, I am engaged as a Research Assistant, where my responsibilities encompass full-stack development and the development of recommendation systems. Prior to this, I have accumulated three years of industry experience as a Software Engineer.
Email: takwane2@illinois.edu