The goal of this Homework Project is to create a Python library, which takes in an array of records of variable size and splits the input to batches of records (array of arrays) suitably sized for delivery to a system which has following limits:
- maximum size of output record is 1 MB, larger records should be discarded
- maximum size of output batch is 5 MB
- maximum number of records in an output batch is 500
Input for the library is: [<record1>, <record2>, <record3>, ... , <recordn>]
Output is: [<batch1>, <batch2>, ..., <batchn>] where each batch is an array of records just like in the input.
The records can be assumed to be strings of variable length and they have to pass intact through the system and records should stay in the order that they arrive.
This Homework Project is just an example and cannot be installed via PyPI. In the real world, it could be published e.g. using Poetry's publish command.
This repository uses Poetry. Development requirements can be installed by
poetry install --with=dev
Next, pre commit hooks needs to be installed. They can be installed by
poetry run pre-commit install
Now static file checks will be ran on each commit or by running
poetry run pre-commit run -a
Unit tests can be run with the following command:
poetry run python -m unittest
This project uses set of development tools to ensure code quality.
Tool | Description |
---|---|
pre-commit | Used to run static file checks on each commit |
black | Used to automate code formatting |
isort | Used to automate sorting imports properly |
pylint | To do static code analyzing |
mypy | To ensure static typing |
The following points are not within the scope of this exercise, but they should be considered if a similar library were to be created in the real world.
- Ensure code quality by setting up GitHub Actions (run static code analysis and unit tests there on each push).
- Automatic releases on push to main (with Github Actions)
- Add more test cases to test all possible corner cases (empty strings, non-utf8 characters, large records, etc.).
- Provide a well-written README.md with clear installation, development, and contribution instructions.
- Tool such Bandit could be considered to find common security issues