- Handle edge cases, i.e if a profile is private
- Statistically calculated delay instead of guesses, maybe read config from database
- Handle errors
- Retry on error
- Instead of using Faktory, use a better scalable components like Kafka/Kinesis(AWS Variant)
- Right now a single instance of browser handles everything, this can be made scalable if we can start as many instances on demand
- Use cookie persistency for efficient login
- Graceful shutdown
- Handle login so that the account doesn't get banned
- Add caching layer and deduplicate mechanism
Since my instagram account was blocked halfway (completing the browser and crawling part) I couldn't complete this fully with APIs but I proposed what the project structure will look like.
Python project under app
is mostly mock and will not work.
- You have to create a
credentials.js
file insidebrowser
folder.
The schema is
export default {
email: "test",
password: "test",
};
- Then
faktory
service fromdocker-compose.yml
run asdocker compose up faktory
- When
faktory
is running, install the dependencies. Better to usebun
. - Run the file
app.js
usingbun app.js
and it will listen for incoming tasks. - While
app.js
is running you can runbun producer.js
to produce a task. - The crawler should be started and will crawl the instagram page of a given
user
in the payload - No data persistent layer was added so it will just print the html contents