A web crawler that crawls wgcompany.de and sends you a mail for each new flatshare.
- Copy crawler.cfg.example to crawler.cfg
- Replace the example values in crawler.cfg with your actual configuration
- Install 'pipenv' if not already installed
- Run
pipenv install
- Run
pipenv shell
- Test your configuration: Run
python test_crawler.py
.
It should print "OK" and you should have received two test mails to the account you configured in step 2
- Run the crawler the first time:
./crawler.py
Nothing should be printed on the console. Log files will be in crawler.log
.
You won't receive any mails on the first run.
- Look at the path that was output when running
pipenv shell
in step 4.
The path looks like this: /Users/YOURNAME/YOUR_ACTUAL_PATH/bin/activate
Replace activate
with python
and note that path for later.
So you get something like this: /Users/YOURNAME/YOUR_ACTUAL_PATH/bin/python
- Create a crontab entry which runs the crawler periodically
- Run
crontab -e
- Put the following line in the crontab and save the file
0 * * * * /Users/YOURNAME/YOUR_ACTUAL_PATH/bin/python PATH_TO_crawler.py
The above setup will run the crawler once an hour.