A crawler for crawling all the catalogue for ptt.
Using gmail to set up smtp service.
Open the file scrapy_tom_ppt_crawer.py
Change XXXXX@gmail.com to your gmail as you wish to sent it from.
Change XXXXX to your password for gmail.
Change ANY_EMAIL to user email as you want to receive.
Go to your Google account => Security => Less secure app access
To enable your less secure app access, which will allow you to access using unknow device to sent email though SMTP.
Run Mongodb as container
docker run -itd --name mongo -p 27017:27017 mongo
To check the whether Mongodb container is alive
docker ps
**ROCK and ROLL ~**
python scrapy_tom_ppt_crawer.py
or
python3 scrapy_tom_ppt_crawer.py
To see this screenshot will start crawling, and finsih will print it on the screen.
It will genarate a result file called "ppt_result.csv", cause I recond the csv file is allways better to use for analysis.
The data will also save in Mongodb.
docker ps
docker exec -it mongo mongo
show dbs
use IT_DB
show collections
db.IT_coll.find()
Below image shows : crawler has store all the data in Mongodb.
If cannot connet to MongoDB will sent the email to user.