vardecab / otomoto_olx-scraper

Scrape car offers from OTOMOTO․pl & OLX․pl and run IFTTT automation (eg. send email; add a to-do task) when new car(s) matching search criteria is found. With support for native macOS & Windows 10 notifications.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

otomoto_olx-scraper

Not Maintained

As of late 2022, serious rewrite is necessary to fix crawling issues.





Scrape car offers from OTOMOTO․pl & OLX․pl and run IFTTT automation (eg. send email; add a to-do task) when new car(s) matching search criteria is found. With support for native macOS & Windows 10 notifications.

Screenshots

Windows macOS

How to use

Note: There are 2 scripts for OTOMOTO — they're the same except for the URL. Using both files (or creating otomoto3.py, otomoto4.py, etc.) is useful when looking for a car in different location or using different search criteria altogether. Same applies for OLX.

macOS

How to chain the scripts:

# automate.sh
cd "PATH/otomoto_olx-scraper/otomoto1"
python3 otomoto1.py
cd ..
cd otomoto2
python3 otomoto2.py
cd .. 
cd olx1
python3 olx1.py
cd .. 
cd olx2
python3 olx2.py

Add ^ to Automator, export as Application and then run in the background via Script Editor.

Release History

  • 0.15: Updated the otomoto1.py & otomoto2.py scripts to handle first run properly — create folders, empty variables.
  • 0.14: Updated the URLs and BeautifulSoup's selectors so the script works again for OTOMOTO.
  • 0.13: Two files per each platform to support searches in two different locations; improved pagination support on OLX; improved regex; more data sent to IFTTT.
  • 0.12.4: Fixed a bug that prevented the script from running because there was only one OTOMOTO subpage to scrape.
  • 0.12.3: Disabled the option to open browser with search results page; changed URLs.
  • 0.12.2: Added date as 2nd parameter to IFTTT automation.
  • 0.12.1: Tiny bug fix around certificate issue on macOS when requesting a URL.
  • 0.12: Added OLX․pl support 🎉
  • 0.11.1: Replaced old win10toast module with win10toast-click.
  • 0.11: Improved Windows 10 notifications to open URL on-click using win10toast-click; added URL shortening module; renamed a few variables; cleaned up project structure.
  • 0.10: Pagination support - script will scrape only the # of pages that are available for certain search query instead of relying on hard-coded value. Also: turned off notifications when there are no new cars; fixed a bug that prevented adding more than 32 cars to the file.
  • 0.9: Cleaned the code - renamed variables & function, reduced number of .txt files used; fixed a bug that was causing false positivies because of empty lines, \n characters and duplicates; broke keyword-search functionality which is not being utilized right now anyway.
  • 0.8: Changed URL; attempt to hide API key; changes to notifications.
  • 0.7: Notifications (Windows & macOS; showing script's run time in seconds; improved regex formula to remove IDs at the end of URLs.
  • 0.6: Sending email via IFTTT if new car is found.
  • 0.5: Disabled user input once again - hardcoded values; implemented file diff; files & folders are created with unique ID.
  • 0.4: Re-enabled user input; minor tweak to URL params; improved compatibility with macOS.
  • 0.3: Disabled user input; improved compatibility with macOS.
  • 0.2: Opening URLs from search results. Windows 10 notification when opening URLs; delaying crawling; renamed some variables for better clarity.
  • 0.1: Initial release.

Versioning

Using SemVer.

License

Acknowledgements

Modules

Stack Overflow

Other

Contributing

If you found a bug or want to propose a feature, feel free to visit the Issues page.

About

Scrape car offers from OTOMOTO․pl & OLX․pl and run IFTTT automation (eg. send email; add a to-do task) when new car(s) matching search criteria is found. With support for native macOS & Windows 10 notifications.

License:GNU General Public License v3.0


Languages

Language:Python 100.0%