malokhvii-eduard / olx-crawler

πŸ€– An easy-to-use, powerful crawler for OLX, that allows collecting various non-sensitive data about ads on the site.

Home Page:https://olx.ua

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

πŸ€– OLX Crawler

An easy-to-use, powerful crawler for OLX, that allows collecting various non-sensitive data about ads on the site.

License pre-commit pre-commit.ci Style Guide markdownlint commitlint flake8 bandit

πŸŽ‰ Features

  • 🦾 Enough performance
  • 🎭 Anonymous, especially via Tor
  • βš–οΈ Non-sensitive data
  • πŸ” Filtering by keywords
  • ⛓️ Commands chaining

🌻 Motivation

Demonstration of experience with Selenium for Web Scraping πŸ’ͺ. Analyzing non-sensitive data about ads on the site 🧐. No ready solutions for collecting data from the site 😒.

✨ Getting Started

πŸ“š Prerequisites

You will need to install only Google Chrome, thats all. No need manual installation of WebDriver binary. @SergeyPirogov thank you for WebDriver Manager.

πŸ“¦ Installation

  1. Clone the Repository
  2. Install this Package (./setup.py install) or install dependencies from Pipfile (pipenv install)

πŸ‘€ Usage

olx ads --help # Show help for ads command and exit
olx ads "https://www.olx.ua/uk/zhivotnye/koshki/" # Collect all ads with cats
olx ads --no-free ... # Only paid ads
olx ads --no-paid ... # Only free ads
olx ads --kind --title --price --location ... # Collect extra fields

olx ad --help # Show help for ad command and exit
olx ad "https://www.olx.ua/d/uk/obyavlenie/laskovye-shotlandskie-malyshi-IDNyrf4.html" # Collect ad details
olx ad --keywords keywords.txt ... # Filter by keywords
olx ad --title --description --author --profile --price --location ... # Collect extra fields

olx ads --progress ... # Show progress
olx ads --no-headless ... # Disabled headless mode
olx ads --proxy "socks5://..." # Use proxy server
olx ads --all ... # Collect all fields
olx ads --no-link ... # Skip link field
olx ads "https://www.olx.ua/uk/zhivotnye/koshki/" | olx ad --all --progress > ads.csv # Commands chaining

πŸ› οΈ Tech Stack

EditorConfig Markdown Python Selenium click tqdm pre-commit markdownlint commitlint Shields.io Git GitHub

✍️ Contributing

πŸ‘πŸŽ‰ First off, thanks for taking the time to contribute! πŸŽ‰πŸ‘

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/awesome-feature)
  3. Commit your Changes (git commit -m 'Add awesome feature')
  4. Push to the Branch (git push origin feature/awesome-feature)
  5. Open a Pull Request

πŸ’– Like this project?

Leave a ⭐ if you think this project is cool or useful for you.

⚠️ License

olx-crawler is licenced under the MIT License. See the LICENSE for more information.

About

πŸ€– An easy-to-use, powerful crawler for OLX, that allows collecting various non-sensitive data about ads on the site.

https://olx.ua

License:MIT License


Languages

Language:Python 99.6%Language:JavaScript 0.4%