iluxonchik / olx-phone-number-scraper

OLX phone number scraper scrapes phone numbers from OLX listings.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

OLX Phone Number Scraper

Scrapes phone numbers from OLX listings.

Usage

NOTE: rewrite this when final version is done, this here is just to guide the design of the program, to make it intuitive.

olxscrape <url_of_listings> <page_limit>* <phone_number_limit>* <output_file_name>

The tools should write a phone number to the file as soon as it's scraped, so that
it can be safely stopped with Ctrl^C at any time.
join <file_1> ... <file_n> <output_file_name>

Joins two or more files into a single one, removing duplicates.

prefix_phone <input_file_name> <prefix> <output_file_name>

Prefixes the given string to all of the phone nubmers in the list. Useful if you
want to prefix all with a country code.

Features marked with * might only be considered for beyond-base-version of the tool.

About

This tool can be easily altered to scrap e-mails and any other useful information as well. This was written as a need for a specific project and is only designed to support this functionality. It's a result of reverse-engineering necessary to fulfill the purpose of scraping phone numbers from listings, given a url of to a list of those listings. Some command-line tools are included.

A minimal test set is included. You can use this as a basis for a tool with similar purpose.

I have a lack of time at the moment and quickly hacked this in. When reading the code, please keep this in mind.

About

OLX phone number scraper scrapes phone numbers from OLX listings.


Languages

Language:HTML 95.7%Language:Python 4.3%