Web-Scraping-with-Python

Scratching the web for information :)

bv_extract.py contains code for crawling Bharatavani website (website for dictionaries for Indian languages).
ntm_translators_db_crawler.py contains code for crawling National Translation Mission website for list of translators (needs to be improved).
align_corpus.py is a simple code for aligning two sentences to make parallel corpus (needs to be improved).
crawler.py is very basic crawler with BeautifulSoup for data parsing.
extract_data.py contains code for extracing data between any two html tags and arranges the data in specific manner.
fill_form.py contains code for submitting data and clicking java script buttons with selenium tool (I used this for getting News paper articles from news paper called "Sakshi").
new_crawler.py is used for crawling PMModi website for informaiton :)
shabdkosh_eng_tel_crawler.py contains code for crawling Shabdkosh website (another website for dictionaries for Indian languages). Implemented with selenium (because the website contains java script enabled buttons).

Ramakrishna05 / Web-Crawling