Ramakrishna05 / Web-Crawling

Scratching the web for information :)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Web-Scraping-with-Python

Scratching the web for information :)

  1. bv_extract.py contains code for crawling Bharatavani website (website for dictionaries for Indian languages).
  2. ntm_translators_db_crawler.py contains code for crawling National Translation Mission website for list of translators (needs to be improved).
  3. align_corpus.py is a simple code for aligning two sentences to make parallel corpus (needs to be improved).
  4. crawler.py is very basic crawler with BeautifulSoup for data parsing.
  5. extract_data.py contains code for extracing data between any two html tags and arranges the data in specific manner.
  6. fill_form.py contains code for submitting data and clicking java script buttons with selenium tool (I used this for getting News paper articles from news paper called "Sakshi").
  7. new_crawler.py is used for crawling PMModi website for informaiton :)
  8. shabdkosh_eng_tel_crawler.py contains code for crawling Shabdkosh website (another website for dictionaries for Indian languages). Implemented with selenium (because the website contains java script enabled buttons).

About

Scratching the web for information :)


Languages

Language:Python 100.0%