Faizaan06583 / Hyderabad_Housing_Prices

Current Repository is about extracting data from website or blog using web-scraping technique.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Hyderabad_Housing_Prices

About the Repository :

In the current repository we try to extract the data from one of the housing rental portals in Hyderabad using web scraping method and clean the data using various data cleaing techniques availabel in python and finally save the file into a .csv file for further processing and analysis.

Steps :

  • Identify the website to be scraped.

  • Identify the feature to be extracted from the data.

  • For example, In the current repository I am extracting following feature.

    1. No Bedrooms.
    2. No Bathrooms.
    3. Type of Furnishing.
    4. The Tennants Preferred.
    5. The Area of the House in sqft.
    6. The locality of the House.
    7. Price or Rent of the flat.
    
  • cleaning and extracting the data from the html tags.

  • Saving the file into a .csv file for further processing.

Dependencies :

  • Pandas.
  • Regular Expressions.
  • Beautifulsoup.
  • Urllib.

Sample-Input :

Sample- Output:

Note :

Installing Anaconda Distribution will resolve all the dependencies. Code changes and modifications needed depending on the layout of the web page and placement of the features in the given web page.

About

Current Repository is about extracting data from website or blog using web-scraping technique.


Languages

Language:Jupyter Notebook 100.0%