bekou / ad_data

Implementation of our paper Reconstructing the house from the ad: Structured prediction on real estate classifieds

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reconstructing the house from the ad: Structured prediction on real estate classifieds

This repository contains the code used for dependency parsing and information about how to obtain the dataset presented in the work:

Reconstructing the house from the ad: Structured prediction on real estate classifieds

The dataset includes 2,318 manually annotated property advertisements from a real estate company.

If you use part of the code or the dataset please cite:

@InProceedings{E17-2044,
  author = 	"Bekoulis, Giannis
		and Deleu, Johannes
		and Demeester, Thomas
		and Develder, Chris",
  title = 	"Reconstructing the house from the ad: Structured prediction on real estate classifieds",
  booktitle = 	"Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers",
  year = 	"2017",
  publisher = 	"Association for Computational Linguistics",
  pages = 	"274--279",
  location = 	"Valencia, Spain",
  url = 	"http://aclweb.org/anthology/E17-2044"
}

and

@article{BEKOULIS2018100,
title = "An attentive neural architecture for joint segmentation and parsing and its application to real estate ads",
journal = "Expert Systems with Applications",
volume = "102",
pages = "100 - 112",
year = "2018",
issn = "0957-4174",
doi = "https://doi.org/10.1016/j.eswa.2018.02.031",
url = "http://www.sciencedirect.com/science/article/pii/S0957417418301192",
author = "Giannis Bekoulis and Johannes Deleu and Thomas Demeester and Chris Develder"
}

Pre-requisites

The code is written for Python 2.7. Some of the python packages needed to run these files, best installed using pip.

  • scikit-learn (machine learning)
  • pandas (Data manipulation)
  • pandas_confusion (performance measures)

Dependency parser

In the repository, one can find the 4 models (Threshold, Edmond, Structured Prediction via the Matrix-Tree Theorem (MTT), Transition-based) that we have developed for dependency parsing. One should run the run_script.py file that serves as a main function.

Dataset

To obtain the anonymized dataset fill in and sign this form. Send it also via email to giannis.bekoulis@gmail.com. Follow the instructions and we will get back to you as soon as possible with information about how to download the anonymized dataset.

About

Implementation of our paper Reconstructing the house from the ad: Structured prediction on real estate classifieds

License:MIT License


Languages

Language:Python 100.0%