gladcolor / IDRISI

IDRISI-R is the largest-scale publicly-available Twitter Location Mention Recognition (LMR) dataset, in both English and Arabic languages. It contains 41 disaster events of different types such as floods, fires, etc. In addition to tagging LMs in tweets, the LMs were labeled for location types such as countries, cities, streets, POIs, etc.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

IDRISI

IDRISI is the largest-scale publicly-available Twitter Location Mention Prediction (LMP) dataset, in both English and Arabic languages. Named after Muhammad Al-Idrisi👳🏻‍♂️, who is one of the pioneers and founders of the advanced geography.

All datasets are licensed under Creative Commons Attribution 4.0 International License.

  • The Location Mention Recognition (LMR) datasets are under LMR directory.
  • The Location Mention Disambiguation (LMD) datasets will be available soon under LMD directory.

For any inqueries, please create a new issue in the repository or contact us via email:

  • Reem Suwaileh: rs081123@qu.edu.qa

Publications

  @article{rsuwaileh2023idrisire,
    title = {IDRISI-RE: A generalizable dataset with benchmarks for location mention recognition on disaster tweets},
    author = {Reem Suwaileh and Tamer Elsayed and Muhammad Imran},
    journal = {Information Processing & Management},
    volume = {60},
    number = {3},
    pages = {103340},
    year = {2023},
    issn = {0306-4573},
    doi = {https://doi.org/10.1016/j.ipm.2023.103340},
    url = {https://www.sciencedirect.com/science/article/pii/S0306457323000778},
    publisher={Elsevier}
  }

  
  @inprocessdings{rsuwaileh2023idrisira,
    title = {IDRISI-RA: The First Arabic Location Mention Recognition Dataset of Disaster Tweets},
    author = {Reem Suwaileh and Muhammad Imran and Tamer Elsayed},
    booktitle = {Proceedings of the 61th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
    month = {may},
    year = {2023},
    address = {Toronto, Canada},
    publisher = {Association for Computational Linguistics},
    url = {...},
    doi = {...},
    pages = {...}
  }
  
    

Acknowledgments

This work was made possible by the Graduate Sponsorship Research Award (GSRA) #GSRA5-1-0527-18082 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors.

About

IDRISI-R is the largest-scale publicly-available Twitter Location Mention Recognition (LMR) dataset, in both English and Arabic languages. It contains 41 disaster events of different types such as floods, fires, etc. In addition to tagging LMs in tweets, the LMs were labeled for location types such as countries, cities, streets, POIs, etc.


Languages

Language:Jupyter Notebook 95.7%Language:Python 4.3%