The repository includes an ongoing collection of Instagram Posts IDs correlated with the new coronavirus COVID-19. The first version of this data collection process started on January 5, 2020 and continued until March 30, 2020. The data gathering is still running as the lockdown has not been finished in many countries around the world (at the time of writing this paper).
We hope that the dataset can support diverse research activities. Below we list a subset of potential topics, we believe the dataset could support:
- Fake news, misinformation and rumors spreading.
- Behavioral change analysis during the pandemic.
- Information sharing related Covid-19.
- etc.
The linked paper to this dataset (arxiv): A First Instagram Dataset on COVID-19
We have collected public posts from Instagram by crawling all posts associated with a set of COVID-19 hashtags including #coronavirus, #covid19, #covid_19, and #corona.
The first version of this data collection process started on January 5, 2020 and continued until March 30, 2020. The data gathering is still running. During this time 18.5K comments and 329K likes from 5.3K public posts have been collected. These posts are distributed by 2.5K publishers.
language | code | of. #post | total % |
---|---|---|---|
Egnlish | en | 3.1K | 58.3% |
Spanish | es | 530 | 9.9% |
Portuguese | pt | 378 | 7.1% |
Italian | it | 199 | 3.7% |
French | fr | 120 | 2.2% |
Russian | ru | 98 | 1.8% |
Farsi | fa | 96 | 1.8% |
Arabic | ar | 79 | 1.4% |
Turkish | tr | 68 | 1.2% |
Other & non-detected | - | 643 | 12.1% |
For any further question, please contact Koosha Zarei at koosha.zarei@telecom-sudparis.eu.
This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0) and we publish in agreement with Instagram's Terms & Conditions.
By using this dataset, you agree to remain in compliance with conditions in the license and Instagram's Terms and Conditions, and cite the following paper:
Koosha Zarei, Reza Farahbakhsh, Noel Crespi, and Gareth Tyson. 2020. A First Instagram Dataset on COVID-19. arXiv:2004.12226.