vdmitriyev / datasets-links-collection

A collection of links for various datasets

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

💬 About

A collection of links for various datasets.

🔎 Search Engines

🇩🇪 Datasets Links (Germany / in German)

📑 Collections

📋 Datasets Links

Dataset Name Links
Airbnb - Airbnb datasets by Inside Airbnb
- Python scripts for Airbnb listings
EU Open Data Portal - https://data.europa.eu/euodp/data/dataset
DataHub of ready-to-use NLP datasets - https://github.com/huggingface/datasets
Music - Million Playlist Dataset (RecSys Challenge 2018)
- Million Song Dataset
WikiData - http://www.wikidata.org/
- Wikipedia Revision History with ~ 314 millions of rows
DBpedia - http://dbpedia.org/
Twitter corpus of moral sentiment (35k) - https://osf.io/k5n7y/
Web data: Amazon reviews (~35 million reviews) - https://snap.stanford.edu/data/web-Amazon.html
Datasets for machine learning in Python - https://github.com/jaberg/skdata
Open Data by Socrata - https://opendata.socrata.com/
Networks Datasets - https://west.uni-koblenz.de/research/datasets
Open Data ( source: comments on ResearchGate) - http://web.ist.utl.pt/acardoso/datasets/
- http://webspam.lip6.fr/wiki/pmwiki.php
SNAP - Stanford Network Analysis Project - http://snap.stanford.edu/data/
- http://snap.stanford.edu/data/links.html
- http://snap.stanford.edu/data/other.html
The daily news cycle [9*~2.0 GB] - http://www.memetracker.org/data.html
- https://pslcdatashop.web.cmu.edu/index.jsp
S5 - A Labeled Anomaly Detection Dataset [~16M] - http://webscope.sandbox.yahoo.com/catalog.php?datatype=s&did=70
Data Sets on AWS - http://aws.amazon.com/publicdatasets/#1
Linked Open Data by NY Times[RDF] - http://data.nytimes.com/#
Global Database of Events, Language, and Tone (GDELT) - https://www.gdeltproject.org/#downloading
- Basic data import can be found here
PSLC DataShop in Pittsburgh - https://pslcdatashop.web.cmu.edu/index.jsp?datasets=public
Labeled Faces in the Wild [images of faces] - http://vis-www.cs.umass.edu/lfw/index.html
Mobile Data Challenge (MDC) Dataset - https://www.idiap.ch/dataset/mdc
(RU) Хаб открытых данных на русском языке - http://hubofdata.ru/
US Open Data Action Plan and Datasets - http://www.kdnuggets.com/2014/05/us-open-data-action-plan-data-sets.html
StatLib - Datasets Archive - http://lib.stat.cmu.edu/datasets/
100+ Interesting Data Sets for Statistics - http://rs.io/100-interesting-data-sets-for-statistics/
Repositories of datasets - http://www.trustlet.org/datasets/
Quantnet - https://quantnet.hu-berlin.de/
Quora on open datasets - http://www.quora.com/Where-can-I-find-large-datasets-open-to-the-public?q=dataset
Enron Email Dataset (Famous Public E-mails Dataset) - https://www.cs.cmu.edu/~./enron/
The 20 Newsgroups data set (~ 20,000 docs) - http://qwone.com/~jason/20Newsgroups/
Stack Overflow Database (up to 320 Gb) - https://www.brentozar.com/archive/2015/10/how-to-download-the-stack-overflow-database-via-bittorrent/
TimeDial and Disfl-QA (Conversational NLP) https://ai.googleblog.com/2021/08/two-new-datasets-for-conversational-nlp.html
Miscellaneous - https://archive-it.org/explore?show=Collections
- https://delicious.com/pskomoroch/dataset
- http://www.hpi.uni-potsdam.de/naumann/projekte/repeatability/datasets.html

Microsoft

💡 Energy

⚓ Water

🤖 LLMs - Evaluation Datasets

⚙️ Software To Work Data

About

A collection of links for various datasets