This repository offers a wide range of datasets and queries from open data or our own practices (with necessary desensitization).
Datasets include a large number of typical domains, with diversified data characters (e.g., different column and tuple numbers).
Queries are real SQL statements that support various functionalities, such as feature extraction (), transactions (coming soon), and analytical queries (coming soon).
name |
description |
table number |
column number |
SQL |
source |
GEF2012-wind-forecasting |
Hourly power generation at 7 wind farms |
10 |
61 |
|
kaggle |
electric-power-consumption |
Per capita energy consumption in Morocco |
1 |
9 |
|
kaggle |
energydata_complete |
|
2 |
59 |
|
|
ashrae-energy-prediction |
Energy usage from over 1,000 buildings over a three-year timeframe |
5 |
32 |
|
kaggle |
name |
description |
table number |
column number |
SQL |
source |
recruit-restaurant-visitor-forecasting |
The browsing statistics of two restaurant websites |
8 |
28 |
|
kaggle |
santander-customer-satisfaction |
Hundreds of anonymized features that could reflect whether a customer is satisfied with their banking experience |
1 |
372 |
|
kaggle |
GiveMeSomeCredit |
Credit features of 250,000 borrowers in banking scenario |
1 |
13 |
|
kaggle |
daily-financial-news |
Daily financial news for over 6,000 stocks |
2 |
12 |
|
tianchi |
restaurant-revenue-prediction |
Demographic, real estate, and commercial data for the investments of new restaurant sites |
2 |
85 |
|
kaggle |
homesite-quote-conversion |
An anonymized database of information on customer and sales activity |
2 |
597 |
|
kaggle |
allstate-claims-severity |
Insurance claims for worry-free customer experiences |
3 |
265 |
|
kaggle |
tiantian |
The price-related features constructed using the fund market data downloaded from TianTian Fund website |
1 |
332 |
|
tianchi |
sberbank-russian-housing-market |
Information about overall conditions in the country's economy and finance sector |
4 |
685 |
|
kaggle |
dow_jones_index |
|
1 |
16 |
|
|
robinhood-stock-data |
The historical stock price of Robinhood (ticker symbol HOOD) |
1 |
6 |
|
kaggle |
porto-seguro-safe-driver-prediction |
The features that affect an auto insurance policy holder files a claim |
1 |
60 |
|
kaggle |
amex-default-prediction |
|
4 |
384 |
|
|
house-rent-prediction-dataset |
Information on almost 4700+ Houses/Apartments/Flats Available for Rent |
1 |
12 |
|
kaggle |
name |
description |
table number |
column number |
SQL |
source |
big-data-derby-2022 |
A wealth of data is now collected, including measures for heart rate, EKG, longitudinal movement, et al |
3 |
24 |
|
kaggle |
predict-west-nile-virus |
Weather, location, testing, and spraying data |
5 |
51 |
|
kaggle |
covid19-global-forecasting-week-2 |
Statistics of COVID19 cases in various locations across the world |
1 |
6 |
|
kaggle |
covid19-global-forecasting-week-5 |
Statistics of COVID19 cases in various locations across the world |
1 |
9 |
|
kaggle |
covid19-global-forecasting-week-4 |
Statistics of COVID19 cases in various locations across the world |
1 |
6 |
|
kaggle |
covid19-global-forecasting-week-1 |
Statistics of COVID19 cases in various locations across the world |
1 |
8 |
|
kaggle |
covid19-global-forecasting-week-3 |
Statistics of COVID19 cases in various locations across the world |
1 |
6 |
|
kaggle |
name |
description |
table number |
column number |
SQL |
source |
facebook-v-predicting-check-ins |
|
3 |
13 |
|
|
telstra-recruiting-network |
|
7 |
18 |
|
|
twitter-threads |
Thread functionality in Twitter |
5 |
35 |
|
tianchi |
spotify-app-reviews-2022 |
Spotify reviews on Google Play Store |
1 |
6 |
|
kaggle |
name |
description |
table number |
column number |
SQL |
source |
PRSA2017_Data_20130301-20170228 |
|
12 |
216 |
|
|
AirQualityUCI |
The responses of a gas multisensor device deployed on the field in an Italian city |
1 |
1 |
|
UCI_ML |
historicalweatherdataforindiancities |
Temperature data (Minimum, Average, Maximum) in degrees Centigrade and Precipitation data |
7 |
34 |
|
kaggle |
name |
description |
table number |
column number |
SQL |
source |
store-sales-time-series-forecasting |
Dates, store and product information |
5 |
22 |
|
kaggle |
coupon-purchase-prediction |
A year of transactional data for 22,873 users on the site ponpare.jp |
9 |
80 |
|
kaggle |
grupo-bimbo-inventory-demand |
9 weeks of sales transactions in Mexico |
6 |
28 |
|
kaggle |
rossmann-store-sales |
Historical sales data for 1,115 Rossmann stores |
2 |
19 |
|
kaggle |
favorita-grocery-sales-forecasting |
Dates, store and item information, whether that item was being promoted, as well as the unit sales |
6 |
26 |
|
kaggle |
walmart-recruiting-store-sales-forecasting |
|
5 |
26 |
|
|
walmart-recruiting-sales-in-stormy-weather |
Sales data for 111 products whose sales may be affected by the weather (such as milk, bread, umbrellas, etc.) |
4 |
28 |
|
kaggle |
ecommerce-customerssales-record |
Order Statistics |
1 |
41 |
|
kaggle |
competitive-data-science-predict-future-sales |
Daily historical sales data. |
5 |
16 |
|
kaggle |
m5-forecasting-accuracy |
Item sales at stores in various locations for two 28-day time periods |
3 |
1965 |
|
kaggle |
m5-forecasting-uncertainty |
Item sales at stores in various locations for two 28-day time periods |
3 |
1965 |
|
kaggle |
name |
description |
table number |
column number |
SQL |
source |
pkdd-15-taxi-trip-time-prediction-ii |
|
4 |
24 |
|
kaggle |
nyc-taxi-trip-duration |
NYC Yellow Cab trip record data |
3 |
22 |
|
kaggle |
taxi-trajectory |
A complete year (from 01/07/2013 to 30/06/2014) of the trajectories for all the 442 taxis running in the city of Porto |
1 |
9 |
|
tianchi |
pkdd-15-predict-taxi-service-trajectory-i |
|
4 |
25 |
|
kaggle |
name |
description |
table number |
column number |
SQL |
source |
talkingdata-mobile-user-demographics |
|
8 |
34 |
|
kaggle |
sf-crime |
incidents derived from SFPD Crime Incident Reporting system |
3 |
57 |
|
tianchi |
detecting-insults-in-social-commentary |
Detect social spam, account hacking, bot attacks, and more. |
1 |
5 |
|
kaggle |
expedia-hotel-recommendations |
Customer behavior |
2 |
174 |
|
kaggle |
nfl-big-data-bowl-2022 |
|
7 |
113 |
|
|
airbnb-recruiting-new-user-bookings |
Users along with their demographics, web session records, and some summary statistics |
6 |
51 |
|
kaggle |
unimelb |
Information on the investigators who are applying for the grant |
1 |
251 |
|
kaggle |
Ipin2016Dataset |
|
8 |
314 |
|
|
dspp1 |
|
4 |
19 |
|
|
lish-moa |
|
4 |
1488 |
|
|
foursquare-location-matching |
|
2 |
38 |
|
|
bike-sharing-demand |
The duration of travel, departure location, arrival location, and time elapsed |
1 |
12 |
|
kaggle |
web-traffic-time-series-forecasting |
|
6 |
1363 |
|
|
web-traffic-time-series-forecasting-1 |
|
2 |
553 |
|
|
korean-baseball-pitching-data-1982-2021 |
Team pitching data from every season of KBO Baseball |
1 |
34 |
|
kaggle |
RSSI_dataset |
RSSIs obtained on smartphones |
2 |
12 |
|
UCI_ML |
DontGetKicked |
Car information |
2 |
67 |
|
kaggle |
cyclistic-bike-share-user-dataset-1-year |
Cyclistic bikes |
1 |
18 |
|
kaggle |
data-science-job-salaries |
|
1 |
12 |
|
|
Hybrid_Indoor_Positioning |
|
1 |
67 |
|
UCI_ML |