There are 183 repositories under dataset topic.
A collective list of free APIs
Faker is a Python package that generates fake data for you.
Label Studio is a multi-type data labeling and annotation tool with standardized output format
A MNIST-like fashion product database. Benchmark :point_down:
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Open source annotation tool for machine learning practitioners.
Curated list of Machine Learning, NLP, Vision, Recommender Systems Project Ideas
Techniques for deep learning with satellite & aerial imagery
Documentation on how to access and use the Quick, Draw! Dataset.
This repository contains compatibility data for Web technologies as displayed on MDN
TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
Models, data loaders and abstractions for language processing, powered by PyTorch
pix2tex: Using a ViT to convert images of equations into LaTeX code.
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
We are building an open database of COVID-19 cases with chest X-ray or CT images.
A curated list of awesome JSON datasets that don't require authentication.
A synthetic data generator for text recognition
Extract data from a wide range of Internet sources into a pandas DataFrame.
Semantic Segmentation Suite in TensorFlow. Implement, train, and test new Semantic Segmentation models easily!
Basic Utilities for PyTorch Natural Language Processing (NLP)
The open standard for data logging
Waymo Open Dataset
Objectron is a dataset of short, object-centric video clips. In addition, the videos also contain AR session metadata including camera poses, sparse point-clouds and planes. In each video, the camera moves around and above the object and captures it from different views. Each object is annotated with a 3D bounding box. The 3D bounding box describes the object’s position, orientation, and dimensions. The dataset contains about 15K annotated video clips and 4M annotated images in the following categories: bikes, books, bottles, cameras, cereal boxes, chairs, cups, laptops, and shoes
🎁 3,000,000+ Unsplash images made available for research and machine learning
[ECCV 2018] CCPD: a diverse and well-annotated dataset for license plate detection and recognition
用于训练中英文对话系统的语料库 Datasets for Training Chatbot System
Large list of handpicked color names 🌈
ISO 3166-1 country lists merged with their UN Geoscheme regional codes in ready-to-use JSON, XML, CSV data sets
Windows Events Attack Samples
FMA: A Dataset For Music Analysis
Collaborate & label any type of data, images, text, or documents, in an easy web interface or desktop app.
📈 目前最大的工业缺陷检测数据库及论文集 Constantly summarizing open source dataset and critical papers in the field of surface defect research which are of great importance.
Colour Science for Python
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.