Graviti-AI / datasets

Discover and share awesome datasets and work together to push the boundaries of AI further.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Graviti AI Community

Twitter Follow Official website Public datasets Downloads Discord


Push the boundaries of AI

⭐ Welcome to Graviti AI community! We are devoted to making datasets more accessible and interoperable to AI developers, fostering a supportive community of building machine learning applications.

Table of Contents


Thanks for supporting the community!

Stargazers repo roster for @Graviti-AI/datasets

Open datasets catalog

These datasets are great for machine learning learners, researchers and engineers to train models for image classification, object detection, visual relationship detection, instance segmentation, and more.
The full list is available on Graviti Community.
Please DO NOT modify this file directly. You could direct to the dataset page to contribute.

Datasets repo is a lightweight library of 1,233 datasets in high quality. All are open source carrying a diverse range of tasks, annotation types, and sizes.
Search by task types or keywords if you need a specific dataset. You could fork a dataset on dataset page and read data through SDK.
Popular tasks

  • Object Detection
  • Classification
  • Keypoints Detection
  • Segmentation
  • Pose Estimation
  • ASR
  • OCR

Quick start

⭐ You have a complex problem or project involving a large amount of data and lots of variables. You know that finding a public dataset to train your machine learning model would be the best approach. How do you deal with data that’s in a variety of formats? How do you choose the dataset for your model?
We'll walk you through step by step from the basics to advanced techniques and help you get started!

  1. Sign up for an account

Go to graviti.com to sign up.
Get an AccessKey on Graviti Developer Tools.

An AccessKey is needed to authenticate identity when using TensorBay via SDK or CLI.
You have full permissions for the account. Please keep the key properly.

  1. Install Tensorbay Python SDK
  • To install TensorBay SDK and CLI by pip, run the following command:
pip3 install tensorbay
  • To verify the SDK and CLI version, run the following command:
gas --version
  • Authorize a Client Instance
from tensorbay import GAS
gas = GAS("<YOUR_ACCESSKEY>")
  1. Select an open dataset

You need to fork an open dataset from the community to your Graviti workspace before processing the data.

  • Search datasets from the open dataset catalog 📖
  • Preview the data and annotations
    View data visualization in advance to help you quickly understand a dataset and its semantic information.
  • On the dataset page, choose to fork the dataset in the 'Explore Dataset' drop-down menu.
  • Find the dataset on the 'Your Datasets' list

fork a dataset

  1. Prepare data

You could customize open datasets into the right dataset for your models by using features below.

  1. Integrate with machine learning frameworks (PyTorch, TensorFlow and more)

The typical method to integrate a dataset with PyTorch is to build a ‘Segment’ class derived from ‘torch.utils.data.Dataset’.

The typical method to integrate a dataset with TensorFlow is to build a callable ‘Segment’ class.

  • We recommend enabling cache for a better training experience. Sample code is as below (It requires enough local storage to load dataset)
from paddle.io import Dataloader,Dataset
from PIL import Image
from tensorbay.dataset import  Dataset as TensorBay Dataset

class DogsVSCatsSegment(Dataset):
##class for wrapping a DosVsCats segment

    def __init__(self, gas, segment_name, transfors):
        super().__inint__()
        self.dataset = TensorBayDataset('DogsVsCats', gas)
        self.dataset.enable_cache() ## launch cache
        self.segment = self.dataset{segment_name}
        self.category_to_index = self.dataset.catalog.clasification.get_category_to_index()
        self.transform = transform
        print(self.datasdt.cache_enabled) ## confirm if cached has been launched

Become a contributor

Contributions are welcomed and greatly appreciated. You can become a community contributor in many different ways, we value all forms of contribution including:

  • Improve code
  • Improve docs
  • Report bugs
  • Write blogs
  • Give talks
  • Provide ideas
  • Answer questions

Q&A

Can I use these datasets for my project?
Sure! You're totally free to do so. You may check detailed license info further on each dataset page.

Can I add a dataset here?
Send us a pull request and we'll discuss.


Join the community

To connect with all practitioners like you, join our community discord for more communication.

About

Discover and share awesome datasets and work together to push the boundaries of AI further.

License:MIT License