allegroai / clearml

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution

Home Page:https://clear.ml/docs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`clearml.Dataset.list_datasets` doesn't return all the datasets

nizamreplica opened this issue · comments

ClearML SDK doesn't return all the datasets at a project location

Describe the bug

from clearml import Dataset as CMLDataset

my_project = 'my-root/source/my-datasets'

all_datasets = CMLDataset.list_datasets(dataset_project=my_project, 
                                        partial_name=None, 
                                        only_completed=True, 
                                        include_archived=False, 
                                        recursive_project_search=True)
len(all_datasets) # returns only 306 datasets when there 381

This is the structure of my datasets in ClearML

my-root
             |
             source
                        |
                        my-datasets
                                             |
                                             folder-A
                                                          |
                                                          dataset-1-beginning-with-A
                                                          |
                                                          dataset-2-beginning-with-A
                                                          .
                                                          . 
                                                          . 
                                                          |
                                                          dataset-n-beginning-with-A
                                             folder-B
                                                          |
                                                          dataset-1-beginning-with-B
                                                          |
                                                          dataset-2-beginning-with-B
                                             .
                                             . 
                                             . 
                                              folder-Z
                                                          |
                                                          dataset-1-beginning-with-Z
                                                          |
                                                          dataset-2-beginning-with-Z

The datasets until folder-P (two out of three to be precise) are returned, thereafter are omitted.

Is it due to the nested structure of the dataset? Any idiopathic idiosyncrasies at play that I'm unaware of ? Just repeating myself again, the project name provided to the API call is 'my-root/source/my-datasets'

Expected behaviour

return all 381 datasets.

Environment

  • Server type (self hosted)
  • ClearML SDK Version: 1.13.2
  • ClearML Server Version: 1.12.1-397
  • Python Version: 3.10.13
  • OS (Linux, Ubuntu)