CogStack / MedCATtrainer

A simple interface to inspect, improve and add concepts to biomedical NER+L -> MedCAT.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Uploading Dataset produces error: Error with previous action: '>' not supported between instances of 'int' and 'str'

georgm8 opened this issue · comments

Just spotted this error whilst trying to upload a dataset!

image

The error appears to come from the following piece of code as max_dataset_size is interpreted as a string so throws an error when trying to compare this value to df.shape[0]

data_utils.py line 19

if df is not None:
        max_dataset_size = os.environ.get('MAX_DATASET_SIZE', _MAX_DATASET_SIZE_DEFAULT)
        if df.shape[0] > max_dataset_size:
            raise(f'Attempting to upload a dataset with {df.shape[0]} rows. The Max dataset size is set to'
                  f' {max_dataset_size}, please reduce the number of rows or contact an admin to increase the max size')

        if 'text' not in df.columns or 'name' not in df.columns:
            raise Exception("Please make sure the uploaded file has a column with two columns:'name', 'text'. "
                            "The 'name' column are document IDs, and the 'text' column is the text you're "
                            "collecting annotations for")

Fixed here: 202ce35 and will be available in the next release, or :latest tag in an hour or so. To fix this locally now, just remove this line from the env file