asyml / forte

Forte is a flexible and powerful ML workflow builder. This is part of the CASL project: http://casl-project.ai/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Create a single source for resources shared by `data_store` and `data_store_serialization` test cases.

Pushkar-Bhuse opened this issue · comments

Is your feature request related to a problem? Please describe.
The resources used to define the test cases in data_store_test.py and data_store_serialization_test.py are the same in many cases. For example: The structure of _type_attributes for most entries in both the above mentioned files is quire similar in most cases

DataStore._type_attributes = {
        "ft.onto.base_ontology.Document": {
            "attributes": {
                "begin": {"index": 2, "type": (None, (int,))},
                "end": {"index": 3, "type": (None, (int,))},
                "payload_idx": {"index": 4, "type": (None, (int,))},
                "document_class": {"index": 5, "type": (list, (str,))},
                "sentiment": {"index": 6, "type": (dict, (str, float))},
                "classifications": {
                    "index": 7,
                    "type": (FDict, (str, Classification)),
                },
            },
            "parent_entry": "forte.data.ontology.top.Annotation",
        },
}

Describe the solution you'd like
In order to reduce this redundancy, there should be a central file that can store these configurations and a clear format for them to be accessed by these tests. Note that although the configurations look quite similar, there are subtle differences in some cases that are intentional. For example, in data_store_serialization_test.py, the _type_attributes for Document is given by

"ft.onto.base_ontology.Document": {
                "attributes": {
                    "begin": {"index": 2, "type": (None, (int,))},
                    "end": {"index": 3, "type": (None, (int,))},
                    "payload_idx": {"index": 4, "type": (None, (int,))},
                    "sentiment": {"index": 5, "type": (dict, (str, float))},
                    "classifications": {
                        "index": 6,
                        "type": (FDict, (str, Classification)),
                    },
                },
                "parent_entry": "forte.data.ontology.top.Annotation",
            },

Note that this configuration misses the document_class attribute intentionally. Thus, the proposed solution needs to have provisions to handle the slight changes in structure.

Additional Context

  • This is part of the data efficiency project
  • This PR should be made to the master branch.