re-engineering data flow

Question

re-engineering data flow

zhouji2013 opened this issue 11 years ago · comments

Currently, the dataset, mainly just the microarray dataset, is first read and parsed to create a bison object and serialized; when the data is needed, the bison object is deserialized. That does not add value compared with parsing the original file again, but add a major layer of complexity and inefficiency.

The implementation consists of serializeDataSet and deserializedDataSet methods in UserDirUtils. serializeDataSet is referenced in one place and deserialzeDataSet is referenced in many places (10). So it is better to replace the deserializeDataSet one by one first.

The new data flow is basically to replace the serialized object with two mechanisms: part of the data is serialized in JPA to make it more efficient to query and cleaner to manage; at the same time, the original data file is explicitly reserved so we can parse it again when we need to access data not included in the current persistence data schema.

Zhou Ji · Answer 1 · Sat Nov 23 2013 04:29:37 GMT+0800 (China Standard Time)

Note issue #6 is obviously related with this.

Zhou Ji · Answer 2 · Sun Dec 15 2013 12:27:20 GMT+0800 (China Standard Time)

The architecture-level change is done and merged back to master branch from redo-annotation branch.