floratos-lab / geworkbench-web

geWorkbench web application - the evolution of geWorkbench project into the age of cloud computing.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

re-engineering data flow

zhouji2013 opened this issue · comments

Currently, the dataset, mainly just the microarray dataset, is first read and parsed to create a bison object and serialized; when the data is needed, the bison object is deserialized. That does not add value compared with parsing the original file again, but add a major layer of complexity and inefficiency.

The implementation consists of serializeDataSet and deserializedDataSet methods in UserDirUtils. serializeDataSet is referenced in one place and deserialzeDataSet is referenced in many places (10). So it is better to replace the deserializeDataSet one by one first.

The new data flow is basically to replace the serialized object with two mechanisms: part of the data is serialized in JPA to make it more efficient to query and cleaner to manage; at the same time, the original data file is explicitly reserved so we can parse it again when we need to access data not included in the current persistence data schema.

Note issue #6 is obviously related with this.

The architecture-level change is done and merged back to master branch from redo-annotation branch.