code improvement

Question

code improvement

amor71 opened this issue 3 years ago · comments

in data_loader.py: a merge between _fetch_data_range() and fetch_data_range() would improve code quality (need to make sure all flows are covered by unit tests)

@amor71 · Answer 1 · Thu Sep 23 2021 10:21:29 GMT+0800 (China Standard Time)

data_loader is a central component in the Framework, it provides a Pandas DataFrame-like interface for the in-memory management of OHLC and additional data - per symbol. Data is loaded from a data provider. Current data providers are Alpaca, Polygon). There are unit tests in the tests folder (see .../tests/test_alpaca_data_loader.py) that check various flows. When making the above changes, need to make sure the changes are covered by unit tests, and if not add additional unit tests in

ksilo · Answer 2 · Sun Sep 26 2021 18:57:29 GMT+0800 (China Standard Time)

After close inspection of two methods _fetch_data_range() and fetch_data_range() It seems, that they are exactly the same. They both provide market data for specified time range and store on class SymbolData attribute symbol_data. The only place where _fetch_data_range() is used is in fetch_data_timestamp() method which converts different timestamps representations and specify time range. In my understanding these two methods could be merged without any consequences and there won't be any new logical code flows.

@amor71 I'd like your input on this matter, before I start to mess around with code :)

@amor71 · Answer 3 · Sun Sep 26 2021 22:48:42 GMT+0800 (China Standard Time)

@ksilo There are subtle differences between them, and also additional things to look at:

fetch_data_range would break the range into "smaller pieces" since some data providers have a limit on the amount of data they can return. I am not sure if the way to split it up (time vs amount) is the most efficient one, It makes sense that fetch_data_range() and re-use _fetch_data_range() if that's what you mean.
I am not sure that way to handle time-scale is really the most pythonic and most efficient way,
need to revisit how they both "stitch" together the loaded data, to make sure no duplicates, the order is being kept in the most efficient DataFrame way

ksilo · Answer 4 · Tue Sep 28 2021 03:32:22 GMT+0800 (China Standard Time)

How big is a typical data range of fetch_data_range()? months? years?

@amor71 · Answer 5 · Tue Sep 28 2021 06:34:03 GMT+0800 (China Standard Time)

Some swing trading algos would look for 200+ days of trading data.

…

On Sep 27, 2021, at 3:32 PM, ksilo ***@***.***> wrote: How big is a typical data range of fetch_data_range()? months? years? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

@amor71 · Answer 6 · Tue Sep 28 2021 06:34:39 GMT+0800 (China Standard Time)

Actually implementing LRU to free memory could be an interesting move

…

On Sep 27, 2021, at 3:32 PM, ksilo ***@***.***> wrote: How big is a typical data range of fetch_data_range()? months? years? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

github-actions · Answer 7 · Sat Nov 27 2021 11:48:32 GMT+0800 (China Standard Time)

Stale issue message