nautechsystems / nautilus_trader

A high-performance algorithmic trading platform and event-driven backtester

Home Page:https://nautilustrader.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

batch_size_bytes not working as expected

dpmabo opened this issue · comments

Bug Report

I just tested a simple OrderBookImbalance example strategy. When I use an one-month order_book_delta dataset,
the memory usage is so huge and always result in an OOM-killed(12core/32GB workstation), batch_size_bytes is useless for memory usage.

The order_book_delta dataset is like below:

8.6M part-20240208-010605.parquet
8.7M part-20240208-014357.parquet
9.1M part-20240208-023338.parquet
8.7M part-20240208-034528.parquet
8.7M part-20240208-042741.parquet
8.9M part-20240208-050715.parquet
9.7M part-20240208-060216.parquet
8.7M part-20240208-075000.parquet
......
9.2M part-20240310-184726.parquet
9.1M part-20240310-195349.parquet
9.2M part-20240310-205305.parquet
8.8M part-20240310-215918.parquet
8.7M part-20240310-224410.parquet
4.3M part-20240310-232658.parquet
9.5M part-20240310-235944.parquet

(totally 799 files,6.8G)

catalog = ParquetDataCatalog("/data/catalog")
start = dt_to_unix_nanos(pd.Timestamp("2024-02-08", tz="UTC"))
end =  dt_to_unix_nanos(pd.Timestamp("2024-03-10", tz="UTC"))

data_configs = [
    BacktestDataConfig(
        catalog_path=str(catalog.path),
        data_cls=OrderBookDelta,
        instrument_id=instrument.id,
        start_time=start,
        end_time=end,
    ),
]

strategies = [
    ImportableStrategyConfig(
        strategy_path="examples.strategies.orderbook_imbalance:OrderBookImbalance",
        config_path="examples.strategies.orderbook_imbalance:OrderBookImbalanceConfig",
        config=dict(
            instrument_id=instrument.id,
            book_type=book_type,
            max_trade_size=Decimal("0.01"),
            trigger_min_size=3.0,
            min_seconds_between_triggers=1.0,
        ),
    ),
]

config = BacktestRunConfig(
    engine=BacktestEngineConfig(
        trader_id="BACKTESTER-001",
        strategies=strategies,
        logging=LoggingConfig(log_level="WARN"),
    ),
    data=data_configs,
    venues=venues_configs,
    batch_size_bytes=16777216 # 16M
)

node = BacktestNode(configs=[config])
result = node.run() 

Expected Behavior

As docs stating, Consider the high-level API when: Your data stream’s size exceeds available memory, necessitating streaming data in batches, When we set batch_size_bytes NT will run in streaming mode and the data in catalog is gradually loaded to memory and feed to the engine.

Actual Behavior

All data in catalog will be loaded to memory at first and easily result in an OOM-killed.

Steps to Reproduce the Problem

  1. set a large data catalog
  2. set batch_size_bytes to a non-zero value
  3. run strategy

Specifications

  • OS platform: Linux
  • Python version: 3.11
  • nautilus_trader version: 1.189.0

Hi @dpmabo

Thanks for the detailed report here.

We think this might be related to missing sort order specification on write, this still needs to be confirmed though - and a more robust solution determined.

Thank you for being patient @dpmabo. The sort order issue has been solved in #1656 and will be merged very soon. We've encountered OOM issues before and it was related to row group size in the data files.

Can you please run the following script on any of the files in your dataset and share the output here.

import pyarrow.parquet as pq
import csv
import sys


def extract_ts_init_values(parquet_file, csv_file):
    # Open the Parquet file
    parquet_file = pq.ParquetFile(parquet_file)

    # Open the CSV file for writing
    with open(csv_file, "w", newline="") as csvfile:
        writer = csv.writer(csvfile)
        writer.writerow(
            ["index", "start_ts", "end_ts", "group_size"]
        )  # Write the header

        # Iterate over each row group in the Parquet file
        for i in range(parquet_file.num_row_groups):
            # Read the row group into a table
            table = parquet_file.read_row_group(i)

            # Convert the 'ts_init' column to a list
            ts_init_values = table.column("ts_init").to_pandas().tolist()

            # Write the index, first and last value to the CSV file
            writer.writerow([i, ts_init_values[0], ts_init_values[-1], table.num_rows])


if __name__ == "__main__":
    if len(sys.argv) < 3:
        print("Usage: python extract_ts_init.py <parquet_file> <csv_file>")
        sys.exit(1)

    parquet_file = sys.argv[1]
    csv_file = sys.argv[2]

    extract_ts_init_values(parquet_file, csv_file)

Now fixed following the holy grail commit 6821fec. Many thanks for the continued effort from @twitu 🦀

@twitu sorry for my delayed post. Here is a csv generated by your script.

index,start_ts,end_ts,group_size
0,1692167341631100001,1692167398939000249,5000
1,1692167398939000250,1692167439090000021,5000
2,1692167439090000022,1692167492526000009,5000
3,1692167492526000010,1692167529900000009,5000
4,1692167529900000010,1692167559682000076,5000
5,1692167559682000077,1692167586491000020,5000
6,1692167586491000021,1692167638911000288,5000
7,1692167638911000289,1692167686987000027,5000
8,1692167686987000028,1692167733534000228,5000
9,1692167733534000229,1692167762013000011,5000
10,1692167762094000001,1692167818975001040,5000
11,1692167818975001041,1692167876183000015,5000
12,1692167876217000001,1692167911018000009,5000
13,1692167911018000010,1692167939462000001,5000
14,1692167939462000002,1692167971797000041,5000
15,1692167971797000042,1692168000600000184,5000
16,1692168000600000185,1692168030617000003,5000
17,1692168030617000004,1692168069511000002,5000
18,1692168069511000003,1692168118981000003,5000
19,1692168118981000004,1692168158816000010,5000
20,1692168158816000011,1692168190038000005,5000
21,1692168190038000006,1692168241101000011,5000
22,1692168241220000001,1692168298941001706,5000
23,1692168298941001707,1692168352840000006,5000
24,1692168352840000007,1692168387248000003,5000
25,1692168387333000001,1692168420088000062,5000
26,1692168420088000063,1692168478958000003,5000
27,1692168478958000004,1692168523143000003,5000
28,1692168523143000004,1692168570863000005,5000
29,1692168570863000006,1692168603683000063,5000
30,1692168603683000064,1692168643435000017,5000
31,1692168643435000018,1692168663348000044,5000
32,1692168663348000045,1692168718986000432,5000
33,1692168718986000433,1692168778942000094,5000
34,1692168778942000095,1692168831595000007,5000
35,1692168831595000008,1692168882567000061,5000
36,1692168882567000062,1692168907045000008,5000
37,1692168907045000009,1692168954805000008,5000
38,1692168954805000009,1692168985983000005,5000
39,1692168985983000006,1692169018968001422,5000
40,1692169018968001423,1692169057563000003,5000
41,1692169057563000004,1692169094796000026,5000
42,1692169094796000027,1692169138906000741,5000
43,1692169138906000742,1692169196073000006,5000
44,1692169196073000007,1692169216027000003,5000
45,1692169216027000004,1692169258849000968,5000
46,1692169258849000969,1692169294686000047,5000
47,1692169294686000048,1692169332526000002,5000
48,1692169332526000003,1692169378963001794,5000
49,1692169378963001795,1692169438974000233,5000
50,1692169438974000234,1692169474509000004,5000
51,1692169474509000005,1692169509300000004,5000
52,1692169509300000005,1692169558998000500,5000
53,1692169558998000501,1692169602342000017,5000
54,1692169602422000001,1692169625683000002,5000
55,1692169625683000003,1692169679005001395,5000
56,1692169679005001396,1692169727537000003,5000
57,1692169727537000004,1692169741307000013,5000
58,1692169741307000014,1692169798335000006,5000
59,1692169798335000007,1692169828496000003,5000
60,1692169828496000004,1692169861391000014,5000
61,1692169861391000015,1692169918918001277,5000
62,1692169918918001278,1692169977322000005,5000
63,1692169977322000006,1692170014880000004,5000
64,1692170014880000005,1692170068343000007,5000
65,1692170068343000008,1692170104230000345,5000
66,1692170104230000346,1692170153997000007,5000
67,1692170153997000008,1692170194786000004,5000
68,1692170194786000005,1692170231519000001,5000
69,1692170231519000002,1692170274174000010,5000
70,1692170274174000011,1692170290134000043,5000
71,1692170290134000044,1692170339017001045,5000
72,1692170339017001046,1692170398982000890,5000
73,1692170398982000891,1692170457409000002,5000
74,1692170457409000003,1692170502216000009,5000
75,1692170502216000010,1692170540700000004,5000
76,1692170540700000005,1692170583126000005,5000
77,1692170583126000006,1692170638970001731,5000
78,1692170638970001732,1692170698971000781,5000
79,1692170698971000782,1692170730815000007,5000
80,1692170730914000001,1692170759023001991,5000
81,1692170759023001992,1692170806616000489,5000
82,1692170806616000490,1692170837864000007,5000
83,1692170837969000001,1692170878993001448,5000
84,1692170878993001449,1692170921601000008,5000
85,1692170921601000009,1692170957829000009,5000
86,1692170957950000001,1692171003468000001,5000
87,1692171003468000002,1692171054634000012,5000
88,1692171054634000013,1692171089143000021,5000
89,1692171089168000001,1692171119013000342,5000
90,1692171119013000343,1692171139500000111,5000
91,1692171139500000112,1692171178993001160,5000
92,1692171178993001161,1692171218355000004,5000
93,1692171218469000001,1692171246782000004,5000
94,1692171246782000005,1692171299004000123,5000
95,1692171299004000124,1692171336042000006,5000
96,1692171336042000007,1692171386788000008,5000
97,1692171386789000001,1692171432612000008,5000
98,1692171432612000009,1692171478859001260,5000
99,1692171478859001261,1692171533389000001,5000
100,1692171533389000002,1692171556388000004,5000
101,1692171556388000005,1692171598994001537,5000
102,1692171598994001538,1692171624876000009,5000
103,1692171624876000010,1692171659029000740,5000
104,1692171659029000741,1692171696452000147,5000
105,1692171696452000148,1692171718980000386,5000
106,1692171718980000387,1692171757435000001,5000
107,1692171757435000002,1692171782182000004,5000
108,1692171782182000005,1692171838987001506,5000
109,1692171838987001507,1692171898937000911,5000
110,1692171898937000912,1692171958931000754,5000
111,1692171958931000755,1692172011510000009,5000
112,1692172011510000010,1692172064913000001,5000
113,1692172064913000002,1692172106289000002,5000
114,1692172106289000003,1692172143707000007,5000
115,1692172143707000008,1692172198954001774,5000
116,1692172198954001775,1692172258947000576,5000
117,1692172258947000577,1692172319038000126,5000
118,1692172319038000127,1692172360417000032,5000
119,1692172360417000033,1692172405505000001,5000
120,1692172405505000002,1692172440418000008,5000
121,1692172440418000009,1692172498944001052,5000
122,1692172498944001053,1692172554078000005,5000
123,1692172554079000001,1692172604540000001,5000
124,1692172604540000002,1692172629662000039,5000
125,1692172629662000040,1692172680395000080,5000
126,1692172680395000081,1692172738970000824,5000
127,1692172738970000825,1692172780292000008,5000
128,1692172780396000001,1692172810050000013,5000
129,1692172810050000014,1692172858991001664,5000
130,1692172858991001665,1692172918974000204,5000
131,1692172918974000205,1692172963596000001,5000
132,1692172963596000002,1692173013478000001,5000
133,1692173013478000002,1692173063058000005,5000
134,1692173063058000006,1692173105074000018,5000
135,1692173105074000019,1692173159007001983,5000
136,1692173159007001984,1692173219006000746,5000
137,1692173219006000747,1692173275530000002,5000
138,1692173275530000003,1692173328353000003,5000
139,1692173328353000004,1692173363091000005,5000
140,1692173363091000006,1692173403100000016,5000
141,1692173403100000017,1692173459031000967,5000
142,1692173459031000968,1692173508768000021,5000
143,1692173508768000022,1692173535001000003,5000
144,1692173535001000004,1692173579005000783,5000
145,1692173579005000784,1692173624933000002,5000
146,1692173624933000003,1692173654207000005,5000
147,1692173654207000006,1692173698998001850,5000
148,1692173698998001851,1692173759019000185,5000
149,1692173759019000186,1692173783192000006,5000
150,1692173783192000007,1692173818862001417,5000
151,1692173818862001418,1692173879019000435,5000
152,1692173879019000436,1692173913968000052,5000
153,1692173913968000053,1692173945231000001,5000
154,1692173945231000002,1692173989589000021,5000
155,1692173989589000022,1692174006991000039,5000
156,1692174006991000040,1692174058835000088,5000
157,1692174058835000089,1692174096279000007,5000
158,1692174096279000008,1692174118949000535,5000
159,1692174118949000536,1692174145939000005,5000
160,1692174145939000006,1692174179209000002,5000
161,1692174179209000003,1692174239024000888,5000
162,1692174239024000889,1692174296794000001,5000
163,1692174296794000002,1692174341372000001,5000
164,1692174341372000002,1692174386992000002,5000
165,1692174387067000001,1692174426404000049,5000
166,1692174426404000050,1692174464421000013,5000
167,1692174464421000014,1692174503574000006,5000
168,1692174503574000007,1692174542349000016,5000
169,1692174542349000017,1692174599035000859,5000
170,1692174599035000860,1692174655977000007,5000
171,1692174655977000008,1692174693764000002,5000
172,1692174693764000003,1692174735717000002,5000
173,1692174735717000003,1692174778986001246,5000
174,1692174778986001247,1692174824298000006,5000
175,1692174824298000007,1692174856006000059,5000
176,1692174856006000060,1692174898955001388,5000
177,1692174898955001389,1692174949117000004,5000
178,1692174949117000005,1692174988824000019,5000
179,1692174988824000020,1692175032924000004,5000
180,1692175032924000005,1692175060237000002,5000
181,1692175060237000003,1692175082160000005,5000
182,1692175082160000006,1692175120171000009,5000
183,1692175120171000010,1692175141400000031,5000
184,1692175141400000032,1692175170532000007,5000
185,1692175170532000008,1692175202380000028,5000
186,1692175202380000029,1692175259031000483,5000
187,1692175259031000484,1692175304754000065,5000
188,1692175304754000066,1692175333858000001,5000
189,1692175333858000002,1692175379061000770,5000
190,1692175379061000771,1692175417382000006,5000
191,1692175417382000007,1692175450425000024,5000
192,1692175450425000025,1692175494053000006,5000
193,1692175494157000001,1692175526065000011,5000
194,1692175526065000012,1692175573690000004,5000
195,1692175573787000001,1692175622771000003,5000
196,1692175622771000004,1692175675528000016,5000
197,1692175675528000017,1692175690480000011,5000
198,1692175690480000012,1692175737563000007,5000
199,1692175737652000001,1692175780486000004,4979

Hi @dpmabo, a fix has been merged into develop. Please give it a try and make an issue if it doesn't work for you.