Chain Index Uses Excessive Memory on Mainnet.

Question

Chain Index Uses Excessive Memory on Mainnet.

bwbush opened this issue 2 years ago · comments

Summary

Syncing the chain index on mainnet requires excessive amounts of memory when it needs to catch up with syncing.

For example, it used approximately 100GB of memory where the last 4% of mainnet was synced recently. (It also uses all available processor cores for extended periods.)

Steps to reproduce the behavior

Run the chain-index on mainnet:

`which time` --verbose plutus-chain-index start-index --network-id 764824073             \
                                                      --db-path chain-index.db/ci.sqlite \
                                                      --socket-path node.socket          \
                                                      --port 9083

Actual Result

AppConfig {acLogConfigPath = Nothing, acMinLogLevel = Nothing, acConfigPath = Nothing, acCLIConfigOverrides = CLIConfigOverrides {ccSocketPath = Just "node.socket", ccDbPath = Just "chain-index.db/ci.sqlite", ccPort = Just 9083, ccNetworkId = Just 764824073}, acCommand = StartChainIndex}

Logging config:
Representation {minSeverity = Info, rotation = Nothing, setupScribes = [ScribeDefinition {scKind = StdoutSK, scFormat = ScText, scName = "stdout", scPrivacy = ScPublic, scRotation = Nothing, scMinSev = Debug, scMaxSev = Emergency}], defaultScribes = [(StdoutSK,"stdout")], setupBackends = [KatipBK,AggregationBK,MonitoringBK,EKGViewBK], defaultBackends = [KatipBK,AggregationBK,EKGViewBK], hasEKG = Just (Endpoint ("localhost",12790)), hasGraylog = Nothing, hasPrometheus = Nothing, hasGUI = Nothing, traceForwardTo = Nothing, forwardDelay = Nothing, traceAcceptAt = Nothing, options = fromList []}

Chain Index config:
Socket: node.socket
Db: chain-index.db/ci.sqlite
Port: 29083
Network Id: Testnet (NetworkMagic {unNetworkMagic = 764824073})
Security Param: 2160
Store from: BlockNo 0

The tip of the local node: SlotNo 52553102
Connecting to the node using socket: node.socket
Starting webserver on port 29083
A Swagger UI for the endpoints are available at http://localhost:29083/swagger/swagger-ui
Syncing (96%)
Syncing (97%)
Syncing (98%)
Syncing (99%)
Syncing (100%)

^C Interrupt
Command terminated by signal 2
        Command being timed: "plutus-chain-index"
        User time (seconds): 489738.46
        System time (seconds): 159428.96
        Percent of CPU this job got: 1556%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 11:34:53
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 98701420
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 45605
        Minor (reclaiming a frame) page faults: 25476185
        Voluntary context switches: 448477241
        Involuntary context switches: 793936231
        Swaps: 0
        File system inputs: 8
        File system outputs: 5032137816
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096

Expected Result

It's unreasonable to require 100GB+ of memory to run chain index. Ideally, its memory footprint should be under 5 GB.

Describe the approach you would take to fix this

Experiment with the use of --RTS options.
Profile the memory usage of the haskell code.
Break database transactions into smaller units.
Replace SQLite3 with a more performant persistent store.

System info

plutus-apps at commit ce8282d

ak3n · Answer 1 · Wed Apr 06 2022 19:42:56 GMT+0800 (China Standard Time)

Could you please check if the situation is still the same? There were several PRs to improve it.