AI4Finance-Foundation / RLSolver

Solvers for NP-hard and NP-complete problems with an emphasis on high-performance GPU computing.

Home Page:https://ai4finance.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

📝 The tricks in Learn to optimize in TNCO sycamore

Yonv1943 opened this issue · comments

下面 theta 表示 TNCO任务的“解”,它是一个表示量子电路收缩顺序的tensor, 计算 theta.argsort() 就能获得有序的 edge_id,表示依次收缩某一条边。

  1. 维持了两个 ReplayBuffer,一个保存了模型实时迭代产生的theta,另一个保存了得分较好的theta。这样让得分好的theta不至于被 ReplayBuffer 的 FIFO 规则删掉。

计算 keep_score
https://github.com/AI4Finance-Foundation/ElegantRL_Solver/blob/41a58d0ecb9daeddfa635d19a3741b0a29162342/rlsolver/rlsolver_learn2opt/tensor_train/TNCO_H2O.py#L418-L419

根据 keep_score 找出需要保存到 buffer0 的 theta
https://github.com/AI4Finance-Foundation/ElegantRL_Solver/blob/41a58d0ecb9daeddfa635d19a3741b0a29162342/rlsolver/rlsolver_learn2opt/tensor_train/TNCO_H2O.py#L372-L375

  1. 从更好的解附近开始搜索

historical_theta 是一个得分好的theta,我们对它加上噪声并在它附近开始迭代
https://github.com/AI4Finance-Foundation/ElegantRL_Solver/blob/41a58d0ecb9daeddfa635d19a3741b0a29162342/rlsolver/rlsolver_learn2opt/tensor_train/TNCO_H2O.py#L402-L408

我们也从 保存了较好的 theta 的 buffer0 里随机选出的theta 开始迭代
https://github.com/AI4Finance-Foundation/ElegantRL_Solver/blob/41a58d0ecb9daeddfa635d19a3741b0a29162342/rlsolver/rlsolver_learn2opt/tensor_train/TNCO_H2O.py#L410-L419

  1. 在训练迭代器之后,使用迭代器进行推理

可以看到