alibaba / pipcook

Machine learning platform for Web developers

Home Page:https://alibaba.github.io/pipcook/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Job failed because of message got too long

upupzealot opened this issue · comments

image
image
my pipline is like

{
  "plugins": {
    "dataCollect": {
      "package": "@pipcook/plugins-object-detection-pascalvoc-data-collect",
      "params": {
        "url": "https://zhijiansha.oss-cn-hangzhou.aliyuncs.com/deep-learning/output.zip"
      }
    },
    "dataAccess": {
      "package": "@pipcook/plugins-coco-data-access"
    },
    "modelDefine": {
      "package": "@pipcook/plugins-pytorch-yolov5-model-define"
    },
    "modelTrain": {
      "package": "@pipcook/plugins-pytorch-yolov5-model-train",
      "params": {
        "epochs": 300
      }
    },
    "modelEvaluate": {
      "package": "@pipcook/plugins-pytorch-yolov5-model-evaluate"
    }
  }
}

BTW, to locate the cause of this issue, I set epoch to 10 for test, and the trainning just gose well

The SIGKILL seems to be caused by the costa has used the memory up, then the OS killed the process, we need to figure it out a way to optimize the memory consumption.