microsoft / pai

Resource scheduling and cluster management for AI

Home Page:https://openpai.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

stdout/stderr: Log folder can not be retrieved

wangxianglang opened this issue · comments

Organization Name:

Short summary about the issue/question:
Cannot access the log-manager
The stdout/stderr in the job details page shows: Log folder can not be retrieved.

I found the rest-server cannot access the worker nodes.
Here's the logs of rest-server :
[ERROR] Got error when retrieving log list, error: Error: connect ETIMEDOUT 10.126.62.58:9103

Anyone knows the reason for this ?

OpenPAI Environment:

  • OpenPAI version: 1.8.0
  • OS (e.g. from /etc/os-release): Ubuntu 20.04

I have solved this problem through kubectl -n kube-system set env daemonset/calico-node FELIX_IGNORELOOSERPF=true

thanks for the share! Saved me a lot of time! +1 for fix